Top 10 Optimizations for Production¶

Coming from Django ORM?

In SQLAlchemy, most performance tuning happens through loader strategies, query options, and bulk operations.

1. Solve the N+1 Query Problem¶

Problem: Querying related data triggers many additional queries.

# Bad: 1 query for books + N queries for authors
books = session.query(Book).all()
for b in books:
    print(b.author.name)

Solution:

from sqlalchemy.orm import joinedload, selectinload

# For many-to-one (e.g. Book → Author)
books = session.query(Book).options(
    joinedload(Book.author)
).all()

# For one-to-many (e.g. Author → Books)
authors = session.query(Author).options(
    selectinload(Author.books)
).all()

Relationship	Strategy
Many-to-one	`joinedload`
One-to-many	`selectinload`

2. Load Only Required Columns¶

Problem: APIs often fetch entire rows when only a few columns are needed.

# Bad
users = session.query(User).all()

# Better
from sqlalchemy.orm import load_only

users = session.query(User).options(
    load_only(User.id, User.email)
)

Reduces network traffic and memory usage — critical in high-traffic APIs.

3. Use `yield_per()` for Large Datasets¶

Problem: Loading millions of rows into memory at once.

# Bad
users = session.query(User).all()

# Better
for user in session.query(User).yield_per(1000):
    process(user)

Rows are streamed in batches, avoiding memory exhaustion. Common in ETL jobs, background workers, and analytics pipelines.

4. Bulk Inserts Instead of ORM Loops¶

Problem: Inserting rows one by one.

# Bad
for data in dataset:
    session.add(User(**data))

# Better
session.bulk_insert_mappings(User, dataset)

Can be 10–100x faster. Common in log ingestion, data migration scripts, and analytics pipelines.

5. Use `exists()` Instead of Loading Objects¶

Problem: Loading full rows just to check existence.

# Bad
user = session.query(User).filter_by(email=email).first()
if user:
    ...

# Better
from sqlalchemy import exists

session.query(
    exists().where(User.email == email)
).scalar()

Avoids unnecessary object creation. Used heavily in authentication and uniqueness checks.

6. Avoid ORM for Heavy Reporting Queries¶

For complex reports, use raw SQL or SQLAlchemy Core queries.

from sqlalchemy import select, func

stmt = select(
    Order.user_id,
    func.sum(Order.total)
).group_by(Order.user_id)

ORM object creation becomes expensive at scale. Production systems often mix ORM for CRUD and SQLAlchemy Core for analytics.

7. Configure Connection Pooling¶

Problem: DB connection exhaustion and slow connection creation under high concurrency.

engine = create_engine(
    DATABASE_URL,
    pool_size=20,
    max_overflow=40,
    pool_timeout=30
)

Critical for microservices and high-concurrency APIs (FastAPI, Flask).

8. Prevent Duplicate Joins with `contains_eager()`¶

Problem: Manually joining tables and then triggering additional queries when accessing the relationship.

# Bad: join exists but accessing book.author triggers another query
query = session.query(Book).join(Book.author)

# Better
from sqlalchemy.orm import contains_eager

query = session.query(Book).join(Book.author).options(
    contains_eager(Book.author)
)

9. Row Locking for Concurrent Writes¶

Problem: Race conditions when two transactions modify the same row simultaneously (e.g. concurrent withdrawals).

account = (
    session.query(Account)
    .with_for_update()
    .filter(Account.id == acc_id)
    .one()
)

Emits SELECT ... FOR UPDATE. Essential in banking, order systems, and inventory management.

10. Keyset Pagination for Large Tables¶

Problem: OFFSET-based pagination becomes slow on large tables.

# Bad: slow on large offsets
session.query(Post).offset(10000).limit(20)

# Better: keyset pagination
session.query(Post).filter(Post.id > last_id).limit(20)

Index-friendly and consistently fast. Used in feeds, logs, and messaging systems.

Production Pattern Summary¶

Pattern	Problem Solved
Eager loading	N+1 queries
`load_only`	Over-fetching columns
`yield_per`	Memory exhaustion
Bulk insert	Slow row-by-row inserts
`exists()`	Unnecessary row loading
Core queries	Heavy analytics
Connection pooling	Connection exhaustion
`contains_eager`	Duplicate queries from manual joins
Row locking	Race conditions
Keyset pagination	Slow `OFFSET` queries

Senior engineers combine multiple patterns

A production-grade query might look like this:

posts = (
    session.query(Post)
    .options(selectinload(Post.comments))
    .options(load_only(Post.id, Post.title))
    .filter(Post.id > last_id)
    .limit(20)
)