Skip to content

Top 10 Optimizations for Production

Coming from Django ORM?

In SQLAlchemy, most performance tuning happens through loader strategies, query options, and bulk operations.

1. Solve the N+1 Query Problem

Problem: Querying related data triggers many additional queries.

# Bad: 1 query for books + N queries for authors
books = session.query(Book).all()
for b in books:
    print(b.author.name)

Solution:

from sqlalchemy.orm import joinedload, selectinload

# For many-to-one (e.g. Book → Author)
books = session.query(Book).options(
    joinedload(Book.author)
).all()

# For one-to-many (e.g. Author → Books)
authors = session.query(Author).options(
    selectinload(Author.books)
).all()
Relationship Strategy
Many-to-one joinedload
One-to-many selectinload

2. Load Only Required Columns

Problem: APIs often fetch entire rows when only a few columns are needed.

# Bad
users = session.query(User).all()

# Better
from sqlalchemy.orm import load_only

users = session.query(User).options(
    load_only(User.id, User.email)
)

Reduces network traffic and memory usage — critical in high-traffic APIs.


3. Use yield_per() for Large Datasets

Problem: Loading millions of rows into memory at once.

# Bad
users = session.query(User).all()

# Better
for user in session.query(User).yield_per(1000):
    process(user)

Rows are streamed in batches, avoiding memory exhaustion. Common in ETL jobs, background workers, and analytics pipelines.


4. Bulk Inserts Instead of ORM Loops

Problem: Inserting rows one by one.

# Bad
for data in dataset:
    session.add(User(**data))

# Better
session.bulk_insert_mappings(User, dataset)

Can be 10–100x faster. Common in log ingestion, data migration scripts, and analytics pipelines.


5. Use exists() Instead of Loading Objects

Problem: Loading full rows just to check existence.

# Bad
user = session.query(User).filter_by(email=email).first()
if user:
    ...

# Better
from sqlalchemy import exists

session.query(
    exists().where(User.email == email)
).scalar()

Avoids unnecessary object creation. Used heavily in authentication and uniqueness checks.


6. Avoid ORM for Heavy Reporting Queries

For complex reports, use raw SQL or SQLAlchemy Core queries.

from sqlalchemy import select, func

stmt = select(
    Order.user_id,
    func.sum(Order.total)
).group_by(Order.user_id)

ORM object creation becomes expensive at scale. Production systems often mix ORM for CRUD and SQLAlchemy Core for analytics.


7. Configure Connection Pooling

Problem: DB connection exhaustion and slow connection creation under high concurrency.

engine = create_engine(
    DATABASE_URL,
    pool_size=20,
    max_overflow=40,
    pool_timeout=30
)

Critical for microservices and high-concurrency APIs (FastAPI, Flask).


8. Prevent Duplicate Joins with contains_eager()

Problem: Manually joining tables and then triggering additional queries when accessing the relationship.

# Bad: join exists but accessing book.author triggers another query
query = session.query(Book).join(Book.author)

# Better
from sqlalchemy.orm import contains_eager

query = session.query(Book).join(Book.author).options(
    contains_eager(Book.author)
)

9. Row Locking for Concurrent Writes

Problem: Race conditions when two transactions modify the same row simultaneously (e.g. concurrent withdrawals).

account = (
    session.query(Account)
    .with_for_update()
    .filter(Account.id == acc_id)
    .one()
)

Emits SELECT ... FOR UPDATE. Essential in banking, order systems, and inventory management.


10. Keyset Pagination for Large Tables

Problem: OFFSET-based pagination becomes slow on large tables.

# Bad: slow on large offsets
session.query(Post).offset(10000).limit(20)

# Better: keyset pagination
session.query(Post).filter(Post.id > last_id).limit(20)

Index-friendly and consistently fast. Used in feeds, logs, and messaging systems.


Production Pattern Summary

Pattern Problem Solved
Eager loading N+1 queries
load_only Over-fetching columns
yield_per Memory exhaustion
Bulk insert Slow row-by-row inserts
exists() Unnecessary row loading
Core queries Heavy analytics
Connection pooling Connection exhaustion
contains_eager Duplicate queries from manual joins
Row locking Race conditions
Keyset pagination Slow OFFSET queries

Senior engineers combine multiple patterns

A production-grade query might look like this:

posts = (
    session.query(Post)
    .options(selectinload(Post.comments))
    .options(load_only(Post.id, Post.title))
    .filter(Post.id > last_id)
    .limit(20)
)