Top 10 Optimizations for Production¶
Coming from Django ORM?
In SQLAlchemy, most performance tuning happens through loader strategies, query options, and bulk operations.
1. Solve the N+1 Query Problem¶
Problem: Querying related data triggers many additional queries.
# Bad: 1 query for books + N queries for authors
books = session.query(Book).all()
for b in books:
print(b.author.name)
Solution:
from sqlalchemy.orm import joinedload, selectinload
# For many-to-one (e.g. Book → Author)
books = session.query(Book).options(
joinedload(Book.author)
).all()
# For one-to-many (e.g. Author → Books)
authors = session.query(Author).options(
selectinload(Author.books)
).all()
| Relationship | Strategy |
|---|---|
| Many-to-one | joinedload |
| One-to-many | selectinload |
2. Load Only Required Columns¶
Problem: APIs often fetch entire rows when only a few columns are needed.
# Bad
users = session.query(User).all()
# Better
from sqlalchemy.orm import load_only
users = session.query(User).options(
load_only(User.id, User.email)
)
Reduces network traffic and memory usage — critical in high-traffic APIs.
3. Use yield_per() for Large Datasets¶
Problem: Loading millions of rows into memory at once.
# Bad
users = session.query(User).all()
# Better
for user in session.query(User).yield_per(1000):
process(user)
Rows are streamed in batches, avoiding memory exhaustion. Common in ETL jobs, background workers, and analytics pipelines.
4. Bulk Inserts Instead of ORM Loops¶
Problem: Inserting rows one by one.
# Bad
for data in dataset:
session.add(User(**data))
# Better
session.bulk_insert_mappings(User, dataset)
Can be 10–100x faster. Common in log ingestion, data migration scripts, and analytics pipelines.
5. Use exists() Instead of Loading Objects¶
Problem: Loading full rows just to check existence.
# Bad
user = session.query(User).filter_by(email=email).first()
if user:
...
# Better
from sqlalchemy import exists
session.query(
exists().where(User.email == email)
).scalar()
Avoids unnecessary object creation. Used heavily in authentication and uniqueness checks.
6. Avoid ORM for Heavy Reporting Queries¶
For complex reports, use raw SQL or SQLAlchemy Core queries.
from sqlalchemy import select, func
stmt = select(
Order.user_id,
func.sum(Order.total)
).group_by(Order.user_id)
ORM object creation becomes expensive at scale. Production systems often mix ORM for CRUD and SQLAlchemy Core for analytics.
7. Configure Connection Pooling¶
Problem: DB connection exhaustion and slow connection creation under high concurrency.
Critical for microservices and high-concurrency APIs (FastAPI, Flask).
8. Prevent Duplicate Joins with contains_eager()¶
Problem: Manually joining tables and then triggering additional queries when accessing the relationship.
# Bad: join exists but accessing book.author triggers another query
query = session.query(Book).join(Book.author)
# Better
from sqlalchemy.orm import contains_eager
query = session.query(Book).join(Book.author).options(
contains_eager(Book.author)
)
9. Row Locking for Concurrent Writes¶
Problem: Race conditions when two transactions modify the same row simultaneously (e.g. concurrent withdrawals).
Emits SELECT ... FOR UPDATE. Essential in banking, order systems, and inventory management.
10. Keyset Pagination for Large Tables¶
Problem: OFFSET-based pagination becomes slow on large tables.
# Bad: slow on large offsets
session.query(Post).offset(10000).limit(20)
# Better: keyset pagination
session.query(Post).filter(Post.id > last_id).limit(20)
Index-friendly and consistently fast. Used in feeds, logs, and messaging systems.
Production Pattern Summary¶
| Pattern | Problem Solved |
|---|---|
| Eager loading | N+1 queries |
load_only |
Over-fetching columns |
yield_per |
Memory exhaustion |
| Bulk insert | Slow row-by-row inserts |
exists() |
Unnecessary row loading |
| Core queries | Heavy analytics |
| Connection pooling | Connection exhaustion |
contains_eager |
Duplicate queries from manual joins |
| Row locking | Race conditions |
| Keyset pagination | Slow OFFSET queries |