Challenges¶
Concurrency & Consistency Issues¶
These occur when multiple users or processes try to access or modify the same data at the same time.
- Race Conditions: When the outcome depends on the sequence or timing of uncontrollable events. For example, two users withdrawing $100 from a $150 account at the exact same millisecond; if not handled, both might succeed, leaving the balance at -$50.
- Dirty Reads: A transaction reads data that has been modified by another transaction but not yet committed. If that other transaction rolls back, your data is now "imaginary."
- Non-Repeatable Reads: You read a row, someone else updates it and commits, and you read it again only to find the values have changed.
- Phantom Reads: You query a range of rows (e.g., "all users in New York"), someone else inserts a new user in New York and commits, and your second query suddenly sees a "phantom" row that wasn't there before.
- Deadlocks: Transaction A locks Row 1 and wants Row 2. Transaction B locks Row 2 and wants Row 1. They wait forever for each other.
Performance & Optimization¶
These are usually the result of how the application interacts with the database engine.
- N+1 Query Problem: This happens when an application makes one query to fetch a list of items, and then executes \(N\) additional queries to fetch related data for each item.
-
Example: Fetching 50 blog posts (1 query) and then running a separate query for each post to get its comments (50 queries).
-
Missing or Poor Indexing: Without indexes, the DB must perform a "Full Table Scan," looking at every single row to find what you need. Conversely, too many indexes slow down writes (INSERT/UPDATE) because every index must be updated.
- Slow Joins: Joining massive tables without proper foreign keys or filtering can cause exponential increases in execution time.
- Connection Exhaustion: Every database has a limit on simultaneous connections. If your app doesn't use a "connection pool," it can crash the DB by opening too many doors at once.
Architectural & Scaling Challenges¶
As data grows, the fundamental structure of the database starts to buckle.
- Data Fragmentation: Over time, as rows are deleted and updated, data becomes physically scattered on the disk, slowing down read speeds.
- Schema Evolution (Migration Pain): Changing a table structure (adding a column to a table with 100 million rows) can lock the table for hours, causing downtime.
- Vertical vs. Horizontal Scaling: You can only buy a bigger server (Vertical) for so long. Eventually, you have to split data across multiple servers (Horizontal), which introduces "Sharding" complexity.
- CAP Theorem Trade-offs: You generally have to choose two out of three: Consistency, Availability, and Partition Tolerance. Balancing these in a distributed system is a major engineering hurdle.
Integrity & Security¶
- Data Silos: When data is duplicated across different databases and becomes out of sync (e.g., the "Email" field is updated in the Marketing DB but not the Billing DB).
- SQL Injection: Still one of the most common vulnerabilities, where malicious users "inject" code into your queries to steal or delete data.
- Stale Data: In systems using "Eventual Consistency" (like many NoSQL DBs), a user might update their profile but still see the old version for several seconds.