How Query Optimizers Work
Your query is slow. You stare at EXPLAIN ANALYZE output, wondering why the database chose a sequential scan when you have a perfectly good index. Or why a simple join takes 30 seconds when similar queries return instantly.
I've been there. And what I discovered surprised me: the database completely ignores how you wrote your query. It considers dozens, sometimes hundreds, of alternative ways to get the same result, estimates the cost of each one, and picks what it thinks is cheapest.
Sometimes it picks wrong.
Understanding how the optimizer makes these decisions is the difference between guessing and knowing. In this post, we'll build that understanding from the ground up: how the database finds rows, how it joins tables, why it sometimes ignores your index, and how to fix it when things go wrong.
The Simplest Query: Finding Rows
Let's start with a basic query:
SELECT * FROM orders WHERE total > 100How does the database find the rows where total > 100?
Without any special data structures, there's only one option: check every single row. This is called a sequential scan (or "table scan"), where the database reads every row from first to last, checking each one against the filter. Step through it and watch what happens:
The database examines every row, even though only 3 match. With 6 rows this is fine. With 6 million rows, you're waiting minutes instead of milliseconds.
Now step through the index scan. The index already knows which rows match, so it jumps straight to them without checking the rest:
Count how many rows each method examines. Sequential scan: 6. Index scan: 3. But the table still has 6 rows. How did the index know which 3 to skip without looking at them?
The index knew which rows to skip. How? We'll get to that soon. But first, let's look at a more expensive problem.
What Happens When You Join Two Tables
Consider this query:
SELECT customers.name, orders.total
FROM customers
JOIN orders ON customers.id = orders.customer_idThe simplest join algorithm is called a nested loop: for each row in the first table, scan through every row in the second table looking for matches. Step through it:
For each customer, the database scans through all orders checking for matches. When customer 1 finishes, the scan starts over from the beginning for customer 2.
Count the comparisons: 3 customers × 6 orders = 18 comparisons for this tiny example. This is O(n × m), a notation meaning the work grows as the product of the two table sizes. With 1,000 customers and 10,000 orders, that's 10,000,000 comparisons.
Step through slowly and watch for the restart. How many times does order #3 get checked? (It's once per customer.) Now imagine 1,000 customers and 10,000 orders: 10,000,000 comparisons.
The optimizer knows this, so it considers alternatives. For larger tables, it might build a hash table (an in-memory lookup table) from one side, then check each row from the other side against it. That's O(n + m), where work grows as the sum of the table sizes rather than their product. Or it could sort both tables and merge them together in a single pass. The optimizer picks based on table sizes, available memory, and existing indexes. But there's an even more impactful optimization it can make.
Why Filter Order Matters
Consider this query:
SELECT customers.name, orders.total
FROM orders
JOIN customers ON orders.customer_id = customers.id
WHERE orders.total > 100There are two ways to execute this. First, the naive approach: join everything, then filter. Step through it and watch the operations counter climb:
All 18 join comparisons happen first, then the filter discards the rows that don't match. Now step through the smarter approach: filter the orders before the join, so there are far fewer comparisons:
Compare the final operations count between the two demos. The join-first path racks up 36 operations. The filter-first path finishes in 15. Where did the savings come from?
With 100,000 orders where 10% match the filter, join-first does 300,000 comparisons. Filter-first does 130,000. The optimizer just saved you 60% of the work.
This works because SQL is declarative: you specify what you want, not how to get it. The optimizer is free to reorder operations as long as the result is identical. This technique, pushing filters earlier in the execution plan, is called predicate pushdown (a predicate is just a filter condition, like total > 100).
But wait. I wrote the JOIN before the WHERE clause. Doesn't the database follow my order?
No. The optimizer rewrites your query into whatever execution order is cheapest. Your SQL is a specification, not a recipe.
The Trick That Makes Indexes Fast
Earlier, we saw that an index lets the database skip rows. But how does it know which rows to skip?
Most database indexes use a data structure called a B-tree (balanced tree). It works like a decision tree: start at the top, compare your value, follow a pointer, repeat. At each level, you only visit one node. Search for a few values below and watch how the tree navigates:
Notice the pattern? Three levels, three node visits. No matter which value you search for, it takes the same number of steps.
Search for 125, then 100, then 245. Count how many nodes each search visits. Now imagine a tree with 1 million entries but only 3-4 levels. How many comparisons would you need?
This is the power of B-trees: O(log n) lookup instead of O(n). A sequential scan through 1 million rows checks 500,000 on average. A B-tree checks 3-4.
Now you can see why certain SQL patterns work the way they do. Range queries like WHERE price BETWEEN 100 AND 200 are fast because you find 100 in the tree, then follow the linked leaf nodes until you hit 200, reading values in order without re-traversing the tree. Leading wildcards like LIKE '%gmail.com' break indexes because the tree sorts values left-to-right, and a leading wildcard means the database can't pick a starting point in the tree. And column order in multi-column indexes matters because the tree sorts by the first column, then the second: an index on (country, city) can answer WHERE country = 'US' but not WHERE city = 'Seattle' alone, since city values are scattered across different country subtrees.
When Does the Optimizer Use an Index?
So B-trees are fast. Why doesn't the optimizer always use them?
Here's the catch: the optimizer sometimes ignores your index.
To use an index, the database must:
- Traverse the B-tree to find matching entries (cheap: O(log n))
- For each match, read the actual row from the table (expensive: random I/O)
Random I/O is expensive because each row fetch might hit a different location on disk. Sequential scans read data in order, which is much faster per byte. Step through the sequential access pattern below and watch the disk head barely move:
Adjacent pages live on nearby tracks, so sequential reads barely move the disk head. Now step through the random access pattern and watch the head jump back and forth:
Run both patterns to completion and compare the time. Now think about this tradeoff at scale: if your query returns 30% of the table, random I/O from an index might actually be slower than just reading everything sequentially.
The optimizer knows this tradeoff. The percentage of rows your query returns is called selectivity. When selectivity is low (say, 5% of the table), the index wins because you only do a small number of random lookups. But as selectivity grows toward 25-30%, those random lookups add up. At some crossover point, it becomes cheaper to just read the entire table sequentially than to bounce around the disk fetching individual rows.
This explains a common frustration: "I added an index, but the optimizer isn't using it." The optimizer isn't broken. It calculated that your query returns too many rows for the index to help.
When Statistics Lie to the Optimizer
Everything we've discussed depends on one thing: the optimizer knowing how many rows each operation will produce. This row count is called cardinality. The optimizer gets cardinality estimates from statistics it maintains: row counts, distinct values per column, and value distribution histograms.
But what happens when these statistics are wrong?
Say you bulk-loaded 10 million rows last night. The statistics still say the table has 100 rows. The optimizer sees WHERE region = 'US', estimates 10 matching rows, and picks a nested loop with index lookups. Fast plan for 10 rows. But in reality, 2 million rows match. Those 2 million random index lookups take orders of magnitude longer than a single sequential scan would have. The optimizer made the right decision for the data it thought it had — it just had the wrong data.
You'll see this in EXPLAIN ANALYZE output as a gap between rows=10 (the estimate) and actual rows=2000000. When those numbers diverge by more than 10x, the plan is almost certainly wrong.
The fix is to run ANALYZE (Postgres) or ANALYZE TABLE (MySQL) to refresh statistics after bulk loads or major data changes. If you loaded millions of rows last night and queries are suddenly slow this morning, this is probably why. Some teams run ANALYZE as the last step of every ETL pipeline to prevent this entirely.
Why Your Index Doesn't Always Help
You added an index. The query is still slow. What happened?
Some SQL patterns prevent the optimizer from using indexes, and once you understand why, you can spot and fix them.
Wrapping a column in a function
-- Slow: optimizer can't use index on created_at
WHERE YEAR(created_at) = 2024
-- Fast: index works, statistics apply
WHERE created_at >= '2024-01-01'
AND created_at < '2025-01-01'The optimizer has statistics on created_at values. It knows how many rows exist for each date. But you wrapped the column in YEAR(), and the optimizer has no statistics on YEAR(created_at). Without statistics, it can't estimate cost. Without cost estimates, it falls back to a sequential scan. The fix is to rewrite the condition so the bare column appears on one side of the comparison. This is what database engineers call a sargable predicate ("Search ARGument ABLE"), meaning the optimizer can use it with an index.
Type mismatch
-- Slow: user_id is INTEGER, comparison forces conversion
WHERE user_id = '123'
-- Fast: types match
WHERE user_id = 123When types don't match, the database must convert every row's value to compare. It can't use the index, and statistics may not apply. The fix is simple: match the type of the value you're comparing against.
Leading wildcard
-- Slow: can't use index
WHERE email LIKE '%@gmail.com'
-- Fast: index works
WHERE email LIKE 'alice@%'Remember how B-trees sort values left-to-right? A leading wildcard means the database can't pick a starting point in the tree. It has to check every value, which is no better than a sequential scan.
OR across different columns
-- Slow: can't use a single index
WHERE status = 'active' OR region = 'US'
-- Fast: can use indexes separately
SELECT ... WHERE status = 'active'
UNION ALL
SELECT ... WHERE region = 'US' AND status != 'active'An OR between different columns prevents a single index lookup. The optimizer may have to fall back to a sequential scan. Rewriting as a UNION lets the database use one index for each branch.
Seeing What the Optimizer Chose
Every database lets you see the plan it chose. In Postgres:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;The output shows:
Index Scan using orders_customer_idx on orders
(cost=0.43..8.45 rows=5 width=56)
(actual time=0.025..0.031 rows=7 loops=1)
Index Cond: (customer_id = 42)
Planning Time: 0.085 ms
Execution Time: 0.052 ms
The most important thing to check is rows=5 vs actual rows=7. When the estimate and reality are close, the optimizer is making good decisions. When the estimate says rows=5 but the actual is rows=5000, you have a statistics problem. Run ANALYZE and try again.
Then check the operation type: did it use an Index Scan or a Seq Scan? For large joins, is it doing a Hash Join or a Nested Loop? If a Nested Loop is running against a large table, that's usually a sign of a cardinality misestimate.
When a query is slow, EXPLAIN ANALYZE is your first diagnostic tool.
Back to Your Slow Query
Remember that slow query from the beginning? Now you can actually diagnose it.
Run EXPLAIN ANALYZE. Look at the estimated rows vs actual rows. If they're wildly different, run ANALYZE and try again. Check whether it's using your index or doing a sequential scan. If it's scanning when it shouldn't be, check if your filter condition has a function wrapping the column, a type mismatch, or a leading wildcard.
The optimizer makes good decisions when it has good information. Your job is to give it that information: fresh statistics, sargable predicates, and indexes on the right columns. When something goes wrong, you now know where to look.
The query isn't slow because databases are mysterious. It's slow for a specific, diagnosable reason. You can find it.