How JIT compilation works
If you've ever benchmarked JavaScript, you've noticed something strange: the first few iterations are sluggish, then suddenly everything speeds up. You're running the same code on the same machine, but it's 5-10x faster.
Paste this into your browser console:
function compute(n) {
let total = 0;
for (let i = 0; i < n; i++) total += i * i;
return total;
}
for (let run = 1; run <= 7; run++) {
const start = performance.now();
compute(1_000_000);
console.log(`Run ${run}: ${(performance.now() - start).toFixed(1)}ms`);
}You'll see something like:
Run 1: 4.8ms ← cold start
Run 2: 1.1ms ← JIT kicks in
Run 3: 1.2ms
Run 4: 0.7ms ← fully optimized
Run 5: 0.7ms
Run 6: 0.8ms
Run 7: 0.7ms
The first run is slow. Then suddenly, it gets faster. This speedup is the result of Just-In-Time (JIT) compilation. Here's what's actually happening.
Why interpreters are slow
When you write code in Python or JavaScript, your computer doesn't run it directly. CPUs only understand machine code, raw binary instructions. So your code first gets translated into an intermediate form called bytecode.
Bytecode is a sequence of simple instructions like PUSH 5 or ADD. The interpreter reads these one at a time and executes them. Here's what happens to an expression like (5 + 3) * 2:
- The parser breaks it into tokens:
(,5,+,3,),*,2 - It builds a tree representing the structure: multiply (add 5 and 3) by 2
- The compiler walks that tree and emits bytecode instructions
Step through the tree walk below. The compiler visits each node in post-order (left child, right child, then current node) and emits an instruction:
Most interpreters use a stack to execute this bytecode. PUSH 5 puts 5 on the stack. ADD pops two values, adds them, and pushes the result. Try different expressions and step through the execution:
For every single instruction, the interpreter must fetch the next bytecode from memory, decode what operation it represents, dispatch to the handler for that operation, and execute the actual computation. That's a lot of overhead just to add two numbers. The CPU spends more time figuring out what to do than actually doing it.
If interpreting is so slow, why not compile everything to machine code upfront, like C does?
There are three problems with this approach. First, startup time explodes. Your program has thousands of functions, and compiling them all before running means waiting seconds or minutes before anything happens. Second, most code never runs. Error handlers, edge cases, initialization code: maybe 90% of your codebase runs rarely or never, so compiling it is wasted effort. Third, dynamic features break. In JavaScript, a function's behavior can change at runtime. How do you compile something that doesn't exist yet?
What if we could start fast like an interpreter, but eventually run fast like compiled code? We just need to be smart about what we compile and when.
JIT compilation solves this by observing first and compiling later.
Finding hot code
Before compiling, the JIT needs to identify which code is worth optimizing. It does this by counting how many times each function runs.
The interpreter maintains a call counter for each function. When the counter crosses a threshold, that function is marked as "hot" and queued for compilation. V8 (Chrome's JavaScript engine) might compile after ~1000 calls, PyPy (an alternative Python runtime) after ~1000 loop iterations. The exact numbers vary, but every JIT only compiles code that actually runs frequently.
This is called profiling, and every JIT compiler does it.
From bytecode to machine code
Once code is hot, the JIT compiles it through several stages. Take a simple function:
function add(a, b) {
return a + b;
}First, it becomes bytecode:
LOAD_ARG 0
LOAD_ARG 1
ADD
RETURN
The profiler records type information from actual calls:
add(1, 2) → int, int
add(3, 4) → int, int
add(5, 6) → int, int
With this data, the optimizer generates specialized intermediate representation (IR). Unlike the generic bytecode, this IR assumes both arguments are integers:
LoadInt(arg0)
LoadInt(arg1)
AddInt32
Return
There are no type checks, no dispatch tables, just integer operations. The optimizer can now apply further transformations: inlining, constant folding, dead code elimination.
Finally, it emits native machine code. Here's what it might look like in x86 assembly:
mov eax, [rdi] ; load first argument into register
add eax, [rsi] ; add second argument to it
ret ; return the result in eaxThat's just three instructions, with no interpreter loop, no bytecode dispatch, and no type checks. The CPU executes this directly, at full speed.
This is fundamentally different from ahead-of-time compilation. A C compiler must generate code that handles every possible case. A JIT compiler generates code for the common case, because it knows what the common case is from the profiling data.
Tiered compilation
Compilation takes time. If a function only runs 5 times, spending 100ms compiling it to save 0.1ms per call is a terrible trade. But if it runs a million times, that 100ms investment pays off 10,000x.
Instead of one compiler, you build several. Each tier trades compilation speed for code quality:
| Tier | When to use | Compile time | Execution speed |
|---|---|---|---|
| Interpreter | First few calls | None | 1x (baseline) |
| Baseline JIT | After ~10 calls | Fast (~1ms) | ~3x faster |
| Optimizing JIT | After ~100+ calls | Slow (~10-100ms) | ~10x faster |
Most functions stabilize at the baseline tier. Only the truly hot inner loops justify the full optimizing pass. V8 calls its tiers Ignition (interpreter), Sparkplug (baseline), and TurboFan (optimizing). All of them defer expensive work until they're certain it's worth it.
Type specialization
In a dynamic language, every operation must check types. Even a simple a + b could mean integer addition, float addition, string concatenation, or a custom __add__ method. How do we eliminate these checks?
Profiling data tells us both how often code runs and what types we see. If the JIT observes consistent types:
add(1, 2) // int
add(3, 4) // int
add(5, 6) // int
It generates a fast path with no type checks:
mov eax, [a]
add eax, [b]
retBut if types are mixed:
add(1, 2) // int
add("a", "b") // string
add(1.5, 2.5) // float
The JIT must add checks before every operation:
check_type(a)
if int: add_int()
if str: concat()
if float: add_float()
ret
That's three instructions versus six, so keeping your types consistent makes a real difference.
Deoptimization
We've generated specialized code assuming integers. Then someone calls add("hello", "world"). What now?
Our optimized code would produce garbage (or crash) if we ran integer addition on strings, so the engine needs a way to bail out.
The engine handles this through deoptimization. When type assumptions break, it bails out of optimized code and falls back to the interpreter. It stops the optimized code mid-execution, reconstructs the interpreter's state (stack frames, variables, program counter), resumes in the interpreter at the correct point, and potentially re-profiles and re-compiles with new type information.
Paste this into your console:
function add(a, b) {
return a + b;
}
for (let i = 0; i < 100000; i++) add(i, i);
for (let run = 1; run <= 10; run++) {
if (run === 6) add("x", "y");
const start = performance.now();
for (let i = 0; i < 10_000_000; i++) add(1, 2);
console.log(`Run ${run}: ${(performance.now() - start).toFixed(1)}ms`);
}You'll see something like:
Run 1: 10.2ms ← cold start
Run 2: 6.5ms ← warming up
Run 3: 2.8ms ← optimized
Run 4: 2.8ms
Run 5: 2.8ms
Run 6: 5.6ms ← deopt spike
Run 7: 2.7ms ← re-optimized
Run 8: 2.8ms
Run 9: 2.8ms
Run 10: 2.9ms
Run 6 spikes because the string call forced the engine to bail out and recompile. V8 recovers quickly, but in a tight loop that spike adds up.
This is why "type pollution" can destroy performance. A single call with the wrong type can invalidate the entire optimized version of a function, forcing future compilations to be more conservative.
Summary
Four techniques turn a slow interpreter into something that rivals ahead-of-time compiled languages. The JIT counts executions to identify hot code, then applies compilation effort proportional to how often code runs. It generates specialized code for the types it observes, and falls back to the interpreter when type assumptions break.
This is how JavaScript runs nearly as fast as C in benchmarks, how PyPy makes Python 10x faster, and how the JVM turns bytecode into native speed.
The next time your code runs slow at first and then speeds up, you'll know why.
Real-world JIT engines
Every major language runtime uses these techniques:
| Engine | Language | Tiers |
|---|---|---|
| V8 | JavaScript (Chrome, Node) | Ignition → Sparkplug → TurboFan |
| SpiderMonkey | JavaScript (Firefox) | Baseline → WarpMonkey |
| JavaScriptCore | JavaScript (Safari) | LLInt → Baseline → DFG → FTL |
| PyPy | Python | Tracing JIT (compiles loops) |
| LuaJIT | Lua | Tracing JIT (often faster than C) |
| HotSpot | Java | C1 (client) → C2 (server) |
| GraalVM | Java, JS, Python, Ruby | Partial evaluation + Truffle |
The implementations vary, but they all watch what actually happens and then optimize for that.