Back to Blog

How JIT compilation works

If you've ever benchmarked JavaScript, you've noticed something strange: the first few iterations are sluggish, then suddenly everything speeds up. You're running the same code on the same machine, but it's 5-10x faster.

Paste this into your browser console:

function compute(n) {
  let total = 0;
  for (let i = 0; i < n; i++) total += i * i;
  return total;
}
 
for (let run = 1; run <= 7; run++) {
  const start = performance.now();
  compute(1_000_000);
  console.log(`Run ${run}: ${(performance.now() - start).toFixed(1)}ms`);
}

You'll see something like:

Run 1:  4.8ms  ← cold start
Run 2:  1.1ms  ← JIT kicks in
Run 3:  1.2ms
Run 4:  0.7ms  ← fully optimized
Run 5:  0.7ms
Run 6:  0.8ms
Run 7:  0.7ms

The first run is slow. Then suddenly, it gets faster. This speedup is the result of Just-In-Time (JIT) compilation. Here's what's actually happening.

Why interpreters are slow

When you write code in Python or JavaScript, your computer doesn't run it directly. CPUs only understand machine code, raw binary instructions. So your code first gets translated into an intermediate form called bytecode.

Bytecode is a sequence of simple instructions like PUSH 5 or ADD. The interpreter reads these one at a time and executes them. Here's what happens to an expression like (5 + 3) * 2:

  1. The parser breaks it into tokens: (, 5, +, 3, ), *, 2
  2. It builds a tree representing the structure: multiply (add 5 and 3) by 2
  3. The compiler walks that tree and emits bytecode instructions

Step through the tree walk below. The compiler visits each node in post-order (left child, right child, then current node) and emits an instruction:

Compiling the Tree
*+532
Walk the tree in post-order (left, right, then emit)
Bytecode:
(step to emit)

Most interpreters use a stack to execute this bytecode. PUSH 5 puts 5 on the stack. ADD pops two values, adds them, and pushes the result. Try different expressions and step through the execution:

Expression → Bytecode → Execution
0PUSH5
1PUSH3
2ADD
3PUSH2
4MUL
Stack:
(empty)

For every single instruction, the interpreter must fetch the next bytecode from memory, decode what operation it represents, dispatch to the handler for that operation, and execute the actual computation. That's a lot of overhead just to add two numbers. The CPU spends more time figuring out what to do than actually doing it.

If interpreting is so slow, why not compile everything to machine code upfront, like C does?

There are three problems with this approach. First, startup time explodes. Your program has thousands of functions, and compiling them all before running means waiting seconds or minutes before anything happens. Second, most code never runs. Error handlers, edge cases, initialization code: maybe 90% of your codebase runs rarely or never, so compiling it is wasted effort. Third, dynamic features break. In JavaScript, a function's behavior can change at runtime. How do you compile something that doesn't exist yet?

What if we could start fast like an interpreter, but eventually run fast like compiled code? We just need to be smart about what we compile and when.

JIT compilation solves this by observing first and compiling later.

Finding hot code

Before compiling, the JIT needs to identify which code is worth optimizing. It does this by counting how many times each function runs.

The interpreter maintains a call counter for each function. When the counter crosses a threshold, that function is marked as "hot" and queued for compilation. V8 (Chrome's JavaScript engine) might compile after ~1000 calls, PyPy (an alternative Python runtime) after ~1000 loop iterations. The exact numbers vary, but every JIT only compiles code that actually runs frequently.

This is called profiling, and every JIT compiler does it.

From bytecode to machine code

Once code is hot, the JIT compiles it through several stages. Take a simple function:

function add(a, b) {
  return a + b;
}

First, it becomes bytecode:

LOAD_ARG 0
LOAD_ARG 1
ADD
RETURN

The profiler records type information from actual calls:

add(1, 2) → int, int
add(3, 4) → int, int
add(5, 6) → int, int

With this data, the optimizer generates specialized intermediate representation (IR). Unlike the generic bytecode, this IR assumes both arguments are integers:

LoadInt(arg0)
LoadInt(arg1)
AddInt32
Return

There are no type checks, no dispatch tables, just integer operations. The optimizer can now apply further transformations: inlining, constant folding, dead code elimination.

Finally, it emits native machine code. Here's what it might look like in x86 assembly:

mov eax, [rdi]   ; load first argument into register
add eax, [rsi]   ; add second argument to it
ret              ; return the result in eax

That's just three instructions, with no interpreter loop, no bytecode dispatch, and no type checks. The CPU executes this directly, at full speed.

This is fundamentally different from ahead-of-time compilation. A C compiler must generate code that handles every possible case. A JIT compiler generates code for the common case, because it knows what the common case is from the profiling data.

Tiered compilation

Compilation takes time. If a function only runs 5 times, spending 100ms compiling it to save 0.1ms per call is a terrible trade. But if it runs a million times, that 100ms investment pays off 10,000x.

Instead of one compiler, you build several. Each tier trades compilation speed for code quality:

TierWhen to useCompile timeExecution speed
InterpreterFirst few callsNone1x (baseline)
Baseline JITAfter ~10 callsFast (~1ms)~3x faster
Optimizing JITAfter ~100+ callsSlow (~10-100ms)~10x faster

Most functions stabilize at the baseline tier. Only the truly hot inner loops justify the full optimizing pass. V8 calls its tiers Ignition (interpreter), Sparkplug (baseline), and TurboFan (optimizing). All of them defer expensive work until they're certain it's worth it.

Type specialization

In a dynamic language, every operation must check types. Even a simple a + b could mean integer addition, float addition, string concatenation, or a custom __add__ method. How do we eliminate these checks?

Profiling data tells us both how often code runs and what types we see. If the JIT observes consistent types:

add(1, 2)   // int
add(3, 4)   // int
add(5, 6)   // int

It generates a fast path with no type checks:

mov eax, [a]
add eax, [b]
ret

But if types are mixed:

add(1, 2)     // int
add("a", "b") // string
add(1.5, 2.5) // float

The JIT must add checks before every operation:

check_type(a)
if int: add_int()
if str: concat()
if float: add_float()
ret

That's three instructions versus six, so keeping your types consistent makes a real difference.

Deoptimization

We've generated specialized code assuming integers. Then someone calls add("hello", "world"). What now?

Our optimized code would produce garbage (or crash) if we ran integer addition on strings, so the engine needs a way to bail out.

The engine handles this through deoptimization. When type assumptions break, it bails out of optimized code and falls back to the interpreter. It stops the optimized code mid-execution, reconstructs the interpreter's state (stack frames, variables, program counter), resumes in the interpreter at the correct point, and potentially re-profiles and re-compiles with new type information.

Paste this into your console:

function add(a, b) {
  return a + b;
}
 
for (let i = 0; i < 100000; i++) add(i, i);
 
for (let run = 1; run <= 10; run++) {
  if (run === 6) add("x", "y");
  const start = performance.now();
  for (let i = 0; i < 10_000_000; i++) add(1, 2);
  console.log(`Run ${run}: ${(performance.now() - start).toFixed(1)}ms`);
}

You'll see something like:

Run 1:  10.2ms  ← cold start
Run 2:   6.5ms  ← warming up
Run 3:   2.8ms  ← optimized
Run 4:   2.8ms
Run 5:   2.8ms
Run 6:   5.6ms  ← deopt spike
Run 7:   2.7ms  ← re-optimized
Run 8:   2.8ms
Run 9:   2.8ms
Run 10:  2.9ms

Run 6 spikes because the string call forced the engine to bail out and recompile. V8 recovers quickly, but in a tight loop that spike adds up.

This is why "type pollution" can destroy performance. A single call with the wrong type can invalidate the entire optimized version of a function, forcing future compilations to be more conservative.

Summary

Four techniques turn a slow interpreter into something that rivals ahead-of-time compiled languages. The JIT counts executions to identify hot code, then applies compilation effort proportional to how often code runs. It generates specialized code for the types it observes, and falls back to the interpreter when type assumptions break.

This is how JavaScript runs nearly as fast as C in benchmarks, how PyPy makes Python 10x faster, and how the JVM turns bytecode into native speed.

The next time your code runs slow at first and then speeds up, you'll know why.

Real-world JIT engines

Every major language runtime uses these techniques:

EngineLanguageTiers
V8JavaScript (Chrome, Node)Ignition → Sparkplug → TurboFan
SpiderMonkeyJavaScript (Firefox)Baseline → WarpMonkey
JavaScriptCoreJavaScript (Safari)LLInt → Baseline → DFG → FTL
PyPyPythonTracing JIT (compiles loops)
LuaJITLuaTracing JIT (often faster than C)
HotSpotJavaC1 (client) → C2 (server)
GraalVMJava, JS, Python, RubyPartial evaluation + Truffle

The implementations vary, but they all watch what actually happens and then optimize for that.