Back to Blog

Build a Computer from NAND Gates

Part 5: Virtual Machine II - Program Control

In Part 4, we built a stack machine that evaluates arithmetic expressions. But real programs need more: loops, conditionals, and functions.

This part adds program control commands that transform our calculator into a complete virtual machine.

Branching commands

Three commands enable control flow:

CommandEffect
label LOOPMarks a location in code
goto LOOPUnconditional jump
if-goto LOOPJump if top of stack ≠ 0
Branching Commands: label, goto, if-goto
VM PROGRAM
0push constant 5
1pop local 0
2label LOOP
3push local 0
4push constant 0
5eq
6if-goto END
7push local 0
8push constant 1
9sub
10pop local 0
11goto LOOP
12label END
13push constant 42
PC
0
Counter
5
Stack
[]
EXECUTION LOG
Click Step to begin
Labels: LOOP → line 2, END → line 12.goto = unconditional jump, if-goto = jump if top of stack ≠ 0

How branching works

Labels are markers, not instructions. They don't generate any operations. They just name a location.

goto performs an unconditional jump. The program counter (PC) changes to the label's address.

if-goto pops the top of the stack:

  • If the value is not zero (true), jump to the label
  • If the value is zero (false), continue to the next instruction

This is enough to implement any control structure:

Compiling control structures

If-else:

// if (condition) { A } else { B }
[compute condition]
not
if-goto ELSE
[code for A]
goto END
label ELSE
[code for B]
label END

While loop:

// while (condition) { body }
label LOOP
[compute condition]
not
if-goto END
[loop body]
goto LOOP
label END

The compiler translates high-level control structures into these primitives.

Label scoping

Labels are scoped to functions. When the translator sees label LOOP inside function Main.foo, it generates the label Main.foo$LOOP. This prevents collisions when different functions use the same label names.

Function commands

Functions are the heart of structured programming. Our VM supports three commands:

CommandEffect
function f kDeclare function f with k local variables
call f nCall function f with n arguments
returnReturn to caller
Function Call Mechanism
CALL STACK
Main.main
locals: [10]
WORKING STACK
3
4
Initial State

Caller is about to call mult(3, 4)

Arguments 3 and 4 are on the stack
Frame structure (saved on call): return address, saved LCL, saved ARG, saved THIS, saved THAT

The calling convention

When a function is called, the VM must:

  1. Save the caller's state (return address, segment pointers)
  2. Set up the callee's environment (arguments, local variables)
  3. Transfer control to the called function

When a function returns:

  1. Restore the caller's state
  2. Push the return value where the caller expects it
  3. Transfer control back to the caller

The call frame

Each function call creates a frame on the stack:

┌─────────────────┐
│ argument 0      │ ← ARG points here
│ argument 1      │
│ ...             │
├─────────────────┤
│ return address  │ ← saved caller state
│ saved LCL       │
│ saved ARG       │
│ saved THIS      │
│ saved THAT      │
├─────────────────┤
│ local 0         │ ← LCL points here
│ local 1         │
│ ...             │
├─────────────────┤
│ working stack   │ ← SP points to top
└─────────────────┘

The five saved values (return address + 4 pointers) form the frame. This enables returning to the exact state before the call.

The call stack

Call Stack Frames
CALL STACK (1 frames)
FRAME DETAILS: Sys.init
Return Address
0
LCL
0
ARG
0
THIS / THAT
0 / 0
ARGUMENTS
(none)
LOCALS
(none)
Each function call creates a new frame. On return, the frame is popped and pointers restored.

Each function call pushes a new frame. Each return pops a frame. This creates a stack of frames, the call stack.

Why a stack?

The stack structure naturally supports:

  • Nested calls: f calls g calls h. The functions return in reverse order: h, then g, then f
  • Recursion: Each recursive call gets its own frame
  • Automatic cleanup: Returning pops the frame, freeing its memory

The call stack is one of computer science's most elegant data structures.

Translating function commands

Function Commands Translation
SELECT COMMAND
ASSEMBLY OUTPUT
@RETURN_1
D=A
@SP
A=M
M=D // push return addr
@SP
M=M+1
@LCL
D=M // push LCL
// ... push ARG, THIS, THAT
@SP
D=M
@7
D=D-A
@ARG
M=D // ARG = SP - 7
@SP
D=M
@LCL
M=D // LCL = SP
@Math.mult
0;JMP // jump to function
(RETURN_1)
call f n: Pushes return address + saved frame (5 values), repositions ARG and LCL, jumps to f

function f k

Declares a function with k local variables:

(f)              // Label for function entry
push constant 0  // Initialize local 0
push constant 0  // Initialize local 1
...              // k times total

The label (f) allows call f to jump here.

call f n

Calls function f with n arguments (already on stack):

push return-address  // Where to return
push LCL             // Save caller's LCL
push ARG             // Save caller's ARG
push THIS            // Save caller's THIS
push THAT            // Save caller's THAT
ARG = SP - 5 - n     // Position ARG for callee
LCL = SP             // Position LCL for callee
goto f               // Jump to function
(return-address)     // Label for return

The arguments were pushed before the call, so they're below the saved frame.

return

Returns to the caller:

endFrame = LCL                   // Save frame pointer
retAddr = *(endFrame - 5)        // Get return address
*ARG = pop()                     // Copy return value
SP = ARG + 1                     // Reposition stack
THAT = *(endFrame - 1)           // Restore THAT
THIS = *(endFrame - 2)           // Restore THIS
ARG = *(endFrame - 3)            // Restore ARG
LCL = *(endFrame - 4)            // Restore LCL
goto retAddr                     // Jump back to caller

The return value ends up at ARG[0], exactly where the caller expects it.

Recursion

The call stack naturally supports recursion. Each recursive call gets its own frame with its own locals and arguments:

Recursive Function Execution: Fibonacci
Step 1 / 26
EXECUTION TRACE
fib(4) called
CALL TREE
fib(4)
├─ fib(3)
│  ├─ fib(2)
│  │  ├─ ...
│  └─ fib(1)
└─ fib(2)
   └─ ...
Each recursive call pushes a new frame. The call stack grows with depth O(n) for fib(n).

Stack growth

Recursive functions grow the stack with each call. For fib(n):

  • Depth: O(n) stack frames
  • Work: O(2^n) total operations (exponential!)

Deep recursion can overflow the stack. Real systems set stack limits (typically a few MB).

Tail call optimization

Some recursion can be optimized. A tail call is when a function's last action is calling another function:

function factorial(n, acc)
  if n == 0 return acc
  return factorial(n-1, n*acc)  // Tail call

Since there's nothing to do after the call, we can reuse the current frame instead of creating a new one. Our simple VM doesn't implement this, but real VMs often do.

Bootstrap code

When a VM program starts, who calls the first function?

Bootstrap Code
BOOTSTRAP ASSEMBLY
// Bootstrap code
@256
D=A
@SP
M=D // SP = 256
@Sys.init
0;JMP // call Sys.init
INITIAL MEMORY STATE
R0SP256Stack starts at 256
R1LCL256Local (set by call)
R2ARG256Arguments (set by call)
R3THIS3000Heap pointer
R4THAT4000Heap pointer
The bootstrap code runs before any VM code. It initializes SP to 256 and calls Sys.init, which is the entry point of every VM program.

The bootstrap code runs before any VM code. It:

  1. Sets SP to 256 (stack base address)
  2. Calls Sys.init, the program's entry point

Sys.init is a special function that:

  • Performs any necessary initialization
  • Calls Main.main, your actual program
  • Enters an infinite loop when main returns

Memory layout summary

RAM[0]      SP       Stack pointer (starts at 256)
RAM[1]      LCL      Current function's local base
RAM[2]      ARG      Current function's argument base
RAM[3]      THIS     Current object base
RAM[4]      THAT     Current array base
RAM[5-12]   temp     Temp segment
RAM[13-15]  R13-R15  VM translator scratch space
RAM[16-255] static   Static variables
RAM[256+]   stack    Stack (grows upward)

A complete example

Let's trace a simple program with function calls:

// Main.vm
function Main.main 0
  push constant 3
  push constant 4
  call Math.multiply 2
  return

// Math.vm
function Math.multiply 1
  push constant 0
  pop local 0         // result = 0
  label LOOP
    push argument 1
    push constant 0
    eq
    if-goto END       // if b == 0, done
    push local 0
    push argument 0
    add
    pop local 0       // result += a
    push argument 1
    push constant 1
    sub
    pop argument 1    // b--
    goto LOOP
  label END
  push local 0        // return result
  return

Execution trace for 3 * 4:

  1. Main.main pushes 3 and 4
  2. Calls Math.multiply with 2 arguments
  3. Math.multiply loops 4 times, adding 3 each iteration
  4. Returns 12
  5. Main.main receives 12 on its stack

Error handling

The VM translator should catch:

Call errors:

  • Calling undefined function
  • Wrong number of arguments (can't always detect)

Return errors:

  • Return outside a function
  • Missing return at function end

Label errors:

  • Duplicate label in same function
  • Goto to undefined label

Good error messages include file name, line number, and context.

What we've built

Our VM is now complete:

CategoryCommands
Arithmeticadd, sub, neg, eq, gt, lt, and, or, not
Memorypush/pop segment index
Branchinglabel, goto, if-goto
Functionsfunction, call, return

This is a real virtual machine, similar in capability to the JVM or CLR. It can run any program that:

  • Uses stack-based evaluation
  • Follows our calling convention
  • Fits in available memory

Comparison with real VMs

FeatureOur VMJVMCLR
Stack-basedYesYesYes
Memory segments8ManyMany
Object supportVia THIS/THATNativeNative
Garbage collectionNoYesYes
Exception handlingNoYesYes
Bytecode verificationNoYesYes

Our VM is simpler but captures the essential ideas.

Next steps

We have hardware (Parts 1-2), an assembler (Part 3), and a VM (Parts 4-5). Now we need something to generate VM code.

In Part 6, we'll introduce Jack, a high-level programming language. It's object-based, like Java, but simpler to compile.

Parts 7-8 will build a compiler that translates Jack to VM code. Then Part 9 covers the operating system.

The layers of abstraction are building up:

Jack (high-level language)
     ↓ Compiler (Parts 7-8)
VM code
     ↓ VM Translator (Parts 4-5)
Assembly
     ↓ Assembler (Part 3)
Machine code
     ↓ CPU (Parts 1-2)
Hardware

Each layer hides complexity, letting programmers work at higher levels of abstraction.


Next: Part 6 introduces the Jack programming language.