Build a Computer from NAND Gates
Part 5: Virtual Machine II - Program Control
In Part 4, we built a stack machine that evaluates arithmetic expressions. But real programs need more: loops, conditionals, and functions.
This part adds program control commands that transform our calculator into a complete virtual machine.
Branching commands
Three commands enable control flow:
| Command | Effect |
|---|---|
label LOOP | Marks a location in code |
goto LOOP | Unconditional jump |
if-goto LOOP | Jump if top of stack ≠ 0 |
How branching works
Labels are markers, not instructions. They don't generate any operations. They just name a location.
goto performs an unconditional jump. The program counter (PC) changes to the label's address.
if-goto pops the top of the stack:
- If the value is not zero (true), jump to the label
- If the value is zero (false), continue to the next instruction
This is enough to implement any control structure:
Compiling control structures
If-else:
// if (condition) { A } else { B }
[compute condition]
not
if-goto ELSE
[code for A]
goto END
label ELSE
[code for B]
label END
While loop:
// while (condition) { body }
label LOOP
[compute condition]
not
if-goto END
[loop body]
goto LOOP
label END
The compiler translates high-level control structures into these primitives.
Label scoping
Labels are scoped to functions. When the translator sees label LOOP inside function Main.foo, it generates the label Main.foo$LOOP. This prevents collisions when different functions use the same label names.
Function commands
Functions are the heart of structured programming. Our VM supports three commands:
| Command | Effect |
|---|---|
function f k | Declare function f with k local variables |
call f n | Call function f with n arguments |
return | Return to caller |
Caller is about to call mult(3, 4)
The calling convention
When a function is called, the VM must:
- Save the caller's state (return address, segment pointers)
- Set up the callee's environment (arguments, local variables)
- Transfer control to the called function
When a function returns:
- Restore the caller's state
- Push the return value where the caller expects it
- Transfer control back to the caller
The call frame
Each function call creates a frame on the stack:
┌─────────────────┐
│ argument 0 │ ← ARG points here
│ argument 1 │
│ ... │
├─────────────────┤
│ return address │ ← saved caller state
│ saved LCL │
│ saved ARG │
│ saved THIS │
│ saved THAT │
├─────────────────┤
│ local 0 │ ← LCL points here
│ local 1 │
│ ... │
├─────────────────┤
│ working stack │ ← SP points to top
└─────────────────┘
The five saved values (return address + 4 pointers) form the frame. This enables returning to the exact state before the call.
The call stack
Each function call pushes a new frame. Each return pops a frame. This creates a stack of frames, the call stack.
Why a stack?
The stack structure naturally supports:
- Nested calls:
fcallsgcallsh. The functions return in reverse order: h, then g, then f - Recursion: Each recursive call gets its own frame
- Automatic cleanup: Returning pops the frame, freeing its memory
The call stack is one of computer science's most elegant data structures.
Translating function commands
function f k
Declares a function with k local variables:
(f) // Label for function entry
push constant 0 // Initialize local 0
push constant 0 // Initialize local 1
... // k times total
The label (f) allows call f to jump here.
call f n
Calls function f with n arguments (already on stack):
push return-address // Where to return
push LCL // Save caller's LCL
push ARG // Save caller's ARG
push THIS // Save caller's THIS
push THAT // Save caller's THAT
ARG = SP - 5 - n // Position ARG for callee
LCL = SP // Position LCL for callee
goto f // Jump to function
(return-address) // Label for return
The arguments were pushed before the call, so they're below the saved frame.
return
Returns to the caller:
endFrame = LCL // Save frame pointer
retAddr = *(endFrame - 5) // Get return address
*ARG = pop() // Copy return value
SP = ARG + 1 // Reposition stack
THAT = *(endFrame - 1) // Restore THAT
THIS = *(endFrame - 2) // Restore THIS
ARG = *(endFrame - 3) // Restore ARG
LCL = *(endFrame - 4) // Restore LCL
goto retAddr // Jump back to caller
The return value ends up at ARG[0], exactly where the caller expects it.
Recursion
The call stack naturally supports recursion. Each recursive call gets its own frame with its own locals and arguments:
fib(4) ├─ fib(3) │ ├─ fib(2) │ │ ├─ ... │ └─ fib(1) └─ fib(2) └─ ...
Stack growth
Recursive functions grow the stack with each call. For fib(n):
- Depth: O(n) stack frames
- Work: O(2^n) total operations (exponential!)
Deep recursion can overflow the stack. Real systems set stack limits (typically a few MB).
Tail call optimization
Some recursion can be optimized. A tail call is when a function's last action is calling another function:
function factorial(n, acc)
if n == 0 return acc
return factorial(n-1, n*acc) // Tail call
Since there's nothing to do after the call, we can reuse the current frame instead of creating a new one. Our simple VM doesn't implement this, but real VMs often do.
Bootstrap code
When a VM program starts, who calls the first function?
Sys.init, which is the entry point of every VM program.The bootstrap code runs before any VM code. It:
- Sets SP to 256 (stack base address)
- Calls
Sys.init, the program's entry point
Sys.init is a special function that:
- Performs any necessary initialization
- Calls
Main.main, your actual program - Enters an infinite loop when main returns
Memory layout summary
RAM[0] SP Stack pointer (starts at 256)
RAM[1] LCL Current function's local base
RAM[2] ARG Current function's argument base
RAM[3] THIS Current object base
RAM[4] THAT Current array base
RAM[5-12] temp Temp segment
RAM[13-15] R13-R15 VM translator scratch space
RAM[16-255] static Static variables
RAM[256+] stack Stack (grows upward)
A complete example
Let's trace a simple program with function calls:
// Main.vm
function Main.main 0
push constant 3
push constant 4
call Math.multiply 2
return
// Math.vm
function Math.multiply 1
push constant 0
pop local 0 // result = 0
label LOOP
push argument 1
push constant 0
eq
if-goto END // if b == 0, done
push local 0
push argument 0
add
pop local 0 // result += a
push argument 1
push constant 1
sub
pop argument 1 // b--
goto LOOP
label END
push local 0 // return result
return
Execution trace for 3 * 4:
Main.mainpushes 3 and 4- Calls
Math.multiplywith 2 arguments Math.multiplyloops 4 times, adding 3 each iteration- Returns 12
Main.mainreceives 12 on its stack
Error handling
The VM translator should catch:
Call errors:
- Calling undefined function
- Wrong number of arguments (can't always detect)
Return errors:
- Return outside a function
- Missing return at function end
Label errors:
- Duplicate label in same function
- Goto to undefined label
Good error messages include file name, line number, and context.
What we've built
Our VM is now complete:
| Category | Commands |
|---|---|
| Arithmetic | add, sub, neg, eq, gt, lt, and, or, not |
| Memory | push/pop segment index |
| Branching | label, goto, if-goto |
| Functions | function, call, return |
This is a real virtual machine, similar in capability to the JVM or CLR. It can run any program that:
- Uses stack-based evaluation
- Follows our calling convention
- Fits in available memory
Comparison with real VMs
| Feature | Our VM | JVM | CLR |
|---|---|---|---|
| Stack-based | Yes | Yes | Yes |
| Memory segments | 8 | Many | Many |
| Object support | Via THIS/THAT | Native | Native |
| Garbage collection | No | Yes | Yes |
| Exception handling | No | Yes | Yes |
| Bytecode verification | No | Yes | Yes |
Our VM is simpler but captures the essential ideas.
Next steps
We have hardware (Parts 1-2), an assembler (Part 3), and a VM (Parts 4-5). Now we need something to generate VM code.
In Part 6, we'll introduce Jack, a high-level programming language. It's object-based, like Java, but simpler to compile.
Parts 7-8 will build a compiler that translates Jack to VM code. Then Part 9 covers the operating system.
The layers of abstraction are building up:
Jack (high-level language)
↓ Compiler (Parts 7-8)
VM code
↓ VM Translator (Parts 4-5)
Assembly
↓ Assembler (Part 3)
Machine code
↓ CPU (Parts 1-2)
Hardware
Each layer hides complexity, letting programmers work at higher levels of abstraction.
Next: Part 6 introduces the Jack programming language.