Build a Computer from NAND Gates
Part 4: Virtual Machine I - Stack Arithmetic
We've built the hardware, designed the CPU, and created an assembler. But writing assembly code directly is still tedious. Modern compilers don't generate assembly. They generate code for an intermediate virtual machine.
A virtual machine (VM) is an abstraction layer between high-level languages and machine code. It provides:
- Portability: The same VM code runs on different hardware
- Simplicity: Easier to target than raw assembly
- Safety: Controlled execution environment
Let's build one.
The stack machine model
Our VM uses a stack as its primary data structure. Instead of registers, values are pushed onto and popped off a stack.
Push the constant 7 onto the stack
Why a stack?
Stack-based VMs are elegant because:
- No register allocation: Compilers don't need to decide which register holds which value
- Simple instructions:
addmeans "pop two values, push their sum" - Natural for expressions:
(a + b) * cbecomespush a, push b, add, push c, mul
The Java Virtual Machine (JVM), Python bytecode, and WebAssembly all use stack-based architectures.
The stack pointer
The stack lives in RAM. A special register called SP (Stack Pointer) tracks where the top of the stack is:
- Push stores a value at RAM[SP] and increments SP
- Pop decrements SP and reads from RAM[SP]
The stack grows upward in memory. Initially SP = 256, leaving room for segment pointers below.
VM commands
Our VM supports three types of commands:
1. Arithmetic commands
These operate on values at the top of the stack:
Binary operations pop two values and push the result:
add: x + ysub: x - y (note: second value minus first)and: bitwise ANDor: bitwise OReq: -1 if x = y, else 0gt: -1 if x > y, else 0lt: -1 if x < y, else 0
Unary operations pop one value and push the result:
neg: -x (arithmetic negation)not: ~x (bitwise NOT)
In our VM, true is represented as -1 (all bits set) and false as 0.
2. Push commands
Push commands add values to the stack. The syntax is:
push segment index
Where segment indicates where to get the value from, and index specifies which element within that segment.
3. Pop commands
Pop commands remove the top value and store it somewhere:
pop segment index
You can't pop to constant (constants are read-only).
Memory segments
The VM divides memory into segments, each serving a specific purpose:
| Segment | Base | Purpose |
|---|---|---|
| local | LCL | Local variables |
| argument | ARG | Function arguments |
| this | THIS | Object fields |
| that | THAT | Array elements |
| constant | - | Virtual segment for constants |
| temp | R5-R12 | Temporary storage |
| pointer | R3-R4 | THIS/THAT pointers |
| static | R16+ | Static variables |
The eight segments
| Segment | Purpose | Implementation |
|---|---|---|
| constant | Virtual segment for constants | push only, value = index |
| local | Function's local variables | base at LCL pointer |
| argument | Function's arguments | base at ARG pointer |
| this | Current object's fields | base at THIS pointer |
| that | Array elements | base at THAT pointer |
| temp | Temporary storage | fixed at RAM[5-12] |
| pointer | THIS/THAT pointers | RAM[3] = this, RAM[4] = that |
| static | Static variables | RAM[16-255], named by file |
Segment implementation
Pointer-based segments (local, argument, this, that):
- The segment's base address is stored in a pointer (LCL, ARG, THIS, THAT)
push local 2reads RAM[LCL + 2]pop argument 0writes to RAM[ARG + 0]
Fixed segments (temp, pointer):
tempis at RAM[5-12], sopush temp 3reads RAM[8]pointer 0is THIS (RAM[3]),pointer 1is THAT (RAM[4])
Static segment:
- Each file gets its own static variables
push static 5in File.vm refers to a variable namedFile.5- These are allocated in RAM[16-255]
Constant segment:
- Virtual (no actual memory location)
push constant 17just pushes the value 17
Why segments matter
Segments separate concerns:
- local and argument support function calls
- this and that support object-oriented programming
- static supports class-level variables
- temp is scratch space for the compiler
A compiler targeting this VM can focus on logic, not memory layout.
Translating VM to assembly
The VM translator converts each VM command into Hack assembly:
Translation patterns
push constant n:
@n
D=A
@SP
A=M
M=D
@SP
M=M+1
This loads n into D, stores it at RAM[SP], and increments SP.
push local i:
@LCL
D=M
@i
A=D+A
D=M
@SP
A=M
M=D
@SP
M=M+1
This computes LCL + i, reads from that address, and pushes the value.
pop local i:
@LCL
D=M
@i
D=D+A
@R13
M=D // R13 = target address
@SP
AM=M-1
D=M // D = popped value
@R13
A=M
M=D // RAM[target] = D
Pop is trickier because we need to compute the target address before popping (the stack pointer changes during the pop).
add:
@SP
AM=M-1
D=M
A=A-1
M=D+M
This pops into D, points to the new top, and adds D to it.
Comparison commands
Comparisons like eq, gt, lt need conditional jumps:
// eq
@SP
AM=M-1
D=M
A=A-1
D=M-D
@TRUE_0
D;JEQ
@SP
A=M-1
M=0
@END_0
0;JMP
(TRUE_0)
@SP
A=M-1
M=-1
(END_0)
This compares the top two values and pushes -1 (true) or 0 (false).
Stack trace visualization
Let's trace through a complete program:
Watch how each command transforms the stack. The trace shows:
- The command being executed
- The stack state after execution
This step-by-step execution helps you understand exactly what the VM does.
Example: Evaluating an expression
Let's trace (3 + 4) * 2:
push constant 3 // stack: [3]
push constant 4 // stack: [3, 4]
add // stack: [7]
push constant 2 // stack: [7, 2]
mul // stack: [14]
The compiler translates the expression into postfix (reverse Polish notation), then each operation pops its arguments and pushes the result.
Stack-based expression evaluation
For any arithmetic expression:
- Convert to postfix notation
- For each token:
- If it's a number, push it
- If it's an operator, pop arguments, compute, push result
This is exactly what our VM does.
Memory layout
Here's how memory is organized:
RAM[0] SP Stack pointer
RAM[1] LCL Local segment base
RAM[2] ARG Argument segment base
RAM[3] THIS This segment base
RAM[4] THAT That segment base
RAM[5-12] temp Temp segment
RAM[13-15] R13-R15 General purpose (used by VM translator)
RAM[16-255] static Static variables
RAM[256+] stack Stack (grows upward)
The segment pointers (LCL, ARG, THIS, THAT) are set by function call/return commands (covered in Part 5).
Error handling
The VM translator should catch:
Syntax errors:
- Unknown command:
multiply(should bemul... wait, we don't havemuleither!) - Missing arguments:
push constant(missing index) - Invalid segment:
push register 0
Semantic errors:
- Pop to constant:
pop constant 0(constants are read-only) - Invalid temp index:
push temp 10(temp only has 8 slots) - Invalid pointer index:
push pointer 2(pointer only has 0 and 1)
A good translator provides clear error messages with line numbers.
What's next
In Part 5, we'll add:
- Branching:
label,goto,if-gotofor control flow - Functions:
function,call,returnfor subroutines - The call stack: Managing frames for recursive functions
These additions transform our simple stack calculator into a complete virtual machine capable of running real programs.
Summary
In this part, we built:
- Stack machine: Push/pop architecture for expression evaluation
- Memory segments: local, argument, this, that, temp, pointer, static, constant
- Arithmetic commands: add, sub, neg, eq, gt, lt, and, or, not
- VM translator: Converts VM commands to Hack assembly
The VM provides a clean abstraction layer. A compiler targeting this VM doesn't need to worry about:
- Register allocation
- Memory management
- Hardware-specific details
It just generates push, pop, and arithmetic commands. The VM translator handles the rest.
Next: Part 5 covers branching and function calls, completing our virtual machine implementation.