Build a Computer from NAND Gates
Part 8: Compiler II - Code Generation
The frontend gave us an AST. Now we traverse it and emit VM code.
Code generation is surprisingly systematic: each AST node type maps to a specific pattern of VM instructions.
Compiling expressions
Expressions evaluate to a value left on the stack:
The pattern
For any expression:
- Recursively compile sub-expressions (they push results)
- Emit the operation (which pops operands and pushes result)
This naturally produces postfix evaluation:
x + 5→push x, push 5, addx + y * z→push x, push y, add, push z, multiply(Jack is left-to-right)
Term compilation
Each term type has its own code:
| Term | VM Code |
|---|---|
| Integer constant | push constant n |
| Variable x | push segment index (from symbol table) |
| true | push constant 0 then not (yields -1) |
| false/null | push constant 0 |
| this | push pointer 0 |
| -expr | compile expr, then neg |
| ~expr | compile expr, then not |
| (expr) | compile expr (parens are just grouping) |
Operator compilation
Binary operators pop two values, push one:
| Operator | VM Code |
|---|---|
+ | add |
- | sub |
& | and |
| | or |
< | lt |
> | gt |
= | eq |
| * | call Math.multiply 2 |
| / | call Math.divide 2 |
Note: * and / aren't primitive VM operations. We call OS functions.
Compiling statements
let x = 5 + 3;
let statement
let varName = expression;
- Compile the expression (pushes value)
- Pop into the variable's segment
// let x = 5 + 3;
push constant 5
push constant 3
add
pop local 0 // x
let with array index
let arr[i] = value;
Array assignment is trickier:
- Compute target address:
arr + i - Compile the value expression
- Store value at computed address via THAT
push arr // base address
push i // index
add // arr + i
// compile value...
pop temp 0 // save value temporarily
pop pointer 1 // THAT = arr + i
push temp 0 // retrieve value
pop that 0 // arr[i] = value
We need temp because both the address and value are on the stack.
if statement
if (condition) { then } else { else }
Generates:
[compile condition]
not
if-goto ELSE_n
[compile then statements]
goto END_n
label ELSE_n
[compile else statements]
label END_n
The not inverts the condition because if-goto jumps on true, but we want to skip to ELSE on false.
while statement
while (condition) { body }
Generates:
label LOOP_n
[compile condition]
not
if-goto END_n
[compile body statements]
goto LOOP_n
label END_n
do statement
do subroutineCall;
Compiles the call (which pushes a return value), then discards it:
[compile call]
pop temp 0 // discard return value
Even void functions push 0, so we always pop.
return statement
return expression; // or just: return;
[compile expression, or push constant 0 for void]
return
Compiling subroutines
function int add(int a, int b) {
return a + b;
}Functions
Functions are the simplest. Arguments are already in the argument segment:
function ClassName.funcName nLocals
[compile body]
nLocals is the count of local variables.
Methods
Methods receive the object as argument 0. First thing: set THIS:
function ClassName.methodName nLocals
push argument 0 // "this" was passed
pop pointer 0 // THIS = argument 0
[compile body]
Now this 0 refers to field 0 of the object.
Constructors
Constructors allocate memory and initialize fields:
function ClassName.new nLocals
push constant nFields
call Memory.alloc 1
pop pointer 0 // THIS = new object
[compile initialization statements]
push pointer 0 // return this
return
The caller gets back a pointer to the new object.
Compiling method calls
Call types
Function call: ClassName.funcName(args)
[push args]
call ClassName.funcName nArgs
Method call on object: obj.methodName(args)
push obj // object becomes "this"
[push args]
call ClassName.methodName nArgs+1
Method call on this: methodName(args) (inside a method)
push pointer 0 // current object
[push args]
call ClassName.methodName nArgs+1
The key insight: method calls pass the object as an extra first argument.
Resolving call targets
When compiling target.method(args):
- Look up
targetin symbol table - If found → it's an object variable → method call
- If not found →
targetis a class name → function call
This determines whether to push the object and increment nArgs.
Special patterns
String constants
Jack strings are objects. "hello" compiles to:
push constant 5 // length
call String.new 1 // create String object
push constant 104 // 'h'
call String.appendChar 2
push constant 101 // 'e'
call String.appendChar 2
// ... and so on
Each character's ASCII code is appended.
Multiplication and division
The VM has no * or / instructions. We use OS calls:
// x * y
push x
push y
call Math.multiply 2
// x / y
push x
push y
call Math.divide 2
The OS implements these using addition and bit shifting.
Boolean constants
true → push constant 0; not // yields -1
false → push constant 0
null → push constant 0
In Jack, -1 (all bits set) is true, 0 is false.
Complete compilation
The code generator algorithm
compileClass(classNode):
for each classVarDec:
add to symbol table (static or field)
for each subroutineDec:
compileSubroutine(subroutineDec)
compileSubroutine(sub):
reset subroutine symbol table
add parameters to symbol table
add local variables to symbol table
emit: function ClassName.subName nLocals
if constructor:
emit: allocate memory, set THIS
if method:
emit: set THIS from argument 0
compileStatements(body.statements)
compileStatements(statements):
for each statement:
compileStatement(statement)
compileExpression(expr):
compileTerm(expr.term)
for each (op, term) in expr.ops:
compileTerm(term)
emitOp(op)
Label generation
Control flow needs unique labels. Use a counter:
labelCounter = 0
newLabel(prefix):
return prefix + "_" + labelCounter++
Generates: IF_ELSE_0, IF_END_0, WHILE_LOOP_1, etc.
The complete pipeline
From Jack to hardware
Jack Source Code
↓ Tokenizer
Token Stream
↓ Parser
Abstract Syntax Tree
↓ Code Generator
VM Code
↓ VM Translator
Assembly Code
↓ Assembler
Machine Code (binary)
↓ CPU
Execution
Each stage transforms the representation, hiding complexity from the stage above.
Multi-file compilation
Real Jack programs span multiple files:
- Compile each
.jackfile to.vmindependently - VM translator processes all
.vmfiles together - Bootstrap code calls
Sys.init Sys.initcallsMain.main
The linker (VM translator) combines everything.
Error handling in code generation
Errors can still occur during code generation:
Undefined variable:
let x = undefined; // 'undefined' not in symbol table
Type errors (limited without a type checker):
let x = obj.field; // Is obj actually an object?
Wrong argument count:
Math.sqrt(1, 2); // sqrt takes 1 argument
Our simple compiler may not catch all these. Production compilers have type checkers.
Testing the compiler
Unit tests
Test each compilation function:
- Expressions:
5,x,x+5,-x,~b,f(x),arr[i] - Statements: let, if, while, do, return
- Subroutines: function, method, constructor
Integration tests
Compile full programs, run on VM simulator:
- Simple arithmetic
- Conditionals and loops
- Method calls
- Array manipulation
- Object creation
Test programs
// Test: 2 + 3 = 5
class Test {
function void main() {
do Output.printInt(2 + 3);
return;
}
}
Compile, run on VM emulator, verify output.
What we've built
The Jack compiler:
- Tokenizes source into tokens
- Parses tokens into an AST
- Generates VM code from the AST
Combined with earlier parts:
| Tool | Input | Output |
|---|---|---|
| Compiler | Jack (.jack) | VM code (.vm) |
| VM Translator | VM code (.vm) | Assembly (.asm) |
| Assembler | Assembly (.asm) | Binary (.hack) |
Three tools, each doing one transformation.
Next steps
We have a complete software stack! But it relies on OS functions like:
Memory.alloc/Memory.deAllocMath.multiply/Math.divideString.new/String.appendCharOutput.printInt/Output.printStringScreen.drawPixel/Screen.drawLineKeyboard.readChar
Part 9 implements these operating system functions in Jack itself.
Next: Part 9 builds the operating system in Jack.