Back to Blog

Build a Computer from NAND Gates

Part 8: Compiler II - Code Generation

The frontend gave us an AST. Now we traverse it and emit VM code.

Code generation is surprisingly systematic: each AST node type maps to a specific pattern of VM instructions.

Compiling expressions

Expressions evaluate to a value left on the stack:

Compiling Expressions to VM Code
JACK EXPRESSION
x + 5
VM CODE
push local 0 // x
push constant 5
add
STACK RESULT
[x, 5] → [x+5]
Pattern: Recursively compile operands (pushing results), then emit the operator. This naturally evaluates expressions in postfix order.

The pattern

For any expression:

  1. Recursively compile sub-expressions (they push results)
  2. Emit the operation (which pops operands and pushes result)

This naturally produces postfix evaluation:

  • x + 5push x, push 5, add
  • x + y * zpush x, push y, add, push z, multiply (Jack is left-to-right)

Term compilation

Each term type has its own code:

TermVM Code
Integer constantpush constant n
Variable xpush segment index (from symbol table)
truepush constant 0 then not (yields -1)
false/nullpush constant 0
thispush pointer 0
-exprcompile expr, then neg
~exprcompile expr, then not
(expr)compile expr (parens are just grouping)

Operator compilation

Binary operators pop two values, push one:

OperatorVM Code
+add
-sub
&and
|or
<lt
>gt
=eq
*call Math.multiply 2
/call Math.divide 2

Note: * and / aren't primitive VM operations. We call OS functions.

Compiling statements

Compiling Statements
JACK STATEMENT
let x = 5 + 3;
VM CODE
push constant 5
push constant 3
add
pop local 0 // x
Compile expression, pop result to variable

let statement

let varName = expression;
  1. Compile the expression (pushes value)
  2. Pop into the variable's segment
// let x = 5 + 3;
push constant 5
push constant 3
add
pop local 0     // x

let with array index

let arr[i] = value;

Array assignment is trickier:

  1. Compute target address: arr + i
  2. Compile the value expression
  3. Store value at computed address via THAT
push arr        // base address
push i          // index
add             // arr + i
// compile value...
pop temp 0      // save value temporarily
pop pointer 1   // THAT = arr + i
push temp 0     // retrieve value
pop that 0      // arr[i] = value

We need temp because both the address and value are on the stack.

if statement

if (condition) { then } else { else }

Generates:

[compile condition]
not
if-goto ELSE_n
[compile then statements]
goto END_n
label ELSE_n
[compile else statements]
label END_n

The not inverts the condition because if-goto jumps on true, but we want to skip to ELSE on false.

while statement

while (condition) { body }

Generates:

label LOOP_n
[compile condition]
not
if-goto END_n
[compile body statements]
goto LOOP_n
label END_n

do statement

do subroutineCall;

Compiles the call (which pushes a return value), then discards it:

[compile call]
pop temp 0      // discard return value

Even void functions push 0, so we always pop.

return statement

return expression;  // or just: return;
[compile expression, or push constant 0 for void]
return

Compiling subroutines

Compiling Subroutines
JACK FUNCTION
function int add(int a, int b) {
  return a + b;
}
VM CODE
function MyClass.add 0
push argument 0 // a
push argument 1 // b
add
return
Functions are simplest: arguments are in order, no "this".

Functions

Functions are the simplest. Arguments are already in the argument segment:

function ClassName.funcName nLocals
[compile body]

nLocals is the count of local variables.

Methods

Methods receive the object as argument 0. First thing: set THIS:

function ClassName.methodName nLocals
push argument 0     // "this" was passed
pop pointer 0       // THIS = argument 0
[compile body]

Now this 0 refers to field 0 of the object.

Constructors

Constructors allocate memory and initialize fields:

function ClassName.new nLocals
push constant nFields
call Memory.alloc 1
pop pointer 0       // THIS = new object
[compile initialization statements]
push pointer 0      // return this
return

The caller gets back a pointer to the new object.

Compiling method calls

Compiling Subroutine Calls
JACK CALL
Math.sqrt(x)
VM CODE
push local 0 // x (argument)
call Math.sqrt 1
Function call: push arguments, call with argument count.

Call types

Function call: ClassName.funcName(args)

[push args]
call ClassName.funcName nArgs

Method call on object: obj.methodName(args)

push obj            // object becomes "this"
[push args]
call ClassName.methodName nArgs+1

Method call on this: methodName(args) (inside a method)

push pointer 0      // current object
[push args]
call ClassName.methodName nArgs+1

The key insight: method calls pass the object as an extra first argument.

Resolving call targets

When compiling target.method(args):

  1. Look up target in symbol table
  2. If found → it's an object variable → method call
  3. If not found → target is a class name → function call

This determines whether to push the object and increment nArgs.

Special patterns

Special Code Generation Patterns
JACK
"hello"
VM CODE
push constant 5 // length
call String.new 1
push constant 104 // h
call String.appendChar 2
push constant 101 // e
call String.appendChar 2
// ... l, l, o
Create String, append each character

String constants

Jack strings are objects. "hello" compiles to:

push constant 5         // length
call String.new 1       // create String object
push constant 104       // 'h'
call String.appendChar 2
push constant 101       // 'e'
call String.appendChar 2
// ... and so on

Each character's ASCII code is appended.

Multiplication and division

The VM has no * or / instructions. We use OS calls:

// x * y
push x
push y
call Math.multiply 2

// x / y
push x
push y
call Math.divide 2

The OS implements these using addition and bit shifting.

Boolean constants

true  →  push constant 0; not    // yields -1
false →  push constant 0
null  →  push constant 0

In Jack, -1 (all bits set) is true, 0 is false.

Complete compilation

Complete Jack Compilation
JACK SOURCE
VM OUTPUT
12 lines
function Main.main 2
push constant 1
pop local 0
push constant 2
pop local 1
push local 0
push local 1
add
call Output.printInt 1
pop temp 0
push constant 0
return

The code generator algorithm

compileClass(classNode):
    for each classVarDec:
        add to symbol table (static or field)

    for each subroutineDec:
        compileSubroutine(subroutineDec)

compileSubroutine(sub):
    reset subroutine symbol table
    add parameters to symbol table
    add local variables to symbol table

    emit: function ClassName.subName nLocals

    if constructor:
        emit: allocate memory, set THIS
    if method:
        emit: set THIS from argument 0

    compileStatements(body.statements)

compileStatements(statements):
    for each statement:
        compileStatement(statement)

compileExpression(expr):
    compileTerm(expr.term)
    for each (op, term) in expr.ops:
        compileTerm(term)
        emitOp(op)

Label generation

Control flow needs unique labels. Use a counter:

labelCounter = 0

newLabel(prefix):
    return prefix + "_" + labelCounter++

Generates: IF_ELSE_0, IF_END_0, WHILE_LOOP_1, etc.

The complete pipeline

Complete Compilation Pipeline
Jack Source
class Main { ... }
Tokenizer
KEYWORD:class IDENTIFIER:Main ...
Parser
ClassNode { name: "Main", ... }
Code Generator
function Main.main 0 ...
VM Translator
@256 D=A @SP M=D ...
Assembler
0000000100000000 ...
Hardware
Flip-flops & Logic Gates
From high-level language to transistors—each layer builds on the one below.

From Jack to hardware

Jack Source Code
       ↓ Tokenizer
Token Stream
       ↓ Parser
Abstract Syntax Tree
       ↓ Code Generator
VM Code
       ↓ VM Translator
Assembly Code
       ↓ Assembler
Machine Code (binary)
       ↓ CPU
Execution

Each stage transforms the representation, hiding complexity from the stage above.

Multi-file compilation

Real Jack programs span multiple files:

  1. Compile each .jack file to .vm independently
  2. VM translator processes all .vm files together
  3. Bootstrap code calls Sys.init
  4. Sys.init calls Main.main

The linker (VM translator) combines everything.

Error handling in code generation

Errors can still occur during code generation:

Undefined variable:

let x = undefined;  // 'undefined' not in symbol table

Type errors (limited without a type checker):

let x = obj.field;  // Is obj actually an object?

Wrong argument count:

Math.sqrt(1, 2);    // sqrt takes 1 argument

Our simple compiler may not catch all these. Production compilers have type checkers.

Testing the compiler

Unit tests

Test each compilation function:

  • Expressions: 5, x, x+5, -x, ~b, f(x), arr[i]
  • Statements: let, if, while, do, return
  • Subroutines: function, method, constructor

Integration tests

Compile full programs, run on VM simulator:

  • Simple arithmetic
  • Conditionals and loops
  • Method calls
  • Array manipulation
  • Object creation

Test programs

// Test: 2 + 3 = 5
class Test {
    function void main() {
        do Output.printInt(2 + 3);
        return;
    }
}

Compile, run on VM emulator, verify output.

What we've built

The Jack compiler:

  1. Tokenizes source into tokens
  2. Parses tokens into an AST
  3. Generates VM code from the AST

Combined with earlier parts:

ToolInputOutput
CompilerJack (.jack)VM code (.vm)
VM TranslatorVM code (.vm)Assembly (.asm)
AssemblerAssembly (.asm)Binary (.hack)

Three tools, each doing one transformation.

Next steps

We have a complete software stack! But it relies on OS functions like:

  • Memory.alloc / Memory.deAlloc
  • Math.multiply / Math.divide
  • String.new / String.appendChar
  • Output.printInt / Output.printString
  • Screen.drawPixel / Screen.drawLine
  • Keyboard.readChar

Part 9 implements these operating system functions in Jack itself.


Next: Part 9 builds the operating system in Jack.