Build a Computer from NAND Gates
Part 6: The Jack Programming Language
We have hardware, assembly, and a virtual machine. But programming in VM code is still tedious. We need a high-level language.
Meet Jack, a simple, Java-like language designed for our platform. It has:
- Classes and objects
- Methods and functions
- Variables and expressions
- Control flow (if, while)
- Arrays and strings
Jack is simpler than Java or Python, but powerful enough for real programs.
Language overview
class Main {
// Class variables
static int count;
field int value;
// Subroutines
constructor Main new() { ... }
function void helper() { ... }
method int getValue() { ... }
}Core concepts
Classes: Every Jack program is a collection of classes. Each class lives in its own file (ClassName.jack).
Subroutines: Three types exist:
- Constructors create and return new objects
- Methods operate on objects (receive implicit "this")
- Functions are static (no object context)
Variables:
- static: Shared across all objects of a class
- field: Per-object instance variables
- var: Local variables within a subroutine
- Parameters: Passed by value
Types:
- int: 16-bit signed integer (-32768 to 32767)
- char: Unicode character
- boolean: true or false
- ClassName: Reference to object of that type
Jack vs other languages
| Feature | Jack | Java | Python |
|---|---|---|---|
| Variable Declaration | var int x; | int x; | x: int |
| Assignment | let x = 5; | x = 5; | x = 5 |
| Equality | if (x = y) | if (x == y) | if x == y: |
| Method Call | do obj.print(); | obj.print(); | obj.print() |
| Constructor | let p = Point.new(1, 2); | Point p = new Point(1, 2); | p = Point(1, 2) |
| Array Access | let arr[i] = 5; | arr[i] = 5; | arr[i] = 5 |
| Void Return | return; | return; | return |
let for assignment, do for void calls,= for comparison, and explicit ClassName.new() for object creation.Key differences
Assignment uses let: Unlike most languages, Jack requires let x = 5; instead of just x = 5;.
Void calls use do: When calling a function for its side effect, use do Output.print();.
Equality is =: Jack uses single = for comparison (like == in other languages). There's no assignment expression.
Constructors are explicit: You write Point.new(x, y) instead of new Point(x, y).
No garbage collection: Objects allocated with new aren't automatically freed. The OS provides Memory.deAlloc().
These differences simplify the compiler while keeping the language expressive.
Object-based programming
class Point {
field int x;
field int y;
}How objects work
Each object is a block of memory on the heap:
- Constructor calls
Memory.alloc(n)to get n words - Field variables are stored at offsets from the base address
- this pointer holds the base address
When you write let p = Point.new(3, 4):
- Memory is allocated (e.g., at address 3000)
- Fields are initialized (RAM[3000] = 3, RAM[3001] = 4)
- The constructor returns 3000
- Variable
pstores 3000
Method calls pass this as the first argument. When p.getX() is called:
- Push 3000 (value of p) as argument 0
- Jump to Point.getX
- Method accesses x as
argument 0 + offset 0
Object vs class
Jack is object-based, not fully object-oriented:
- No inheritance (no "extends")
- No polymorphism (no method overriding)
- No interfaces
This keeps the compiler simple while supporting essential object concepts.
Keywords and symbols
The 21 keywords
Jack has exactly 21 reserved words. They fall into categories:
Program structure (4): class, constructor, function, method
Variable kinds (3): field, static, var
Types (4): int, char, boolean, void
Constants (4): true, false, null, this
Statements (6): let, do, if, else, while, return
The 19 symbols
Symbols serve as operators and punctuation:
Grouping (6): { } ( ) [ ]
Arithmetic (5): + - * / ~
Comparison (4): < > = & (where & is logical AND, | is OR)
Punctuation (3): . , ;
Everything else is either a keyword, number, string, or identifier.
Tokenization
Before parsing, the compiler breaks source code into tokens:
Token types
| Type | Examples | Description |
|---|---|---|
| KEYWORD | class, if, let | Reserved words |
| SYMBOL | { + ; | Single characters |
| INT_CONST | 0, 42, 32767 | Integer literals |
| STRING_CONST | "hello" | String literals |
| IDENTIFIER | x, Main, getValue | Names |
Tokenization rules
- Skip whitespace and comments (
//and/* */) - If character is a symbol, emit SYMBOL token
- If character is digit, read integer constant
- If character is
", read until closing" - Otherwise, read word (alphanumeric + underscore)
- If word is a keyword, emit KEYWORD; else IDENTIFIER
Tokenization is the first compiler phase. It simplifies parsing.
Jack grammar
class: 'class' className '{' classVarDec* subroutineDec* '}'class Point { field int x; method int getX() {...} }Context-free grammar
Jack's syntax is defined by a context-free grammar (CFG). Each grammar rule describes how constructs are formed:
class → 'class' className '{' classVarDec* subroutineDec* '}'
classVarDec → ('static'|'field') type varName (',' varName)* ';'
type → 'int' | 'char' | 'boolean' | className
The grammar is recursive: expressions contain terms, terms can contain expressions.
Why context-free?
Context-free means each rule can be parsed independently. When we see if, we know what follows ((condition) { statements }).
This enables recursive descent parsing, one function per grammar rule:
parseClass()callsparseClassVarDec()andparseSubroutineDec()parseExpression()callsparseTerm()parseTerm()might callparseExpression()(recursively!)
We'll build this parser in Part 7.
Expressions
Jack expressions follow standard precedence:
expression → term (op term)*
term → intConst | stringConst | keyword | varName
| varName '[' expression ']'
| subroutineCall
| '(' expression ')'
| unaryOp term
op → '+' | '-' | '*' | '/' | '&' | '|' | '<' | '>' | '='
unaryOp → '-' | '~'
No operator precedence
Jack doesn't define precedence between binary operators. The expression 1 + 2 * 3 is parsed left-to-right as (1 + 2) * 3 = 9, not 1 + (2 * 3) = 7.
To get correct precedence, use parentheses: 1 + (2 * 3).
This simplifies the parser but requires more careful coding.
Subroutine calls
Two forms exist:
Method/function on class or object:
target.subroutineName(arguments)
// Examples: Math.sqrt(x), point.getX()
Method on current object:
subroutineName(arguments)
// Implicitly calls this.subroutineName
Statements
Jack has five statement types:
letStatement → 'let' varName ('[' expression ']')? '=' expression ';'
ifStatement → 'if' '(' expression ')' '{' statements '}' ('else' '{' statements '}')?
whileStatement → 'while' '(' expression ')' '{' statements '}'
doStatement → 'do' subroutineCall ';'
returnStatement → 'return' expression? ';'
Statement semantics
let: Assigns a value to a variable or array element.
let x = 5;
let arr[i] = x + 1;
if/else: Conditional execution.
if (x > 0) {
do Output.printString("positive");
} else {
do Output.printString("non-positive");
}
while: Loop while condition is true.
while (i < 10) {
let sum = sum + i;
let i = i + 1;
}
do: Call a subroutine for its side effect (discard return value).
do Output.printInt(x);
do screen.drawPixel(100, 50);
return: Exit subroutine with optional value.
return x + 1;
return; // void subroutines
Complete programs
// Simple class
class Main {
function void main() {
var int x, y;
let x = 1;
let y = 2;
do Output.printInt(x + y);
return;
}
}Program entry point
Every Jack program needs:
- A class named
Main - A function
Main.main()with no arguments
The OS calls Main.main() to start your program.
Multi-file programs
Large programs span multiple files:
- Each class in its own
ClassName.jackfile - All files in the same directory
- Compiler processes each file to
ClassName.vm - VM translator combines all
.vmfiles
Standard library
Jack comes with built-in classes:
| Class | Purpose |
|---|---|
| Math | Mathematical operations (multiply, divide, sqrt) |
| String | String manipulation |
| Array | Array allocation |
| Output | Text output to screen |
| Screen | Graphics (drawPixel, drawLine, etc.) |
| Keyboard | Input from keyboard |
| Memory | Memory allocation (alloc, deAlloc) |
| Sys | System utilities (halt, error, wait) |
These are implemented in Part 9 (Operating System).
Design philosophy
Jack was designed with compiler simplicity in mind:
Explicit typing: Every variable has a declared type.
No overloading: Each subroutine name is unique within a class.
Simple memory model: No garbage collection, explicit alloc/deAlloc.
Limited operators: No operator precedence, no compound assignment.
Statement-oriented: Expressions can't be statements (need do).
These choices make the compiler straightforward while preserving expressiveness.
What's next
We now understand the Jack language. Next:
- Part 7: Build the compiler frontend (tokenizer and parser that produce an AST)
- Part 8: Build the compiler backend (code generator that translates AST to VM code)
By the end, we'll have a complete pipeline:
Jack source (.jack)
↓ Tokenizer
Tokens
↓ Parser
Abstract Syntax Tree
↓ Code Generator
VM code (.vm)
↓ VM Translator
Assembly (.asm)
↓ Assembler
Machine code (.hack)
Each layer hides complexity from the layer above.
Next: Part 7 builds the compiler frontend, covering tokenization and parsing.