Build a Computer from NAND Gates

Part 3: Building an Assembler

We've built the hardware (Part 1) and learned the machine language (Part 2). Now for the final piece: a tool that translates human-readable assembly into machine code.

Writing binary by hand is error-prone and tedious. An assembler solves this by letting us write:

@100
D=A
@sum
M=D

Instead of:

0000000001100100
1110110000010000
0000000000010000
1110001100001000

Let's build one from scratch.

Assembly vs machine code

First, let's see why assembly language matters. Compare these two representations of the same program:

Assembly vs Machine Code

ASSEMBLY (human-readable)

0@2

1D=A

2@3

3D=D+A

4@R0

5M=D

MACHINE CODE (what CPU sees)

00000000000000010

11110110000010000

20000000000000011

31110000010010000

40000000000000000

51110001100001000

WHAT EACH LINE DOES

0: Load 2 into A register

1: Copy A to D register

2: Load 3 into A register

3: Add A to D, store in D

4: Load 0 (R0 address) into A

5: Store D into RAM[A] (R0)

The assembler's job: Convert human-readable assembly into binary machine code.

The assembly version tells us what we're doing: loading values, adding, storing. The binary version is just numbers. An assembler bridges this gap.

What the assembler must do

The assembler performs three main tasks:

Parse: Break each line into components (dest, comp, jump)
Resolve symbols: Convert labels and variables to addresses
Generate code: Convert parsed components to binary

Let's tackle each step.

The two-pass algorithm

Here's the challenge: when we encounter @LOOP, we don't yet know what address LOOP refers to. The label might be defined later in the code.

The solution: two passes.

Symbol Table: Two-Pass Assembly

SOURCE CODE

@R0

D=M

@sum

M=0

(LOOP)

@R0

D=M

@END

D;JEQ

@R0

D=M

@sum

M=D+M

@R0

M=M-1

@LOOP

0;JMP

(END)

@END

0;JMP

SYMBOL TABLE

Pass 1: Labels → ROM Addresses

Label	ROM Addr
LOOP	4
END	16

Pass 1: Collect Labels

Scan through the code and record where each label (LOOP) appears. Labels don't generate machine code—they're just markers for ROM addresses.

Pass 1: Collect labels

In the first pass, we scan through the code and record every label's ROM address. We don't generate any code. We just build the symbol table.

When we see (LOOP):

Note that LOOP maps to the current ROM address
Don't increment the address counter (labels produce no code)

When we see @value or dest=comp:

Increment the ROM address counter
These instructions will occupy one word each

Pass 2: Resolve and generate

In the second pass, we generate actual machine code:

When we see @symbol, look it up in the symbol table
If not found, it's a new variable. Allocate RAM starting at address 16
Convert each instruction to its 16-bit binary form

This two-pass approach handles forward references elegantly.

The parser

The parser breaks each line into structured components:

Parser: Breaking Down Instructions

Enter an instruction:

INSTRUCTION TYPE

C-instruction

PARSED FIELDS

dest: D

comp: D+A

jump: JGT

Try: • • • • •

Parsing rules

A-instructions start with @:

If followed by a number, it's a constant: @100 → value 100
If followed by a name, it's a symbol: @LOOP → look up in table

C-instructions have the form dest=comp;jump:

dest is optional (defaults to null)
comp is required
jump is optional (defaults to null)

Labels are names in parentheses:

(LOOP) defines a label at the current ROM address
Labels don't generate code

Comments and whitespace are stripped:

Everything after // is ignored
Leading/trailing spaces are trimmed

The parser algorithm

function parseLine(line):
    remove comments (everything after //)
    trim whitespace

    if line is empty:
        return EMPTY

    if line starts with '(' and ends with ')':
        return LABEL with name

    if line starts with '@':
        value = rest of line
        if value is a number:
            return A-INSTRUCTION with value
        else:
            return A-INSTRUCTION with symbol

    else:
        parse as C-instruction:
        - if '=' exists, split on '=' for dest
        - if ';' exists, split on ';' for jump
        - middle part is comp
        return C-INSTRUCTION with dest, comp, jump

Code generation

Once we've parsed the instruction, we need to generate the corresponding binary:

Code Generation: Assembly → Binary

Destination (dest)

Computation (comp)

Jump (jump)

BINARY ENCODING

111

prefix

0000010

comp (D+A)

010

dest (D)

000

jump (null)

Assembly

D=D+A

→

Binary

1110000010010000

A-instruction encoding

A-instructions are simple: the value becomes the lower 15 bits.

@value → 0vvv vvvv vvvv vvvv

The leading 0 bit distinguishes A-instructions from C-instructions.

C-instruction encoding

C-instructions are more complex:

111a cccc ccdd djjj

111: C-instruction prefix (3 bits)
a: Use M instead of A? (1 bit)
cccccc: Computation code (6 bits)
ddd: Destination code (3 bits)
jjj: Jump code (3 bits)

Each component maps to a lookup table:

Comp	Binary	Dest	Binary	Jump	Binary
0	0101010	null	000	null	000
1	0111111	M	001	JGT	001
D	0001100	D	010	JEQ	010
A	0110000	MD	011	JGE	011
D+A	0000010	A	100	JLT	100
...	...	...	...	JMP	111

The assembler looks up each component and concatenates the bits.

The assembler playground

Now let's put it all together. The playground below shows the complete assembler in action:

Assembler Playground

ASSEMBLY SOURCE

MACHINE CODE OUTPUT

Try editing the source code:

Change values and see the binary update
Add labels and watch them appear in the symbol table
Introduce errors and see the error messages

Predefined symbols

The assembler comes with built-in symbols:

Symbol	Value	Purpose
R0-R15	0-15	Virtual registers
SP	0	Stack pointer
LCL	1	Local segment
ARG	2	Argument segment
THIS	3	This segment
THAT	4	That segment
SCREEN	16384	Screen memory base
KBD	24576	Keyboard memory

These predefined symbols make programs more readable and portable.

Integrated development

With the assembler complete, we can now write, assemble, and run programs in one environment:

Integrated Development Environment

SOURCE CODE

1// Add 2 + 3 and store in R0

2@2

3D=A

4@3

5D=D+A

6@R0

7M=D

9(END)

10@END

110;JMP

Watch your assembly code execute step by step. The highlighted line shows which instruction is about to execute, while the registers and memory display the current state.

Error handling

A good assembler doesn't just translate. It catches mistakes:

Syntax errors:

Invalid characters in symbols
Missing computation in C-instruction
Unbalanced parentheses in labels

Semantic errors:

Duplicate label definitions
Invalid computation mnemonics
Invalid destination or jump codes

Value errors:

A-instruction value > 32767
Negative values in A-instructions

Each error should report:

The line number
The problematic source text
A clear description of what's wrong

A complete example

Let's trace through assembling a multiplication program:

// Multiply R0 by R1, store in R2
    @R2
    M=0

(LOOP)
    @R0
    D=M
    @END
    D;JEQ

    @R1
    D=M
    @R2
    M=D+M

    @R0
    M=M-1

    @LOOP
    0;JMP

(END)
    @END
    0;JMP

Pass 1 builds the symbol table:

LOOP → ROM address 2
END → ROM address 14

Pass 2 generates machine code:

Line	Assembly	Binary	Explanation
1	@R2	0000000000000010	A = 2
2	M=0	1110101010001000	RAM[2] = 0
3	@R0	0000000000000000	A = 0
4	D=M	1111110000010000	D = RAM[0]
5	@END	0000000000001110	A = 14 (END)
6	D;JEQ	1110001100000010	if D=0 jump
...	...	...	...

The result: 16 instructions that multiply any two numbers.

What we've built

Over these three parts, we've constructed:

Hardware (Part 1)
- Logic gates from NAND
- Arithmetic circuits (adders, ALU)
- Memory (registers, RAM)
Architecture (Part 2)
- Machine language encoding
- CPU fetch-decode-execute cycle
- Memory-mapped I/O
Tools (Part 3)
- Parser for assembly syntax
- Symbol table with two-pass resolution
- Code generator for binary output

Starting from a single NAND gate, we've built a complete computer system. The hardware executes instructions. The assembler translates human-readable code to those instructions.

Next steps

This is just the beginning. From here, you could explore:

High-level languages: Build a compiler that translates a language like Jack to assembly
Operating systems: Write software that manages memory and processes
Hardware description languages: Describe circuits in HDL and simulate them
Real hardware: Implement this architecture on an FPGA

The principles we've covered (logic gates, machine language, translation) are the foundation of all computing. Every program you write, every app you use, ultimately becomes bits flowing through circuits we now understand.

The computer is no longer a black box.

This concludes the 3-part series on building a computer from first principles. The complete code for all simulators and demos is available in this project.