Back to Blog

Build a Computer from NAND Gates

Part 3: Building an Assembler

We've built the hardware (Part 1) and learned the machine language (Part 2). Now for the final piece: a tool that translates human-readable assembly into machine code.

Writing binary by hand is error-prone and tedious. An assembler solves this by letting us write:

@100
D=A
@sum
M=D

Instead of:

0000000001100100
1110110000010000
0000000000010000
1110001100001000

Let's build one from scratch.

Assembly vs machine code

First, let's see why assembly language matters. Compare these two representations of the same program:

Assembly vs Machine Code
ASSEMBLY (human-readable)
0@2
1D=A
2@3
3D=D+A
4@R0
5M=D
MACHINE CODE (what CPU sees)
00000000000000010
11110110000010000
20000000000000011
31110000010010000
40000000000000000
51110001100001000
WHAT EACH LINE DOES
0: Load 2 into A register
1: Copy A to D register
2: Load 3 into A register
3: Add A to D, store in D
4: Load 0 (R0 address) into A
5: Store D into RAM[A] (R0)
The assembler's job: Convert human-readable assembly into binary machine code.

The assembly version tells us what we're doing: loading values, adding, storing. The binary version is just numbers. An assembler bridges this gap.

What the assembler must do

The assembler performs three main tasks:

  1. Parse: Break each line into components (dest, comp, jump)
  2. Resolve symbols: Convert labels and variables to addresses
  3. Generate code: Convert parsed components to binary

Let's tackle each step.

The two-pass algorithm

Here's the challenge: when we encounter @LOOP, we don't yet know what address LOOP refers to. The label might be defined later in the code.

The solution: two passes.

Symbol Table: Two-Pass Assembly
SOURCE CODE
@R0
D=M
@sum
M=0
(LOOP)
@R0
D=M
@END
D;JEQ
@R0
D=M
@sum
M=D+M
@R0
M=M-1
@LOOP
0;JMP
(END)
@END
0;JMP
SYMBOL TABLE
Pass 1: Labels → ROM Addresses
LabelROM Addr
LOOP4
END16
Pass 1: Collect Labels

Scan through the code and record where each label (LOOP) appears. Labels don't generate machine code—they're just markers for ROM addresses.

Pass 1: Collect labels

In the first pass, we scan through the code and record every label's ROM address. We don't generate any code. We just build the symbol table.

When we see (LOOP):

  1. Note that LOOP maps to the current ROM address
  2. Don't increment the address counter (labels produce no code)

When we see @value or dest=comp:

  1. Increment the ROM address counter
  2. These instructions will occupy one word each

Pass 2: Resolve and generate

In the second pass, we generate actual machine code:

  1. When we see @symbol, look it up in the symbol table
  2. If not found, it's a new variable. Allocate RAM starting at address 16
  3. Convert each instruction to its 16-bit binary form

This two-pass approach handles forward references elegantly.

The parser

The parser breaks each line into structured components:

Parser: Breaking Down Instructions
INSTRUCTION TYPE
C-instruction
PARSED FIELDS
dest: D
comp: D+A
jump: JGT
Try:

Parsing rules

A-instructions start with @:

  • If followed by a number, it's a constant: @100 → value 100
  • If followed by a name, it's a symbol: @LOOP → look up in table

C-instructions have the form dest=comp;jump:

  • dest is optional (defaults to null)
  • comp is required
  • jump is optional (defaults to null)

Labels are names in parentheses:

  • (LOOP) defines a label at the current ROM address
  • Labels don't generate code

Comments and whitespace are stripped:

  • Everything after // is ignored
  • Leading/trailing spaces are trimmed

The parser algorithm

function parseLine(line):
    remove comments (everything after //)
    trim whitespace

    if line is empty:
        return EMPTY

    if line starts with '(' and ends with ')':
        return LABEL with name

    if line starts with '@':
        value = rest of line
        if value is a number:
            return A-INSTRUCTION with value
        else:
            return A-INSTRUCTION with symbol

    else:
        parse as C-instruction:
        - if '=' exists, split on '=' for dest
        - if ';' exists, split on ';' for jump
        - middle part is comp
        return C-INSTRUCTION with dest, comp, jump

Code generation

Once we've parsed the instruction, we need to generate the corresponding binary:

Code Generation: Assembly → Binary
BINARY ENCODING
111
prefix
0000010
comp (D+A)
010
dest (D)
000
jump (null)
Assembly
D=D+A
Binary
1110000010010000

A-instruction encoding

A-instructions are simple: the value becomes the lower 15 bits.

@value → 0vvv vvvv vvvv vvvv

The leading 0 bit distinguishes A-instructions from C-instructions.

C-instruction encoding

C-instructions are more complex:

111a cccc ccdd djjj
  • 111: C-instruction prefix (3 bits)
  • a: Use M instead of A? (1 bit)
  • cccccc: Computation code (6 bits)
  • ddd: Destination code (3 bits)
  • jjj: Jump code (3 bits)

Each component maps to a lookup table:

CompBinaryDestBinaryJumpBinary
00101010null000null000
10111111M001JGT001
D0001100D010JEQ010
A0110000MD011JGE011
D+A0000010A100JLT100
............JMP111

The assembler looks up each component and concatenates the bits.

The assembler playground

Now let's put it all together. The playground below shows the complete assembler in action:

Assembler Playground
ASSEMBLY SOURCE
MACHINE CODE OUTPUT

Try editing the source code:

  • Change values and see the binary update
  • Add labels and watch them appear in the symbol table
  • Introduce errors and see the error messages

Predefined symbols

The assembler comes with built-in symbols:

SymbolValuePurpose
R0-R150-15Virtual registers
SP0Stack pointer
LCL1Local segment
ARG2Argument segment
THIS3This segment
THAT4That segment
SCREEN16384Screen memory base
KBD24576Keyboard memory

These predefined symbols make programs more readable and portable.

Integrated development

With the assembler complete, we can now write, assemble, and run programs in one environment:

Integrated Development Environment
SOURCE CODE
1// Add 2 + 3 and store in R0
2@2
3D=A
4@3
5D=D+A
6@R0
7M=D
8
9(END)
10@END
110;JMP

Watch your assembly code execute step by step. The highlighted line shows which instruction is about to execute, while the registers and memory display the current state.

Error handling

A good assembler doesn't just translate. It catches mistakes:

Syntax errors:

  • Invalid characters in symbols
  • Missing computation in C-instruction
  • Unbalanced parentheses in labels

Semantic errors:

  • Duplicate label definitions
  • Invalid computation mnemonics
  • Invalid destination or jump codes

Value errors:

  • A-instruction value > 32767
  • Negative values in A-instructions

Each error should report:

  1. The line number
  2. The problematic source text
  3. A clear description of what's wrong

A complete example

Let's trace through assembling a multiplication program:

// Multiply R0 by R1, store in R2
    @R2
    M=0

(LOOP)
    @R0
    D=M
    @END
    D;JEQ

    @R1
    D=M
    @R2
    M=D+M

    @R0
    M=M-1

    @LOOP
    0;JMP

(END)
    @END
    0;JMP

Pass 1 builds the symbol table:

  • LOOP → ROM address 2
  • END → ROM address 14

Pass 2 generates machine code:

LineAssemblyBinaryExplanation
1@R20000000000000010A = 2
2M=01110101010001000RAM[2] = 0
3@R00000000000000000A = 0
4D=M1111110000010000D = RAM[0]
5@END0000000000001110A = 14 (END)
6D;JEQ1110001100000010if D=0 jump
............

The result: 16 instructions that multiply any two numbers.

What we've built

Over these three parts, we've constructed:

  1. Hardware (Part 1)

    • Logic gates from NAND
    • Arithmetic circuits (adders, ALU)
    • Memory (registers, RAM)
  2. Architecture (Part 2)

    • Machine language encoding
    • CPU fetch-decode-execute cycle
    • Memory-mapped I/O
  3. Tools (Part 3)

    • Parser for assembly syntax
    • Symbol table with two-pass resolution
    • Code generator for binary output

Starting from a single NAND gate, we've built a complete computer system. The hardware executes instructions. The assembler translates human-readable code to those instructions.

Next steps

This is just the beginning. From here, you could explore:

  • High-level languages: Build a compiler that translates a language like Jack to assembly
  • Operating systems: Write software that manages memory and processes
  • Hardware description languages: Describe circuits in HDL and simulate them
  • Real hardware: Implement this architecture on an FPGA

The principles we've covered (logic gates, machine language, translation) are the foundation of all computing. Every program you write, every app you use, ultimately becomes bits flowing through circuits we now understand.

The computer is no longer a black box.


This concludes the 3-part series on building a computer from first principles. The complete code for all simulators and demos is available in this project.