Build a Computer from NAND Gates

Part 2: Machine Language and CPU Architecture

In Part 1, we built the hardware foundation: logic gates, arithmetic circuits, and memory. Now we need to bring these components to life. How do we tell a computer what to compute?

The answer is machine language, the binary instructions that directly control the CPU. In this part, we'll:

Understand how 16 bits encode an instruction
Design a CPU that executes those instructions
See how memory-mapped I/O connects the CPU to the world

Let's decode the language of computers.

Machine language: speaking binary

Every instruction a CPU executes is just a 16-bit number. But those 16 bits are carefully structured to encode:

What operation to perform
Where to get the data
Where to store the result
Whether to jump to a different instruction

Try clicking the bits below to see how the instruction changes:

Binary Instruction Decoder

Click bits to toggle

Dec: 60432Hex: 0xEC10

Type: C-instruction

comp: A

dest: D

jump: null

Store result in D. Compute A

Assembly: D=A

Notice how different bit patterns create completely different instructions. The CPU doesn't see "add" or "jump". It sees 1110000010010000.

A-instructions and C-instructions

Our computer has two types of instructions:

A-instructions (Address): Load a value into the A register. The first bit is 0, and the remaining 15 bits hold the value.

C-instructions (Compute): Perform a computation. The first three bits are 111, and the rest encode what to compute, where to store it, and whether to jump.

Explore both formats:

Instruction Format Explorer

A-instructions load a 15-bit value into the A register. Format: 0vvv vvvv vvvv vvvv

Value (0-32767)

type

000000000010101

value

A-instruction: 0000000000010101

Assembly: @21

Why two instruction types?

The A register serves two purposes:

Data: Load a constant value for computation
Address: Point to a memory location for read/write

For example, to set memory address 100 to the value 42:

@100    // A-instruction: A = 100
D=A     // C-instruction: D = 100
@42     // A-instruction: A = 42
D=D+A   // Oops, this adds instead of stores!

Actually, we'd do it this way:

@42     // A = 42 (the value)
D=A     // D = 42
@100    // A = 100 (the address)
M=D     // RAM[100] = D = 42

This is the rhythm of Hack programming: A-instruction to set up, C-instruction to compute.

The fetch-decode-execute cycle

Every CPU, from the simplest microcontroller to the fastest supercomputer, follows the same fundamental loop:

Fetch: Read the instruction at the current PC (Program Counter)
Decode: Interpret what the instruction means
Execute: Perform the operation and update state

Watch the cycle in action:

The Fetch-Decode-Execute Cycle

Phase 1fetch

Phase 2decode

Phase 3execute

Fetch

Read instruction at address PC=0 from ROM

ROM[0] = 0000000000000010

PC0

This cycle repeats billions of times per second in modern CPUs. Each iteration is one clock cycle, the heartbeat of computation.

The program counter

The PC register holds the address of the next instruction to execute. Normally, it simply increments by 1 after each instruction. But jump instructions can change this, creating loops and conditionals.

Program Counter and Jumps

ALU Result

Jump Target (A register)

Jump Condition

5 JGT

JUMP!

Current PC

→

Next PC

The CPU checks the ALU result against the jump condition. If the condition is true, PC jumps to the address in A. Otherwise, PC increments by 1.

The jump decision is based on the ALU's output flags:

zr (zero): The result is 0
ng (negative): The result is negative (highest bit is 1)

Combining these flags with jump conditions (JGT, JEQ, JLT, etc.) enables all control flow.

The CPU simulator

Let's run actual programs on our virtual CPU. The simulator below shows:

The program in ROM (instruction memory)
The current register values (A, D, PC)
The data memory (R0-R7)

CPU Simulator

PROGRAM (ROM)

0@2

1D=A

2@3

3D=D+A

4@0

5M=D

6@6

70;JMP

REGISTERS

PC0

MEMORY (R0-R7)

R00

R10

R20

R30

R40

R50

R60

R70

Cycles: 0

Try both example programs:

Add 2+3: Loads two values and adds them
Sum 1 to 10: Uses a loop to sum numbers

Watch the PC jump back during the loop. See how values flow through registers and into memory.

Memory-mapped I/O

How does the CPU interact with the outside world? Through memory-mapped I/O. Certain memory addresses don't store data. Instead, they connect to hardware:

Address Range	Purpose
0x0000-0x3FFF	RAM (16K words)
0x4000-0x5FFF	Screen (8K words)
0x6000	Keyboard (1 word)

Writing to screen memory turns on pixels. Reading the keyboard address returns the currently pressed key.

Memory-Mapped I/O

Address

Value

🖥 Screen memory (16384-24575)

SCREEN (256 × 64 pixels shown)

Writing to screen memory (0x4000+) turns pixels on. Each 16-bit word controls 16 horizontal pixels.

This elegant design means the CPU doesn't need special "draw pixel" or "read keyboard" instructions. Memory operations handle everything.

Drawing to the screen

The screen is 512×256 pixels. Each 16-bit word controls 16 horizontal pixels. To draw a pixel at position (x, y):

1. Calculate word address: SCREEN + (y * 32) + (x / 16)
2. Calculate bit position: x mod 16
3. Set that bit to 1

Writing 0xFFFF (-1 in two's complement) to a screen address turns on 16 pixels.

Computer architecture

Let's zoom out and see how all the pieces fit together:

Computer Architecture

Hover over a component to learn more. The CPU reads instructions from ROM and data from RAM. Screen and keyboard are memory-mapped.

Memory Map

0x0000 - 0x3FFF: RAM (16K data)

0x4000 - 0x5FFF: Screen (8K)

0x6000: Keyboard

The Harvard architecture separates instruction memory (ROM) from data memory (RAM). This allows the CPU to fetch the next instruction while executing the current one.

Key connections:

CPU reads instructions from ROM using PC
CPU reads/writes data from RAM using A register
Screen and keyboard are mapped into the RAM address space

The CPU internals

Inside the CPU:

A Register: Address or data
D Register: Data only
PC: Program counter
ALU: Performs all computations

The A register is special. It can either be used as data (for computations) or as an address (to access RAM). This dual-purpose design keeps the instruction set simple.

Putting it together: a multiplication example

Let's trace through a program that multiplies two numbers. Since our CPU has no multiply instruction, we use repeated addition:

// Multiply R0 by R1, store result in R2
@R2
M=0       // R2 = 0 (result)

(LOOP)
@R1
D=M       // D = R1
@END
D;JEQ     // if R1 == 0, goto END

@R0
D=M       // D = R0
@R2
M=D+M     // R2 = R2 + R0

@R1
M=M-1     // R1--

@LOOP
0;JMP     // goto LOOP

(END)
@END
0;JMP     // infinite loop (halt)

This program:

Initializes R2 to 0
Loops R1 times, adding R0 to R2 each iteration
Halts when R1 reaches 0

Each line becomes one or two 16-bit instructions in ROM.

The pain of binary

By now you've probably noticed: writing programs in binary is tedious. Even simple operations require carefully crafted bit patterns.

Consider adding 5 + 3:

0000000000000101  // @5
1110110000010000  // D=A
0000000000000011  // @3
1110000010010000  // D=D+A
0000000000000000  // @R0
1110001100001000  // M=D

Who wants to write 1110000010010000 when they mean D=D+A?

This is where assembly language comes in. Instead of binary, we write:

@5
D=A
@3
D=D+A
@R0
M=D

An assembler translates this human-readable code into binary. That's exactly what we'll build in Part 3.

What we've learned

In this part, we explored:

Machine language: How 16 bits encode instructions
A and C instructions: Address loading vs. computation
The fetch-decode-execute cycle: The heartbeat of the CPU
Program counter and jumps: Control flow
Memory-mapped I/O: Connecting to the world
Computer architecture: How CPU, ROM, RAM, and I/O fit together

We now understand how the hardware from Part 1 becomes programmable. But programming in binary is painful.

In Part 3, we'll build an assembler that translates human-readable assembly into machine code, completing our journey from transistors to software.

This is Part 2 of a 3-part series on building a computer from first principles.