Build a Computer from NAND Gates
Part 2: Machine Language and CPU Architecture
In Part 1, we built the hardware foundation: logic gates, arithmetic circuits, and memory. Now we need to bring these components to life. How do we tell a computer what to compute?
The answer is machine language, the binary instructions that directly control the CPU. In this part, we'll:
- Understand how 16 bits encode an instruction
- Design a CPU that executes those instructions
- See how memory-mapped I/O connects the CPU to the world
Let's decode the language of computers.
Machine language: speaking binary
Every instruction a CPU executes is just a 16-bit number. But those 16 bits are carefully structured to encode:
- What operation to perform
- Where to get the data
- Where to store the result
- Whether to jump to a different instruction
Try clicking the bits below to see how the instruction changes:
Store result in D. Compute A
Notice how different bit patterns create completely different instructions. The CPU doesn't see "add" or "jump". It sees 1110000010010000.
A-instructions and C-instructions
Our computer has two types of instructions:
A-instructions (Address): Load a value into the A register. The first bit is 0, and the remaining 15 bits hold the value.
C-instructions (Compute): Perform a computation. The first three bits are 111, and the rest encode what to compute, where to store it, and whether to jump.
Explore both formats:
A-instructions load a 15-bit value into the A register. Format: 0vvv vvvv vvvv vvvv
Why two instruction types?
The A register serves two purposes:
- Data: Load a constant value for computation
- Address: Point to a memory location for read/write
For example, to set memory address 100 to the value 42:
@100 // A-instruction: A = 100
D=A // C-instruction: D = 100
@42 // A-instruction: A = 42
D=D+A // Oops, this adds instead of stores!
Actually, we'd do it this way:
@42 // A = 42 (the value)
D=A // D = 42
@100 // A = 100 (the address)
M=D // RAM[100] = D = 42
This is the rhythm of Hack programming: A-instruction to set up, C-instruction to compute.
The fetch-decode-execute cycle
Every CPU, from the simplest microcontroller to the fastest supercomputer, follows the same fundamental loop:
- Fetch: Read the instruction at the current PC (Program Counter)
- Decode: Interpret what the instruction means
- Execute: Perform the operation and update state
Watch the cycle in action:
Read instruction at address PC=0 from ROM
This cycle repeats billions of times per second in modern CPUs. Each iteration is one clock cycle, the heartbeat of computation.
The program counter
The PC register holds the address of the next instruction to execute. Normally, it simply increments by 1 after each instruction. But jump instructions can change this, creating loops and conditionals.
The CPU checks the ALU result against the jump condition. If the condition is true, PC jumps to the address in A. Otherwise, PC increments by 1.
The jump decision is based on the ALU's output flags:
- zr (zero): The result is 0
- ng (negative): The result is negative (highest bit is 1)
Combining these flags with jump conditions (JGT, JEQ, JLT, etc.) enables all control flow.
The CPU simulator
Let's run actual programs on our virtual CPU. The simulator below shows:
- The program in ROM (instruction memory)
- The current register values (A, D, PC)
- The data memory (R0-R7)
Try both example programs:
- Add 2+3: Loads two values and adds them
- Sum 1 to 10: Uses a loop to sum numbers
Watch the PC jump back during the loop. See how values flow through registers and into memory.
Memory-mapped I/O
How does the CPU interact with the outside world? Through memory-mapped I/O. Certain memory addresses don't store data. Instead, they connect to hardware:
| Address Range | Purpose |
|---|---|
| 0x0000-0x3FFF | RAM (16K words) |
| 0x4000-0x5FFF | Screen (8K words) |
| 0x6000 | Keyboard (1 word) |
Writing to screen memory turns on pixels. Reading the keyboard address returns the currently pressed key.
Writing to screen memory (0x4000+) turns pixels on. Each 16-bit word controls 16 horizontal pixels.
This elegant design means the CPU doesn't need special "draw pixel" or "read keyboard" instructions. Memory operations handle everything.
Drawing to the screen
The screen is 512×256 pixels. Each 16-bit word controls 16 horizontal pixels. To draw a pixel at position (x, y):
1. Calculate word address: SCREEN + (y * 32) + (x / 16)
2. Calculate bit position: x mod 16
3. Set that bit to 1
Writing 0xFFFF (-1 in two's complement) to a screen address turns on 16 pixels.
Computer architecture
Let's zoom out and see how all the pieces fit together:
Hover over a component to learn more. The CPU reads instructions from ROM and data from RAM. Screen and keyboard are memory-mapped.
The Harvard architecture separates instruction memory (ROM) from data memory (RAM). This allows the CPU to fetch the next instruction while executing the current one.
Key connections:
- CPU reads instructions from ROM using PC
- CPU reads/writes data from RAM using A register
- Screen and keyboard are mapped into the RAM address space
The CPU internals
Inside the CPU:
- A Register: Address or data
- D Register: Data only
- PC: Program counter
- ALU: Performs all computations
The A register is special. It can either be used as data (for computations) or as an address (to access RAM). This dual-purpose design keeps the instruction set simple.
Putting it together: a multiplication example
Let's trace through a program that multiplies two numbers. Since our CPU has no multiply instruction, we use repeated addition:
// Multiply R0 by R1, store result in R2
@R2
M=0 // R2 = 0 (result)
(LOOP)
@R1
D=M // D = R1
@END
D;JEQ // if R1 == 0, goto END
@R0
D=M // D = R0
@R2
M=D+M // R2 = R2 + R0
@R1
M=M-1 // R1--
@LOOP
0;JMP // goto LOOP
(END)
@END
0;JMP // infinite loop (halt)
This program:
- Initializes R2 to 0
- Loops R1 times, adding R0 to R2 each iteration
- Halts when R1 reaches 0
Each line becomes one or two 16-bit instructions in ROM.
The pain of binary
By now you've probably noticed: writing programs in binary is tedious. Even simple operations require carefully crafted bit patterns.
Consider adding 5 + 3:
0000000000000101 // @5
1110110000010000 // D=A
0000000000000011 // @3
1110000010010000 // D=D+A
0000000000000000 // @R0
1110001100001000 // M=D
Who wants to write 1110000010010000 when they mean D=D+A?
This is where assembly language comes in. Instead of binary, we write:
@5
D=A
@3
D=D+A
@R0
M=D
An assembler translates this human-readable code into binary. That's exactly what we'll build in Part 3.
What we've learned
In this part, we explored:
- Machine language: How 16 bits encode instructions
- A and C instructions: Address loading vs. computation
- The fetch-decode-execute cycle: The heartbeat of the CPU
- Program counter and jumps: Control flow
- Memory-mapped I/O: Connecting to the world
- Computer architecture: How CPU, ROM, RAM, and I/O fit together
We now understand how the hardware from Part 1 becomes programmable. But programming in binary is painful.
In Part 3, we'll build an assembler that translates human-readable assembly into machine code, completing our journey from transistors to software.
This is Part 2 of a 3-part series on building a computer from first principles.