Back to Blog

Build a Computer from NAND Gates

Part 2: Machine Language and CPU Architecture

In Part 1, we built the hardware foundation: logic gates, arithmetic circuits, and memory. Now we need to bring these components to life. How do we tell a computer what to compute?

The answer is machine language, the binary instructions that directly control the CPU. In this part, we'll:

  • Understand how 16 bits encode an instruction
  • Design a CPU that executes those instructions
  • See how memory-mapped I/O connects the CPU to the world

Let's decode the language of computers.

Machine language: speaking binary

Every instruction a CPU executes is just a 16-bit number. But those 16 bits are carefully structured to encode:

  • What operation to perform
  • Where to get the data
  • Where to store the result
  • Whether to jump to a different instruction

Try clicking the bits below to see how the instruction changes:

Binary Instruction Decoder
Click bits to toggle
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
Dec: 60432Hex: 0xEC10
Type: C-instruction
comp: A
dest: D
jump: null

Store result in D. Compute A

Assembly: D=A

Notice how different bit patterns create completely different instructions. The CPU doesn't see "add" or "jump". It sees 1110000010010000.

A-instructions and C-instructions

Our computer has two types of instructions:

A-instructions (Address): Load a value into the A register. The first bit is 0, and the remaining 15 bits hold the value.

C-instructions (Compute): Perform a computation. The first three bits are 111, and the rest encode what to compute, where to store it, and whether to jump.

Explore both formats:

Instruction Format Explorer

A-instructions load a 15-bit value into the A register. Format: 0vvv vvvv vvvv vvvv

0
type
000000000010101
value
A-instruction: 0000000000010101
Assembly: @21

Why two instruction types?

The A register serves two purposes:

  1. Data: Load a constant value for computation
  2. Address: Point to a memory location for read/write

For example, to set memory address 100 to the value 42:

@100    // A-instruction: A = 100
D=A     // C-instruction: D = 100
@42     // A-instruction: A = 42
D=D+A   // Oops, this adds instead of stores!

Actually, we'd do it this way:

@42     // A = 42 (the value)
D=A     // D = 42
@100    // A = 100 (the address)
M=D     // RAM[100] = D = 42

This is the rhythm of Hack programming: A-instruction to set up, C-instruction to compute.

The fetch-decode-execute cycle

Every CPU, from the simplest microcontroller to the fastest supercomputer, follows the same fundamental loop:

  1. Fetch: Read the instruction at the current PC (Program Counter)
  2. Decode: Interpret what the instruction means
  3. Execute: Perform the operation and update state

Watch the cycle in action:

The Fetch-Decode-Execute Cycle
Phase 1fetch
Phase 2decode
Phase 3execute
Fetch

Read instruction at address PC=0 from ROM

ROM[0] = 0000000000000010
A0
D0
PC0

This cycle repeats billions of times per second in modern CPUs. Each iteration is one clock cycle, the heartbeat of computation.

The program counter

The PC register holds the address of the next instruction to execute. Normally, it simply increments by 1 after each instruction. But jump instructions can change this, creating loops and conditionals.

Program Counter and Jumps
5 JGT
JUMP!
Current PC
42
Next PC
10

The CPU checks the ALU result against the jump condition. If the condition is true, PC jumps to the address in A. Otherwise, PC increments by 1.

The jump decision is based on the ALU's output flags:

  • zr (zero): The result is 0
  • ng (negative): The result is negative (highest bit is 1)

Combining these flags with jump conditions (JGT, JEQ, JLT, etc.) enables all control flow.

The CPU simulator

Let's run actual programs on our virtual CPU. The simulator below shows:

  • The program in ROM (instruction memory)
  • The current register values (A, D, PC)
  • The data memory (R0-R7)
CPU Simulator
PROGRAM (ROM)
0@2
1D=A
2@3
3D=D+A
4@0
5M=D
6@6
70;JMP
REGISTERS
A0
D0
PC0
MEMORY (R0-R7)
R00
R10
R20
R30
R40
R50
R60
R70
Cycles: 0

Try both example programs:

  • Add 2+3: Loads two values and adds them
  • Sum 1 to 10: Uses a loop to sum numbers

Watch the PC jump back during the loop. See how values flow through registers and into memory.

Memory-mapped I/O

How does the CPU interact with the outside world? Through memory-mapped I/O. Certain memory addresses don't store data. Instead, they connect to hardware:

Address RangePurpose
0x0000-0x3FFFRAM (16K words)
0x4000-0x5FFFScreen (8K words)
0x6000Keyboard (1 word)

Writing to screen memory turns on pixels. Reading the keyboard address returns the currently pressed key.

Memory-Mapped I/O
🖥 Screen memory (16384-24575)
SCREEN (256 × 64 pixels shown)

Writing to screen memory (0x4000+) turns pixels on. Each 16-bit word controls 16 horizontal pixels.

This elegant design means the CPU doesn't need special "draw pixel" or "read keyboard" instructions. Memory operations handle everything.

Drawing to the screen

The screen is 512×256 pixels. Each 16-bit word controls 16 horizontal pixels. To draw a pixel at position (x, y):

1. Calculate word address: SCREEN + (y * 32) + (x / 16)
2. Calculate bit position: x mod 16
3. Set that bit to 1

Writing 0xFFFF (-1 in two's complement) to a screen address turns on 16 pixels.

Computer architecture

Let's zoom out and see how all the pieces fit together:

Computer Architecture
CPUROMRAMScreenKeyboardinstructionsdata

Hover over a component to learn more. The CPU reads instructions from ROM and data from RAM. Screen and keyboard are memory-mapped.

Memory Map
0x0000 - 0x3FFF: RAM (16K data)
0x4000 - 0x5FFF: Screen (8K)
0x6000: Keyboard

The Harvard architecture separates instruction memory (ROM) from data memory (RAM). This allows the CPU to fetch the next instruction while executing the current one.

Key connections:

  • CPU reads instructions from ROM using PC
  • CPU reads/writes data from RAM using A register
  • Screen and keyboard are mapped into the RAM address space

The CPU internals

Inside the CPU:

  • A Register: Address or data
  • D Register: Data only
  • PC: Program counter
  • ALU: Performs all computations

The A register is special. It can either be used as data (for computations) or as an address (to access RAM). This dual-purpose design keeps the instruction set simple.

Putting it together: a multiplication example

Let's trace through a program that multiplies two numbers. Since our CPU has no multiply instruction, we use repeated addition:

// Multiply R0 by R1, store result in R2
@R2
M=0       // R2 = 0 (result)

(LOOP)
@R1
D=M       // D = R1
@END
D;JEQ     // if R1 == 0, goto END

@R0
D=M       // D = R0
@R2
M=D+M     // R2 = R2 + R0

@R1
M=M-1     // R1--

@LOOP
0;JMP     // goto LOOP

(END)
@END
0;JMP     // infinite loop (halt)

This program:

  1. Initializes R2 to 0
  2. Loops R1 times, adding R0 to R2 each iteration
  3. Halts when R1 reaches 0

Each line becomes one or two 16-bit instructions in ROM.

The pain of binary

By now you've probably noticed: writing programs in binary is tedious. Even simple operations require carefully crafted bit patterns.

Consider adding 5 + 3:

0000000000000101  // @5
1110110000010000  // D=A
0000000000000011  // @3
1110000010010000  // D=D+A
0000000000000000  // @R0
1110001100001000  // M=D

Who wants to write 1110000010010000 when they mean D=D+A?

This is where assembly language comes in. Instead of binary, we write:

@5
D=A
@3
D=D+A
@R0
M=D

An assembler translates this human-readable code into binary. That's exactly what we'll build in Part 3.

What we've learned

In this part, we explored:

  1. Machine language: How 16 bits encode instructions
  2. A and C instructions: Address loading vs. computation
  3. The fetch-decode-execute cycle: The heartbeat of the CPU
  4. Program counter and jumps: Control flow
  5. Memory-mapped I/O: Connecting to the world
  6. Computer architecture: How CPU, ROM, RAM, and I/O fit together

We now understand how the hardware from Part 1 becomes programmable. But programming in binary is painful.

In Part 3, we'll build an assembler that translates human-readable assembly into machine code, completing our journey from transistors to software.


This is Part 2 of a 3-part series on building a computer from first principles.