"I speak Spanish to God, Italian to women, French to men, and German to my horse."
: Charles V, Holy Roman Emperor (1500–1558)
You might think that the languages of computers would be as diverse as those of people, but in reality computer languages are quite similar, more like regional dialects than independent languages. Hence, once you learn one, it is easy to pick up others.
In this article, we'll explore the MIPS instruction set, an elegant example of the instruction sets designed since the 1980s. We'll see how high-level code becomes the binary patterns that processors actually execute.
By the end, you'll understand:
"There must certainly be instructions for performing the fundamental arithmetic operations."
: Burks, Goldstine, and von Neumann, 1946
Every computer must be able to perform arithmetic. The MIPS assembly language notation for addition is:
add a, b, c # a = b + c
This instructs the computer to add the two variables b and c and put their sum in a.
This notation is rigid: each MIPS arithmetic instruction performs only one operation and must always have exactly three variables. Want to add four variables? You need multiple instructions:
add a, b, c # a = b + c
add a, a, d # a = a + d = b + c + d
add a, a, e # a = a + e = b + c + d + e
Why this rigidity? Requiring every instruction to have exactly three operands conforms to a fundamental principle of hardware design:
Simplicity favors regularity.
Hardware for a variable number of operands would be more complicated than hardware for a fixed number. Keeping the instruction format regular makes the processor simpler, faster, and cheaper.
Let's see how a compiler translates high-level code to assembly:
The compiler must break complex expressions into multiple instructions, creating temporary variables (stored in temporary registers) to hold intermediate results.
High-level languages have variables. But processors don't work with variables, where do the operands of arithmetic instructions come from?
Unlike high-level languages, the operands of arithmetic instructions are restricted. They must come from a limited number of special locations built directly into the hardware called registers.
Registers are the fastest form of storage in a computer. The MIPS architecture has 32 registers, each 32 bits wide. This 32-bit quantity is called a word.
Variables map to registers. The compiler assigns frequently-used variables to registers for fast access. Less frequently-used data lives in memory.
Why only 32 registers? This brings us to our second design principle:
Smaller is faster.
A very large number of registers would increase the clock cycle time because electronic signals take longer to travel farther. The designer must balance the programmer's desire for more registers with the need for speed.
MIPS uses two-character names following a dollar sign. The convention reflects usage:
$s0–$s7: Saved registers for long-lived variables$t0–$t9: Temporaries for intermediate calculations$a0–$a3: Arguments to functions$v0–$v1: Return values from functions$zero: Hardwired to constant 0Let's compile a C assignment using actual register names. If variables f, g, h, i, and j are assigned to registers $s0–$s4:
f = (g + h) - (i + j);Compiles to:
add $t0, $s1, $s2 # $t0 = g + h
add $t1, $s3, $s4 # $t1 = i + j
sub $s0, $t0, $t1 # f = $t0 - $t1Programming languages have arrays and structures that contain far more data than 32 registers can hold. This data lives in memory, a large, single-dimensional array where each location has an address.
MIPS includes data transfer instructions that move data between memory and registers:
lw (load word): Copies data from memory to a registersw (store word): Copies data from a register to memoryMIPS addresses individual bytes, not words. Since a word is 4 bytes, word addresses are always multiples of 4. This is called alignment.
# Array A starts at address in $s3
# To access A[8], we need byte offset 8 × 4 = 32
lw $t0, 32($s3) # $t0 = A[8]The address is computed as: base register + offset. The offset (32) is added to the contents of $s3 to form the memory address.
8-bit bytes are useful for characters and many other purposes. Virtually all modern architectures address individual bytes. The tradeoff is that accessing a 32-bit word requires multiplying the index by 4.
Many programs use small constants. Loading them from memory would be wasteful. MIPS provides immediate instructions with constants built in:
addi $s3, $s3, 4 # $s3 = $s3 + 4The constant 0 is so common that MIPS dedicates register $zero to always hold 0. This makes operations like "move" trivial:
add $t0, $s0, $zero # $t0 = $s0 (copy)Computers represent numbers as sequences of bits. A 32-bit word can represent 2³² different patterns. How we interpret these patterns determines whether they're signed or unsigned.
For signed integers, MIPS uses two's complement representation. The key insight: the leftmost bit indicates sign (0 = positive, 1 = negative).
To negate a number in two's complement: invert all bits, then add 1.
# Negate 2 (in 8 bits)
2 = 0000 0010
# Step 1: Invert all bits
1111 1101
# Step 2: Add 1
1111 1110 = -2
Two's complement has one negative number (-2³¹) with no positive counterpart. But it makes addition and subtraction simpler in hardware, you don't need separate circuits for signed and unsigned operations.
When a 16-bit immediate needs to be used with 32-bit registers, the sign bit is replicated to fill the extra bits. This preserves the number's value.
We've written instructions as text like add $t0, $s1, $s2. But computers only understand binary. How do we represent instructions as numbers?
Instructions are stored as 32-bit binary patterns. Each instruction is divided into fields that encode the operation and operands.
MIPS uses fixed-size 32-bit instructions with standardized field layouts. Register names become 5-bit numbers, operations become opcodes, and the pieces combine into machine code.
For R-type instructions (register operations):
For I-type instructions (immediate operations and memory access):
Good design demands good compromises.
MIPS uses different formats for different instruction types. This complicates the hardware slightly, but the alternative, limiting immediates to 5 bits or having variable-length instructions, would be worse.
Enter any MIPS instruction to see how it encodes to binary:
Besides arithmetic, processors need operations that work on individual bits:
sll $t0, $s0, 4 # $t0 = $s0 << 4 (shift left)
srl $t0, $s0, 4 # $t0 = $s0 >> 4 (shift right)
and $t0, $s0, $s1 # $t0 = $s0 & $s1 (bitwise AND)
or $t0, $s0, $s1 # $t0 = $s0 | $s1 (bitwise OR)
nor $t0, $s0, $s1 # $t0 = ~($s0 | $s1) (bitwise NOR)Shift left by n bits multiplies by 2ⁿ. Shift right divides. These are much faster than actual multiplication and division.
MIPS doesn't have a NOT instruction, but NOR with $zero achieves the same effect:
nor $t0, $s0, $zero # $t0 = ~$s0This is another example of keeping the instruction set minimal while remaining powerful.
Computers need conditional execution. MIPS provides branch instructions that compare registers and jump if a condition is true:
beq $s0, $s1, Label # if ($s0 == $s1) goto Label
bne $s0, $s1, Label # if ($s0 != $s1) goto LabelCombined with slt (set on less than), these handle all comparisons:
slt $t0, $s0, $s1 # $t0 = ($s0 < $s1) ? 1 : 0
bne $t0, $zero, Less # if $s0 < $s1, goto LessA simple while loop in C:
while (save[i] == k)
i += 1;Becomes:
Loop:
sll $t1, $s3, 2 # $t1 = i × 4
add $t1, $t1, $s6 # $t1 = address of save[i]
lw $t0, 0($t1) # $t0 = save[i]
bne $t0, $s5, Exit # if save[i] != k, exit
addi $s3, $s3, 1 # i += 1
j Loop # repeat
Exit:Procedures (functions) require saving and restoring state. MIPS provides:
jal (jump and link): Calls a procedure, saving return address in $rajr (jump register): Returns to the saved addressThe caller places arguments in $a0–$a3. The callee returns results in $v0–$v1.
# Calling a procedure
addi $a0, $zero, 5 # argument = 5
jal factorial # call factorial(5)
# $v0 now contains the result
factorial:
# procedure body
jr $ra # returnWhen procedures call other procedures, or need more than 4 arguments, they use the stack, a region of memory pointed to by $sp.
Saved registers ($s0–$s7) must be preserved across calls. If a procedure uses them, it must save the old values on the stack and restore them before returning.
We've seen that instructions are just numbers. This leads to a profound insight:
This is the stored-program concept, and it makes computers universal.
A computer's memory can contain:
All of it is just numbers in memory. The same hardware can run any program.
Let's trace the complete journey from high-level code to execution:
# The journey of: a = b + c
# 1. C code
a = b + c;
# 2. Assembly (compiler output)
add $s0, $s1, $s2
# 3. Machine code (assembler output)
0000 0010 0011 0010 1000 0000 0010 0000
# 4. Hexadecimal
0x02328020
We've explored how computers understand and execute instructions:
These principles, established in the 1940s, still underpin every computer today. The specifics may vary, ARM, x86, RISC-V, but the core ideas remain the same.
The really decisive considerations in selecting an instruction set are simplicity of the equipment and the clarity of its application to important problems, as true today as it was in 1946.