Instructions: Language of the Computer

"I speak Spanish to God, Italian to women, French to men, and German to my horse."

: Charles V, Holy Roman Emperor (1500–1558)

You might think that the languages of computers would be as diverse as those of people, but in reality computer languages are quite similar, more like regional dialects than independent languages. Hence, once you learn one, it is easy to pick up others.

In this article, we'll explore the MIPS instruction set, an elegant example of the instruction sets designed since the 1980s. We'll see how high-level code becomes the binary patterns that processors actually execute.

What we're building understanding of

By the end, you'll understand:

How arithmetic operations work at the hardware level
The 32 MIPS registers and their purposes
How memory is addressed and accessed
Binary and two's complement number representation
How assembly instructions become machine code
The stored-program concept that makes computers universal

Operations of the Computer Hardware

"There must certainly be instructions for performing the fundamental arithmetic operations."

: Burks, Goldstine, and von Neumann, 1946

Every computer must be able to perform arithmetic. The MIPS assembly language notation for addition is:

add a, b, c    # a = b + c

This instructs the computer to add the two variables b and c and put their sum in a.

This notation is rigid: each MIPS arithmetic instruction performs only one operation and must always have exactly three variables. Want to add four variables? You need multiple instructions:

add a, b, c    # a = b + c
add a, a, d    # a = a + d = b + c + d
add a, a, e    # a = a + e = b + c + d + e

Why this rigidity? Requiring every instruction to have exactly three operands conforms to a fundamental principle of hardware design:

Design Principle 1

Simplicity favors regularity.

Hardware for a variable number of operands would be more complicated than hardware for a fixed number. Keeping the instruction format regular makes the processor simpler, faster, and cheaper.

Compiling C to MIPS

Let's see how a compiler translates high-level code to assembly:

The compiler must break complex expressions into multiple instructions, creating temporary variables (stored in temporary registers) to hold intermediate results.

Operands: The Computer's Registers

The Problem

High-level languages have variables. But processors don't work with variables, where do the operands of arithmetic instructions come from?

Unlike high-level languages, the operands of arithmetic instructions are restricted. They must come from a limited number of special locations built directly into the hardware called registers.

Registers are the fastest form of storage in a computer. The MIPS architecture has 32 registers, each 32 bits wide. This 32-bit quantity is called a word.

Solution

Variables map to registers. The compiler assigns frequently-used variables to registers for fast access. Less frequently-used data lives in memory.

Why only 32 registers? This brings us to our second design principle:

Design Principle 2

Smaller is faster.

A very large number of registers would increase the clock cycle time because electronic signals take longer to travel farther. The designer must balance the programmer's desire for more registers with the need for speed.

MIPS uses two-character names following a dollar sign. The convention reflects usage:

$s0–$s7: Saved registers for long-lived variables
$t0–$t9: Temporaries for intermediate calculations
$a0–$a3: Arguments to functions
$v0–$v1: Return values from functions
$zero: Hardwired to constant 0

Using Registers

Let's compile a C assignment using actual register names. If variables f, g, h, i, and j are assigned to registers $s0–$s4:

f = (g + h) - (i + j);

Compiles to:

add $t0, $s1, $s2    # $t0 = g + h
add $t1, $s3, $s4    # $t1 = i + j
sub $s0, $t0, $t1    # f = $t0 - $t1

Memory Operands

Programming languages have arrays and structures that contain far more data than 32 registers can hold. This data lives in memory, a large, single-dimensional array where each location has an address.

MIPS includes data transfer instructions that move data between memory and registers:

lw (load word): Copies data from memory to a register
sw (store word): Copies data from a register to memory

Byte Addressing and Alignment

MIPS addresses individual bytes, not words. Since a word is 4 bytes, word addresses are always multiples of 4. This is called alignment.

# Array A starts at address in $s3
# To access A[8], we need byte offset 8 × 4 = 32
 
lw $t0, 32($s3)    # $t0 = A[8]

The address is computed as: base register + offset. The offset (32) is added to the contents of $s3 to form the memory address.

Why byte addressing?

8-bit bytes are useful for characters and many other purposes. Virtually all modern architectures address individual bytes. The tradeoff is that accessing a 32-bit word requires multiplying the index by 4.

Immediate Operands

Many programs use small constants. Loading them from memory would be wasteful. MIPS provides immediate instructions with constants built in:

addi $s3, $s3, 4    # $s3 = $s3 + 4

The constant 0 is so common that MIPS dedicates register $zero to always hold 0. This makes operations like "move" trivial:

add $t0, $s0, $zero    # $t0 = $s0 (copy)

Signed and Unsigned Numbers

Computers represent numbers as sequences of bits. A 32-bit word can represent 2³² different patterns. How we interpret these patterns determines whether they're signed or unsigned.

Two's Complement

For signed integers, MIPS uses two's complement representation. The key insight: the leftmost bit indicates sign (0 = positive, 1 = negative).

To negate a number in two's complement: invert all bits, then add 1.

# Negate 2 (in 8 bits)
2  = 0000 0010

# Step 1: Invert all bits
     1111 1101

# Step 2: Add 1
     1111 1110 = -2

Two's complement has one negative number (-2³¹) with no positive counterpart. But it makes addition and subtraction simpler in hardware, you don't need separate circuits for signed and unsigned operations.

Sign Extension

When a 16-bit immediate needs to be used with 32-bit registers, the sign bit is replicated to fill the extra bits. This preserves the number's value.

Representing Instructions in the Computer

The Problem

We've written instructions as text like add $t0, $s1, $s2. But computers only understand binary. How do we represent instructions as numbers?

Instructions are stored as 32-bit binary patterns. Each instruction is divided into fields that encode the operation and operands.

Solution

MIPS uses fixed-size 32-bit instructions with standardized field layouts. Register names become 5-bit numbers, operations become opcodes, and the pieces combine into machine code.

MIPS Instruction Formats

R-type (Register)

Arithmetic and logical operations between registers — add $t0, $s1, $s2

Field	Bits	Description
op	6	Opcode (always 0 for R-type)
rs	5	First source register
rt	5	Second source register
rd	5	Destination register
shamt	5	Shift amount
funct	6	Function code

Total: 32 bits = 6 + 5 + 5 + 5 + 5 + 6

The MIPS Fields

For R-type instructions (register operations):

op (6 bits): Operation code. Always 0 for R-type.
rs (5 bits): First source register (0–31)
rt (5 bits): Second source register
rd (5 bits): Destination register
shamt (5 bits): Shift amount (for shift instructions)
funct (6 bits): Function code (specifies exact operation)

For I-type instructions (immediate operations and memory access):

op (6 bits): Operation code (identifies instruction)
rs (5 bits): Source/base register
rt (5 bits): Destination or source register
immediate (16 bits): Constant or address offset

Design Principle 3

Good design demands good compromises.

MIPS uses different formats for different instruction types. This complicates the hardware slightly, but the alternative, limiting immediates to 5 bits or having variable-length instructions, would be worse.

Try the Encoder

Enter any MIPS instruction to see how it encodes to binary:

Logical Operations

Besides arithmetic, processors need operations that work on individual bits:

sll $t0, $s0, 4    # $t0 = $s0 << 4 (shift left)
srl $t0, $s0, 4    # $t0 = $s0 >> 4 (shift right)
and $t0, $s0, $s1  # $t0 = $s0 & $s1 (bitwise AND)
or  $t0, $s0, $s1  # $t0 = $s0 | $s1 (bitwise OR)
nor $t0, $s0, $s1  # $t0 = ~($s0 | $s1) (bitwise NOR)

Shift left by n bits multiplies by 2ⁿ. Shift right divides. These are much faster than actual multiplication and division.

Why NOR instead of NOT?

MIPS doesn't have a NOT instruction, but NOR with $zero achieves the same effect:

nor $t0, $s0, $zero    # $t0 = ~$s0

This is another example of keeping the instruction set minimal while remaining powerful.

Making Decisions

Computers need conditional execution. MIPS provides branch instructions that compare registers and jump if a condition is true:

beq $s0, $s1, Label    # if ($s0 == $s1) goto Label
bne $s0, $s1, Label    # if ($s0 != $s1) goto Label

Combined with slt (set on less than), these handle all comparisons:

slt $t0, $s0, $s1    # $t0 = ($s0 < $s1) ? 1 : 0
bne $t0, $zero, Less # if $s0 < $s1, goto Less

Loops in Assembly

A simple while loop in C:

while (save[i] == k)
    i += 1;

Becomes:

Loop:
    sll  $t1, $s3, 2      # $t1 = i × 4
    add  $t1, $t1, $s6    # $t1 = address of save[i]
    lw   $t0, 0($t1)      # $t0 = save[i]
    bne  $t0, $s5, Exit   # if save[i] != k, exit
    addi $s3, $s3, 1      # i += 1
    j    Loop             # repeat
Exit:

Supporting Procedures

Procedures (functions) require saving and restoring state. MIPS provides:

jal (jump and link): Calls a procedure, saving return address in $ra
jr (jump register): Returns to the saved address

The caller places arguments in $a0–$a3. The callee returns results in $v0–$v1.

# Calling a procedure
addi $a0, $zero, 5    # argument = 5
jal  factorial        # call factorial(5)
# $v0 now contains the result
 
factorial:
    # procedure body
    jr $ra            # return

The stack

When procedures call other procedures, or need more than 4 arguments, they use the stack, a region of memory pointed to by $sp.

Saved registers ($s0–$s7) must be preserved across calls. If a procedure uses them, it must save the old values on the stack and restore them before returning.

The Stored-Program Concept

We've seen that instructions are just numbers. This leads to a profound insight:

Instructions are represented as numbers.
Programs are stored in memory to be read or written, just like data.

This is the stored-program concept, and it makes computers universal.

A computer's memory can contain:

The source code for an editor
The compiled machine code for that editor
The text document being edited
The compiler that produced the machine code

All of it is just numbers in memory. The same hardware can run any program.

From C to Execution

Let's trace the complete journey from high-level code to execution:

C Compiler translates C to assembly language
Assembler translates assembly to machine code (object files)
Linker combines object files and libraries into an executable
Loader places the executable in memory and starts execution

# The journey of: a = b + c

# 1. C code
a = b + c;

# 2. Assembly (compiler output)
add $s0, $s1, $s2

# 3. Machine code (assembler output)
0000 0010 0011 0010 1000 0000 0010 0000

# 4. Hexadecimal
0x02328020

Summary

We've explored how computers understand and execute instructions:

Instruction sets define the vocabulary a processor understands. MIPS uses simple, regular instructions with three operands.
Registers provide fast, limited storage for operands. Memory holds data that doesn't fit in registers.
Binary representation uses two's complement for signed integers, allowing the same circuits for addition and subtraction.
Instruction formats encode operations as 32-bit binary patterns. R-type for register operations, I-type for immediates and memory.
The stored-program concept means programs are data. This makes computers universal machines.

These principles, established in the 1940s, still underpin every computer today. The specifics may vary, ARM, x86, RISC-V, but the core ideas remain the same.

The really decisive considerations in selecting an instruction set are simplicity of the equipment and the clarity of its application to important problems, as true today as it was in 1946.