Instructions: Language of the Computer

"I speak Spanish to God, Italian to women, French to men, and German to my horse."

: Charles V, Holy Roman Emperor (1500–1558)

You might think that the languages of computers would be as diverse as those of people, but in reality computer languages are quite similar, more like regional dialects than independent languages. Hence, once you learn one, it is easy to pick up others.

In this article, we'll explore the MIPS instruction set, an elegant example of the instruction sets designed since the 1980s. We'll see how high-level code becomes the binary patterns that processors actually execute.

What we're building understanding of

By the end, you'll understand:

  • How arithmetic operations work at the hardware level
  • The 32 MIPS registers and their purposes
  • How memory is addressed and accessed
  • Binary and two's complement number representation
  • How assembly instructions become machine code
  • The stored-program concept that makes computers universal

Operations of the Computer Hardware

"There must certainly be instructions for performing the fundamental arithmetic operations."

: Burks, Goldstine, and von Neumann, 1946

Every computer must be able to perform arithmetic. The MIPS assembly language notation for addition is:

add a, b, c    # a = b + c

This instructs the computer to add the two variables b and c and put their sum in a.

This notation is rigid: each MIPS arithmetic instruction performs only one operation and must always have exactly three variables. Want to add four variables? You need multiple instructions:

add a, b, c    # a = b + c
add a, a, d    # a = a + d = b + c + d
add a, a, e    # a = a + e = b + c + d + e

Why this rigidity? Requiring every instruction to have exactly three operands conforms to a fundamental principle of hardware design:

Design Principle 1

Simplicity favors regularity.

Hardware for a variable number of operands would be more complicated than hardware for a fixed number. Keeping the instruction format regular makes the processor simpler, faster, and cheaper.

Compiling C to MIPS

Let's see how a compiler translates high-level code to assembly:

C to MIPS Compilation
C Code
f = (g + h) - (i + j);
MIPS Assembly
add $t0, $s1, $s2 # $t0 = g + h
Variable mapping: f→$s0, g→$s1, h→$s2, i→$s3, j→$s4

The compiler must break complex expressions into multiple instructions, creating temporary variables (stored in temporary registers) to hold intermediate results.


Operands: The Computer's Registers

The Problem

High-level languages have variables. But processors don't work with variables, where do the operands of arithmetic instructions come from?

Unlike high-level languages, the operands of arithmetic instructions are restricted. They must come from a limited number of special locations built directly into the hardware called registers.

Registers are the fastest form of storage in a computer. The MIPS architecture has 32 registers, each 32 bits wide. This 32-bit quantity is called a word.

Solution

Variables map to registers. The compiler assigns frequently-used variables to registers for fast access. Less frequently-used data lives in memory.

MIPS 32 Registers
$zero#0$at#1$v0#2$v1#3$a0#4$a1#5$a2#6$a3#7$t0#8$t1#9$t2#10$t3#11$t4#12$t5#13$t6#14$t7#15$s0#16$s1#17$s2#18$s3#19$s4#20$s5#21$s6#22$s7#23$t8#24$t9#25$k0#26$k1#27$gp#28$sp#29$fp#30$ra#31
Saved ($s)
Temporary ($t)
Arguments ($a)
Special

Why only 32 registers? This brings us to our second design principle:

Design Principle 2

Smaller is faster.

A very large number of registers would increase the clock cycle time because electronic signals take longer to travel farther. The designer must balance the programmer's desire for more registers with the need for speed.

Register naming conventions

MIPS uses two-character names following a dollar sign. The convention reflects usage:

  • $s0$s7: Saved registers for long-lived variables
  • $t0$t9: Temporaries for intermediate calculations
  • $a0$a3: Arguments to functions
  • $v0$v1: Return values from functions
  • $zero: Hardwired to constant 0

Using Registers

Let's compile a C assignment using actual register names. If variables f, g, h, i, and j are assigned to registers $s0$s4:

f = (g + h) - (i + j);

Compiles to:

add $t0, $s1, $s2    # $t0 = g + h
add $t1, $s3, $s4    # $t1 = i + j
sub $s0, $t0, $t1    # f = $t0 - $t1

Memory Operands

Programming languages have arrays and structures that contain far more data than 32 registers can hold. This data lives in memory, a large, single-dimensional array where each location has an address.

MIPS includes data transfer instructions that move data between memory and registers:

Memory Addressing
Base Register ($s3)
Offset
# Effective address
100 + 8 = 108
# Instruction
lw $t0, 8($s3)
Memory (Word Aligned)
10042104710813112991162561201024124812864
Addresses are multiples of 4 (word alignment)

Byte Addressing and Alignment

MIPS addresses individual bytes, not words. Since a word is 4 bytes, word addresses are always multiples of 4. This is called alignment.

# Array A starts at address in $s3
# To access A[8], we need byte offset 8 × 4 = 32
 
lw $t0, 32($s3)    # $t0 = A[8]

The address is computed as: base register + offset. The offset (32) is added to the contents of $s3 to form the memory address.

Why byte addressing?

8-bit bytes are useful for characters and many other purposes. Virtually all modern architectures address individual bytes. The tradeoff is that accessing a 32-bit word requires multiplying the index by 4.

Immediate Operands

Many programs use small constants. Loading them from memory would be wasteful. MIPS provides immediate instructions with constants built in:

addi $s3, $s3, 4    # $s3 = $s3 + 4

The constant 0 is so common that MIPS dedicates register $zero to always hold 0. This makes operations like "move" trivial:

add $t0, $s0, $zero    # $t0 = $s0 (copy)

Signed and Unsigned Numbers

Computers represent numbers as sequences of bits. A 32-bit word can represent 2³² different patterns. How we interpret these patterns determines whether they're signed or unsigned.

Binary / Hex Converter
Decimal Value
Bit Width
Representation
Binary (click bits to toggle)
0
0
0
0
1
0
1
1
Hex
0x0B
Range
0 to 255

Two's Complement

For signed integers, MIPS uses two's complement representation. The key insight: the leftmost bit indicates sign (0 = positive, 1 = negative).

To negate a number in two's complement: invert all bits, then add 1.

# Negate 2 (in 8 bits)
2  = 0000 0010

# Step 1: Invert all bits
     1111 1101

# Step 2: Add 1
     1111 1110 = -2

Two's complement has one negative number (-2³¹) with no positive counterpart. But it makes addition and subtraction simpler in hardware, you don't need separate circuits for signed and unsigned operations.

Sign Extension

When a 16-bit immediate needs to be used with 32-bit registers, the sign bit is replicated to fill the extra bits. This preserves the number's value.

Sign Extension
16-bit representation
1111111111111110
32-bit sign extended
11111111111111111111111111111110
The sign bit is copied 16 times to fill the upper half, preserving the number's value.

Representing Instructions in the Computer

The Problem

We've written instructions as text like add $t0, $s1, $s2. But computers only understand binary. How do we represent instructions as numbers?

Instructions are stored as 32-bit binary patterns. Each instruction is divided into fields that encode the operation and operands.

Solution

MIPS uses fixed-size 32-bit instructions with standardized field layouts. Register names become 5-bit numbers, operations become opcodes, and the pieces combine into machine code.

MIPS Instruction Formats
R-type (Register)
Arithmetic and logical operations between registersadd $t0, $s1, $s2
op6 bitsrs5 bitsrt5 bitsrd5 bitsshamt5 bitsfunct6 bits
FieldBitsDescription
op6Opcode (always 0 for R-type)
rs5First source register
rt5Second source register
rd5Destination register
shamt5Shift amount
funct6Function code
Total: 32 bits = 6 + 5 + 5 + 5 + 5 + 6

The MIPS Fields

For R-type instructions (register operations):

For I-type instructions (immediate operations and memory access):

Design Principle 3

Good design demands good compromises.

MIPS uses different formats for different instruction types. This complicates the hardware slightly, but the alternative, limiting immediates to 5 bits or having variable-length instructions, would be worse.

Try the Encoder

Enter any MIPS instruction to see how it encodes to binary:

MIPS Instruction Encoder
R-type Format
0op6b17rs5b18rt5b8rd5b0shamt5b32funct6b
Binary: 000000 10001 10010 01000 00000 100000
Hex: 0x02324020

Logical Operations

Besides arithmetic, processors need operations that work on individual bits:

sll $t0, $s0, 4    # $t0 = $s0 << 4 (shift left)
srl $t0, $s0, 4    # $t0 = $s0 >> 4 (shift right)
and $t0, $s0, $s1  # $t0 = $s0 & $s1 (bitwise AND)
or  $t0, $s0, $s1  # $t0 = $s0 | $s1 (bitwise OR)
nor $t0, $s0, $s1  # $t0 = ~($s0 | $s1) (bitwise NOR)

Shift left by n bits multiplies by 2ⁿ. Shift right divides. These are much faster than actual multiplication and division.

Why NOR instead of NOT?

MIPS doesn't have a NOT instruction, but NOR with $zero achieves the same effect:

nor $t0, $s0, $zero    # $t0 = ~$s0

This is another example of keeping the instruction set minimal while remaining powerful.


Making Decisions

Computers need conditional execution. MIPS provides branch instructions that compare registers and jump if a condition is true:

beq $s0, $s1, Label    # if ($s0 == $s1) goto Label
bne $s0, $s1, Label    # if ($s0 != $s1) goto Label

Combined with slt (set on less than), these handle all comparisons:

slt $t0, $s0, $s1    # $t0 = ($s0 < $s1) ? 1 : 0
bne $t0, $zero, Less # if $s0 < $s1, goto Less

Loops in Assembly

A simple while loop in C:

while (save[i] == k)
    i += 1;

Becomes:

Loop:
    sll  $t1, $s3, 2      # $t1 = i × 4
    add  $t1, $t1, $s6    # $t1 = address of save[i]
    lw   $t0, 0($t1)      # $t0 = save[i]
    bne  $t0, $s5, Exit   # if save[i] != k, exit
    addi $s3, $s3, 1      # i += 1
    j    Loop             # repeat
Exit:

Supporting Procedures

Procedures (functions) require saving and restoring state. MIPS provides:

The caller places arguments in $a0$a3. The callee returns results in $v0$v1.

# Calling a procedure
addi $a0, $zero, 5    # argument = 5
jal  factorial        # call factorial(5)
# $v0 now contains the result
 
factorial:
    # procedure body
    jr $ra            # return
The stack

When procedures call other procedures, or need more than 4 arguments, they use the stack, a region of memory pointed to by $sp.

Saved registers ($s0$s7) must be preserved across calls. If a procedure uses them, it must save the old values on the stack and restore them before returning.


The Stored-Program Concept

We've seen that instructions are just numbers. This leads to a profound insight:

  1. Instructions are represented as numbers.
  2. Programs are stored in memory to be read or written, just like data.

This is the stored-program concept, and it makes computers universal.

A computer's memory can contain:

All of it is just numbers in memory. The same hardware can run any program.


From C to Execution

Let's trace the complete journey from high-level code to execution:

  1. C Compiler translates C to assembly language
  2. Assembler translates assembly to machine code (object files)
  3. Linker combines object files and libraries into an executable
  4. Loader places the executable in memory and starts execution
# The journey of: a = b + c

# 1. C code
a = b + c;

# 2. Assembly (compiler output)
add $s0, $s1, $s2

# 3. Machine code (assembler output)
0000 0010 0011 0010 1000 0000 0010 0000

# 4. Hexadecimal
0x02328020

Summary

We've explored how computers understand and execute instructions:

These principles, established in the 1940s, still underpin every computer today. The specifics may vary, ARM, x86, RISC-V, but the core ideas remain the same.

The really decisive considerations in selecting an instruction set are simplicity of the equipment and the clarity of its application to important problems, as true today as it was in 1946.