Memory in Computer Architecture

From tiny registers to massive hard disks: how computers organize memory and why speed always costs more than size.

Every computer must remember things. Your code. Your data. The screen you're looking at. But remembering comes with a cost: memory can be fast or cheap, rarely both. This tension shapes everything about how computers work.

What we're building intuition for

Understanding memory helps you write faster code by keeping data in CPU caches, understand why your program slows down at certain operations, and see the core tradeoff in computing: speed vs. cost.

The memory hierarchy

The Problem

We need storage that's both fast and enormous. But fast storage is expensive and expensive storage is slow. How do we build a computer that's both quick and spacious?

The answer: we don't pick one. We use many. Computers are built in layers, from tiny fast memories near the CPU to vast slow memories on disk.

Think of a library. The book you're reading right now? On your desk. Reference books you might need today? On the shelf. Archived books you might need someday? In the basement. As you move further down, storage gets cheaper but access takes longer.

Try It Out

Click on different levels to simulate finding data there. Notice how accessing data in L1 Cache is nearly instant (~1ns), but finding data in SSD Storage takes over 50 microseconds. That's 50,000 times slower.

Each level caches copies of data from the level below, making frequently accessed data available faster. Your CPU has a few kilobytes of registers, some megabytes of cache, gigabytes of RAM, and terabytes on disk.

Why so many levels?

Economics. Registers and cache are made from expensive transistors. RAM is cheaper but slower. Disk is cheapest but much slower still. By using a pyramid, you get speed where it matters (the hot data) and capacity where you need it (the bulk storage).

The longer answer involves physics. As memory gets bigger, it gets slower. Electrons take time to move through wires. Capacitors take time to charge and discharge.

Primary memory: RAM and ROM

Primary memory is what the CPU can directly address. This is where your program runs, where variables live, where the stack and heap exist.

RAM (Random Access Memory)

RAM is readable, writable, and volatile: turn off power and it all vanishes. Each byte has an address. Access any address and you get the data in roughly the same time (hence "random" access).

A 32-bit address space gives you 2^32 = 4 billion addresses, each holding 1 byte, for 4GB total. A 64-bit address space is astronomically larger.

How does RAM actually work?

RAM is typically built using capacitors, two metal plates with insulation between them. Apply a voltage and the plates store charge. Remove the voltage and the charge leaks away.

Try It Out

Step through to see how a DRAM capacitor stores data. Watch the leakage step carefully: this is why RAM needs constant refreshing. Without it, your data would gradually fade away.

This is why DRAM must be "refreshed" constantly. Every ~64 milliseconds, the memory controller reads each bit and writes it back.

Static RAM

Cache memory uses a different technology: SRAM. Instead of capacitors, SRAM uses flip-flops (the same circuits that make CPU registers). This makes SRAM much faster but also much more expensive.

The rowhammer vulnerability

Modern DRAM is so tightly packed that electric fields between neighboring capacitors can interfere. If an attacker rapidly accesses one row of memory, they can corrupt neighboring rows. It's a physical security problem: computers are so small and memory is so dense that we've hit limits imposed by physics, not just engineering.

ROM (Read-Only Memory)

ROM is permanent. Turn off the power and it survives. Its contents are baked in at the factory and can't be changed (or at least, not easily). Your computer's boot firmware lives in ROM.

Modern machines use flash memory and EEPROMs, which can be rewritten. Your SSD is basically a huge, fast-ish flash memory.

Memory addressing

How do computers distinguish between different memory locations? They use addresses, just like houses on a street. But there's a quirk in how those addresses work.

Modern computers use byte addressing: each byte (8 bits) has its own address. When you store a multi-byte value like 0x12345678, which byte goes where? There are two conventions.

Try It Out

Toggle between Big Endian and Little Endian to see how the same value (0x12345678) is stored differently. Big Endian reads naturally left-to-right, while Little Endian puts the least significant byte at the lowest address.

Big endian stores the most significant byte first. It reads naturally, like Western writing. Little endian stores the least significant byte first. It looks backwards, but arithmetic operations find the units place immediately at offset 0.

// Big endian: most significant byte first
// Address 0x1000: 0x12  0x1001: 0x34  0x1002: 0x56  0x1003: 0x78
 
// Little endian: least significant byte first
// Address 0x1000: 0x78  0x1001: 0x56  0x1002: 0x34  0x1003: 0x12

Intel chose little endian. Most networking protocols chose big endian. When two machines try to talk, they have to convert. Little endian won on desktops because Intel dominated, not because it's inherently better.

Caches: the middleman

The Problem

RAM is 100x slower than the CPU. Even a simple operation waiting for memory is a huge bottleneck. How do we make memory feel fast without making it actually fast?

Keep a small amount of fast memory near the CPU and store copies of frequently used data there. This is a cache.

The strategy hinges on locality. Programs don't access memory randomly. They reuse data (temporal locality), access nearby data (spatial locality), and repeat patterns (sequential locality). Caches exploit these patterns by keeping copies of recently accessed data nearby.

Try It Out

Click 0x1000, 0x1004, 0x1008 in order: you'll see hits after the first load. Then click 0x2000: cache miss. Alternating between the two ranges drops your hit rate.

Cache organizations

There are three main ways to organize a cache, each making different tradeoffs.

Direct mapped maps each address to exactly one cache line. Simple and fast, but inflexible: two addresses that hash to the same slot constantly evict each other.

Fully associative lets any address go in any line. Flexible but expensive: checking every line takes hardware.

Set associative is the compromise: divide the cache into sets. Any address can go in any line within its set, but you only check one set, not the whole cache.

// Direct mapped: hash(address) determines the cache line
cache_line = hash(address) % cache_size
 
// Problem: address1 and address2 both hash to line 0
// They fight for the same slot in tight loops

Cache hits and misses

Most programs have hit rates above 90%. The remaining 10% of accesses go to RAM at 100x cost. One algorithm might have a 95% hit rate and another 70%, leading to 10x differences in speed.

How cache writes work

The Problem

When the CPU writes data to cache, when should that data also be written to main memory? Immediately (simple but slow) or later when the cache line is evicted (fast but risky)?

Reading from cache is straightforward: if it's there, great. If not, fetch it. Writing is trickier. When you store a value in cache, you also need to eventually update main memory.

Try It Out

Step through in Write-Through mode: both cache and memory update simultaneously. Then switch to Write-Back and notice the DIRTY flag. Memory stays stale until the cache line is evicted.

Write-through writes to both cache and RAM on every store. Simple, always consistent, but every write waits for slow RAM.

Write-back writes to cache only and marks the line as "dirty". The data gets flushed to RAM later, when the cache needs space. Faster writes, but memory can be stale. This is why sudden power loss can corrupt data: unflushed writes never made it to disk.

// Write-through: always write to both
cache[address] = value;
mainMemory[address] = value;
 
// Write-back: write to cache only, flush later
cache[address] = value;
cache.markDirty(address);
// On eviction: if dirty, write back to mainMemory

Modern CPUs use write-back because the speed gains outweigh the complexity.

Secondary storage: disks

Everything below main RAM is secondary storage. It's not directly accessible by the CPU. To use it, the CPU has to load it into RAM first.

Hard Disk Drives (HDDs)

An HDD is a spinning magnetic platter with a read-write head on an arm. To access a specific bit: move the arm to the right track (seek time), wait for the platter to spin the right sector under the head (rotational latency), then read the data.

Try It Out

Click different tracks and watch the read head move. Reading from a track far from the current position takes longer (higher seek time). SSDs eliminate both delays entirely by having no moving parts.

SSDs are faster because there's no spinning, no seeking, just electrical access. But they're still slow compared to RAM.

Magnetic tape

Magnetic tape is even cheaper than HDD but much slower. You have to scroll through the tape to find your data. It's still used for archival backups because it's so cheap ($10/TB) and survives decades of storage.

Memory performance in practice

Understanding the hierarchy helps explain real-world performance mysteries.

Cache lines

Caches don't store individual bytes. They store cache lines, typically 64 bytes. When you access one byte, the whole 64-byte line comes into cache. This is why sequential access is so efficient: you pay for the first byte, but get 63 free neighbors.

Try It Out

Click any byte in the array. The CPU fetches an entire 64-byte cache line, not just the single byte you requested. The green bytes are "free" data that came along for the ride.

Sequential vs random access

Sequential access exploits cache lines. Adjacent elements share lines, so after the first miss, the next several accesses are free hits. Random access jumps around, wasting cache lines and forcing more memory fetches.

Try It Out

Run the simulation in Sequential mode first, then try Random. Sequential access achieves 75%+ hits. Random access thrashes the cache.

// Sequential: ~95% hit rate, ~1 second
for (int i = 0; i < 1000000; i++)
  sum += array[i];
 
// Random: ~10% hit rate, ~100 seconds
for (int i = 0; i < 1000000; i++)
  sum += array[random() % 1000000];

The same 1 million accesses, 100x slower. The only difference is the access pattern.

Prefetching

Modern CPUs notice patterns (like sequential access) and fetch the next cache line before you ask for it. If they predict correctly, you get a free hit. If they predict wrong, the prefetched data just takes up space.

Virtual memory

Modern operating systems let programs pretend they have infinite memory. The OS maps virtual addresses to physical RAM, and when RAM runs out, it swaps data to disk.

This is transparent to the program. You allocate memory, use it, and the OS figures out where it actually lives. Each program has its own virtual address space.

But if a program accesses memory that's on disk instead of RAM, the OS has to fetch it. This is a page fault. It takes millions of cycles. If your program page faults frequently, it becomes glacially slow.

Why this matters

Memory is a hierarchy because of physics: fast is small, slow is big, and you can't have both.

Your code has to navigate this hierarchy carefully. Keep working data small (in caches). Access it sequentially (to exploit spatial locality). Reuse it often (to exploit temporal locality). When you violate these rules, your program becomes orders of magnitude slower.

Solution

The memory hierarchy isn't a limitation to overcome. It's how computers work. Master it and you write fast software. Ignore it and your program mysteriously slows down at scale.