Back to Blog

Serial and Parallel Execution

Imagine you're a chef following a recipe. You read the first instruction, do it, read the next, do that, and so on until the dish is ready. That's how a computer works at its most basic level: it fetches an instruction, figures out what it means, executes it, and moves on to the next one. This loop, the fetch-execute cycle, is the heartbeat of every processor.

Step through the cycle and watch how each stage must finish before the next one begins:

The Fetch-Execute Cycle
Fetch1Decode2Execute3Store4

Each instruction passes through four stages. The processor cannot skip ahead or work on two instructions at once. This constraint, doing one thing at a time, is where our story starts.


One thing at a time

When we write programs, we break problems into a series of small tasks and execute them one after another, or serially. The first task runs to completion, then the second, then the third. No overlap.

Hit Run and watch the four tasks execute in sequence:

Serial Execution
CPUTask ATask BTask CTask D0.0s

Notice how Task B can't start until Task A finishes. The processor is busy the entire time, but it's only ever working on one task. If each task takes 2 seconds, four tasks take 8 seconds. Simple arithmetic, no surprises.

This model is called serial execution, and it's how most programs work. It's predictable: you never have to worry about whether a previous step finished, because the next step won't start until it does.

Why serial execution persists

Serial execution is the default because it's simple. There's no need to coordinate between tasks, no risk of one task interfering with another. For many programs, this simplicity is more valuable than speed.


When order matters

Serial execution isn't just a choice. Sometimes the problem demands it.

Play a few moves of tic-tac-toe. Each move depends on the previous one: you can't decide where to place your mark without seeing what your opponent did. The game is inherently sequential, each step needs the result of the step before it.

Sequential Dependencies

Programs with this kind of chain, where each task depends on the output of the previous one, are called sequential computations. No matter how many processors you have, you can't speed them up by doing multiple steps at once. The dependency between tasks is the bottleneck, not the hardware.

But here's the thing: most real problems aren't like tic-tac-toe. In most programs, many tasks are completely independent of each other. And that opens the door to something faster.


More workers, less time

Think about it this way. You have four loads of laundry and one washing machine. Each load takes 30 minutes, so you're stuck waiting two hours. But what if you went to a laundromat with four machines? All four loads run at the same time, and you're done in 30 minutes.

That's parallel execution: multiple tasks running simultaneously on multiple processors. Toggle between 1, 2, and 4 workers and watch the difference:

Parallel Execution
W10.0s

With 1 worker, the tasks run one after another, the same serial timeline from before. With 4 workers, all four tasks run at the same time, and the total time drops from 8 seconds to 2. The speedup is proportional to the number of workers.

Solution

Parallel execution requires two things: independent tasks (so workers don't need to wait for each other) and multiple processors (the physical hardware to run them simultaneously).

Tasks that require little or no coordination between processors are sometimes called embarrassingly parallel. Password cracking, rendering pixels, searching through data: these problems split naturally into independent chunks.


Parallelism in action

Here's a concrete example. Suppose you need to crack a 4-digit PIN by brute force: try every combination from 0000 to 9999, hash each one, and compare it to the target. Sequentially, you check one combination at a time. In parallel, you split the range across multiple cores and check them simultaneously.

Switch between sequential and parallel mode and compare the time:

Brute-Force Password Cracking
Guess: ----0 / 10,0000ms
target = sha256(???)
...

Each password check is completely independent. You don't need to know whether 0000 was correct before checking 0001. The problem decomposes perfectly, which means throwing more cores at it gives you a near-linear speedup.


There's a catch

One mother can deliver a baby in nine months. Can nine mothers deliver a baby in one month?

It seems like we could just keep adding processors to make any program run faster. But Gene Amdahl spotted the flaw in that reasoning. Most programs have some portion that must run sequentially, the part that can't be parallelized. And that sequential portion puts a hard ceiling on how much speedup you can achieve.

Consider sorting index cards to find those about concurrency. Dividing the pile into stacks and collecting the results at the end are inherently serial. Only the searching step parallelizes. Even if you recruit a hundred friends to search, you still have to divide and collect.

Drag the sliders to see how the sequential fraction limits the speedup, no matter how many processors you add:

Amdahl's Law
12x1x10.0xProcessorsmax6.40x

The formula is Speedup = 1 / (S + (1-S)/N), where S is the sequential fraction and N is the number of processors. If 10% of your program is sequential, you can never exceed 10x speedup, even with infinite processors. The curve flattens out and hits a wall.

The mall analogy

Hundreds of people can shop in a mall simultaneously. But when it's time to pay, lines form because there are fewer cashiers than shoppers. The checkout is the sequential bottleneck, and no amount of extra floor space fixes it.


A different way to look at it

Amdahl's law can feel discouraging. But it makes one assumption: the problem size stays fixed. What if we change the question?

Instead of asking "how fast can we solve this problem?", ask "how much work can we do in the same amount of time?" If we keep the time fixed and add more processors, we can tackle bigger problems. The sequential portion stays the same size, but the parallel portion grows with more workers, so the sequential part becomes a smaller and smaller fraction of the total.

Toggle the worker count and watch the total work increase while the time stays fixed:

Gustafson's Law
WorkSeq (2)Parallel (12)Total work: 14 unitsThroughput: 2.8xFixed time

This is Gustafson's law, and it explains why supercomputers and distributed systems succeed with parallelism. They don't try to solve the same problem faster. They solve bigger problems in the same time. More processors means more data processed, more simulations run, more pixels rendered.


Dealing with things vs doing things

We've been talking about parallelism, doing multiple things at the same instant. But there's a related concept that often gets confused with it: concurrency.

Step through this demo and compare the two timelines:

Concurrency vs Parallelism
Concurrent(1 worker)Task ATask BParallel(2 workers)ABtime →

In the concurrent timeline, a single worker alternates between Task A and Task B. Both tasks make progress, but never at the same instant. The worker context-switches between them, like a cook who stirs the soup, then chops the salad, then stirs again.

In the parallel timeline, two workers run simultaneously. Task A and Task B progress at the same instant on separate tracks.

The distinction matters because they're solving different problems. Concurrency is about structuring a program to handle multiple things at once, it's a design decision. Parallelism is about executing multiple things at once, it's a hardware capability. A program can be concurrent on a single-core CPU (interleaving tasks), parallel on a multi-core CPU (simultaneous execution), both, or neither.

As Rob Pike put it: "Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once."


Where we've landed

We started with the simplest model: one processor, one instruction at a time, serial execution. We saw that some problems demand this sequential approach because each step depends on the last. But for independent tasks, we can use multiple processors to run them simultaneously, cutting the total time proportionally.

Amdahl's law showed us the ceiling: the sequential portion of a program limits how much we can speed it up. Gustafson's law showed us the escape hatch: scale the work, not just the time. And concurrency gave us a way to structure programs around multiple tasks even when we only have one processor to run them on.

In the next articles, we'll explore the hardware and runtime systems that make concurrency possible: processes, threads, and the coordination mechanisms that keep them from stepping on each other.