Inside asyncio: how Python pauses and resumes your code

You write await fetch_data() and your code pauses. When the data arrives, it resumes right where it left off, local variables intact, as if nothing happened. But while it was paused, your function wasn't on the call stack. The CPU was doing other work. So how did Python bring everything back?

How? Where did the local variables go? What does "pause" even mean at the implementation level? And what's actually watching for that data to arrive?

Most explanations of async Python teach you the syntax: put async before def, sprinkle await in front of I/O calls, run it with asyncio.run(). That's useful, but it leaves a black box at the center. This article opens that box. We're going to build up the machinery of asyncio from its foundations, one piece at a time, starting with a feature of Python you might already know: generators.

What this article covers

This is not a tutorial on how to use asyncio. If you want practical patterns (async HTTP requests, error handling, when to use asyncio vs multiprocessing), see the companion article on async I/O. This article is about the machinery underneath: how Python implements pausing, resuming, I/O watching, and task scheduling at the language and runtime level.

The problem: functions run to completion

When you call a function, it runs all the way through. It starts at the first line, executes each statement in order, and returns. The local variables live on the call stack, and when the function returns, they're gone.

def compute(x):
    a = x * 2
    b = a + 10
    return b

Call compute(5), and a, b, and x exist for exactly the duration of that call. The moment return b executes, the stack frame is destroyed.

This is fine for computation. But what about a function that needs to wait for a network response halfway through? You don't want to block the entire thread. You want to pause the function, let other code run, and come back later when the data arrives. That means you need the local variables to survive even after the function "leaves" the call stack.

Problem

Normal function calls are all-or-nothing. The function runs, returns, and its state is destroyed. How do we create functions that can pause mid-execution and resume later with their state intact?

Python already has a mechanism for this. You've probably used it for lazy iteration without realizing it also solves the pausing problem: generators.

Generators: functions that pause

A generator function looks like a regular function, but it uses yield instead of (or in addition to) return.

def running_total():
    total = 0
    while True:
        n = yield total   # yield sends total OUT, send() pushes n IN
        total += n

When you call running_total(), Python does not execute the function body. Instead, it creates a generator object, a frozen snapshot of the function that hasn't started yet. The code only runs when you advance the generator, and it runs only until it hits yield. Then it pauses and hands control back to you.

The generator object holds the function's frame, including all local variables, the instruction pointer (which line to resume at), and the full execution context. The function doesn't live on the call stack while it's paused. It lives inside the generator object, waiting to be resumed.

There are two ways to advance a generator. next() resumes it and discards the yield expression's value. send(value) resumes it and makes the yield expression evaluate to value, which is how you push data back into a paused generator.

gen = running_total()
next(gen)           # advance to first yield → receives 0
gen.send(10)        # n = 10, total becomes 10 → receives 10
gen.send(25)        # n = 25, total becomes 35 → receives 35

Why send() requires next() first

You must call next() (or send(None)) once before you can send a real value. The generator hasn't started executing yet, so there's no yield expression waiting to receive a value. The first next() advances to the first yield. After that, send() works. This is why asyncio internally calls send(None) to start a coroutine.

yield pushes values out, send() pushes values in. The generator pauses at each yield, and the caller decides when to resume it and what value to inject. Asyncio uses the same pattern: a coroutine yields control to the event loop, and the event loop sends the I/O result back when it's ready.

You can step through the demo below to see how the generator's state persists between send() calls. Notice that total keeps its value across yields. The function doesn't restart from the top each time, it resumes at the exact line where it paused.

Solution

Generators solve the pausing problem. When a generator yields, Python saves its entire execution state inside the generator object. The function leaves the call stack, but its state survives. Calling next() or send() puts it back on the stack and resumes from where it left off.

You can verify this yourself. Every generator object has a gi_frame attribute that holds the saved frame, and inspect.getgeneratorstate() tells you whether it's been started yet.

import inspect
 
def running_total():
    total = 0
    while True:
        n = yield total
        total += n
 
gen = running_total()
print(inspect.getgeneratorstate(gen))   # GEN_CREATED
print(gen.gi_frame.f_locals)            # {}
 
next(gen)
print(inspect.getgeneratorstate(gen))   # GEN_SUSPENDED
print(gen.gi_frame.f_locals)            # {'total': 0}
 
gen.send(10)
print(gen.gi_frame.f_locals)            # {'total': 10, 'n': 10}
 
gen.close()
print(inspect.getgeneratorstate(gen))   # GEN_CLOSED
print(gen.gi_frame)                     # None — frame is gone

The frame is a real Python frame object, the same kind that appears in tracebacks. When the generator is suspended, the frame sits inside gi_frame with all its locals intact. When you call send(), Python pushes that frame back onto the call stack and resumes execution. When the generator is closed, the frame is released.

So we have functions that pause (yield), state that persists (generator object), and bidirectional communication (send()). What's left is formalizing this into a protocol that the event loop can drive.

The coroutine protocol: what await actually does

When you write async def, Python creates a coroutine function. Calling it returns a coroutine object, not a generator, but coroutines are built on the same machinery. They implement the same send(), throw(), and close() methods. The difference is they also implement __await__().

If you strip away the syntax sugar, await does something like this:

# What you write:
result = await fetch_data(url)
 
# What Python does (approximately):
_iter = fetch_data(url).__await__()
try:
    while True:
        _value = _iter.send(None)
        yield _value   # pass control to the outer awaiter
except StopIteration as e:
    result = e.value   # the return value

The __await__() method returns an iterator. Python drives that iterator with send(), exactly like a generator. When the coroutine returns, it raises StopIteration with the return value smuggled inside the exception's .value attribute. That's how the return value gets back to the caller.

The demo below breaks this down phase by phase.

The await keyword is doing the same thing as iterating over a generator: calling send() to resume, receiving yielded values, and catching StopIteration when it's done. The event loop sits at the top of this process. It calls send() on the top-level coroutine, which calls send() on the next one down, forming a chain that reaches the actual I/O operation.

Problem

We can pause and resume coroutines. But what actually triggers the resume? When we await sock.recv(), something needs to notice that data has arrived on the socket. What is that something?

I/O multiplexing: how the loop watches for data

When a coroutine awaits a socket read, the event loop needs to know when data arrives on that socket. It can't busy-loop checking because that would waste CPU. Instead, it asks the operating system: "Tell me when any of these sockets have data."

This is called I/O multiplexing, and the OS provides system calls for it: select(), poll(), epoll() (Linux), and kqueue() (macOS/BSD). Python wraps all of these behind the selectors module, which picks the best one for your OS.

You register file descriptors (sockets, pipes, files, anything the OS gives you a number for) with the selector, and then call selector.select(). This call blocks until at least one descriptor is ready, at which point the OS wakes you up and tells you which ones.

import selectors
 
sel = selectors.DefaultSelector()
sel.register(sock_a, selectors.EVENT_READ, data=callback_a)
sel.register(sock_b, selectors.EVENT_READ, data=callback_b)
 
# This blocks until at least one socket has data
ready = sel.select()  # returns list of (key, events) pairs
for key, events in ready:
    key.data()  # call the callback

selector.select() is the system call that makes asyncio work. It allows a single thread to wait for thousands of connections simultaneously. Instead of one thread per socket, you have one thread watching all of them.

Click on the sockets below to simulate data arriving, then call selector.select() to see which ones the OS reports as ready.

When a coroutine awaits sock.recv(), the event loop registers that socket's file descriptor with the selector. The coroutine is now suspended. When selector.select() returns with that file descriptor in the ready list, the event loop calls send(data) on the waiting coroutine, resuming it with the received data.

select() vs epoll()

select() is the oldest and most portable, but it's slow: it checks every registered descriptor on each call. epoll() (Linux) and kqueue() (macOS) are event-driven and only report descriptors that actually changed. For a handful of sockets, it doesn't matter. For thousands, epoll() is orders of magnitude faster. Python's selectors.DefaultSelector() automatically picks the best option.

The event loop: one iteration

The event loop is a while True loop that repeats these steps:

Pop a task from the ready queue
Run it by calling send() on its coroutine
If the coroutine yields (awaits I/O), register its socket with the selector
If the coroutine completes (raises StopIteration), mark the task as done
When the ready queue is empty, call selector.select() to wait for I/O
For each ready descriptor, wake up the waiting task and put it back in the ready queue

A simplified implementation:

from collections import deque
import selectors
 
class EventLoop:
    def __init__(self):
        self.ready = deque()           # tasks ready to run
        self.selector = selectors.DefaultSelector()
 
    def run(self):
        while self.ready or self.selector.get_map():
            # Run all ready tasks
            while self.ready:
                task = self.ready.popleft()
                try:
                    # Resume the coroutine
                    fd = task.coro.send(task.result)
                    # Coroutine yielded a file descriptor — wait for I/O
                    self.selector.register(fd, selectors.EVENT_READ, data=task)
                except StopIteration:
                    task.set_done()
 
            # Nothing ready — wait for I/O events
            for key, events in self.selector.select():
                task = key.data
                self.selector.unregister(key.fileobj)
                task.result = read_data(key.fileobj)
                self.ready.append(task)

The demo below shows tasks moving between the ready queue, execution, and I/O wait.

There's no threading, no locking, no shared mutable state. It's just a queue, a selector, and a loop. Each task runs until it voluntarily yields, and the loop picks up the next one. This is cooperative multitasking: the tasks cooperate by yielding at appropriate points.

The await chain: nested coroutines

In practice, await calls are nested. You write await fetch_page(url), and inside fetch_page there's await http_get(url), and inside that there's await sock.recv(4096). How does this chain work?

Each await delegates to the next coroutine down. When main() awaits fetch_page(), it calls fetch_page().__await__() and iterates over it. When fetch_page() awaits sock.recv(), it does the same thing one level deeper. The event loop sits at the top and only ever calls send() on the outermost coroutine, but the value propagates all the way down to wherever the chain is actually suspended.

When the innermost coroutine completes, its return value propagates back up the chain via StopIteration. Each level catches the exception, extracts the value, and returns it to its own caller. The whole chain unwinds in one go.

Because of this layering, each function doesn't need to know about the event loop. It just awaits things, and the protocol handles the rest. The chain of __await__() calls creates a pipeline from the event loop down to the actual I/O operation, and the values flow back up the same pipeline when the I/O completes.

What about asyncio.sleep()?

asyncio.sleep() doesn't use a selector. Instead, it schedules a callback on the event loop's timer heap (a min-heap sorted by wake-up time). Before each selector.select() call, the loop checks the heap and wakes up any expired timers. This is why await asyncio.sleep(0) is useful: it yields control without any I/O, letting other tasks in the ready queue run before you continue.

Tasks: wrapping coroutines for scheduling

A bare coroutine is just a pausable function. To schedule it on the event loop, asyncio wraps it in a Task object. A Task tracks the coroutine's state (pending, running, done, cancelled), stores its result or exception, and manages callbacks that fire when it completes.

When you call asyncio.create_task(coro()), the event loop creates a Task and immediately puts it in the ready queue. The next time the loop iterates, it pops the task and calls send() on its coroutine. If the coroutine yields, the task moves to the I/O waiting set. When the I/O completes, the task moves back to the ready queue.

It's round-robin scheduling. Tasks take turns running, and each task runs until it hits an await that can't be resolved immediately. No task can hog the CPU indefinitely (unless it does CPU-bound work without yielding, which is exactly why you shouldn't do heavy computation inside async functions).

The tasks cycle through three states: ready, running, and I/O wait. No task runs while another is running, because each task voluntarily yields when it hits an await. If a task never awaits, it blocks the entire loop and no other task gets to run. That's the trade-off of single-threaded concurrency.

The full picture

Let's trace through the entire stack, from your code all the way down to the OS:

You write result = await fetch_page(url)
Python calls fetch_page(url).__await__(), getting an iterator
The event loop calls send(None) on the outermost iterator
The call propagates down the await chain to sock.recv()
sock.recv() yields the socket's file descriptor to the event loop
The event loop registers that fd with the selector and pauses the task
The loop runs other tasks from the ready queue
When the ready queue empties, the loop calls selector.select()
The OS wakes the loop when data arrives on the socket
The loop calls send(data) on the paused task
The data propagates back up the await chain
fetch_page() processes the data and returns
StopIteration carries the return value back to your result variable

That's it. There are no threads, no special runtime tricks. It's generators, an iterator protocol, and a system call that watches file descriptors. Each layer (your code, the library, the event loop, the OS) only needs to know about the layer directly below it. Your code just awaits things, the library manages sockets, the event loop manages scheduling, and the OS manages hardware.

The next time you write await, you'll know what's happening underneath: a generator pauses, a selector watches, and a send() resumes.