How HTTP Works (and How It Got Faster)

Every time you open a webpage, your browser fetches dozens of files from a server: the HTML, the stylesheets, the scripts, the images. HTTP is the set of rules that governs how your browser asks for those files and how the server sends them back. It's a protocol, a shared language that both sides agree to follow so they can understand each other.

But HTTP doesn't work alone. Before the browser can ask for anything, it needs to establish a connection. And the way that connection works has shaped everything about how the web performs.

One request, one connection

HTTP runs on top of another protocol called TCP (Transmission Control Protocol). TCP is what actually delivers data between two computers over the internet, reliably and in order. But before any data can flow, TCP requires both sides to go through a short introduction ritual called a three-way handshake. Think of it like a phone call: you dial, the other person picks up, you confirm you can hear each other, and then you start talking.

Step through this to see what it looks like:

That handshake takes roughly one full roundtrip across the network. A roundtrip is the time for a message to go from client to server and for the response to come back. Across the Atlantic, a roundtrip takes about 28 milliseconds. On a mobile network, it can be 100ms or more.

In the earliest version of HTTP (1.0), every single request opened a new TCP connection. Ask for the HTML? Handshake, then request. Ask for the CSS? Another handshake, then request. Ask for each image? Handshake, handshake, handshake.

For a webpage with just three resources, that's three handshakes. For a modern page with 90+ resources (the average today), the handshake overhead alone adds seconds of latency.

There has to be a better way.

What if we left the line open?

Problem

Every request opens a new TCP connection. The handshake takes a full roundtrip, and for small resources, the handshake is more expensive than the actual data transfer.

HTTP/1.1 introduced keepalive connections (also called persistent connections). Instead of closing the connection after each response, the server leaves it open so the browser can send additional requests over the same connection. One handshake, many requests.

The savings scale with the number of resources. For N requests, you save N-1 handshakes. With 90 resources on a page and 28ms per roundtrip, that's roughly 2.5 seconds of latency eliminated just from avoiding redundant handshakes.

GET /page.html HTTP/1.1
Connection: keep-alive

GET /style.css HTTP/1.1
Connection: keep-alive

GET /app.js HTTP/1.1
Connection: keep-alive

// All three on the same TCP connection

Keepalive became the default in HTTP/1.1. Every modern browser uses it automatically. But look at the demo again. Even with keepalive, the requests happen one at a time. The client sends a request, waits for the response, then sends the next one. The connection is open, but it's sitting idle between each request-response cycle.

Can we do better?

Sending everything at once (and why it broke)

Problem

Even with keepalive, the client waits for each response before sending the next request. The connection sits idle during those waits. What if we sent all requests upfront?

That's the idea behind HTTP pipelining. The client sends multiple requests without waiting for any responses. The server processes them in order and sends responses back. In theory, this eliminates idle time on both ends.

In practice, pipelining has a fatal flaw. Step through this to see it:

The server must respond in the same order it receives requests. If the first response is slow to generate, every response behind it gets stuck in a queue. A fast 10ms CSS response can't be sent until the slow 100ms HTML response finishes. This is head-of-line blocking, and it's the reason pipelining never worked on the open web.

There are other problems too. Not all proxies (intermediary servers between the client and the origin) understand pipelining. Some serialize the requests, eliminating the benefit. Others close the connection unpredictably. If the connection drops mid-pipeline, the client can't tell which requests were already processed, making retries unsafe.

Because of all this, most browsers ship with pipelining disabled. It exists in the spec but it's effectively dead on the open web.

Pipelining can work in controlled environments

Apple reportedly saw 300% performance improvements using pipelining in iTunes, where they controlled both client and server. The key: they could handle edge cases like connection aborts and knew exactly which requests were safe to retry. On the open web, with unknown proxies and diverse servers, you can't make those guarantees.

So we're stuck. Keepalive reuses connections but requests are still sequential. Pipelining tries to parallelize but breaks in practice. How do developers get around this?

Working around the protocol

Problem

Browsers limit themselves to about 6 TCP connections per hostname. With 90 resources on a page and sequential requests per connection, most resources sit in a queue. How do we get more parallelism?

Since the protocol itself doesn't offer real parallelism, developers invented workarounds. These aren't elegant solutions. They're patches for protocol limitations.

Domain sharding

The browser's connection limit is per hostname, not per IP address. So if your resources come from different hostnames, each one gets its own pool of 6 connections.

Developers exploit this by spreading resources across subdomains: shard1.example.com, shard2.example.com, etc. These often point to the same server (via a DNS alias), but the browser treats each hostname independently.

More shards means more parallel connections, but each shard adds overhead. Every new hostname requires its own DNS lookup (translating the hostname to an IP address) and its own set of TCP handshakes. Many of those connections never last long enough to reach peak bandwidth because TCP starts slow and ramps up over time (a mechanism called slow start). In practice, 2-4 shards is the sweet spot. More than that and the overhead outweighs the benefit.

Concatenation and spriting

Instead of fetching 20 CSS files, combine them into one. Instead of 50 small icons, combine them into a single large image called a sprite sheet and use CSS to show just the right portion of it. Fewer requests means less protocol overhead.

/* Sprite sheet: single image, many icons */
.icon-home { background: url(sprite.png) 0 0; }
.icon-menu { background: url(sprite.png) -32px 0; }
.icon-user { background: url(sprite.png) -64px 0; }

The tradeoff: you lose cache granularity. Change one icon and the browser re-downloads the entire sprite. Change one line of CSS and the entire concatenated bundle is invalidated. The browser also has to decode the full sprite image even if only one icon is visible.

Resource inlining

For very small resources, embed them directly in the HTML using data URIs or inline <style> and <script> tags. This eliminates the request entirely but means the resource can't be cached separately. If the same CSS appears on every page, the browser downloads it again with each page load instead of caching it once.

All of these workarounds share the same pattern: developers restructuring their code and assets around the protocol's limitations. The code isn't organized for clarity or maintainability, it's organized for transport efficiency. That's a sign the protocol needs to change.

The hidden weight of every request

Even with fewer requests (thanks to concatenation and spriting), each request still carries surprising weight. Every HTTP/1.x request includes headers: the user-agent string, accepted content types, cookies, cache directives, and more. These are sent as uncompressed plain text.

POST /api HTTP/1.1
User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0)
Host: www.example.com
Accept: */*
Content-Length: 15
Content-Type: application/x-www-form-urlencoded

{"msg":"hello"}

That's 352 bytes of headers for a 15-byte payload. The metadata is 23 times larger than the actual data. And here's the thing: most of those headers are identical across every request. The same user-agent, the same host, the same accept types, sent over and over and over.

Cookies make it worse

Every cookie is sent with every request to that domain. A few kilobytes of tracking and session cookies can dominate the payload size, especially for small API calls. Developers work around this by serving static assets from a separate cookieless domain, so the browser doesn't waste bandwidth attaching irrelevant cookies to image requests.

We now have two problems stacked on top of each other. The protocol forces sequential requests (or at best, a limited number of parallel connections). And every request carries hundreds of bytes of redundant metadata. Workarounds help, but they add complexity and fragility. What if we redesigned the protocol to fix these problems at the root?

What if we started over?

Solution

Instead of patching around HTTP/1's limitations, redesign the protocol from scratch. Eliminate head-of-line blocking, enable true parallelism on a single connection, and compress those redundant headers.

That's exactly what happened. In 2009, Google built an experimental protocol called SPDY (pronounced "speedy") to test these ideas. Real-world results were striking: pages loaded up to 55% faster. The HTTP Working Group used SPDY as the starting point for HTTP/2, which was standardized in 2015.

The elegant part: HTTP/2 didn't change what HTTP means. GET requests, POST requests, status codes like 200 and 404, headers, URIs — all exactly the same. What changed is how the data is packaged and delivered over the wire. Your web applications don't need to change at all. The browser and server handle everything.

Binary framing: the foundation

The first change: HTTP/2 replaces human-readable text with a binary framing layer. Instead of sending requests as blocks of plain text, HTTP/2 splits everything into small binary frames, each with a type and a stream ID.

Why binary? Text protocols are ambiguous (optional whitespace, unclear boundaries between headers and body). Binary frames have fixed structures that are fast and unambiguous to parse. There's no confusion about where one message ends and another begins.

This might seem like a minor implementation detail, but it enables everything that follows. Because messages are now discrete frames with stream IDs, we can do something that was impossible with plain text.

Multiplexing: many requests, one connection

Because every frame is tagged with a stream ID, frames from different requests can be interleaved on the same connection. The client sends frames for the HTML, CSS, and JavaScript requests all mixed together. The server sends response frames the same way. Both sides reassemble each stream using the IDs.

This eliminates head-of-line blocking between streams. A slow HTML response doesn't block CSS or JavaScript frames from flowing. It eliminates the need for multiple connections. One TCP connection handles everything, which means one handshake, one slow-start ramp-up, and cleaner resource usage on both client and server. And it makes domain sharding pointless. All resources from one hostname with full parallelism.

HPACK: compressing those redundant headers

With all requests flowing over a single connection, those repetitive headers become an even bigger waste. HTTP/2 addresses this with HPACK, a compression scheme designed specifically for HTTP headers.

HPACK uses three techniques. First, a static table of common headers (like :method: GET and :status: 200) that are pre-indexed on both sides and never need to be sent at all. Second, a dynamic table that both client and server build up during the connection — once a header has been sent, future requests reference it by index instead of repeating the full text. Third, Huffman encoding that compresses the remaining header values with variable-length codes.

The result: after the first few requests warm up the dynamic table, header overhead drops by 85% or more. For API-heavy applications sending many small requests, this is a significant bandwidth savings.

HPACK is stateful

Unlike traditional compression (like gzip), HPACK maintains state across requests. Both sides keep a synchronized copy of the dynamic table. This makes it efficient — each request builds on what came before — but it also means you can't decompress headers in isolation. You need the full history of the connection.

Stream priorities and server push

HTTP/2 includes two more features worth mentioning. Stream priorities let the browser tell the server which responses matter most. The HTML is critical (the browser can't do anything without it), so it gets high priority. Images can arrive later. The server can allocate bandwidth accordingly, though in practice not all servers implement priorities well.

Server push lets the server proactively send resources the client hasn't asked for yet. The server knows that when the browser requests index.html, it will also need style.css, so it pushes the CSS along with the HTML without waiting for the browser to parse the HTML and discover the dependency. This saves a roundtrip but can waste bandwidth if the client already has the resource cached.

What changed for developers

With HTTP/2, many of the workarounds from the HTTP/1 era become unnecessary or even counterproductive. Domain sharding adds overhead (extra DNS lookups, extra handshakes) without benefit since HTTP/2 already multiplexes on one connection. File concatenation and spriting sacrifice cache granularity for something HTTP/2 provides natively. Resource inlining prevents independent caching of assets that HTTP/2 can deliver efficiently.

The web becomes more modular. You can serve many small, granular files instead of a few large bundles. Each file gets cached independently. Your project structure can match your code's logical organization rather than being dictated by transport efficiency.

HTTP/2 requires HTTPS in practice (browsers only support it over encrypted connections), which means adopting it also means adopting TLS (the encryption layer that turns HTTP into HTTPS). The connection upgrade happens automatically through a mechanism called ALPN (Application-Layer Protocol Negotiation), where the client and server agree on which HTTP version to use during the TLS handshake. If the server doesn't support HTTP/2, the connection falls back to HTTP/1.1 gracefully.

What about HTTP/3?

HTTP/3, standardized in 2022, goes further by replacing TCP entirely with a new transport protocol called QUIC. QUIC combines the connection handshake with the encryption handshake into a single step, eliminating a full roundtrip. It also solves a problem HTTP/2 inherited from TCP: if a single packet is lost, TCP stalls all streams until it's retransmitted. QUIC handles loss per-stream, so one lost packet doesn't block unrelated data. Most of the hard problems in web performance were identified in the HTTP/1 → HTTP/2 transition. HTTP/3 refines the solutions.

The thread through it all

HTTP/1.0 gave us one connection per request. HTTP/1.1 gave us keepalive, but requests were still sequential. Pipelining tried to fix that but broke on head-of-line blocking. Developers worked around the protocol with domain sharding, concatenation, and spriting, restructuring code for transport instead of clarity.

HTTP/2 solved the underlying problems: binary framing enabled multiplexed streams on a single connection, HPACK eliminated redundant header overhead, and the workarounds could be retired.

Every HTTP/2 feature is a direct answer to a real problem developers encountered with HTTP/1. And every HTTP/1 workaround is a testament to developer ingenuity in the face of protocol constraints. The protocol changed, but the impulse — making the web faster — stayed the same.