Back to Blog

Build Your Own TCP/IP Stack: TCP Data Flow

You've established a TCP connection. The three-way handshake is complete. Both sides know each other's starting sequence numbers.

Now you need to send a 10MB file.

The naive approach: send one packet, wait for acknowledgment, send the next packet, wait for acknowledgment...

But there's a problem with this.


The Throughput Problem

Let's do the math on sending one packet at a time.

Your round-trip time (RTT) to the server is 100ms. That's the time for a packet to reach the server plus the time for the ACK to come back.

Each packet carries 1,500 bytes of data (typical for Ethernet).

The Problem

If you send one packet, wait 100ms for the ACK, then send another packet and wait another 100ms, how many packets can you send per second? What's your throughput?

Let's calculate:

1 packet every 100ms = 10 packets per second
10 packets × 1,500 bytes = 15,000 bytes per second
= 15 KB/s
= 0.12 Mbps

That's terrible. You have a gigabit ethernet connection (1000 Mbps) and you're using 0.012% of it.

The problem: you're waiting. The network is sitting idle while the ACK travels back. During those 100ms, you could have sent dozens more packets.

How do we fix this?


The Solution: Pipelining

Instead of waiting for each ACK, what if we send multiple packets before waiting for any ACKs?

Think of it like a pipeline:

  1. Send packet 1 → (starts traveling)
  2. Send packet 2 → (starts traveling)
  3. Send packet 3 → (starts traveling)
  4. ACK for packet 1 arrives → send packet 4
  5. ACK for packet 2 arrives → send packet 5
  6. ...

The network is never idle. While packets are traveling one direction and ACKs are traveling back, you keep sending more packets.

This is called pipelining and it's the key to TCP's speed.

But how many packets can we have "in flight" at once?


Inventing the Sliding Window

We need a limit. You can't send unlimited packets. The receiver might not be able to process them all. It needs some kind of buffer.

Let's track a window of bytes we're allowed to send without waiting for ACKs.

The window has three zones:

  1. Sent and ACKed: done, safe, can forget about these
  2. Sent but not ACKed: in flight, waiting for confirmation
  3. Ready to send: window has room for these

As ACKs arrive, the window slides forward:

Before ACK:
[Acked][In Flight......][Ready...]
       ^
       Window left edge

After ACK arrives:
[Acked.....][In Flight...][Ready...]
            ^
            Window slides right!
TCP Sliding Window Flow control with send and receive windows
Seq 1
Seq 2
Seq 3
Seq 4
Seq 5
Seq 6
Seq 7
Seq 8
Seq 9
Seq 10
Seq 11
Seq 12
SND.UNA
Window
Window Start (SND.UNA)
1
Next to Send (SND.NXT)
1
Window Size
4
ACKed
Sent
In Window
Not Yet
Try It Out

Click Send Next Segment multiple times. Notice you can send several segments before any ACKs arrive (these are in the "in flight" zone). Now click Receive ACK and watch the window slide forward, making room for more data.

This mechanism has a name: the sliding window.

The window size determines how much data can be "in flight" at once. Bigger window = more data in flight = better throughput (up to a point).

But who decides the window size?

Key Insight

The receiver controls the window size! Every ACK includes a "window" field saying "I can accept this many more bytes." If the receiver is overwhelmed, it shrinks the window (flow control). If it catches up, it expands it. This prevents the sender from overwhelming the receiver's buffer.


When Packets Get Lost

The sliding window is working great. Packets are flowing, ACKs are coming back, the window slides forward. Life is good.

Then a packet gets lost.

A router's queue overflows. The packet is dropped. The receiver never sees it.

The Problem

You've sent packets with sequence numbers 1000, 2000, 3000, 4000. Packet 3000 gets lost. The receiver gets 1000, 2000, 4000. What happens?

The sender has packet 3000 in its "sent but not ACKed" zone. It's waiting for an ACK.

But the ACK will never come. The receiver never got the packet.

We need timeouts.

Retransmission Timeout

The sender starts a timer when it sends a packet. If no ACK arrives before the timer expires, it retransmits.

But how long should the timeout be?

  • Too short: You retransmit packets that were just slow, wasting bandwidth
  • Too long: You wait forever, killing throughput

TCP solves this by measuring RTT (round-trip time) continuously and adjusting the timeout dynamically. Fast network → short timeout. Slow, variable network → longer timeout.

The Faster Way: Duplicate ACKs

There's a cleverer mechanism. Remember, TCP ACKs are cumulative. The ACK number says "I've received everything up to (but not including) this sequence."

If the receiver gets packets 1000, 2000, then jumps to 4000 (missing 3000), what does it ACK?

It ACKs 3000. "I've received everything up to 3000."

When packet 5000 arrives, it still ACKs 3000. "Still waiting for 3000." When packet 6000 arrives, it STILL ACKs 3000. "STILL waiting for 3000!"

These are duplicate ACKs. The receiver keeps sending the same ACK number.

When the sender sees three duplicate ACKs, it knows a packet is missing and retransmits immediately, without waiting for the timeout. This is called fast retransmit and it's much faster than waiting for a timeout.


Closing the Connection

You've sent all your data. The file is transferred. Now you want to close the connection.

But there's a complication: TCP is full-duplex. Data flows in both directions independently. Just because you're done sending doesn't mean the other side is done.

The Problem

How do you close a connection when each side might finish at different times? How do you ensure both sides agree the connection is closed?

We need a graceful shutdown that handles each direction separately.

The Four-Way Close

Step 1: Client → Server (FIN)

FIN flag set
"I'm done sending data."

Step 2: Server → Client (ACK)

"OK, I received your FIN."

At this point, the client won't send more data. But the server might still have data to send!

Step 3: Server → Client (FIN)

FIN flag set
"I'm also done sending data."

Step 4: Client → Server (ACK)

"OK, I received your FIN. Connection fully closed."

Four messages. Each direction closes independently.

Why the wait? The server might still be sending the last chunks of a large file. It can't close until it's done. The client waits patiently, receiving data, until the server sends its FIN.

TIME_WAIT: The Annoying Wait

After sending the final ACK, the client enters a state called TIME_WAIT. It sits there for about 2 minutes doing nothing.

This is infuriating when you're developing. You close a connection, try to restart your server, and get "Address already in use." The old connection is still in TIME_WAIT!

But it exists for good reasons:

  1. Ensure the final ACK arrives: If the server's FIN gets retransmitted (because our ACK was lost), we need to still be around to ACK it again
  2. Kill old duplicate packets: Old packets from this connection might still be bouncing around the network. TIME_WAIT ensures they all die before we reuse the port number

After 2 minutes (2 × Maximum Segment Lifetime), TIME_WAIT expires and the port is truly free.


Building It: Python Implementation

Let's implement the sliding window and retransmission logic:

Sliding Window Implementation

import time
from collections import deque
 
class SlidingWindow:
    """TCP sliding window for flow control."""
 
    def __init__(self, window_size=65535):
        self.window_size = window_size
        self.base_seq = 1000  # Start of window
        self.next_seq = 1000  # Next byte to send
        self.max_seq = 1000 + window_size  # End of window
 
        # Track segments in flight
        self.in_flight = {}  # seq -> (data, send_time)
 
        # Received ACKs
        self.last_ack = 1000
 
    def can_send(self, data_len):
        """Check if we can send this much data."""
        # In-flight bytes
        in_flight_bytes = self.next_seq - self.base_seq
 
        # Can we fit more data in the window?
        return in_flight_bytes + data_len <= self.window_size
 
    def send(self, data):
        """Send data if window allows."""
        if not self.can_send(len(data)):
            raise Exception("Window full! Cannot send.")
 
        seq = self.next_seq
        self.next_seq += len(data)
 
        # Track this segment
        self.in_flight[seq] = (data, time.time())
 
        print(f"SEND: seq={seq}, len={len(data)}, "
              f"window=[{self.base_seq}..{self.next_seq}]")
 
        return seq
 
    def receive_ack(self, ack_num):
        """Process acknowledgment."""
        if ack_num <= self.last_ack:
            # Duplicate ACK
            print(f"ACK: {ack_num} (DUPLICATE)")
            return False
 
        print(f"ACK: {ack_num} (NEW, advancing window)")
 
        # Remove ACKed segments from in-flight
        acked_seqs = [seq for seq in self.in_flight.keys()
                      if seq < ack_num]
        for seq in acked_seqs:
            del self.in_flight[seq]
 
        # Slide window forward
        self.base_seq = ack_num
        self.last_ack = ack_num
 
        print(f"  Window now: [{self.base_seq}..{self.next_seq}]")
        print(f"  In flight: {len(self.in_flight)} segments")
 
        return True
 
    def get_status(self):
        """Get window status."""
        in_flight_bytes = self.next_seq - self.base_seq
        available = self.window_size - in_flight_bytes
 
        return {
            'base': self.base_seq,
            'next': self.next_seq,
            'window_size': self.window_size,
            'in_flight': in_flight_bytes,
            'available': available
        }
 
# Example: Sending data with sliding window
window = SlidingWindow(window_size=6000)  # 6KB window
 
print("=== Sliding Window Demo ===\n")
 
# Send some segments
window.send(b'A' * 1000)  # Send 1KB
window.send(b'B' * 1000)  # Send 1KB
window.send(b'C' * 1000)  # Send 1KB
 
print(f"\nStatus: {window.get_status()}")
 
# Receive ACK for first segment
window.receive_ack(2000)  # ACK first 1KB
 
# Now we can send more
window.send(b'D' * 1000)  # Send 1KB
window.send(b'E' * 1000)  # Send 1KB
 
print(f"\nStatus: {window.get_status()}")
 
# Receive ACK for everything
window.receive_ack(6000)  # ACK all
 
print(f"\nFinal status: {window.get_status()}")

Retransmission with Timeouts

import time
from collections import OrderedDict
 
class TCPSender:
    """TCP sender with retransmission."""
 
    def __init__(self, rto=1.0):  # Retransmission timeout in seconds
        self.rto = rto
        self.window = SlidingWindow()
        self.timers = OrderedDict()  # seq -> timeout_time
 
    def send(self, data):
        """Send data and start timer."""
        seq = self.window.send(data)
 
        # Start retransmission timer
        self.timers[seq] = time.time() + self.rto
        print(f"  Timer started: {self.rto}s")
 
        return seq
 
    def receive_ack(self, ack_num):
        """Process ACK and cancel timers."""
        # Slide window
        advanced = self.window.receive_ack(ack_num)
 
        if advanced:
            # Cancel timers for ACKed segments
            cancelled = [seq for seq in self.timers.keys()
                        if seq < ack_num]
            for seq in cancelled:
                del self.timers[seq]
                print(f"  Timer cancelled: seq={seq}")
 
    def check_timeouts(self):
        """Check for timeouts and retransmit."""
        now = time.time()
        timed_out = []
 
        for seq, timeout_time in self.timers.items():
            if now >= timeout_time:
                timed_out.append(seq)
 
        for seq in timed_out:
            print(f"\nTIMEOUT: seq={seq}")
            data, _ = self.window.in_flight[seq]
 
            # Retransmit
            print(f"RETRANSMIT: seq={seq}, len={len(data)}")
 
            # Reset timer
            self.timers[seq] = now + self.rto
 
        return len(timed_out)
 
# Example: Simulate packet loss with retransmission
print("\n=== Retransmission Demo ===\n")
 
sender = TCPSender(rto=0.5)  # 500ms timeout
 
# Send 3 segments
sender.send(b'Segment 1')
sender.send(b'Segment 2')
sender.send(b'Segment 3')
 
# ACK first two (third one "lost")
sender.receive_ack(1009)  # ACK segment 1
sender.receive_ack(1018)  # ACK segment 2
 
# Wait for timeout
print("\nWaiting for timeout...")
time.sleep(0.6)
 
# Check for timeouts (segment 3 should timeout)
sender.check_timeouts()

Fast Retransmit (Duplicate ACKs)

class TCPSenderWithFastRetransmit(TCPSender):
    """TCP sender with fast retransmit."""
 
    def __init__(self, rto=1.0):
        super().__init__(rto)
        self.dup_ack_count = 0
        self.last_ack = 0
 
    def receive_ack(self, ack_num):
        """Process ACK with duplicate ACK detection."""
        if ack_num == self.last_ack:
            # Duplicate ACK!
            self.dup_ack_count += 1
            print(f"ACK: {ack_num} (DUPLICATE #{self.dup_ack_count})")
 
            if self.dup_ack_count == 3:
                # Fast retransmit!
                print(f"\n>>> FAST RETRANSMIT triggered! <<<")
                seq = ack_num
                if seq in self.window.in_flight:
                    data, _ = self.window.in_flight[seq]
                    print(f"RETRANSMIT: seq={seq}, len={len(data)}")
                    # Reset timer
                    self.timers[seq] = time.time() + self.rto
 
        else:
            # New ACK
            super().receive_ack(ack_num)
            self.last_ack = ack_num
            self.dup_ack_count = 0
 
# Example: Fast retransmit demo
print("\n=== Fast Retransmit Demo ===\n")
 
sender = TCPSenderWithFastRetransmit(rto=5.0)
 
# Send 5 segments
for i in range(5):
    sender.send(f'Segment {i+1}'.encode())
 
# Receiver gets 1, 2, then 4, 5 (3 is lost)
sender.receive_ack(1009)  # ACK segment 1
sender.receive_ack(1018)  # ACK segment 2
 
# Now receiver keeps getting out-of-order packets
# It keeps ACKing 1018 (still waiting for segment 3)
sender.receive_ack(1018)  # Duplicate ACK #1
sender.receive_ack(1018)  # Duplicate ACK #2
sender.receive_ack(1018)  # Duplicate ACK #3 -> FAST RETRANSMIT!

Output:

=== Fast Retransmit Demo ===

SEND: seq=1000, len=9, window=[1000..1009]
  Timer started: 5.0s
SEND: seq=1009, len=9, window=[1000..1018]
  Timer started: 5.0s
SEND: seq=1018, len=9, window=[1000..1027]
  Timer started: 5.0s
SEND: seq=1027, len=9, window=[1000..1036]
  Timer started: 5.0s
SEND: seq=1036, len=9, window=[1000..1045]
  Timer started: 5.0s
ACK: 1009 (NEW, advancing window)
  Window now: [1009..1045]
  In flight: 4 segments
  Timer cancelled: seq=1000
ACK: 1018 (NEW, advancing window)
  Window now: [1018..1045]
  In flight: 3 segments
  Timer cancelled: seq=1009
ACK: 1018 (DUPLICATE #1)
ACK: 1018 (DUPLICATE #2)
ACK: 1018 (DUPLICATE #3)

>>> FAST RETRANSMIT triggered! <<<
RETRANSMIT: seq=1018, len=9
Key Insight

Fast retransmit is much faster than waiting for timeouts! Three duplicate ACKs signal "I got something after this, so this specific packet is missing." The sender can immediately retransmit without waiting seconds for the timeout. This keeps throughput high even when packets are lost.


The Complete Stack

Let's step back and see what we've built across this entire series.

Layer 1: Ethernet (Local Delivery)

  • Frames with MAC addresses
  • ARP for address discovery
  • Broadcast for "ask everyone"
  • Works only on the local network

Layer 2: IP (Global Routing)

  • Packets with IP addresses
  • TTL to prevent loops
  • Checksum for error detection
  • ICMP for diagnostics (ping, traceroute)
  • Best-effort, unreliable delivery

Layer 3: TCP (Reliable Connections)

  • Three-way handshake to establish connections
  • Sequence numbers for ordering
  • Acknowledgments for confirmation
  • Sliding window for throughput
  • Retransmission for loss recovery
  • Flow control to protect the receiver
  • Four-way close for graceful shutdown

These three layers work together to provide what applications need: a reliable stream of bytes between two computers anywhere on the internet.


What Applications See

Here's the beautiful part: applications don't see any of this complexity.

When you write code, you call:

  • connect(): TCP does the three-way handshake
  • send(data): TCP breaks it into packets, adds sequence numbers, manages the window
  • recv(): TCP reassembles bytes in order, handles retransmissions
  • close(): TCP does the four-way close

The kernel handles everything:

  • Building Ethernet frames
  • Calculating IP checksums
  • Managing TCP state
  • Retransmitting lost packets
  • Sliding the window
  • Handling timeouts

From the application's perspective, there's just a reliable stream of bytes going in and out. The chaos of the unreliable network (lost packets, reordered packets, duplicates) is completely hidden.

This abstraction is one of the internet's greatest achievements.


You've Built the Internet

When you type ping google.com in your terminal, here's what actually happens:

  1. DNS lookup resolves google.com to an IP address (e.g., 142.250.185.46)
  2. Routing decision determines the next hop (your home router)
  3. ARP request discovers your router's MAC address
  4. ICMP packet (Echo Request) is created
  5. IP header is added with destination = Google's IP
  6. Ethernet frame is built with destination = router's MAC
  7. Frame sent on the wire to your router
  8. Router receives, decrements TTL, recalculates checksum, forwards to ISP
  9. Hops across the internet, router by router, until reaching Google
  10. Google responds with ICMP Echo Reply
  11. Reply travels back through the same layers in reverse
  12. Your stack unwraps the frame → IP packet → ICMP reply
  13. ping displays "64 bytes from 142.250.185.46: time=23ms"

Every single step uses concepts we've built in this series.


Going Deeper

This is just the beginning. Modern networks add many more layers:

  • TLS for encryption
  • HTTP/2 and HTTP/3 for efficient web transfers
  • QUIC for faster connection setup
  • Congestion control algorithms (Reno, CUBIC, BBR)
  • Quality of Service (QoS) for prioritizing traffic

But it all builds on the foundation we've created: Ethernet, IP, and TCP.

If you want to go deeper:

  • RFC 793: Original TCP specification
  • TCP/IP Illustrated by W. Richard Stevens: The classic book
  • Build your own: Use a TUN/TAP interface to implement a real TCP/IP stack

The best way to truly understand networking is to build it yourself. Every line of code teaches you something about how the internet actually works.

You've now built the core of the TCP/IP stack from scratch. You understand how data travels from your computer to anywhere in the world, and back.

That's the internet.