Linux Pipes: How Processes Talk

When you type cat file | grep pattern, a stream of bytes flows from one process to another through an invisible channel. Let's build the mental model for how that works.

The isolation problem

Every process lives in its own world. Thanks to virtual memory, Process A cannot simply read Process B's memory. This isolation is a feature, not a bug. It prevents misbehaving programs from corrupting each other.

But isolation creates a problem: how do processes share data? They could write to a file and have the other process read it. But files are slow, and you need to worry about synchronization. What if Process B reads before Process A finishes writing?

We need something faster, something that handles synchronization automatically. That's a pipe: a kernel-managed buffer that connects two processes. One process writes bytes into it, the other reads them out. The kernel handles the rest.

A channel for bytes

A pipe is exactly what it sounds like: a channel through which data flows. Bytes go in one end and come out the other, in the same order. The kernel maintains a buffer in memory where the data temporarily lives.

Write some bytes into the pipe, then read them out the other end:

The pipe is unidirectional: data flows in only one direction, from the write end to the read end. There are no message boundaries. If you write 100 bytes, a reader might read 32, then 68. It's just a stream of bytes, first-in, first-out. When you read bytes, they're removed from the pipe. No other process can read those same bytes. And the buffer has a fixed capacity (typically 64KB on Linux). When it's full, writes block until a reader drains some bytes.

Where does the pipe live?

Pipes don't exist in the filesystem. They're in-kernel objects managed by a hidden filesystem called pipefs. The kernel allocates buffers, tracks readers and writers, and handles blocking. When all processes close their ends, the pipe is destroyed.

Creating a pipe

The pipe() system call creates a pipe and returns two file descriptors:

pipe_example.cc

#include <unistd.h>

int pipefd[2];

if (pipe(pipefd) == -1) {
  perror("pipe");
  exit(EXIT_FAILURE);
}

// pipefd[0] is the read end
// pipefd[1] is the write end

The array indices follow a convention: pipefd[0] is always the read end, pipefd[1] is always the write end.

Once you have the file descriptors, you use the standard read() and write() system calls:

// Write to the pipe
char *message = "Hello through the pipe!";
write(pipefd[1], message, strlen(message));

// Read from the pipe
char buffer[100];
ssize_t n = read(pipefd[0], buffer, sizeof(buffer));
buffer[n] = '\0';  // Null-terminate for printing
printf("Received: %s\n", buffer);

But a pipe created by a single process isn't very useful. We need two different processes to share it.

Sharing a pipe with fork()

Here's the key insight: when a process calls fork(), the child inherits copies of all the parent's open file descriptors. This includes the pipe's read and write ends.

Step through the file descriptor table as it changes:

The typical pattern is: parent creates a pipe, calls fork(), and then each process closes the end it won't use. The parent closes the read end if it's writing, and the child closes the write end if it's reading.

Why closing unused ends matters

This step is crucial and often forgotten by beginners. You must close the pipe end you're not using.

For the reader: if you keep the write end open, the kernel thinks someone might still write to the pipe. When the pipe is empty, read() will block forever waiting for data that will never come. For the writer: if no process has the read end open and you try to write, the kernel sends SIGPIPE. The write fails with EPIPE.

pipe_fork.cc

int pipefd[2];
pipe(pipefd);

switch (fork()) {
case 0:  // Child: will READ
  close(pipefd[1]);  // Close write end FIRST!

  char buf[100];
  while (read(pipefd[0], buf, 1) > 0) {
      write(STDOUT_FILENO, buf, 1);
  }

  close(pipefd[0]);
  exit(EXIT_SUCCESS);

default:  // Parent: will WRITE
  close(pipefd[0]);  // Close read end

  write(pipefd[1], "Hello child!", 12);
  close(pipefd[1]);  // Signals EOF to child

  wait(NULL);
  exit(EXIT_SUCCESS);
}

The most common pipe bug

Forgetting to close the write end in a reading process is the most common pipe bug. The symptom: your program hangs forever at a read() call even though the writer finished. The fix: always close the unused end before doing anything else.

When reads and writes block

Understanding when read() and write() block, succeed, or fail is essential for working with pipes correctly. Step through each scenario:

A read blocks when the pipe is empty and at least one process has the write end open. It stays blocked until data arrives or all writers close their ends. A write blocks when the pipe buffer is full and at least one process has the read end open. It stays blocked until a reader drains some bytes.

When all processes close the write end and a reader has drained all remaining data, the next read() returns 0. This signals end-of-file: the reader knows no more data will ever arrive. When all processes close the read end and a writer tries to write, the kernel sends SIGPIPE. By default, this terminates the process. If you pipe yes to head, you don't want yes to keep running forever after head exits.

Atomic writes

POSIX guarantees that writes of PIPE_BUF bytes or fewer are atomic. On Linux, PIPE_BUF is 4096 bytes. If two processes each write 4000 bytes simultaneously, their data won't be interleaved. If they each write 5000 bytes, the kernel may interleave them.

How the shell connects commands

When you type cat file | grep pattern, the shell performs a precise sequence of system calls to connect the two commands. Step through what happens:

The challenge: cat and grep are separate programs. They don't know about pipes. cat writes to stdout (file descriptor 1), and grep reads from stdin (file descriptor 0). How does the shell redirect these?

Redirecting with dup2()

The dup2() system call duplicates a file descriptor into a specific slot:

dup2(oldfd, newfd);
// After this, newfd points to the same thing as oldfd
// If newfd was already open, it's closed first

This is how the shell redirects stdout to a pipe. Watch the file descriptor table change:

The complete algorithm

A simpler way: popen()

Creating pipes, forking, and redirecting file descriptors is tedious. The C library provides popen() to simplify common cases:

FILE *fp = popen("ls -l", "r");
// Read output of ls -l from fp

char buffer[256];
while (fgets(buffer, sizeof(buffer), fp) != NULL) {
  printf("Got line: %s", buffer);
}

pclose(fp);  // NOT fclose()!

The second argument determines the direction:

popen() is convenient but limited: you can only read or write (not both), it spawns a shell to interpret the command (slower, with potential security issues), and you must use pclose(), not fclose().

Security warning

Never pass user input directly to popen() without sanitization. The command is interpreted by a shell, so an attacker could inject commands. For example, if filename is ; rm -rf /, the command popen("cat " + filename, "r") would be catastrophic.

Reaching beyond parent and child

Regular pipes have a limitation: they only work between related processes (parent and child, or siblings). What if two unrelated processes need to communicate?

A FIFO (First-In-First-Out), also called a named pipe, solves this. It appears in the filesystem as a special file. Any process that can access the file can open the pipe.

# Create a FIFO at the command line
$ mkfifo /tmp/myfifo
$ ls -l /tmp/myfifo
prw-r--r-- 1 user user 0 Jan 15 10:30 /tmp/myfifo
#^ The 'p' indicates it's a pipe

In C, you create a FIFO with mkfifo():

#include <sys/stat.h>

if (mkfifo("/tmp/myfifo", 0644) == -1) {
  if (errno != EEXIST) {
      perror("mkfifo");
      exit(EXIT_FAILURE);
  }
}

// Now any process can open it
int fd = open("/tmp/myfifo", O_RDONLY);  // or O_WRONLY

Opening a FIFO blocks until both ends are opened: open(fifo, O_RDONLY) blocks until another process opens for writing, and vice versa. This ensures both parties are ready before communication begins. You can use O_NONBLOCK to avoid this, but then you need to handle partial opens.

FIFOs vs Unix sockets

FIFOs are unidirectional. For bidirectional communication, you need either two FIFOs or a Unix domain socket. Sockets also support multiple concurrent clients, while a FIFO is typically used for one producer and one consumer.

Summary

Pipes are one of Unix's most elegant abstractions. A simple idea: connect the output of one process to the input of another. The pipe() system call creates the channel, fork() lets children inherit it, dup2() redirects stdin and stdout to the pipe ends, and closing unused ends is what makes the whole thing terminate cleanly.

Next time you use the | operator, you'll know exactly what's happening: the shell creates a pipe, forks two children, uses dup2 to connect them, and cleans up when they finish.