Linux Pipes: How Processes Talk
When you type cat file | grep pattern, a stream of bytes flows from one
process to another through an invisible channel. Let's build the mental model
for how that works.
The isolation problem
Every process lives in its own world. Thanks to virtual memory, Process A cannot simply read Process B's memory. This isolation is a feature, not a bug. It prevents misbehaving programs from corrupting each other.
But isolation creates a problem: how do processes share data? They could write to a file and have the other process read it. But files are slow, and you need to worry about synchronization. What if Process B reads before Process A finishes writing?
We need something faster, something that handles synchronization automatically. That's a pipe: a kernel-managed buffer that connects two processes. One process writes bytes into it, the other reads them out. The kernel handles the rest.
A channel for bytes
A pipe is exactly what it sounds like: a channel through which data flows. Bytes go in one end and come out the other, in the same order. The kernel maintains a buffer in memory where the data temporarily lives.
Write some bytes into the pipe, then read them out the other end:
The pipe is unidirectional: data flows in only one direction, from the write end to the read end. There are no message boundaries. If you write 100 bytes, a reader might read 32, then 68. It's just a stream of bytes, first-in, first-out. When you read bytes, they're removed from the pipe. No other process can read those same bytes. And the buffer has a fixed capacity (typically 64KB on Linux). When it's full, writes block until a reader drains some bytes.
Where does the pipe live?
Pipes don't exist in the filesystem. They're in-kernel objects managed by a
hidden filesystem called pipefs. The kernel allocates buffers, tracks
readers and writers, and handles blocking. When all processes close their ends, the
pipe is destroyed.
Creating a pipe
The pipe() system call creates a pipe and returns two file descriptors:
#include <unistd.h>
int pipefd[2];
if (pipe(pipefd) == -1) {
perror("pipe");
exit(EXIT_FAILURE);
}
// pipefd[0] is the read end
// pipefd[1] is the write endThe array indices follow a convention: pipefd[0] is always the read end,
pipefd[1] is always the write end.
Once you have the file descriptors, you use the standard read() and
write() system calls:
// Write to the pipe
char *message = "Hello through the pipe!";
write(pipefd[1], message, strlen(message));
// Read from the pipe
char buffer[100];
ssize_t n = read(pipefd[0], buffer, sizeof(buffer));
buffer[n] = '\0'; // Null-terminate for printing
printf("Received: %s\n", buffer);But a pipe created by a single process isn't very useful. We need two different processes to share it.
Sharing a pipe with fork()
Here's the key insight: when a process calls fork(), the child inherits
copies of all the parent's open file descriptors. This includes the pipe's
read and write ends.
Step through the file descriptor table as it changes:
The typical pattern is: parent creates a pipe, calls fork(), and then each
process closes the end it won't use. The parent closes the read end if it's
writing, and the child closes the write end if it's reading.
Why closing unused ends matters
This step is crucial and often forgotten by beginners. You must close the pipe end you're not using.
For the reader: if you keep the write end open, the kernel thinks someone might
still write to the pipe. When the pipe is empty, read() will block forever
waiting for data that will never come. For the writer: if no process has the read
end open and you try to write, the kernel sends SIGPIPE. The write fails with
EPIPE.
int pipefd[2];
pipe(pipefd);
switch (fork()) {
case 0: // Child: will READ
close(pipefd[1]); // Close write end FIRST!
char buf[100];
while (read(pipefd[0], buf, 1) > 0) {
write(STDOUT_FILENO, buf, 1);
}
close(pipefd[0]);
exit(EXIT_SUCCESS);
default: // Parent: will WRITE
close(pipefd[0]); // Close read end
write(pipefd[1], "Hello child!", 12);
close(pipefd[1]); // Signals EOF to child
wait(NULL);
exit(EXIT_SUCCESS);
}The most common pipe bug
Forgetting to close the write end in a reading process is the most common pipe bug.
The symptom: your program hangs forever at a read() call even though
the writer finished. The fix: always close the unused end before doing anything else.
When reads and writes block
Understanding when read() and write() block, succeed, or fail
is essential for working with pipes correctly. Step through each scenario:
A read blocks when the pipe is empty and at least one process has the write end open. It stays blocked until data arrives or all writers close their ends. A write blocks when the pipe buffer is full and at least one process has the read end open. It stays blocked until a reader drains some bytes.
When all processes close the write end and a reader has drained all remaining
data, the next read() returns 0. This signals end-of-file: the reader
knows no more data will ever arrive. When all processes close the read end and a
writer tries to write, the kernel sends SIGPIPE. By default, this terminates
the process. If you pipe yes to head, you don't want yes to
keep running forever after head exits.
Atomic writes
POSIX guarantees that writes of PIPE_BUF bytes or fewer are atomic.
On Linux, PIPE_BUF is 4096 bytes. If two processes each write 4000 bytes
simultaneously, their data won't be interleaved. If they each write 5000 bytes,
the kernel may interleave them.
How the shell connects commands
When you type cat file | grep pattern, the shell performs a precise
sequence of system calls to connect the two commands. Step through what happens:
The challenge: cat and grep are separate programs. They don't
know about pipes. cat writes to stdout (file descriptor 1), and grep
reads from stdin (file descriptor 0). How does the shell redirect these?
Redirecting with dup2()
The dup2() system call duplicates a file descriptor into a specific slot:
dup2(oldfd, newfd);
// After this, newfd points to the same thing as oldfd
// If newfd was already open, it's closed firstThis is how the shell redirects stdout to a pipe. Watch the file descriptor table change:
The complete algorithm
A simpler way: popen()
Creating pipes, forking, and redirecting file descriptors is tedious. The C library
provides popen() to simplify common cases:
FILE *fp = popen("ls -l", "r");
// Read output of ls -l from fp
char buffer[256];
while (fgets(buffer, sizeof(buffer), fp) != NULL) {
printf("Got line: %s", buffer);
}
pclose(fp); // NOT fclose()!The second argument determines the direction:
popen() is convenient but limited: you can only read or write (not both), it
spawns a shell to interpret the command (slower, with potential security issues),
and you must use pclose(), not fclose().
Security warning
Never pass user input directly to popen() without sanitization. The
command is interpreted by a shell, so an attacker could inject commands. For example,
if filename is ; rm -rf /, the command
popen("cat " + filename, "r") would be catastrophic.
Reaching beyond parent and child
Regular pipes have a limitation: they only work between related processes (parent and child, or siblings). What if two unrelated processes need to communicate?
A FIFO (First-In-First-Out), also called a named pipe, solves this. It appears in the filesystem as a special file. Any process that can access the file can open the pipe.
# Create a FIFO at the command line
$ mkfifo /tmp/myfifo
$ ls -l /tmp/myfifo
prw-r--r-- 1 user user 0 Jan 15 10:30 /tmp/myfifo
#^ The 'p' indicates it's a pipeIn C, you create a FIFO with mkfifo():
#include <sys/stat.h>
if (mkfifo("/tmp/myfifo", 0644) == -1) {
if (errno != EEXIST) {
perror("mkfifo");
exit(EXIT_FAILURE);
}
}
// Now any process can open it
int fd = open("/tmp/myfifo", O_RDONLY); // or O_WRONLYOpening a FIFO blocks until both ends are opened: open(fifo, O_RDONLY) blocks
until another process opens for writing, and vice versa. This ensures both
parties are ready before communication begins. You can use O_NONBLOCK to
avoid this, but then you need to handle partial opens.
FIFOs vs Unix sockets
FIFOs are unidirectional. For bidirectional communication, you need either two FIFOs or a Unix domain socket. Sockets also support multiple concurrent clients, while a FIFO is typically used for one producer and one consumer.
Summary
Pipes are one of Unix's most elegant abstractions. A simple idea: connect the output
of one process to the input of another. The pipe() system call creates the channel,
fork() lets children inherit it, dup2() redirects stdin and stdout to the pipe
ends, and closing unused ends is what makes the whole thing terminate cleanly.
Next time you use the | operator, you'll know exactly what's happening: the shell
creates a pipe, forks two children, uses dup2 to connect them, and cleans up when
they finish.