File Descriptors, Sockets, and the poll() Loop: How I/O Multiplexing Actually Works

A deep dive into what file descriptors really are at the kernel level, how sockets build on top of them, and how poll() lets a single thread handle hundreds of concurrent clients without blocking.

2025-01-15

File Descriptors, Sockets, and the poll() Loop: How I/O Multiplexing Actually Works

File Descriptors, Sockets, and the poll() Loop: How I/O Multiplexing Actually Works

When I started building ft_irc, a full IRC server in C++98 with no external libraries, I hit a wall almost immediately. My first version worked perfectly with one client. The moment a second client connected, everything froze. The server was stuck waiting for the first client to send something, completely unaware that a second one was trying to talk to it.

That bug forced me to understand something most programmers never have to confront directly: what I/O actually is at the system level, and why the obvious approach to handling multiple clients is fundamentally broken.

What is a File Descriptor, Really?

Before sockets or multiplexing make sense, you need to understand the abstraction everything else is built on: the file descriptor.

A file descriptor is an integer. That's it on the surface. But what that integer represents is a layered chain of kernel data structures.

When you open a file, or a socket, or a pipe, the kernel does three things:

  1. Creates an entry in the kernel's global file table: a structure containing the file's current offset, access mode, and a pointer to the underlying file object (inode or socket buffer)
  2. Creates an entry in your process's open-file table: a per-process array indexed by small integers
  3. Returns the index of that entry to you as an int

That integer is your file descriptor. When you call read(fd, buf, n), the kernel looks up fd in your process table, follows the pointer to the kernel file table, follows that to the inode or socket buffer, and reads data from there into your buffer.

int fd = open("data.txt", O_RDONLY);
// fd might be 3 (0=stdin, 1=stdout, 2=stderr are always open)
// kernel: process_table[3] -> file_table[X] -> inode

What I find genuinely cool about this design: sockets, pipes, and regular files are all file descriptors. The kernel exposes the same read() / write() interface for all of them. The difference is only in what sits at the other end of the pointer chain.

Sockets: A File Descriptor That Talks Over a Network

A socket is a file descriptor backed by a kernel networking structure rather than an inode. Creating one follows a specific sequence of syscalls:

// 1. Create the socket, get an FD back
int server_fd = socket(AF_INET, SOCK_STREAM, 0);
 
// 2. Bind it to an address and port
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons(6667);          // IRC port
addr.sin_addr.s_addr = INADDR_ANY;
bind(server_fd, (struct sockaddr*)&addr, sizeof(addr));
 
// 3. Mark it as a listening socket
listen(server_fd, 128);               // backlog: max pending connections
 
// 4. Accept an incoming connection, get a NEW FD for this client
int client_fd = accept(server_fd, NULL, NULL);

Each step matters:

  • socket() allocates the kernel socket structure and gives you its FD.
  • bind() associates the socket with a local address. That's how the kernel knows which process should receive packets arriving at port 6667.
  • listen() tells the kernel to start accepting TCP handshakes and queue them. The backlog limits how many unaccepted connections can pile up.
  • accept() dequeues the next completed connection and creates a new FD for it. The server socket (server_fd) stays open and keeps listening.

With 50 clients connected, you have 51 open FDs. One for the server, one per client.

The Blocking Problem

With your client FD in hand, reading data seems simple:

char buf[512];
int n = read(client_fd, buf, sizeof(buf));

But read() is blocking by default. If no data has arrived yet, read() does not return. It suspends your thread in the kernel until data arrives or the connection closes.

Fine for a single client. For multiple clients, it's a disaster. Your thread is parked inside read() waiting for client A. Client B could be sending data non-stop and you won't see any of it until client A sends something first. The kernel has data sitting in client B's receive buffer, but your process is sleeping inside read(fd_A, ...).

The obvious fix is threads: one thread per client, each blocking independently. This works up to a point, then falls apart. Thread creation is expensive, each thread needs its own stack (typically 8MB by default on Linux), and context switching between hundreds of threads has real overhead. An IRC server expecting thousands of concurrent clients can't use one thread per client.

You need to invert the model. Instead of asking "give me data from this FD," you ask the kernel "tell me which FDs have data ready."

select() and Why It Wasn't Enough

The first POSIX answer was select():

fd_set read_fds;
FD_ZERO(&read_fds);
FD_SET(server_fd, &read_fds);
FD_SET(client_fd, &read_fds);
 
select(max_fd + 1, &read_fds, NULL, NULL, NULL);
// now FD_ISSET(fd, &read_fds) tells you which FDs are ready

select() blocks until at least one FD is ready to read, then marks which ones are ready. You call read() only on those, guaranteed not to block.

The problems are real though:

  • fd_set is a fixed-size bitmap, typically 1024 bits. You can't monitor more than 1024 FDs.
  • Every call to select() requires rebuilding the fd_set from scratch.
  • The return value doesn't tell you which FDs are ready. You have to iterate the entire set to find out.
  • The kernel receives the full set on every call and scans it internally, even if only one FD out of 1000 is active.

poll() was designed to fix these.

poll(): The pollfd Struct

#include <poll.h>
 
int poll(struct pollfd *fds, nfds_t nfds, int timeout);

Instead of a fixed bitmap, poll() takes an array of pollfd structs:

struct pollfd {
    int   fd;       // the file descriptor to watch
    short events;   // events you're interested in (input)
    short revents;  // events that occurred (output, set by kernel)
};

The events and revents fields are bitmasks:

FlagMeaning
POLLINData available to read
POLLOUTSocket ready to accept writes
POLLHUPConnection closed by remote
POLLERRError condition on the FD
POLLNVALFD is not open

You build an array of pollfd structs, one per FD, set the events you want, and call poll(). It blocks until at least one FD is ready, then returns. Then you iterate the array checking revents on each entry.

struct pollfd fds[MAX_CLIENTS];
 
// Server socket: we only care about incoming connections
fds[0].fd = server_fd;
fds[0].events = POLLIN;
 
// Client sockets: we want to know when they send data
fds[1].fd = client_fd_A;
fds[1].events = POLLIN;
 
int ready = poll(fds, 2, -1);   // -1 = block indefinitely
 
if (fds[0].revents & POLLIN) {
    // New client trying to connect, call accept()
}
if (fds[1].revents & POLLIN) {
    // Client A sent something, call recv()
}

No more rebuilding a set. No 1024 FD limit. The kernel tells you exactly which FDs are ready via revents.

The Server Loop

A real IRC server's main loop looks like this:

while (true) {
    int ready = poll(_fds.data(), _fds.size(), -1);
    if (ready < 0) {
        // poll() was interrupted by a signal or an error
        break;
    }
 
    for (size_t i = 0; i < _fds.size(); i++) {
        if (_fds[i].revents == 0)
            continue;                           // nothing happened on this FD
 
        if (_fds[i].fd == _server_fd) {
            if (_fds[i].revents & POLLIN)
                _accept_client();               // new connection
        } else {
            if (_fds[i].revents & POLLIN)
                _receive_data(_fds[i].fd);      // client sent data
            if (_fds[i].revents & POLLHUP)
                _disconnect_client(_fds[i].fd); // client disconnected
        }
    }
}

That's the entire concurrency model. One thread. One loop. poll() does the waiting; the loop does the dispatching.

Managing a Dynamic pollfd[] Array

Every time a client connects, you add their FD to the array. Every time one disconnects, you remove it. This is the most error-prone part.

Adding is straightforward:

void Server::_accept_client() {
    int client_fd = accept(_server_fd, NULL, NULL);
    
    struct pollfd pfd;
    pfd.fd = client_fd;
    pfd.events = POLLIN;
    pfd.revents = 0;
    _fds.push_back(pfd);
    
    _clients[client_fd] = Client(client_fd);
}

Removing is trickier. You can't remove an element from the middle of a vector while iterating it, because that shifts all the indices. The clean way: swap the element you want to remove with the last element, then pop the back.

void Server::_remove_client(int fd) {
    close(fd);
    _clients.erase(fd);
    
    for (size_t i = 0; i < _fds.size(); i++) {
        if (_fds[i].fd == fd) {
            _fds[i] = _fds.back();  // overwrite with last element
            _fds.pop_back();        // remove the now-duplicate last element
            break;
        }
    }
}

O_NONBLOCK and EAGAIN

Even with poll() telling you an FD is ready, there's one more thing to handle: O_NONBLOCK.

By default, read() blocks if no data is available. poll() ensures this won't happen when it says POLLIN is set, but there are edge cases (spurious wakeups, races between poll() returning and the data being consumed elsewhere) where it's safer to set the FD as non-blocking:

int flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);

With O_NONBLOCK, if read() or recv() finds no data actually available, instead of blocking it returns -1 and sets errno to EAGAIN (same value as EWOULDBLOCK on Linux). You just loop back to poll().

ssize_t n = recv(fd, buf, sizeof(buf), 0);
if (n < 0) {
    if (errno == EAGAIN || errno == EWOULDBLOCK)
        return;    // no data yet, back to poll()
    // real error
    _remove_client(fd);
} else if (n == 0) {
    // recv() returning 0 means the connection was closed cleanly
    _remove_client(fd);
} else {
    // n bytes of data in buf, process it
}

Partial Reads and Message Framing

Here's the thing that took me longest to internalize: TCP is a stream protocol, not a message protocol. When you call recv(), you might get 7 bytes of a 15-byte message, or three messages concatenated. TCP guarantees ordering and delivery, not message boundaries.

The IRC protocol frames messages with \r\n. So recv() might return:

"JOIN #general\r\nPRIVMSG #general :hel"

Two messages: one complete, one partial. You need a per-client receive buffer that accumulates bytes until it sees \r\n, then extracts and processes complete messages:

void Server::_receive_data(int fd) {
    char buf[512];
    ssize_t n = recv(fd, buf, sizeof(buf) - 1, 0);
    
    if (n <= 0) {
        _remove_client(fd);
        return;
    }
    
    buf[n] = '\0';
    _clients[fd].buffer += buf;          // accumulate into per-client string
    
    size_t pos;
    while ((pos = _clients[fd].buffer.find("\r\n")) != std::string::npos) {
        std::string message = _clients[fd].buffer.substr(0, pos);
        _clients[fd].buffer.erase(0, pos + 2);
        _process_message(fd, message);   // handle one complete IRC command
    }
}

The protocol layer, parsing JOIN, PRIVMSG, KICK, managing channel state, sits entirely on top of this. The I/O layer just delivers complete, correctly-framed messages. That separation is what keeps the code sane as complexity grows.

Where This Left Me

The finished ft_irc server handles hundreds of simultaneous clients with a single thread and a single poll() loop. Full IRC command set, channels, operators, modes, private messages, a built-in bot. No threading, no forking.

Understanding file descriptors was the key: sockets, pipes, regular files, they're all the same abstraction. The kernel manages the actual I/O; your job is to tell it which FDs you care about and react when they're ready. poll() is just the mechanism for having that conversation efficiently.

Once you've built this by hand, reading about epoll or io_uring is just reading about faster versions of the same idea.