Threads, Mutexes, and the Anatomy of a Deadlock

The Dining Philosophers problem is 60 years old and still the clearest demonstration of deadlock. Implementing it in C with POSIX threads forces you to understand data races, mutex internals, the four Coffman conditions, and why asymmetry breaks circular wait.

2025-04-01

Threads, Mutexes, and the Anatomy of a Deadlock

The first time I ran my philosophers implementation with five philosophers, all five died immediately. Every philosopher picked up their left fork and waited. None of them ever picked up their right fork. The program didn't crash. It just sat there, all threads alive but none making progress.

That's deadlock, and the frustrating thing is it looked completely reasonable in the code.

Understanding why it happened, and how to fix it with one change, is a clean window into everything that matters about concurrent programming.

Threads Share Memory, and That's the Problem

A process has one address space. When you create multiple threads inside that process, they all share the same heap, the same global variables, and the same file descriptor table. Each thread gets its own stack, but that's it.

#include <pthread.h>
 
int shared_counter = 0;   // visible to all threads
 
void *increment(void *arg) {
    for (int i = 0; i < 1000000; i++)
        shared_counter++;      // all threads write here
    return NULL;
}
 
int main(void) {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("%d\n", shared_counter);   // NOT 2000000
}

Run this and you get a number less than 2,000,000, different every run. The shared heap is what makes threads useful (communication without IPC overhead) and what makes them dangerous.

Data Races: Not Just a "Bad Luck" Bug

shared_counter++ looks like one operation. It is not. On x86, it compiles to three instructions:

mov eax, [shared_counter]    ; load
inc eax                      ; increment
mov [shared_counter], eax    ; store

Two threads can interleave these instructions:

Thread 1: load  -> eax = 5
Thread 2: load  -> eax = 5
Thread 1: inc   -> eax = 6
Thread 2: inc   -> eax = 6
Thread 1: store -> shared_counter = 6
Thread 2: store -> shared_counter = 6   <- one increment lost

Both threads incremented, but the counter only went from 5 to 6. One increment was lost.

This is a data race: two threads access the same memory location concurrently, at least one access is a write, and there's no synchronization between them. The C standard says a data race is undefined behavior. The program is not just incorrect, it's formally invalid. The compiler and CPU are allowed to reorder operations, cache values in registers, and make assumptions that break completely in the presence of concurrent writes.

The race doesn't require simultaneous access. It requires only that the load-modify-store sequence be non-atomic and unprotected.

`pthread_mutex_t` at the OS Level

A mutex prevents concurrent access to a shared resource by ensuring only one thread can hold it at a time.

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
 
void *safe_increment(void *arg) {
    for (int i = 0; i < 1000000; i++) {
        pthread_mutex_lock(&lock);
        shared_counter++;
        pthread_mutex_unlock(&lock);
    }
    return NULL;
}

What is a pthread_mutex_t at the kernel level? On Linux, it's built on futex (Fast Userspace Mutex). A futex is a 32-bit integer in shared memory with two states: 0 (unlocked) and 1 (locked).

When you call pthread_mutex_lock():

The library attempts an atomic compare-and-swap (CAS) to change the futex from 0 to 1
If it succeeds, the lock is yours. No syscall needed.
If it fails (someone else holds it), the library calls futex(FUTEX_WAIT), a syscall that puts your thread to sleep in the kernel's wait queue for that futex address

When you call pthread_mutex_unlock():

The library sets the futex back to 0
If other threads are waiting, it calls futex(FUTEX_WAKE) to wake one

Locking an uncontended mutex is just a CAS instruction in userspace. No syscall, no kernel transition. That's why mutexes are fast when there's no contention.

The Four Coffman Conditions

Deadlock is not a random failure. It's a deterministic consequence of four conditions being simultaneously true:

1. Mutual exclusion: at least one resource can only be held by one thread at a time. Each fork can only be held by one philosopher.

2. Hold and wait: a thread holds at least one resource while waiting to acquire another. Each philosopher holds their left fork and waits for their right fork.

3. No preemption: resources can't be forcibly taken from a thread, they must be released voluntarily. You can't steal a fork from a philosopher.

4. Circular wait: there's a cycle in the wait graph. Philosopher 0 waits for philosopher 1's fork, who waits for philosopher 2's fork, ..., who waits for philosopher 0's fork.

All four must be true simultaneously for deadlock to be possible. Breaking any one of them prevents it. The Coffman conditions aren't just academic: they're the diagnostic framework for any real deadlock. Find which condition holds, eliminate it.

The Naive Implementation Guarantees Deadlock

The natural implementation: each philosopher locks their left fork, then locks their right fork.

void *philosopher(void *arg) {
    t_philo *philo = (t_philo *)arg;
    
    while (1) {
        think(philo);
        
        pthread_mutex_lock(philo->left_fork);    // pick up left
        pthread_mutex_lock(philo->right_fork);   // pick up right
        
        eat(philo);
        
        pthread_mutex_unlock(philo->right_fork);
        pthread_mutex_unlock(philo->left_fork);
    }
    return NULL;
}

If all philosophers execute pthread_mutex_lock(left_fork) before any of them reaches pthread_mutex_lock(right_fork), the wait graph looks like:

philo 0 holds fork 0, wants fork 1
philo 1 holds fork 1, wants fork 2
philo 2 holds fork 2, wants fork 3
philo 3 holds fork 3, wants fork 4
philo 4 holds fork 4, wants fork 0

Circular wait. All four Coffman conditions met. Deadlock. No philosopher ever eats.

Is this guaranteed to happen? Not on every run. The scheduler might give one philosopher both forks before the others get their left. But the deadlock state is reachable, and on any real system running this program long enough, you'll hit it.

The Asymmetric Fix

Deadlock requires circular wait. Circular wait requires a cycle in the dependency graph. Break the cycle, break deadlock.

The simplest fix: make one philosopher pick up forks in the opposite order.

void *philosopher(void *arg) {
    t_philo *philo = (t_philo *)arg;
    
    while (1) {
        think(philo);
        
        if (philo->id % 2 == 0) {
            pthread_mutex_lock(philo->left_fork);
            pthread_mutex_lock(philo->right_fork);
        } else {
            pthread_mutex_lock(philo->right_fork);  // odd: right first
            pthread_mutex_lock(philo->left_fork);
        }
        
        eat(philo);
        
        pthread_mutex_unlock(philo->right_fork);
        pthread_mutex_unlock(philo->left_fork);
    }
    return NULL;
}

Now consider the worst case: all philosophers reach their first mutex_lock simultaneously. Even-numbered philosophers compete for even-numbered forks first. Odd-numbered philosophers compete for odd-numbered forks first. At least one philosopher will successfully acquire both forks. The chain of dependency no longer wraps around. Circular wait is impossible.

No additional locks, no sleeping, no condition variables. One ordering rule, zero deadlock.

Livelock and Starvation: The Other Failure Modes

Deadlock-free doesn't mean correct.

Livelock: threads are active but making no progress. If every philosopher picks up their left fork, finds the right fork unavailable, puts down their left fork, and repeats, they're all running, all changing state, but no one eats. Livelock usually comes from retry-without-backoff logic.

// Livelock-prone pattern
while (pthread_mutex_trylock(right_fork) != 0) {
    pthread_mutex_unlock(left_fork);
    usleep(1);   // too short, everyone retries in sync
    pthread_mutex_lock(left_fork);
}

Starvation: the system makes progress globally, but one thread is perpetually denied a resource. Even with the asymmetric fix, a fast-thinking philosopher might repeatedly grab both forks before their neighbor gets a chance.

Preventing starvation is harder than preventing deadlock. It typically requires tracking how long each philosopher has been waiting, and prioritizing the one who's been waiting longest.

The philosophers project adds a death condition: if a philosopher doesn't eat within time_to_die milliseconds, they die. That makes starvation prevention non-optional.

Building the Monitor Thread

A dedicated monitor thread periodically checks all philosophers for death:

void *monitor(void *arg) {
    t_table *table = (t_table *)arg;
    
    while (1) {
        for (int i = 0; i < table->num_philos; i++) {
            pthread_mutex_lock(&table->philos[i].meal_lock);
            long time_since_last_meal = get_time_ms() - table->philos[i].last_meal_time;
            pthread_mutex_unlock(&table->philos[i].meal_lock);
            
            if (time_since_last_meal > table->time_to_die) {
                announce_death(table, i);
                return NULL;
            }
        }
        usleep(1000);   // check every 1ms
    }
}

The monitor is a separate pthread_create() that runs concurrently. It reads last_meal_time from each philosopher struct, which is also written by the philosopher threads. Every access to that field needs a mutex.

Timing: Why `usleep()` Lies

The project requires detecting death within 10ms. Sounds easy: check every 1ms. But usleep() doesn't guarantee precision:

usleep(1000);   // sleep for 1000 microseconds (1ms)
// Actual sleep: anywhere from 1ms to ~15ms depending on scheduler

usleep() asks the kernel to wake the thread after at least N microseconds. On Linux with the default CFS scheduler, you often get plus or minus 10ms of variance. On a loaded system, much more.

Use clock_gettime(CLOCK_MONOTONIC) instead: nanosecond resolution, monotonic, doesn't jump when the system clock is adjusted. Structure your death check as a comparison against elapsed time, not a countdown timer.

long get_time_ms(void) {
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return ts.tv_sec * 1000L + ts.tv_nsec / 1000000L;
}

Data Races on Shared State

Every field a philosopher thread writes and the monitor thread reads is shared state:

last_meal_time (written on each meal, read by monitor)
meal_count (written by philosopher, read by monitor for stopping condition)
is_dead / simulation_running (written by monitor, read by all philosophers)

Each of these needs mutex protection. Not only because reads and writes aren't atomic in the load/store sense (on x86, 64-bit aligned reads actually are atomic at the hardware level), but because:

The compiler can cache values in registers across iterations, never reloading from memory
The CPU can reorder stores and loads across thread boundaries
The C memory model says concurrent unsynchronized access to the same location is undefined behavior. The compiler is allowed to assume it doesn't happen and optimize accordingly.

A mutex doesn't just serialize access. It also acts as a memory barrier. pthread_mutex_lock() prevents the CPU and compiler from reordering loads and stores across the lock boundary. Without it, the monitor might read a stale cached value of last_meal_time that was updated by the philosopher thread but not yet visible to the monitor's CPU core.

volatile is not a substitute. volatile prevents the compiler from caching a value, but it doesn't prevent CPU reordering, and it doesn't prevent the compiler from reordering other operations around the volatile access. Only proper synchronization, mutexes or atomics with appropriate memory ordering, is correct.

What It Actually Taught Me

The philosophers problem looks simple. It's a dinner table with forks. It's not simple. It's a precise model of every resource contention problem in concurrent systems: databases acquiring row locks, kernels acquiring spinlocks, applications acquiring semaphores. The failure modes are the same ones you find in every concurrent system at every scale.

The asymmetric fix is satisfying because it's minimal. One ordering rule, derived from first principles: break circular wait. No additional synchronization needed. Once you understand why it works, you can apply the same reasoning to any resource ordering problem you run into.

Threads, Mutexes, and the Anatomy of a Deadlock

Threads Share Memory, and That's the Problem

Data Races: Not Just a "Bad Luck" Bug

pthread_mutex_t at the OS Level

The Four Coffman Conditions

The Naive Implementation Guarantees Deadlock

The Asymmetric Fix

Livelock and Starvation: The Other Failure Modes

Building the Monitor Thread

Timing: Why usleep() Lies

Data Races on Shared State

What It Actually Taught Me

`pthread_mutex_t` at the OS Level

Timing: Why `usleep()` Lies