Tech Notes

1. Overview

Linux exposes processes, threads, and coroutines as different points on the same sharing spectrum: a process is isolated by page tables, a thread is a kernel-scheduled task sharing the process resources, and a coroutine is a user-space control-flow object inside one thread. The kernel center of gravity is copy_process(), which creates a task_struct and decides what to share based on clone flags.

Process vs Thread vs Coroutine

A process buys fault isolation and independent VM; a pthread buys cheap sharing and kernel preemption; a coroutine buys extremely cheap cooperative switching but depends on the current thread to yield. The model below shows what is actually separate.

Model	Scheduler	Isolation	Typical Switch Cost	Memory Cost
`process`	Kernel scheduler	Separate VM, FD table copied unless shared explicitly	Microseconds; may include TLB effects	Page tables, VMAs, copied metadata, private stacks
`thread`	Kernel scheduler	Shares VM, heap, most process resources	Usually lower than process switch	One user stack, TLS, and kernel task per thread
`coroutine`	User-space event loop or runtime	No memory isolation inside the process	Tens to hundreds of nanoseconds in simple runtimes	Saved registers plus stack or frame state

2. Key Data Structures

The main structures are not just C structs: they are ownership boundaries. task_struct is the schedulable entity, mm_struct owns the address space, files_struct owns descriptor slots, and ucontext_t is a user-space snapshot for coroutine switching.

Structure	Important Fields	Purpose	Shared By
`task_struct`	`pid`, `tgid`, `mm`, `files`, `signal`	One kernel scheduling object per process thread.	Never shared; each thread has its own.
`mm_struct`	`mmap`, `pgd`, `mm_users`	Address space, VMA list/tree, and top page-table pointer.	Threads share it with `CLONE_VM`.
`files_struct`	`fdt`, `close_on_exec`, `open_fds`	Maps integer FDs to open file descriptions.	Pthreads share it with `CLONE_FILES`; fork copies it.
`vm_area_struct`	`vm_start`, `vm_end`, `vm_flags`, `vm_file`	One mapped region: text, heap, stack, mmap file, shared memory.	Referenced through `mm_struct`.
`ucontext_t`	saved registers, stack pointer, signal mask, successor context	User-space coroutine context; no kernel task is created.	Owned by the coroutine runtime.

3. Core Mechanism

fork and Copy-On-Write

Background: A shell often forks only to immediately exec a new program. Copying every page of a large parent would be wasteful, so Linux copies metadata and shares physical pages until one side writes.

The COW state is visible in the page tables: both parent and child map the same physical page with write permission removed, even though the VMA still says the mapping is logically writable.

Plan: create a child task, duplicate resource tables, mark writable PTEs read-only, and let the page-fault handler allocate a private page only if a write occurs.

Step	Parent State	Child State	Kernel Reason
Before fork	Writable PTE points at page A	Does not exist	Normal private anonymous memory.
After fork	Read-only PTE points at page A	Read-only PTE points at page A	Sharing makes fork proportional to metadata, not resident memory.
Child writes	Still maps page A	Faults, gets copied page B	Private semantics are restored lazily.
After fault	Sees original bytes	Sees modified bytes	Each process now has the version it should observe.

The fork-plus-threads trap

In a multi-threaded process, only the calling thread exists in the child after fork(). Any lock held by a vanished sibling may remain permanently locked in the child process copied memory.

fork, vfork, clone, and exec

fork(), vfork(), and clone() differ mostly in sharing policy. execve() is different: it keeps the task identity but replaces the user-space image.

Call	Address Space	FD Table	Parent Behavior	Use Case
`fork`	Copied metadata, COW physical pages	Copied slots pointing at same open file descriptions	Runs immediately	General process creation.
`vfork`	Temporarily shared with parent	Shared until exec or exit	Suspended	Immediate exec in constrained environments.
`clone thread`	Shared via `CLONE_VM`	Shared via `CLONE_FILES`	Runs immediately	POSIX thread implementation.
`clone namespace`	Usually separate like fork	Policy-dependent	Runs immediately	Container bootstrap.

A typical shell path is fork, adjust descriptors in the child, then exec. The close-on-exec bit is what prevents helper pipes, secrets, and listening sockets from leaking into the new image.

The descriptor table itself survives fork, but execve() consults the close_on_exec bitmap and closes marked descriptors before the new program starts.

Wrapper	Arguments	PATH Search	Environment
`execl`	List	No	Inherit
`execlp`	List	Yes	Inherit
`execle`	List	No	Explicit
`execv`	Vector	No	Inherit
`execvp`	Vector	Yes	Inherit
`execve`	Vector	No	Explicit; actual syscall

4. Minimal C Demo

Process vs Thread vs Coroutine Switch Cost — C Demo

stdin (optional)

fork + pthread_atfork Lock Repair — C Demo

stdin (optional)

clone Namespace Skeleton — C Demo

stdin (optional)

Mini Shell: pipe + fork + execve — C Demo

stdin (optional)

5. Kernel Source Pointers

kernel/fork.c: kernel_clone, copy_process, copy_mm, copy_files.
mm/memory.c: do_cow_fault, wp_page_copy.
fs/exec.c: do_execveat_common, begin_new_exec.
fs/file.c: descriptor table growth, close_on_exec bitmap handling.
arch/x86/entry/syscalls/syscall_64.tbl: syscall numbers for clone, fork, vfork, execve.

6. Interview Prep

Question	Concise Answer
Why is `fork()` fast even for a 20 GB process?	Linux copies page-table metadata and marks writable pages COW. Physical pages are copied only on first write.
What is shared after `fork()`?	FD entries point to the same open file descriptions, so file offsets and status flags are shared; address-space pages are COW, not independently copied.
Why is `fork()` dangerous in a multi-threaded program?	The child keeps only the calling thread but inherits memory containing locks held by vanished threads. Before exec, call only async-signal-safe functions or use atfork carefully.
What makes a pthread a thread rather than a process?	NPTL uses clone flags such as `CLONE_VM`, `CLONE_FILES`, `CLONE_SIGHAND`, and `CLONE_THREAD`.
Why use `O_CLOEXEC` instead of open then `fcntl`?	It is atomic. In a multi-threaded process, another thread could fork and exec between open and fcntl, leaking the FD.

7. §18.5-18.9 Process Lifecycle, Daemons, Job Control, Limits, and Privilege

After creation and exec, the operating-system story becomes lifecycle management: tasks move between runnable, sleeping, stopped, and zombie states; parents reap children; sessions and process groups control terminal signals; limits constrain damage; and UID/capability state decides which privileged operations are allowed.

8. Key Data Structures

Process state is a compact kernel-visible state machine. A zombie is not a running program; most resources are already gone, but the kernel keeps enough status for the parent to collect a reliable result.

Structure / API	Important Fields	Purpose	Interview Detail
`task_struct`	`state`, `exit_state`, `exit_code`, `real_parent`	Tracks scheduling state, parentage, and exit status until reaped.	Zombie state retains PID and status, not the full address space.
`signal_struct`	`pids`, `rlim`, `tty`, `flags`	Thread-group-wide signal, limit, and session-related state.	Limits such as `RLIMIT_NOFILE` live with signal-wide process state.
`struct rlimit`	`rlim_cur`, `rlim_max`	Soft and hard resource ceilings inherited across fork and exec.	A non-privileged process can lower hard limits but cannot raise them back.
`cred`	`uid`, `euid`, `suid`, `fsuid`, capability sets	Immutable credential snapshot referenced by permission checks.	Effective UID and effective capabilities are what most checks consult.

Terminal job control is a hierarchy: a session owns the controlling terminal, the terminal designates one foreground process group, and terminal-generated signals are delivered to that foreground group.

Resource limits are inherited defaults with two ceilings. The soft limit is enforced now; the hard limit bounds how far an unprivileged process can raise the soft limit later.

9. Core Mechanism

Background

A parent starts a worker, the worker exits with status 42, and the parent is busy doing other work. The kernel cannot throw the status away because the parent may later call waitpid(), so the child becomes a zombie until reaped.

Plan

On exit, release expensive resources such as VM mappings, open files, and userspace memory.
Retain a small process record containing PID, parent link, signal state, and exit status.
Notify the parent with SIGCHLD unless the disposition says otherwise.
When the parent calls waitpid or waitid, copy the status out and free the final task record.

Walkthrough

Initial state: parent PID 900 forks child PID 901. PID 901 exits with _exit(42); ps shows state Z because PID 900 has not reaped it. When PID 900 runs waitpid(901, &status, 0), the kernel returns 901, WEXITSTATUS(status) is 42, and PID 901 disappears from the process table.

Daemonization

A classic daemon detaches from the launching terminal by forking, creating a new session, forking again, resetting process environment details, and redirecting standard descriptors. The second fork matters because a session leader can acquire a controlling terminal; the grandchild is not a session leader.

Privilege Transition

Privilege is not just UID 0. Linux combines real/effective/saved UIDs, file ownership transitions, and capability sets. Modern services should acquire the one privileged resource they need, drop IDs and capabilities, and often set PR_SET_NO_NEW_PRIVS before processing untrusted input.

10. Minimal C Demo

These demos keep the mechanisms observable: a zombie appears before waitpid, a reusable daemon function performs the double fork, process group/session IDs change after setsid, descriptor limits produce EMFILE, and a privileged process permanently drops to nobody.

Zombie Until waitpid Reaps — C Demo

stdin (optional)

Reusable daemonize Function — C Demo

stdin (optional)

Process Group and Session IDs — C Demo

stdin (optional)

RLIMIT_NOFILE Produces EMFILE — C Demo

stdin (optional)

Drop Privilege to nobody — C Demo

stdin (optional)

11. Kernel Source Pointers

Area	Files / symbols	What to inspect
Exit and reaping	`kernel/exit.c`, `do_exit`, `do_wait`, `release_task`	How resources are released first and the final zombie record is released only after wait.
Signals	`kernel/signal.c`, `do_notify_parent`, `SIGCHLD`	How child exit notification is queued and delivered to the parent.
Sessions and groups	`kernel/sys.c`, `setsid`, `setpgid`, `drivers/tty/tty_jobctrl.c`	How sessions, process groups, and controlling tty foreground groups are maintained.
Resource limits	`kernel/sys.c`, `do_prlimit`, `include/uapi/linux/resource.h`	Where soft/hard rlimits are checked and updated.
Credentials and capabilities	`kernel/cred.c`, `kernel/capability.c`, `security/commoncap.c`	How immutable credentials and capability checks implement least privilege.

12. Interview Prep

Question	Concise Answer
What is a zombie process?	A child that exited but has not been reaped. The address space and files are gone; PID and exit status remain for `wait`.
What is an orphan process?	A process whose parent exited first. Linux reparents it to PID 1 or the nearest subreaper, which later reaps it.
Why does daemonization fork twice?	The first fork lets the child call `setsid`. The second fork ensures the final daemon is not a session leader and cannot reacquire a controlling terminal.
How do job-control signals reach a pipeline?	The terminal tracks one foreground process group. Ctrl-C and Ctrl-Z are delivered to every member of that foreground group.
What is the difference between effective UID and capabilities?	Effective UID is the traditional identity used by many permission checks. Capabilities split root privileges into smaller bits such as `CAP_NET_ADMIN` or `CAP_SYS_PTRACE`.

13. §18.10-18.11 pthread Lifecycle and Mutexes

POSIX threads are user-space API objects backed by Linux kernel tasks in the NPTL 1:1 model. Creating a thread is mostly libc preparing a stack, TLS, and a start trampoline, then asking the kernel to create a schedulable task that shares the process address space and file table.

14. Key Data Structures

A pthread is a small set of coordinated objects: a join handle, a user stack with a guard page, a TCB/TLS area, kernel task state, and optional synchronization words that stay in user memory until contention forces a futex syscall.

Structure / API	Important Fields	Purpose	Interview Detail
`pthread_t`	Opaque handle, often points at the TCB in NPTL	Identifies a joinable or detached pthread to libc APIs.	Do not compare with `==`; use `pthread_equal`.
`pthread_attr_t`	stack size, stack address, guard size, scheduling policy, detach state	Controls how `pthread_create` builds the thread.	Large default stacks can cap practical thread count before CPU does.
Thread stack	guard page, mapped stack region, current frames	Private call frames for one thread in the shared address space.	Overflow corrupts adjacent memory unless the guard page catches it first.
TCB / TLS	self pointer, errno slot, thread-local storage vector	Stores per-thread libc and language runtime state.	`errno` is thread-local, not one global integer.
`pthread_mutex_t`	lock word, owner, waiter bit, kind/protocol fields	Mutual exclusion object implemented with atomic fast path plus futex slow path.	Uncontended lock/unlock usually stays entirely in user space.

The mutex word has a deliberately small state machine. The common case is a compare-exchange from unlocked to locked; the kernel only sees the operation when a thread has to sleep or wake a waiter.

Mutex Type / Attribute	Behavior	Use Case	Pitfall
`NORMAL`	Default fast mutex; recursive lock by same thread deadlocks.	Most production code with clear ownership.	Undefined behavior on wrong-thread unlock.
`RECURSIVE`	Same owner may lock multiple times; unlock decrements count.	Legacy callback-heavy code where re-entry is unavoidable.	Often hides poor lock design and lengthens critical sections.
`ERRORCHECK`	Recursive lock returns `EDEADLK`; bad unlock returns an error.	Debug builds and teaching.	Slower; not a general deadlock detector.
`ROBUST`	Next locker gets `EOWNERDEAD` if the owner died.	Shared memory state that can be recovered after process death.	Caller must repair state and call `pthread_mutex_consistent`.
`PRIO_INHERIT`	Kernel boosts owner priority when a higher-priority waiter blocks.	Realtime threads using `SCHED_FIFO` or `SCHED_RR`.	Requires PI futex slow path; not needed for ordinary CFS code.
`PROCESS_SHARED`	Mutex may synchronize threads in different processes through shared memory.	`mmap MAP_SHARED` or POSIX shm regions.	The object must actually live in shared memory, not private heap.

15. Core Mechanism

Background

A server creates eight worker threads to share an in-memory cache. Thread creation needs independent stacks and TLS, while the cache update path needs mutual exclusion without paying a syscall on every uncontended lock.

Plan

Initialize attributes, optionally reducing stack size or choosing detached state.
Call pthread_create; libc allocates stack/TCB/TLS and uses clone sharing flags.
Keep joinable threads until pthread_join collects the return value and releases resources, or detach them when no result is needed.
Protect shared state with a mutex; the uncontended path is atomic user-space code, while contended waiters sleep through futex.

Walkthrough

Initial state: counter=0, four threads each run 100,000 increments. Without a mutex, two threads can read 17, both compute 18, and both store 18, losing one increment. With pthread_mutex_lock, one thread changes the mutex word from 0 to 1, updates the counter, unlocks, and only then can another thread enter; the final protected counter is exactly 400,000.

Robust Owner Death

Robust mutexes are for recoverable shared state, especially across processes. If an owner dies in the critical section, the next locker is told that protected data may be inconsistent rather than silently continuing.

16. Minimal C Demo

The first demo shows pthread creation attributes, kernel TIDs, return values, and joins. The second demo contrasts a racing increment with the same increment protected by a normal mutex.

pthread_create, pthread_attr, and join — C Demo

stdin (optional)

Race Counter vs Mutex Counter — C Demo

stdin (optional)

17. Kernel Source Pointers

Area	Files / symbols	What to inspect
Thread creation	`kernel/fork.c`, `kernel_clone`, `copy_process`	How clone flags decide shared `mm_struct`, files, sighand, and thread group membership.
TLS setup	`arch/x86/kernel/process_64.c`, `copy_thread_tls`	Where the new thread stack, register state, and TLS pointer are prepared.
Futex wait/wake	`kernel/futex/core.c`, `futex_wait`, `futex_wake`	How contended pthread mutexes sleep and wake through a kernel hash table keyed by user address.
Priority inheritance	`kernel/futex/pi.c`, `FUTEX_LOCK_PI`, `rt_mutex`	How PI futexes connect pthread priority inheritance to kernel realtime mutex logic.
Robust lists	`kernel/futex/core.c`, `exit_robust_list`, `set_robust_list`	How the kernel marks owner-dead mutexes when a task exits while holding them.

18. Interview Prep

Question	Concise Answer
What does `pthread_create` do on Linux?	NPTL prepares stack and TLS in libc, then uses clone with flags such as `CLONE_VM`, `CLONE_FILES`, `CLONE_SIGHAND`, `CLONE_THREAD`, and `CLONE_SETTLS`.
Joinable vs detached thread?	A joinable thread retains termination resources and a return value until joined. A detached thread releases resources automatically and cannot be joined.
Why can default pthread stacks limit scalability?	Each thread reserves a stack mapping, often megabytes plus a guard page. Thousands of threads can exhaust virtual memory or address-space policy before CPU is saturated.
How does a pthread mutex use futex?	The fast path is an atomic user-space transition. On contention, libc marks waiters and calls futex wait; unlock calls futex wake when needed.
When do you use robust or process-shared mutexes?	Use robust when owner death must be detected and repaired. Use process-shared only when the mutex object lives in shared memory and coordinates multiple processes.

19. §18.12-18.17 Sync Primitives Beyond Mutexes

Mutexes protect exclusive critical sections, but real pthread programs also need to wait for predicates, admit many readers, synchronize phases, survive cancellation, initialize shared state once, store per-thread data, and handle process-directed signals without running unsafe code in an async handler.

20. Key Data Structures

A condition variable is not the condition itself. The condition is the protected predicate such as count > 0; the condition variable is only the wait queue used when the predicate is false.

Primitive	Important State	Purpose	Rule Interviewers Care About
`pthread_cond_t`	wait sequence, wake sequence, futex word, associated mutex protocol	Sleep until a mutex-protected predicate may have changed.	Always wait in `while (!predicate)`, never `if`.
`pthread_rwlock_t`	reader count, writer ownership, reader/writer wait queues	Allow many readers or one writer.	Reader preference can starve writers; recursive read locking can deadlock with pending writers.
`pthread_spinlock_t`	one atomic lock word	Busy-wait for extremely short critical sections.	Bad around syscalls, blocking I/O, or any critical section longer than a context switch.
`pthread_barrier_t`	required count, arrived count, generation number	Release a group only after N participants reach the same phase.	Reusable barriers need a generation counter so old waiters do not mix with the next round.
cleanup stack	LIFO cleanup handlers pushed by `pthread_cleanup_push`	Run unlock/free handlers when a thread is canceled or exits.	Deferred cancellation is manageable; async cancellation is almost never safe in C.
`pthread_key_t`	global key index plus per-thread slot values and destructor	Dynamic thread-specific data with cleanup on thread exit.	Different threads see different values for the same key.

A reader-writer lock is useful when protected data is read often and mutated rarely. The state must distinguish active readers, an active writer, and waiters whose preference policy decides who enters next.

Spinlocks and barriers solve opposite timing problems: a spinlock avoids sleeping because the hold time should be tiny, while a barrier deliberately waits until every participant has reached the same phase boundary.

Thread-specific data is a two-level lookup. The key is process-wide, but each thread has its own slot value, and a destructor receives that thread's non-null value when the thread exits.

21. Core Mechanism

Background

A bounded queue starts empty. A consumer must sleep until a producer inserts an item, but the sleep must not miss the producer's wakeup and must not consume from an empty buffer after a spurious wake.

Plan

Protect queue state with one mutex: head, tail, and count.
Consumers use while (count == 0) pthread_cond_wait(&not_empty, &mutex).
Producers use while (count == CAP) pthread_cond_wait(&not_full, &mutex).
After changing the predicate, signal the opposite side before unlocking or while still holding enough ordering through the mutex.

Walkthrough

Initial state: count=0, consumer C holds the mutex and checks the predicate. C calls pthread_cond_wait, which atomically places C on the wait queue and unlocks the mutex. Producer P locks, writes item 7, changes count to 1, signals not_empty, and unlocks. C wakes, reacquires the mutex before returning, checks count > 0 again, removes item 7, signals not_full, and unlocks. The mutex creates the happens-before edge; the loop creates correctness under spurious or competing wakes.

Cancellation and Cleanup

Cancellation is only sane when resources are paired with cleanup handlers. The common pattern is lock, push an unlock handler, call cancellation points, then pop the handler when the protected region is finished.

Multi-Threaded Signals

Signal handlers are process-wide, but each thread has its own signal mask. Production pthread programs usually block signals in all workers and dedicate one thread to sigwait(), so shutdown code can use normal mutexes and condition variables instead of async-signal-safe handler rules.

22. Minimal C Demo

These demos focus on the failure-resistant patterns: predicate loops with two condition variables, phase synchronization with a barrier, cleanup handlers around cancellation points, exactly-once initialization with TSD destructors, and synchronous signal handling through sigwait.

Bounded Buffer with Two Condition Variables — C Demo

stdin (optional)

RWLock Readers with Barrier Phases — C Demo

stdin (optional)

Cancellation Cleanup Unlocks Mutex — C Demo

stdin (optional)

pthread_once and Thread-Specific Data — C Demo

stdin (optional)

Dedicated sigwait Signal Thread — C Demo

stdin (optional)

23. Kernel Source Pointers

Area	Files / symbols	What to inspect
Futex-backed waits	`kernel/futex/core.c`, `futex_wait`, `futex_wake`, `futex_requeue`	How pthread condition variables and locks sleep on user-space words only after libc decides the fast path cannot proceed.
Signal delivery	`kernel/signal.c`, `complete_signal`, `get_signal`, `do_sigtimedwait`	How Linux chooses an eligible thread and how synchronous waiting consumes pending signals.
Thread masks	`kernel/signal.c`, `set_current_blocked`, `rt_sigprocmask`	Where per-thread blocked signal masks are updated.
Scheduling impact	`kernel/sched/core.c`, `schedule`, `try_to_wake_up`	What happens after a futex wait parks a thread and a wake makes it runnable again.

24. Interview Prep

Question	Concise Answer
Why must `pthread_cond_wait` be in a while loop?	Because wakeups can be spurious and another waiter may consume the state first. The predicate, not the wakeup, decides whether progress is legal.
What is a lost wakeup?	A signal happens before the waiter is actually waiting and the predicate is not protected consistently, so the waiter sleeps even though work is available.
When is an rwlock better than a mutex?	When reads dominate, critical sections are large enough to justify rwlock overhead, and writer starvation policy is acceptable or configured.
When should you use a pthread spinlock?	Only for very short non-blocking critical sections where sleeping would cost more than waiting. Never hold it across I/O or calls that may block.
How should a multi-threaded program handle SIGINT or SIGTERM?	Block those signals in all threads, create one signal thread that uses `sigwait` or `signalfd`, then notify workers through normal synchronization.

25. Overview

Concurrency patterns are reusable shapes for moving work through shared state without letting memory, latency, or failure domains grow without control. The practical Linux building blocks are bounded queues, pthread synchronization, process workers, event loops, and a clear choice of where isolation should live.

26. Key Data Structures

A bounded producer-consumer queue is a ring plus two predicates. not_full is true when producers may insert, and not_empty is true when consumers may remove; both predicates are guarded by the same mutex.

The single-producer single-consumer ring removes the mutex by giving each side exclusive ownership of one index. Release/acquire ordering publishes the slot contents before the consumer observes the advanced tail.

Structure	Fields	Purpose	Failure Mode
bounded ring queue	`buf`, `head`, `tail`, `count`, mutex, two condvars	Move finite work between producers and consumers with backpressure.	Using `if` instead of `while` consumes from empty or writes into full under competing wakeups.
SPSC ring	atomic `head`, atomic `tail`, power-of-two slots	Fast one-producer one-consumer handoff.	Extending it to MPMC without CAS claim/publish corrupts slots.
thread pool	worker array, bounded queue, shutdown flag, join path	Bound CPU parallelism and thread stack memory.	Unbounded queues hide overload until memory or tail latency fails.
prefork worker pool	master process, inherited listen FD, worker PIDs	Isolate faults and avoid thread-shared memory bugs.	High memory footprint and heavier context switches than threads.
epoll worker loop	nonblocking FDs, epoll interest list, event batch	Scale many mostly-idle connections with few threads.	One blocking call can stall the whole event loop thread.

Reader-writer patterns choose different costs for the same read-mostly problem: rwlock blocks writers until readers drain, seqlock makes readers retry, and RCU lets old readers finish before freeing replaced data.

Userspace RCU separates pointer publication from reclamation. Readers mark a read-side region, writers publish a replacement, and the old object is freed only after every pre-existing reader has passed through a quiescent state.

A fixed thread pool is a bounded queue plus sleeping workers. Shutdown is just another state transition: set the quit flag, broadcast the waiters, and join every worker.

27. Core Mechanism

Background

A service receives bursts of requests, but only has enough CPU and memory for a fixed number of workers and queued jobs. The concurrency design must absorb small bursts, apply backpressure during overload, and shut down without losing ownership of queued work.

Plan

Use a bounded queue so submitters block or fail fast when the system is overloaded.
Protect queue state with one mutex and use not_empty and not_full predicates.
Workers sleep while the queue is empty and the shutdown flag is false.
Shutdown sets the flag, broadcasts workers, drains or rejects new submissions, then joins every thread.

Walkthrough

Initial state: queue capacity is 8, count=0, and three workers are sleeping on not_empty. A producer submits jobs 1 through 8, advancing tail and signaling workers. Job 9 blocks because count == CAP. Worker 0 pops job 1, decrements count, signals not_full, and executes outside the mutex, so job 9 can be queued while CPU work continues. During shutdown, the main thread sets quit=1 and broadcasts; a worker exits only when the queue is empty and quit is set, so already accepted work is not abandoned.

Work Stealing

Work stealing reduces contention by giving each worker a private deque. The owner pushes and pops recent tasks from the head for cache locality, while idle workers steal older tasks from the tail to rebalance load.

Server Models

Apache prefork spends memory to isolate each request handler in a process. Nginx spends engineering complexity on event-driven workers that epoll shared listening sockets and keep many idle connections without one stack per connection.

In the Nginx-style model, the master supervises workers while the kernel owns readiness notification and accept-queue behavior. SO_REUSEPORT can shard incoming connections across worker sockets.

A master can also accept centrally and pass connected sockets to workers over a Unix socket with SCM_RIGHTS. The receiving worker gets a valid duplicate FD in its own descriptor table.

Multi-threaded servers choose between one thread per connection, a bounded worker pool around epoll, M:N runtimes, and io_uring variants. The common rule is to keep blocking work away from the event loop.

An epoll worker repeatedly sleeps for readiness, accepts or reads nonblocking FDs, writes responses, and re-arms interest when edge-triggered or one-shot modes are used.

Model Comparison

Concurrency models are trade-offs between fault isolation, switch cost, memory footprint, and reasoning cost. Processes isolate failures; threads share memory cheaply but make data races possible; coroutines and event loops are fast but require cooperative nonblocking discipline.

28. Minimal C Demo

These runnable demos show the core interview patterns: a many-producer many-consumer bounded queue, a minimal thread pool with graceful shutdown, and the shape of a prefork worker pool using a pipe as the stand-in for accepted connections.

MPMC Bounded Queue with Backpressure — C Demo

stdin (optional)

Tiny Thread Pool with Graceful Shutdown — C Demo

stdin (optional)

Prefork Worker Pool Skeleton — C Demo

stdin (optional)

29. Kernel Source Pointers

Area	Files / symbols	What to inspect
Blocking queue waits	`kernel/futex/core.c`, `futex_wait`, `futex_wake`	How pthread condvars park and wake threads after user-space predicates decide the fast path cannot continue.
Worker scheduling	`kernel/sched/core.c`, `schedule`, `try_to_wake_up`	What happens when a sleeping worker becomes runnable after a queue signal.
Process worker creation	`kernel/fork.c`, `copy_process`, `dup_fd`	How prefork workers inherit the listening socket and why each worker has a separate address space.
FD passing	`net/unix/af_unix.c`, `unix_stream_sendmsg`, `scm_detach_fds`	How `SCM_RIGHTS` transfers references to open files across Unix sockets.
Event readiness	`fs/eventpoll.c`, `ep_poll`, `ep_send_events`	How epoll batches readiness for event-loop and thread-pool server designs.

30. Interview Prep

Question	Concise Answer
How do you design a correct producer-consumer queue?	Use one mutex for queue state, two predicates for not-full and not-empty, wait in loops, signal after changing the predicate, and bound capacity for backpressure.
What makes a thread pool production-ready?	Bounded queue, clear rejection or blocking policy, worker lifecycle management, graceful shutdown, no executing jobs while holding the queue mutex, and observable backlog.
When is RCU better than an rwlock?	When reads dominate and can tolerate seeing old versions while writers copy-swap and defer reclamation until all old readers finish.
Apache prefork vs Nginx worker model?	Prefork favors isolation and simple blocking code at high memory cost. Nginx favors event-driven scalability with fewer workers but requires nonblocking discipline.
Thread per connection vs epoll thread pool?	Thread per connection is simple but stack-heavy and context-switch-heavy. An epoll pool handles many idle connections with fewer threads and offloads blocking CPU work to workers.

31. §18.24-18.28 Deadlocks, PI Futexes, IPC Sync, and Coroutines

Advanced user-space concurrency is about the failure modes that appear after the basic mutex and queue code works: lock cycles, realtime priority inversion, cross-process ownership, file-lock semantics, and runtimes that multiplex logical tasks without creating one kernel thread per task.

32. Key Data Structures

A deadlock is a cycle in a wait-for graph. In the classic two-lock bug, each thread owns one lock and waits for the other, so neither edge can be removed by normal execution.

Cross-process synchronization works only when the synchronization word and protected data live in shared memory. A process-shared pthread mutex on a private heap is just two unrelated copies after fork.

Structure / API	Important State	Purpose	Interview Detail
wait-for graph	threads, resources, waits-for edges	Detect cycles that imply deadlock.	Pthreads do not maintain a general deadlock detector for you.
`PTHREAD_PRIO_INHERIT`	owner TID, waiter priority, PI futex state	Boost a low-priority mutex owner when a higher-priority thread blocks.	Useful for realtime scheduling, not ordinary throughput tuning.
`PROCESS_SHARED` mutex	futex word stored in shared mapping	Synchronize independent processes without a server process.	Combine with robust mutexes when owner death must be recovered.
file locks	inode, byte range, owner scope	Coordinate access to persistent files across processes.	`fcntl` process-scoped locks surprise many people; OFD locks are often cleaner.
`ucontext_t`	saved registers, stack, signal mask, link context	Stackful cooperative coroutine state in user space.	No preemption: a fiber that never yields blocks all fibers on that thread.

Linux exposes several cross-process coordination tools. The crucial difference is ownership scope: whole file, byte range, process, open file description, named kernel semaphore, or futex word in a shared page.

A stackful coroutine is just another saved CPU context and stack inside one OS thread. Switching saves the current instruction and stack pointers, then restores another coroutine's saved state.

33. Core Mechanism

Background

A service has two shared indexes guarded by two locks. One path updates cache then index; another updates index then cache. Under load, thread A owns the cache lock and waits for the index lock while thread B owns the index lock and waits for the cache lock.

Plan

Assign every lock a canonical rank: static enum, object address, or lock class.
Acquire multiple locks in increasing rank and release them in reverse rank.
For optional locks, use pthread_mutex_trylock with release-and-backoff instead of waiting while holding lower-priority resources.
In complex systems, log wait-for edges and fail fast when adding an edge would create a cycle.

Walkthrough

Initial state: lock A has rank 10 and lock B has rank 20. Thread 1 asks for A then B; thread 2 asks for B then A. The ordered helper sorts both requests into A then B. Thread 1 may run first and hold both; thread 2 waits at A without owning B, so there is no cycle. When thread 1 releases B then A, thread 2 proceeds and the circular-wait Coffman condition is removed.

Priority Inversion

Priority inversion appears when a low-priority lock owner is preempted by medium-priority work while a high-priority thread waits behind it. The high-priority thread is indirectly blocked by a thread that does not even need the lock.

A priority-inheritance mutex lets the kernel see the dependency. When the high-priority waiter blocks, the kernel temporarily boosts the low-priority owner so it can finish and release the mutex.

Coroutine Runtimes

Modern coroutine systems all split logical waiting from OS-thread blocking. Go schedules goroutines onto worker threads, while C++20 and Rust represent suspended work as state machines that an executor resumes when I/O or timers complete.

A stackless coroutine moves through explicit compiler-generated states at each suspension point. Local variables that must survive a suspension live in the coroutine frame rather than on an ordinary caller stack.

34. Minimal C Demo

These demos show the practical core without requiring realtime privileges: ordered two-lock acquisition prevents a deadlock, a process-shared mutex coordinates two forked children through anonymous shared memory, stackful coroutines ping-pong with swapcontext, and the model comparison prints equivalent coroutine shapes.

Deadlock Prevention by Lock Ordering — C Demo

stdin (optional)

PROCESS_SHARED Mutex in mmap Memory — C Demo

stdin (optional)

ucontext Coroutine Ping-Pong — C Demo

stdin (optional)

Coroutine Model Comparison — C Demo

stdin (optional)

35. Kernel Source Pointers

Area	Files / symbols	What to inspect
PI futexes	`kernel/futex/pi.c`, `futex_lock_pi`, `futex_unlock_pi`	How userspace PRIO_INHERIT mutexes route contention through kernel realtime mutex logic.
Realtime mutex core	`kernel/locking/rtmutex.c`, `rt_mutex_adjust_prio_chain`	How priority donation walks lock-owner chains and prevents unbounded inversion.
Shared-memory futex keys	`kernel/futex/core.c`, `get_futex_key`	How the kernel distinguishes private futexes from futex words in shared mappings.
File locks	`fs/locks.c`, `fcntl_setlk`, `flock_lock_inode`	Where BSD locks, POSIX byte-range locks, and OFD locks are represented.
Scheduling	`kernel/sched/core.c`, `setscheduler`, `schedule`	How realtime priority changes and wakeups affect runnable thread selection.

36. Interview Prep

Question	Concise Answer
What are the Coffman deadlock conditions?	Mutual exclusion, hold-and-wait, no preemption, and circular wait. Lock ordering removes circular wait.
What is priority inversion?	A high-priority thread waits on a lock held by a low-priority thread while medium-priority work prevents the owner from running. PI boosts the owner until it releases the lock.
How do process-shared pthread mutexes work?	The mutex object and protected data live in a shared mapping, and pthreads uses futex operations keyed to that shared memory address.
Why are OFD locks often safer than classic fcntl locks?	OFD locks are scoped to the open file description, so duplicate FDs and threads behave more predictably than process-scoped POSIX record locks.
Stackful fiber vs C++20 coroutine?	A stackful fiber switches an entire stack and can suspend deep in calls. A C++20 coroutine is a compiler-generated state machine that suspends only at explicit coroutine points.

§ 18.1-18.28 Processes, pthreads, Sync, and Advanced Concurrency

1. Overview

Process vs Thread vs Coroutine

2. Key Data Structures

3. Core Mechanism

fork and Copy-On-Write

The fork-plus-threads trap

fork, vfork, clone, and exec

4. Minimal C Demo

5. Kernel Source Pointers

6. Interview Prep

7. §18.5-18.9 Process Lifecycle, Daemons, Job Control, Limits, and Privilege

8. Key Data Structures

9. Core Mechanism

Background

Plan

Walkthrough

Daemonization

Privilege Transition

10. Minimal C Demo

11. Kernel Source Pointers

12. Interview Prep

13. §18.10-18.11 pthread Lifecycle and Mutexes

14. Key Data Structures

15. Core Mechanism

Background

Plan

Walkthrough

Robust Owner Death

16. Minimal C Demo

17. Kernel Source Pointers

18. Interview Prep

19. §18.12-18.17 Sync Primitives Beyond Mutexes

20. Key Data Structures

21. Core Mechanism

Background

Plan

Walkthrough

Cancellation and Cleanup

Multi-Threaded Signals

22. Minimal C Demo

23. Kernel Source Pointers

24. Interview Prep

25. Overview

26. Key Data Structures

27. Core Mechanism

Background

Plan

Walkthrough

Work Stealing

Server Models

Model Comparison

28. Minimal C Demo

29. Kernel Source Pointers

30. Interview Prep

31. §18.24-18.28 Deadlocks, PI Futexes, IPC Sync, and Coroutines

32. Key Data Structures

33. Core Mechanism

Background

Plan

Walkthrough

Priority Inversion

Coroutine Runtimes

34. Minimal C Demo

35. Kernel Source Pointers

36. Interview Prep