Part XXVI — libev

§ 35 — libev Internals: Loop Phases · Watcher Types · 4-Heap Timer · ANFD Map · Fork Safety · Threading

Why libev bridges the gap between raw epoll and a real event loop (§35.1) · ev_loop struct and ANFD fd map (§35.2) · six loop phases in ev_run() (§35.3) · ev_io lifecycle through epoll_ctl and dispatch (§35.4) · 4-heap timer algorithm and cache efficiency (§35.5) · ev_timer vs ev_periodic clock semantics (§35.6) · signal delivery via self-pipe (§35.7) · all watcher types and their syscalls (§35.8) · priority queue dispatch (§35.9) · fork safety and ev_loop_fork() (§35.10) · threading model and ev_async (§35.11) · ANFD merger reduces epoll_ctl calls (§35.12) · worked echo server O(k) dispatch (§35.14) · libev vs libevent vs libuv (§35.15)

1. § 35.1 — Why libev? From Raw epoll to an Event Library

epoll_wait() answers one question: which fds are readable or writable right now? A real server also needs timers, signal handling, child-process exit notification, cross-thread wakeup, and periodic callbacks — all wired together correctly, fork-safely, and with minimal overhead. libev provides this through a single abstraction: a watcher that attaches a callback to any event source.

Full Syscall Path: Application → libev → Kernel → Hardware

When your application calls ev_io_start(loop, w) for a socket fd, here is the exact path through the kernel before your callback fires:

  1. ev_io_start() links the watcher into anfds[fd].head and adds fd to fdchanges[] (no syscall yet — deferred).
  2. On the next ev_run() iteration Phase 2: epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) — kernel inserts an epitem into the eventpoll RB-tree. It also installs ep_poll_callback as a wait-queue entry in the socket's sk_wq.
  3. Phase 4: epoll_wait(epfd, events, 64, timeout_ms) — process sleeps on ep->wq.
  4. NIC receives a packet → DMA → sk_data_ready()ep_poll_callback() fires in softirq context → appends epitem to rdllist wake_up(&ep->wq).
  5. epoll_wait returns → libev looks up anfds[fd] → walks the ev_io watcher list → adds matching watchers to pendings[priority].
  6. Phase 6: watcher callback fires in the event loop thread.

libev vs libevent vs libuv

Aspectlibevlibeventlibuv
Timer heap4-heap (cache-optimal)Binary heapBinary heap
Bytes/watcher~56 B (ev_io)~120 B (event)~120 B (uv_poll_t)
I/O bufferingNone — user handles read/writebufferevent (rx/tx queues)Stream with alloc_cb
Thread poolNoneNone (use threads addon)Yes — uv_work_t for blocking ops
DNSNoneevdns (async)c-ares via thread pool
Timer clockdouble (CLOCK_MONOTONIC, ns)timeval (µs)uint64_t (ms)
Used bynginx (historical), many daemonsTor, tmux, ChromiumNode.js, Julia

Interactive: Backend Selection

libev backend selection — epoll vs kqueue vs poll vs select — C Demo
stdin (optional)

2. § 35.2 — The Loop: struct ev_loop Internals

Every libev application creates one ev_loop per thread (or uses the default loop). The struct is the central hub connecting all watcher types to the backend. Its key fields fall into four groups:

ev_loop Field Map

FieldTypeRole
backend_fdintepoll fd (Linux) or kqueue fd (BSD) — created by epoll_create1(EPOLL_CLOEXEC) in ev_loop_new()
anfds / anfdmaxANFD[] / intArray indexed by fd number. Each ANFD holds emask (current epoll interest) and the linked list of ev_io watchers for that fd
fdchanges[]int[]Pending fd updates. fds whose interest mask changed since last iteration. Flushed to epoll_ctl in Phase 2 — never called immediately on ev_io_start/stop
timers / timercntev_timer** / int4-ary min-heap of ev_timer structs. Root (index 0) always holds the earliest deadline
periodics / periodiccntev_periodic** / intSeparate min-heap for ev_periodic (wall-clock recurrence, self-correcting after NTP jumps)
pendings[NUMPRI][]WATCHER*[]Per-priority arrays of ready-to-dispatch watchers. Indexed by priority level (EV_MINPRI=-2 to EV_MAXPRI=2)
prepares / checks / idleslinked listsev_prepare (Phase 1), ev_check (post-Phase 4), ev_idle (when no other pending) watcher lists
async_fdinteventfd or self-pipe read end. Internal ev_io watches this — woken by ev_async_send() from other threads
curpidpid_tCached getpid() result. At loop start: if getpid() != curpid, fork was detected → ev_loop_fork() called automatically
now / now_floordoubleCached CLOCK_MONOTONIC time (seconds). Updated once per iteration to minimize clock_gettime() overhead

ANFD Array — the fd → watcher Index

anfds[fd] is libev's merge point for all watchers on the same fd. When multiple ev_io watchers monitor the same fd, only ONE epitem exists in the kernel — the union of all interest masks is passed to epoll_ctl(MOD).

Interactive: ev_loop Struct Explorer

ev_loop struct — fields, ANFD array, fdchanges flush — C Demo
stdin (optional)

3. § 35.3 — The Loop Iteration: ev_run() Phase-by-Phase

Each call to ev_run(loop, 0) enters a tight loop. Every iteration has exactly 6 phases in strict order. Understanding each phase tells you exactly when a syscall is made and why.

PhaseWhat happensSyscallData structure read/written
1 — PrepareAll ev_prepare callbacks fire (e.g., integrate foreign event loops, flush buffers)Noneprepares linked list
2 — fd flushfdchanges[] drained: compute new interest mask per fd, call epoll_ctl ADD/MOD/DELepoll_ctl()fdchanges[], anfds[].emask, kernel RB-tree
3 — TimeoutPeek 4-heap root (earliest timer), compute epoll_wait ms timeoutclock_gettime() (vDSO, no context switch)timers[0].at, ev_now()
4 — PollBlock in epoll_wait; on return, walk anfds[fd] for each ready event, enqueue pendingsepoll_wait()rdllist → anfds → pendings[]
5 — TimersUpdate ev_now(); pop expired timers from 4-heap; mark pending + re-heap if repeat>0clock_gettime() (vDSO)timers[] 4-heap, pendings[]
6 — DispatchDrain pendings[EV_MAXPRI] first down to EV_MINPRI; call ev_check; call ev_idle if pendings emptyNone (user callbacks)pendings[5][]

Why fdchanges Defers epoll_ctl

A naive implementation would call epoll_ctl() immediately inside ev_io_start(). libev instead queues the fd into fdchanges[] and flushes them all in Phase 2. This batches multiple watcher changes on the same fd into a single epoll_ctl call. If a callback calls ev_io_stop then ev_io_start on the same fd, only one epoll_ctl(MOD) is issued — not a DEL followed by an ADD.

Interactive: ev_run() Phase Stepper

ev_run() — step through all 6 phases with 2 ev_io + 1 ev_timer — C Demo
stdin (optional)

4. § 35.4 — ev_io Watcher: From Registration to Callback

ev_io is the most commonly used watcher. It bridges application fd interest to the backend epoll. The complete path from ev_io_start() to callback invocation touches three data structures: the anfds[] array, fdchanges[], and pendings[].

ev_io_start() → epoll_ctl → Dispatch

StepLocationWhat happens
ev_io_start(loop, w)libev (userspace)Link w into anfds[w->fd].head; compute new interest mask; if changed: add fd to fdchanges[]
Phase 2 flushlibev (userspace)epoll_ctl(epfd, ADD|MOD|DEL, fd, mask) — batch call covering all fdchanges
Kernel: epoll_ctl ADDkernel (syscall)Insert epitem into eventpoll RB-tree; install ep_poll_callback as wait_queue entry in sock->sk_wq
Data arrives on fdkernel softirqNIC DMA → sk_data_ready() → ep_poll_callback() → append epitem to rdllist → wake_up(&ep->wq)
epoll_wait returnslibev (userspace)events[i] = {fd, EPOLLIN}; walk anfds[fd].head; for each w: if w->events & revents → pendings[w->priority].append(w)
Phase 6 dispatchlibev (userspace)w->cb(loop, w, EV_READ) — your application callback

Interactive: ev_io Lifecycle (Start → epoll_ctl → Event → Stop)

ev_io lifecycle — registration, ANFD merge, epoll_ctl, dispatch — C Demo
stdin (optional)

Interactive: ANFD Merger — Multiple Watchers, One epitem

ANFD merger — multiple ev_io on same fd → single epoll_ctl — C Demo
stdin (optional)

5. § 35.5 — Timer Internals: The 4-Heap Algorithm

libev uses a quaternary min-heap (4-heap) rather than the more common binary heap. The root always holds the earliest deadline; its value is peeked in Phase 3 to compute the epoll_wait timeout.

4-Heap Tree Structure

Each node at index i has children at indices 4i+1, 4i+2, 4i+3, 4i+4. Its parent is at (i-1)/4. Compared to a binary heap:

PropertyBinary heap (libevent)4-heap (libev)
Children per node24
Tree height for N=1Mlog₂(1M) = 20 levelslog₄(1M) = 10 levels
Comparisons per insert (sift-up)20 × 1 = 2010 × 3 = 30 (compare up to 3 siblings)
Comparisons per delete (sift-down)20 × 2 = 4010 × 4 = 40
Cache lines per sift-down level2 children = 16B → may span 2 cache lines4 children = 32B → ONE 64B cache line
Real-world performanceBaseline~20% faster due to cache locality

Heap Operations

  • ev_timer_start(loop, w): compute w->at = ev_now() + w->after; append to timers[timercnt++];upheap(timercnt - 1) — O(log₄ n)
  • ev_timer_stop(loop, w): swap timers[w->active] with timers[--timercnt]; call downheap() or upheap() on the moved element — O(log₄ n)
  • Phase 5 expiry: while (timers[0]->at <= ev_now()) → pop root, mark pending, if w->repeat > 0: w->at = ev_now() + w->repeat → re-heap

Interactive: 4-Heap Visualizer

4-heap timer — insert, upheap, sift-down, expiry — C Demo
stdin (optional)

6. § 35.6 — ev_timer vs ev_periodic: Monotonic vs Wall Clock

Propertyev_timerev_periodic
Clock usedCLOCK_MONOTONIC (never jumps backward)CLOCK_REALTIME (wall clock, can jump)
SchedulingRelative: now + after, repeat from current ev_now()Absolute: next multiple of (offset + N × interval)
NTP jump behaviorUnaffected — monotonic clock continuesSelf-corrects to next wall-clock slot
DriftAccumulates if callback is slow (reschedules from ev_now())No drift — always anchors to wall-clock grid
Reschedule callbackNoneOptional reschedule_cb(w, now) for arbitrary schedule logic
Use caseTimeouts, retries, keepalivesCron-like jobs that must fire at wall-clock times

Interactive: Timer Drift and NTP Correction

ev_timer vs ev_periodic — drift, NTP jump, self-correction — C Demo
stdin (optional)

7. § 35.7 — Signal Watchers: ev_signal and the Signal Pipe

Signal handlers can interrupt any code — including malloc, epoll bookkeeping, and your own data structures. Calling libev dispatch functions from a signal handler is not async-signal-safe. libev's solution: the signal handler only writes 1 byte to a self-pipe (or increments an eventfd), and the actual ev_signal callback fires safely in the event loop thread.

Signal Delivery Path Step-by-Step

  1. ev_signal_start(loop, w, SIGINT) sigaction(SIGINT, &sa, NULL) installs handler → creates self-pipe with pipe2(O_NONBLOCK|O_CLOEXEC) → internal ev_io on pipefd[0] registered with epoll.
  2. SIGINT arrives → kernel delivers signal → handler runs: signals[SIGINT].pending = 1; write(pipefd[1], "\0", 1) — async-signal-safe.
  3. epoll_wait returns EPOLLIN on pipefd[0].
  4. libev reads and drains the pipe; scans signals[]; dispatches all pending ev_signal callbacks.

Interactive: Signal Pipe Delivery

ev_signal — self-pipe trick, sigaction, safe dispatch path — C Demo
stdin (optional)

8. § 35.8 — All Watcher Types and Their Syscalls

WatcherRegistration syscallEvent deliveryTypical use
ev_ioepoll_ctl(ADD/MOD)epoll_wait returns fd readyNetwork I/O, any fd
ev_timerNone (heap insert)epoll_wait timeout expiresConnection timeout, retry
ev_periodicNone (heap insert)epoll_wait timeout, wall-clock alignedCron-like, hourly jobs
ev_signalsigaction(signum) + pipe2()EPOLLIN on self-pipeSIGINT/SIGTERM handlers
ev_childsigaction(SIGCHLD)SIGCHLD → pipe → waitpid(WNOHANG)Child process exit
ev_stat (Linux)inotify_add_watch()EPOLLIN on inotify_fd → stat()File/dir change detection
ev_idleNoneFires when pendings[] empty (no kernel)Background GC, log flush
ev_prepareNonePhase 1 hook (before epoll_wait)State update, foreign loop
ev_checkNonePost-Phase 4 hook (after epoll_wait)Foreign loop dispatch
ev_asynceventfd() or pipe2()EPOLLIN on eventfd from any threadCross-thread wakeup
ev_embednested ev_loopev_io on nested loop backend_fdEmbed foreign event loop
ev_forkNonefork detection via getpid() checkRe-init after fork

ev_child — waitpid in the Event Loop

ev_child reuses the signal pipe: SIGCHLD fires → pipe wakes epoll → libev calls waitpid(-1, &status, WNOHANG) in a loop until it returns 0 → dispatches matching ev_child watchers by PID. The watcher's rstatus field contains the exit status for inspection in the callback.

ev_async — the Only Thread-Safe libev Call

ev_async_send(loop, w) may be called from any thread or signal handler. It calls write(loop->async_fd, &one, 8) on an eventfd, which is specified as atomic. The main loop's epoll_wait returns EPOLLIN on async_fd; libev reads to reset the counter and dispatches all pending ev_async watchers.

9. § 35.10 — Fork Safety: Why epoll Breaks After fork()

fork() copies the parent's address space, including ev_loop and its backend_fd (the epoll fd). Both parent and child now hold file descriptors pointing to the same kernel eventpoll object. The child's new sockets are not registered, and calling epoll_wait in the child monitors the parent's file descriptors — a serious bug.

ev_loop_fork() — What It Does

  1. close(old_epfd) — only closes this process's fd; the parent's epoll is unaffected.
  2. epfd = epoll_create1(EPOLL_CLOEXEC) — fresh kernel eventpoll for the child.
  3. Iterate all active anfds[]: for each entry with non-zero emask, call epoll_ctl(new_epfd, ADD, fd, mask) — O(active fds).
  4. Re-initialize signal pipe and async_fd.
  5. curpid = getpid() — resets fork detection for the child's own future forks.

libev detects fork automatically: at the start of each ev_run()iteration it compares getpid() against loop->curpid. If they differ, it calls ev_loop_fork() before proceeding. You can also call it explicitly right after fork().

Interactive: fork() + epoll Bug and Fix

fork() invalidates epoll fd — ev_loop_fork() fixes it — C Demo
stdin (optional)

10. § 35.11 — Threading Model and ev_async

libev's threading rule is simple: one event loop per thread; never call ev_io_start/stop or ev_run from a different thread. The only cross-thread primitive is ev_async_send().

Correct Multi-Thread Pattern

  1. Main thread creates ev_async watcher, starts it, then calls ev_run().
  2. Worker threads do CPU-intensive work (compute, compress, encrypt).
  3. Worker completion: ev_async_send(loop, &async_w) — writes to eventfd, thread-safe.
  4. Main thread wakes from epoll_wait → dispatchesev_async callback → safely calls ev_io_start/stop, enqueues response, modifies loop state.

Interactive: Cross-Thread Wakeup with ev_async

ev_async — cross-thread wakeup via eventfd — C Demo
stdin (optional)

11. § 35.14 — Worked Example: Echo Server O(k) Dispatch

Sequence: Client Connect → Data → Echo

Why O(k), Not O(N)

With 10,000 connected clients and only 3 sending data simultaneously, epoll_wait returns 3 events. libev looks up anfds[fd_A], anfds[fd_B], anfds[fd_C] — the other 9,997 clients are never touched. Compare with select/poll which scans all 10,000 fds every wakeup.

Interactive: O(k) Dispatch Demo

Echo server — O(k) dispatch: N clients, k active, epoll does not scan idle fds — C Demo
stdin (optional)

12. § 35.15 — libev vs libevent vs libuv

Interactive: Timer Heap and Memory Comparison

libev vs libevent vs libuv — heap performance and memory — C Demo
stdin (optional)

13. Interview Prep

QuestionAnswer
Walk through one full ev_run() iteration for a server with 10 ev_io watchers and 2 ev_timer watchers.Phase 1: ev_prepare callbacks (none if not registered). Phase 2: flush fdchanges[] — call epoll_ctl ADD/MOD/DEL for any changed fds. Phase 3: peek timers[0] from 4-heap root, compute epoll_wait timeout = min(deadline - ev_now(), max_block). Phase 4: epoll_wait() blocks; on return walk anfds[fd] for each ready event, enqueue matching ev_io watchers into pendings[priority]. Phase 5: clock_gettime(CLOCK_MONOTONIC), pop expired timers from 4-heap, mark pending + re-heap if repeat>0. Phase 6: drain pendings highest-priority first, call callbacks; then ev_check; then ev_idle if pendings empty.
What is ANFD? Why does libev need it? How many epoll_ctl calls result from 3 ev_io watchers on the same fd?ANFD (Array of Notified File Descriptors) is anfds[fd] — one entry per fd, holding a linked list of all ev_io watchers for that fd and the current epoll interest mask (emask). It is needed because epoll allows only ONE epitem per (epfd, fd) pair, but multiple watchers (e.g., EV_READ + EV_WRITE) may exist. libev merges all watcher event masks via OR and issues ONE epoll_ctl call. With 3 ev_io watchers on the same fd: still just 1 epoll_ctl ADD (or MOD if mask changes).
Why does libev use a 4-heap instead of a binary heap for timers? Quantify the cache benefit.A 4-heap has log₄(n) height vs log₂(n) for a binary heap — half the levels for the same N. During sift-down, all 4 children of a node fit in one 64-byte cache line (4 × 8 bytes of pointers). A binary heap may need 2 cache line accesses for 2 children. With N=1M timers: binary heap needs 20 levels × 2 cache accesses = 40 cache misses per op; 4-heap needs 10 levels × 1 cache access = 10 cache misses. Real-world: ~20% faster in timer-heavy benchmarks.
What is the difference between ev_timer and ev_periodic? Which clock does each use? What happens to each after an NTP time jump?ev_timer uses CLOCK_MONOTONIC and schedules relative to ev_now() (current loop time). After a slow callback, it reschedules from the current time, so it can drift. NTP jumps do not affect it. ev_periodic uses CLOCK_REALTIME and anchors to absolute wall-clock slots (offset + N × interval). After an NTP jump, ev_periodic self-corrects by advancing its deadline to the next valid wall-clock slot. Use ev_timer for timeouts and retries; ev_periodic for cron-like tasks that must fire at wall-clock times.
Explain why fork() breaks an inherited epoll fd. What does ev_loop_fork() do?fork() copies the parent's ev_loop including backend_fd. Both parent and child now hold fds pointing to the SAME kernel eventpoll object. The child's new sockets are not in that eventpoll; if the child calls epoll_wait it monitors the parent's fds. ev_loop_fork() fixes this: close(old_epfd), epoll_create1() for a fresh eventpoll, re-register all active anfds with epoll_ctl(ADD), re-init signal pipe, update curpid. libev also detects fork automatically via getpid() != curpid at the start of ev_run().
How does ev_signal deliver signal events safely? Draw the path.sigaction(signum) installs a handler; the handler only calls write(pipefd[1], 1 byte) — async-signal-safe. An internal ev_io monitors pipefd[0] with EPOLLIN registered in epoll. When the signal arrives: signal handler writes to pipe → epoll_wait returns EPOLLIN on pipe fd → libev reads pipe to drain, checks signals[signum].pending flag → dispatches ev_signal callback in the event loop thread. This guarantees the callback runs in a safe, non-signal context.
What is ev_async_send()? Why is it the only libev function safe to call from another thread?ev_async_send(loop, w) writes to an eventfd (or pipe) using write(), which is specified as async-signal-safe and can be called from any thread. The main loop's epoll_wait returns EPOLLIN on the eventfd fd, drains it with read(), then dispatches ev_async callbacks — all in the main thread where it is safe to modify loop state. All other libev functions (ev_io_start, ev_io_stop, etc.) must run in the loop's owning thread.
How does fdchanges[] reduce syscall overhead?ev_io_start/stop never call epoll_ctl immediately. Instead they add the fd to fdchanges[]. Phase 2 of each ev_run() iteration flushes all changes in a batch. If start + stop + start happen on the same fd in one iteration, only one epoll_ctl(MOD) is issued — not ADD+DEL+ADD. For a server accepting many connections in one callback, all new fds are batched into one set of epoll_ctl calls at the next Phase 2, not one syscall per accept().
Compare the delivery path for ev_io EV_READ vs ev_timer — which involves a kernel interrupt?ev_io EV_READ: NIC interrupt → softirq NAPI → sk_data_ready() → ep_poll_callback() (still in softirq) → rdllist append → wake_up() → epoll_wait returns → anfds[fd] lookup → pendings → callback. ev_timer: no kernel interrupt at all. libev peeks the 4-heap root in Phase 3 to compute epoll_wait timeout. When epoll_wait returns (either due to I/O or timeout), Phase 5 checks clock_gettime() against heap[0].at and pops expired timers. The timer fires via a timeout in epoll_wait, not via an IRQ.
How does the pendings[] priority array work?pendings is an array of arrays: pendings[EV_MAXPRI - EV_MINPRI + 1] = pendings[5]. Each slot holds watchers at one priority level. Phase 6 drains from index 4 (EV_MAXPRI=2) down to index 0 (EV_MINPRI=-2). Set priority with ev_set_priority(w, EV_MAXPRI) before ev_start. Priority only affects dispatch order within one loop iteration, not which event epoll_wait returns first.