Part XVII — libc

§17.1–17.34 libc, Loader, ELF, Toolchain, ABI, and Common C Libraries

How C code reaches libc, how glibc manages runtime state, how the compiler, linker, and loader turn source into ELF, how cross toolchains target another Linux system, how inspection tools explain real binaries, and how ABI rules plus common compression, serialization, and container libraries shape production C programs.

1. Overview

libc is the ABI adapter between portable C or POSIX calls and the Linux process model. Some calls are pure userspace library work, some wrap the syscall instruction, and a small set use the kernel-mapped vDSO so the process can read current kernel state without paying for a privilege transition.

The choice of libc is an engineering trade-off: glibc favors compatibility and breadth, musl favors simplicity and static linking, embedded libcs favor size and configurability, and Android or bare-metal libcs adapt the same C surface to very different kernels or no kernel at all.

libcLicenseTypical targetSize profilePractical trade-off
glibcLGPL-2.1+Linux servers and desktopslargeFull POSIX surface, NSS plugins, locales, IFUNC optimized memcpy, broad ABI stability.
muslMITAlpine, static binaries, containerssmallSimple codebase, static link friendly, predictable DNS and locale behavior, fewer legacy extensions.
uClibc-ngLGPL-2.1embedded LinuxconfigurableBuild-time feature selection for small root filesystems and older embedded targets.
BionicBSD-styleAndroidmediumAndroid-specific libc with tight platform integration and different POSIX coverage trade-offs.
newlib/picolibcBSD-stylebare metal and RTOStinyNo kernel assumption; board support package supplies syscall stubs such as _write and _sbrk.
dietlibc/Cosmopolitanmixedsize or portable experimentstiny/nicheSpecial-purpose alternatives for very small Linux binaries or portable single-file executables.

2. Key Data Structures

A dynamically linked Linux process contains the application, libc, the vDSO mapping, thread-local storage, and the initial auxiliary vector that tells libc where kernel-provided user pages live.

ObjectShapePurpose
libc.soELF shared object or static archive contentProvides C/POSIX APIs, wrapper policy, allocator, stdio, resolver, and startup glue.
linux-vdso.so.1Small read-only ELF image mapped by the kernelExports symbols such as __vdso_clock_gettime for selected pseudo-syscalls.
vvarKernel-updated read-only data pageStores timekeeping base values and sequence counters consumed by vDSO code.
auxvArray of ElfW(auxv_t) entries on the initial stackPasses process facts from kernel to loader; AT_SYSINFO_EHDR points at the vDSO ELF header.
errnoThread-local integer behind a macroHolds the positive error code after libc converts a negative kernel return into API convention.

3. Core Mechanism

Background

A C program wants one stable API, but Linux exposes a lower-level syscall ABI where numbers and arguments live in specific CPU registers. libc hides that ABI, adds POSIX behavior such as cancellation points, and converts kernel error returns into the familiar -1 plus errno convention.

Plan

  1. Application code calls a libc wrapper such as write(fd, buf, len).
  2. libc moves the syscall number and arguments into the architecture syscall registers.
  3. The CPU executes syscall on x86_64 or svc #0 on aarch64.
  4. The kernel dispatches through its syscall table and returns either a non-negative result or a negative errno value.
  5. libc maps negative kernel results to thread-local errno and returns -1.
ArchitectureTrap instructionSyscall numberArgumentsReturnNote
x86_64syscallraxrdi, rsi, rdx, r10, r8, r9raxrcx and r11 clobbered by the instruction.
aarch64svc #0x8x0, x1, x2, x3, x4, x5x0Negative errno is returned in x0; libc converts it.

Walkthrough

Suppose the program calls write(-1, "x", 1). The kernel does not set userspace errno; it returns a negative value in the return register. The wrapper recognizes the reserved negative range, stores the positive code in TLS, and returns -1 so the POSIX API stays uniform across architectures.

vDSO Fast Path

Time calls are different because the kernel can publish read-only timekeeping data into every process. libc finds the vDSO through AT_SYSINFO_EHDR, resolves symbols like __vdso_clock_gettime, and calls them directly when the clock source allows a safe user-mode calculation.

A successful vDSO clock_gettime(CLOCK_MONOTONIC) call never enters ring 0. It reads a sequence counter, base time, and CPU counter; if the sequence changed mid-read or the clock is unsupported, libc can fall back to the real syscall path.

APICommon pathWhy it can avoid the kernel
clock_gettimeUsually vDSOReads vvar timekeeper data plus a CPU counter; falls back to syscall when unsupported.
gettimeofdayUsually vDSOLegacy wall-clock API; still accelerated for compatibility-heavy programs.
timeUsually vDSOSeconds precision can be answered from shared kernel-maintained data.
getcpuUsually vDSOReturns CPU and NUMA node without entering the scheduler path.

4. Minimal C Demo

This program prints with both the libc write wrapper and the raw syscall(SYS_write) interface, then compares normal clock_gettime against a forced real syscall. On most Linux machines the libc path is faster because it uses vDSO.

libc Wrapper, Raw Syscall, and vDSO Timing — C Demo
stdin (optional)

5. Kernel Source Pointers

AreaFiles / symbolsWhat to inspect
Syscall entryarch/x86/entry/entry_64.S, arch/x86/entry/common.cRegister save, syscall dispatch, return-to-user checks.
Syscall tablesarch/x86/entry/syscalls/syscall_64.tbl, include/linux/syscalls.hMapping from syscall numbers to kernel handler names.
write pathfs/read_write.c, ksys_write, vfs_writeHow an fd and user buffer become a VFS write operation.
vDSO setuparch/x86/entry/vdso/, arch/x86/include/asm/vdso/vDSO image, exported symbols, and timekeeping data access.
Auxiliary vectorfs/binfmt_elf.c, AT_SYSINFO_EHDRHow the kernel passes vDSO location and process metadata to user space.

6. Interview Prep

Why does libc exist if Linux already has syscalls?

Syscalls are a kernel ABI, not the full C/POSIX programming environment. libc provides startup code, wrappers, errno handling, cancellation semantics, malloc, stdio, DNS, locale, dynamic loading hooks, and portability across kernel versions and architectures.

What is the difference between glibc and musl in practice?

glibc is broad, highly compatible, and optimized for mainstream Linux distributions. musl is smaller, simpler, MIT-licensed, and strong for static containers, but may differ in extensions, resolver behavior, locale coverage, and performance characteristics.

What happens when a syscall fails?

The kernel returns a negative errno value in the return register. The libc wrapper converts it to -1, stores the positive error number in thread-local errno, and returns to the caller.

Why is vDSO faster than a normal syscall?

vDSO code runs in user mode and reads kernel-maintained shared pages, so it avoids privilege transition, syscall dispatch, register save/restore overhead, and scheduler return-to-user checks.

When does libc use syscall(2) directly?

Programs can call syscall(2) for newer or uncommon syscalls before a stable wrapper exists. The trade-off is that the caller must handle syscall numbers, argument types, restart behavior, and portability details explicitly.

7. §17.4–17.5 Allocator Internals

glibc malloc is ptmalloc2: a dlmalloc-derived allocator with arenas to reduce lock contention, bins to classify free chunks by size, and a per-thread tcache fast path for the hottest small allocations.

FieldType / sizePurpose
prev_sizesize_t, one machine wordValid when the previous chunk is free; lets free coalesce backward.
sizesize_t, one machine wordChunk size plus low-bit flags such as previous-in-use and mmap-backed.
fd / bktwo pointersLinks used only while a chunk is free and parked in allocator bins.
payloadaligned byte rangeThe address returned to the caller; allocator metadata lives immediately before it.

Background

A high-throughput C service allocates many short-lived objects but cannot take a global heap lock for every request. ptmalloc2 first tries thread-local state, then arena-local bins, then grows the process with brk or mmap.

Plan

  1. Round the request up to an aligned chunk size and choose a size class.
  2. Check the calling thread's tcache; no arena lock is needed on a hit.
  3. For small objects, consult fastbins or smallbins in the selected arena.
  4. For larger reusable chunks, scan the unsorted bin first, then size-sorted largebins.
  5. If no reusable chunk exists, split the top chunk or request memory from the kernel.

Walkthrough

Suppose thread A frees seven 32-byte objects and then allocates another 32-byte object. The first allocation after the frees pops directly from A's tcache, so it does not touch another thread, does not lock an arena, and usually returns memory that is still warm in the CPU cache.

Alternative allocators make different bets: jemalloc and tcmalloc emphasize scalable size classes and thread caches, mimalloc emphasizes locality and eager release, and scudo spends more metadata work to catch or contain heap corruption. DPDK's rte_malloc is a separate NUMA-aware hugepage allocator for packet-processing memory, not a drop-in general-purpose libc allocator.

AllocatorThread cacheCentral structureBest interview takeaway
ptmalloc2per-thread tcache plus multiple arenasfastbins, smallbins, largebins, unsorted binDefault glibc allocator; good compatibility, can fragment under mixed long-lived workloads.
jemallocthread cache plus arenassize classes and extent metadataStrong multi-threaded behavior; common in databases and browsers.
tcmallocthread cache plus central free listpage heap and spansLow allocation latency for many small objects; central lists rebalance caches.
mimallocper-thread heapssegments and pagesGood locality and eager page release; designed for predictable latency.
scudoper-thread cache with quarantinechecksummed headers and delayed reuseSecurity-hardened allocator used where corruption detection matters.

8. §17.6 stdio Buffering

stdio wraps file descriptors with a FILE object and a userspace buffer. That buffer reduces syscall count, but it also means printf, write, fork, fflush, and fsync are different layers of the I/O path.

ModeNameCommon defaultFlush behavior
_IOFBFfull bufferingregular filesFlushes when the buffer is full, on fflush, or on close.
_IOLBFline bufferinginteractive tty stdoutFlushes on newline or explicit fflush.
_IONBFunbufferedstderr by conventionEach stdio call tends to reach the underlying fd immediately.

Background

A program prints a prompt, forks, and then both parent and child exit. If the prompt was still sitting in a userspace FILE buffer when fork copied the address space, both processes can flush the same bytes.

Plan

  1. Use setvbuf when a stream needs explicit full, line, or no buffering.
  2. Call fflush before fork or before mixing stdio with raw write.
  3. Use fsync only when data must reach storage durability, not merely the kernel page cache.

Walkthrough

If stdout is redirected to a file, printf("x") usually stores one byte in the userspace buffer. After fork, parent and child both own a private copy of that byte, so two exits can produce two writes unless the program flushed before forking.

9. §17.7–17.8 NPTL pthread and TLS

NPTL implements POSIX threads as a 1:1 mapping: every pthread_t is backed by a kernel task_struct created with clone, while glibc manages stacks, thread control blocks, TLS, joins, cancellation, and futex-backed synchronization.

ObjectType / sizePurpose
pthread_topaque libc handleIdentifies a userspace thread object; not portable as a numeric kernel TID.
task_structkernel scheduler objectOne per NPTL thread; shares VM, files, fs state, and signal handlers with the thread group.
TCBthread control block near thread pointerStores pthread metadata, TLS base information, cancellation state, and per-thread errno location.
DTVdynamic thread vectorMaps ELF TLS module IDs to this thread's per-module TLS block.

Background

A thread library must create independent CPU schedulable contexts without giving each one a separate process address space. Linux solves that with clone flags, and libc layers the POSIX API and TLS model on top.

Plan

  1. Allocate a stack, guard page, TCB, and TLS area for the new thread.
  2. Call clone with shared VM, fd table, filesystem state, signal handlers, and thread-group membership.
  3. Pass the TLS pointer with CLONE_SETTLS so the CPU thread register points at the new TCB.
  4. Run a libc trampoline that calls the user's start routine and records the return value for pthread_join.

Walkthrough

When two threads read errno, they are not reading one process-global integer. On x86_64, the FS base points to the current thread's TCB, and libc's errno macro resolves to storage reached through that per-thread base.

TLS modelWhere it worksAccess shapeSpeed
General dynamicWorks for any shared object, including dlopenCalls runtime resolver through TLS descriptor or __tls_get_addrmost flexible, slowest
Local dynamicSeveral TLS symbols in the same moduleResolve module base once, then use offsetsmiddle
Initial execTLS in libraries loaded at program startLoad from fixed DTV slot via GOT offsetfast
Local execMain executable or fully static codeDirect thread-pointer relative offsetfastest

10. Minimal C Demo

These demos keep the core runtime mechanisms small enough to trace: allocator accounting after small-object churn, stdio buffer duplication across fork, and NPTL threads with both a racy shared counter and per-thread TLS addresses.

ptmalloc2 Small Allocation Accounting — C Demo
stdin (optional)
stdio Buffering Across fork — C Demo
stdin (optional)
pthread Mutex and TLS Addresses — C Demo
stdin (optional)

11. Kernel Source Pointers

AreaFiles / symbolsWhat to inspect
Thread creationkernel/fork.c, copy_process, clone3How clone flags decide which process resources are shared by the new task.
Futex mutex slow pathkernel/futex/, futex_wait, futex_wakeHow pthread mutexes sleep only after the userspace atomic fast path fails.
TLS setuparch/x86/kernel/process_64.c, arch_prctl, set_thread_areaHow user-mode FS or architecture thread-pointer state is installed.
Memory mapping for large chunksmm/mmap.c, do_mmap, do_brk_flagsKernel VM paths hit when malloc expands the heap or maps a large allocation.
File descriptor writesfs/read_write.c, vfs_writeThe syscall reached after stdio finally drains its userspace buffer.

12. Interview Prep

Why does glibc use multiple malloc arenas?

A single global heap lock would serialize allocation-heavy multi-threaded programs. Multiple arenas let different threads allocate concurrently, while tcache handles many small allocations without taking an arena lock at all.

What is the difference between tcache and fastbins?

tcache is per-thread and lock-free for recently freed small chunks. Fastbins are arena-owned singly linked lists for small chunks; they still belong to allocator arena state and are consolidated later.

Why can printf output duplicate after fork?

fork copies userspace memory, including unflushed FILE buffers. If both parent and child exit normally, both can flush the copied bytes.

Is a pthread a process or a kernel thread?

On modern Linux with NPTL, each pthread is a kernel schedulable task in the same thread group. Threads share address space and many process resources, but each has its own stack, registers, TCB, and TLS.

Why does TLS need several models?

Code linked into the main executable can use direct thread-pointer offsets, while shared libraries loaded later need runtime lookup through module IDs and the DTV. The models trade dynamic-loading flexibility for faster instruction sequences.

13. §17.9 Dynamic Linker and Loader

Dynamically linked executables do not jump straight from the kernel into main. The ELF PT_INTERP segment names ld-linux.so, and that interpreter maps dependencies, resolves relocations, prepares libc startup, and only then enters the application.

ObjectType / sizePurpose
PT_INTERPELF program header plus string pathTells the kernel which dynamic loader should receive control first.
DT_NEEDEDdynamic-section tagNames shared objects that ld.so must locate and map before the program starts.
GOTarray of pointer-sized slotsStores runtime addresses for external objects and functions after relocation.
PLTexecutable stubsRoutes imported function calls through the GOT and lazy resolver.

Background

A program calls printf, but the final address of printf inside libc is not known when the executable is linked. Position-independent shared libraries can be loaded at different addresses, so ld.so must fix up references at runtime.

Plan

  1. The static linker emits PLT stubs, GOT slots, and relocation records for imported symbols.
  2. At startup, ld.so maps dependencies listed by DT_NEEDED and applies required relocations.
  3. With lazy binding, a PLT call enters the resolver the first time a function is called.
  4. The resolver finds the winning symbol, patches the GOT slot, and returns to the function.
  5. Later calls use the patched GOT slot directly instead of repeating symbol lookup.

Walkthrough

Suppose main calls malloc for the first time. With lazy binding enabled, the call reaches malloc@plt, which jumps through a GOT slot that still points at the resolver path. ld.so searches the link map, honors interposition rules such as LD_PRELOAD, writes the chosen address into the GOT, and transfers control to the real function.

14. §17.10 Runtime Loading with dlopen

dlopen exposes part of the dynamic loader as a runtime plugin API: the host maps a shared object, asks for symbol addresses with dlsym, calls them through function pointers, and eventually releases the handle with dlclose.

Plugin ABI Shape

Real plugin systems avoid scattering many raw dlsym calls. They usually export one initialization symbol that returns a versioned table of function pointers, so the host can check ABI compatibility before it calls into untrusted extension code.

Flag / APIEffectCommon pitfall
RTLD_LAZYDefers function binding until first call.Missing symbols may fail later, far from the load site.
RTLD_NOWResolves required relocations during dlopen.Startup is stricter and sometimes slower, but failures are immediate.
RTLD_GLOBALMakes symbols visible to subsequently loaded objects.Can create surprising symbol interposition between plugins.
dlerrorReturns and clears the loader's thread-local error string.Call once to clear before dlsym, then again to test the result.

15. §17.11 ELF File Format

ELF has two views of the same file. The program header table is the loader view used by the kernel and ld.so, while the section header table is the linker and debugger view that names pieces such as .text, .data, .bss, and relocation sections.

Program headers group sections into mappable segments. For example, .text and .rodata often share a read-execute load segment, while .data and .bss live in a writable segment where the file-backed bytes are followed by zero-filled memory.

SectionFlagsPurpose
.textALLOC + EXECMachine instructions mapped into an executable segment.
.rodataALLOCRead-only constants, string literals, jump tables, and format strings.
.dataALLOC + WRITEInitialized global or static storage copied from the file image.
.bssALLOC + WRITE + NOBITSZero-initialized storage; occupies memory but not bytes in the file.
.dynsym/.dynstrALLOCDynamic symbol table and symbol names used by ld.so.
.rela.pltALLOCPLT relocation records, often R_X86_64_JUMP_SLOT for imported functions.
.got/.pltALLOC, sometimes WRITE/EXEC splitIndirection tables and stubs for external data and function references.
.init_array/.fini_arrayALLOC + WRITEConstructor and destructor function pointer arrays.

Background

When execve loads an ELF executable, it does not care about every named section. It uses program headers to map pages with the correct permissions, locate the interpreter, find dynamic linking metadata, and set the initial instruction pointer.

Plan

  1. Read the ELF header to validate magic, class, endianness, machine type, and entry point.
  2. Use program headers to map PT_LOAD segments with read, write, and execute permissions.
  3. Map the interpreter from PT_INTERP when the binary is dynamically linked.
  4. Let ld.so process PT_DYNAMIC, symbol tables, and relocation records.
  5. Use section headers later for linking, debugging, symbol inspection, and tools like readelf -S.

Walkthrough

A file with a 4 KB zero-initialized global array grows the process memory image, but it does not add 4 KB of zero bytes to the executable. The linker records that storage as .bss with SHT_NOBITS; the loader maps writable memory and guarantees the extra bytes start as zero.

16. Minimal C Demo

These snippets demonstrate the loader surface area: interposing malloc with LD_PRELOAD, loading a plugin function with dlopen, and compiling a tiny ELF that can be inspected with readelf.

LD_PRELOAD malloc Interposition — C Demo
stdin (optional)
dlopen Plugin Function Pointer — C Demo
stdin (optional)
ELF Sections for readelf — C Demo
stdin (optional)

17. Kernel Source Pointers

AreaFiles / symbolsWhat to inspect
ELF executable loadingfs/binfmt_elf.c, load_elf_binaryProgram header parsing, segment mapping, interpreter setup, and initial stack construction.
Memory mappingsmm/mmap.c, do_mmapHow executable and shared-library segments become VMAs with permissions.
Auxiliary vectorinclude/uapi/linux/auxvec.h, create_elf_tablesHow the kernel passes loader facts such as page size, program headers, random bytes, and vDSO base.
Userspace loaderglibc/elf/rtld.c, glibc/elf/dl-load.cld.so bootstrap, library search, relocation, PLT binding, and link-map maintenance.
Dynamic loading APIglibc/dlfcn/, dlopen, dlsymThe libc-facing wrapper around loader namespaces, handles, and error strings.

18. Interview Prep

What does ld.so do before main runs?

It maps needed shared libraries, applies relocations, initializes TLS and loader state, runs constructors in the required order, and enters libc startup code that eventually calls main.

Why do PLT and GOT exist?

They let position-independent code call functions whose final addresses are known only after shared libraries are mapped. PLT stubs route calls through GOT slots, and ld.so can patch those slots during relocation or lazy binding.

How does LD_PRELOAD override functions?

The loader inserts preload objects early in the symbol search order. When relocation asks for malloc, the preloaded definition can win, and that wrapper can use dlsym(RTLD_NEXT, ...) to call the next implementation.

What is the difference between RTLD_LAZY and RTLD_NOW?

RTLD_LAZY defers function symbol binding until first call. RTLD_NOW resolves required symbols during load, which makes failures earlier and more deterministic.

Why does .bss not increase file size like .data?

.data stores initialized bytes in the file. .bss records only size and alignment; the loader allocates memory for it and zeros it at load time.

19. §17.12 C Compilation Pipeline

A C build is a staged translation pipeline. The driver named gcc or cc usually orchestrates the preprocessor, compiler proper, assembler, and linker rather than doing every job itself.

Before the linker sees an object file, the assembler has already produced ELF sections, a symbol table, and relocation records. Undefined references such as printf are not bugs at this point; they are promises for the linker to resolve later.

StageInputOutputDriver flagWhat changes
Preprocessmain.cmain.i-EExpands includes and macros; no machine code exists yet.
Compilemain.i or main.cmain.s-SParses, optimizes, and emits target assembly.
Assemblemain.smain.o-cTurns assembly into a relocatable ELF object with symbols and relocations.
Link.o, .a, .soexecutable or .so-oResolves symbols, lays out segments, records runtime dependencies, and writes final ELF metadata.

Background

Build failures are much easier to diagnose when you know which stage produced the artifact. A missing header is a preprocessing problem, bad C syntax is a compile problem, an unknown instruction is often assembler-facing, and an undefined external symbol is usually a link problem.

Plan

  1. Use -E to inspect macro expansion and included declarations.
  2. Use -S to inspect optimized assembly before it becomes ELF bytes.
  3. Use -c to create a relocatable object without linking.
  4. Link objects and libraries into an executable with final segment layout and dynamic metadata.

Walkthrough

Suppose main.c contains #define TWICE(x) ((x) * 2) and calls printf. The .i file contains the expanded expression, the .s file contains an assembly call, the .o file records an unresolved printf relocation, and the final executable records either linked libc code or a dynamic dependency.

20. §17.13 Static vs Dynamic Linking

Static linking copies selected object files from archives into the final ELF. A .a file is an ar archive of relocatable objects, and the linker pulls only members needed to satisfy currently unresolved symbols.

Dynamic linking records dependency names and relocation work instead of copying all library implementation code. At process start, ld-linux.so maps the named shared objects, applies relocations, and resolves calls through GOT and PLT machinery.

Shared library naming separates the developer-facing linker name from the ABI contract and the real file. The executable normally records the SONAME, not the exact patch-level filename, so compatible library updates can replace the real file.

MechanismFile shapeResolution timeWhat executable storesTrade-off
Static archive.aLink timeNeeded object files copied into the final executableSingle binary, larger image, no runtime library replacement.
Shared object.soLoad time and optionally first callExecutable records DT_NEEDED plus relocation dataSmaller binaries, library updates, ABI and deployment sensitivity.
SONAMElibfoo.so.1Runtime compatibility contractDynamic section name selected by the linkerLets libfoo.so.1.2.3 update without breaking programs built for ABI version 1.

Background

A deployment wants a single binary, but operations wants library security updates without rebuilding every application. Static and dynamic linking choose different points on that deployment and update trade-off.

Plan

  1. For static archives, scan unresolved symbols and pull matching archive members into the output.
  2. For shared objects, record DT_NEEDED, symbol references, and relocations in the executable.
  3. At runtime, let ld.so locate libraries through rpath, runpath, cache, and default directories.
  4. Use SONAME major versions to preserve ABI compatibility across real-file upgrades.

Walkthrough

If an app links against libfoo.so whose SONAME is libfoo.so.1, the executable records NEEDED libfoo.so.1. A distro can move the symlink from libfoo.so.1.2.3 to libfoo.so.1.2.4 without relinking the app, as long as ABI version 1 remains compatible.

21. §17.14 Position Independent Code and PIE

Shared libraries need -fPIC so their text can be mapped at any virtual address. Position-independent executables use -fPIE and link as PIE, giving normal programs the same address-randomization property.

External calls still use indirection. PIC code can call a PLT stub, the stub consults a GOT slot, and the dynamic loader patches that slot when the final callee address is known. With -fno-plt, compilers can emit a direct GOT-indirect call and skip the PLT stub for some calls.

ModeTypical flagUsed forAddressing shape
Non-PIC-fno-picFixed-address code or old executable modelsMay require absolute relocations that are invalid or costly in shared text.
PIC-fPICShared librariesUses RIP-relative addressing and GOT entries for global data and external symbols.
PIE-fPIE -pieASLR-friendly executablesExecutable is loaded like a shared object at a randomized base address.

Background

If a shared library contained fixed absolute addresses in its text, the loader would have to modify executable pages for each process. PIC keeps text pages shareable by moving address-specific state into relocatable data such as the GOT.

Plan

  1. Compile shared-library code with -fPIC so code references data through position-independent sequences.
  2. Keep runtime-patched addresses in writable GOT slots rather than patching executable code.
  3. Compile executables as PIE when ASLR should randomize the main program image.
  4. Inspect relocations and disassembly to see GOT-relative references and PLT calls.

Walkthrough

A shared library function that reads a global variable does not bake the final address of that variable into the instruction stream. On x86_64 it can use RIP-relative code to reach a GOT entry, and ld.so patches that data slot after choosing the library load address.

22. Minimal C Demo

These snippets create small files under /tmp and run the real compiler tools so the intermediate artifacts are visible: preprocessing, assembly, relocatable objects, archives, shared objects, SONAME metadata, and PIC relocations.

Compilation Pipeline Artifacts — C Demo
stdin (optional)
Static Archive vs Shared Object — C Demo
stdin (optional)
PIC Relocations and Shared Library Link — C Demo
stdin (optional)

23. Kernel Source Pointers

AreaFiles / symbolsWhat to inspect
ELF execution handofffs/binfmt_elf.c, load_elf_binaryKernel-visible effect of linking: program headers, interpreter mapping, stack, and auxv.
Memory protectionmm/mmap.c, mprotect_fixupHow linked segment permissions become VMAs and why writable text is undesirable.
ASLR base selectionfs/binfmt_elf.c, arch_mmap_rndWhere PIE and shared-library load addresses receive randomized bases.
Userspace linkerbinutils/ld/, gcc/collect2Archive scanning, relocation application, constructor collection, and final ELF writing.
Dynamic loaderglibc/elf/dl-reloc.c, glibc/elf/dl-lookup.cRuntime relocation, symbol lookup, GOT patching, and lazy binding.

24. Interview Prep

What are the four main stages from C source to executable?

Preprocessing expands includes and macros, compilation emits assembly, assembly writes relocatable ELF objects, and linking resolves symbols plus writes the final executable or shared object.

Why can an object file contain undefined symbols?

A relocatable .o is not the final program. It can record unresolved references and relocation entries so the linker can bind them to another object, an archive member, or a shared-library import.

What is the practical difference between static and dynamic linking?

Static linking copies library object code into the executable. Dynamic linking records dependencies and lets ld.so map shared libraries at runtime, enabling smaller binaries and library updates at the cost of runtime dependency management.

What problem does SONAME solve?

SONAME gives the runtime loader an ABI-versioned library name such as libfoo.so.1, decoupling the executable's compatibility requirement from the exact real file name installed on disk.

Why is -fPIC required for shared libraries?

A shared library may load at different addresses in different processes. PIC keeps code address-independent and shareable by using RIP-relative instructions and GOT indirection instead of fixed absolute addresses in text.

25. §17.15 Cross Compilation

Cross compilation means the compiler runs on one machine but emits binaries for another. The confusing words are build, host, and target: for normal applications, the host is the machine that will run the result; target matters mainly when the thing being built is itself a compiler.

A cross compiler is not just a different code generator. It must pair target binutils with a target sysroot, otherwise it may accidentally compile against host headers or link host libraries that cannot run on the target CPU or libc.

TermMeaningConcrete example
buildMachine that runs the build toolsYour x86_64 CI runner executing configure and gcc.
hostMachine that runs the output programAn ARM64 Linux board that will run hello-arm64.
targetMachine code emitted by a compiler being builtA cross compiler running on ARM64 but producing RISC-V firmware.
sysrootTarget filesystem view used during compile and linkHeaders, crt objects, libc, and pkg-config metadata from the target rootfs.
multilibOne compiler supporting several ABI variantsSame GCC prefix selecting 32-bit or 64-bit libraries with flags such as -m32.

Background

A common embedded workflow builds on x86_64 CI but deploys to an ARM64 board. The produced ELF must contain AArch64 instructions, use the target dynamic loader path, and link to the target libc, not the CI runner's libc.

Plan

  1. Select a triplet such as aarch64-linux-gnu or arm-linux-musleabihf.
  2. Point the compiler at a sysroot containing target headers, crt*.o, libc, and dependent libraries.
  3. Configure pkg-config with PKG_CONFIG_SYSROOT_DIR and PKG_CONFIG_LIBDIR.
  4. Verify the output with file, readelf -h, and optionally qemu-aarch64.

Walkthrough

For aarch64-linux-gnu-gcc -static hello.c -o hello-arm64, the driver invokes an AArch64 assembler and linker, searches the AArch64 sysroot for stdio.h and libc, writes an AArch64 ELF, and produces a binary that the x86_64 build host can inspect but cannot execute directly without qemu-user or real ARM64 hardware.

Sysroot and pkg-config must agree on the same target tree. If pkg-config --libs zlib returns host paths, the final link may mix architectures even when the compiler prefix is correct.

26. §17.16 Cross Toolchains

A complete Linux cross toolchain is built in layers: target binutils first, a bootstrap compiler next, kernel UAPI headers into the sysroot, libc built for that target, then a final compiler that knows how to find the completed runtime.

ToolBest fitComplexityOutput
crosstool-NGBuild one standalone cross toolchainMediumbinutils, gcc, libc, gdb, sysroot, reproducible config file.
BuildrootBuild an embedded root filesystem plus toolchainMediumToolchain, kernel, busybox/userspace packages, final rootfs images.
Yocto / OpenEmbeddedProduction distro construction from layered metadataHighSDKs, package feeds, images, BSP layers, long-term product customization.
Linaro prebuiltUse maintained ARM-focused binaries quicklyLowReady compiler tarballs or distro packages, less control over exact libc/options.

Background

Toolchain choice depends on whether you need only a compiler, a whole root filesystem, or a product distro with repeatable package metadata. The libc choice is part of that decision because glibc, musl, and uClibc expose different ABI and feature surfaces.

Plan

  1. Use crosstool-NG when the deliverable is a pinned compiler and sysroot.
  2. Use Buildroot when the compiler and a small root filesystem should be generated together.
  3. Use Yocto when product images, package feeds, BSP layers, and SDK export matter more than simplicity.
  4. Smoke-test every toolchain by compiling, inspecting ELF metadata, and running under emulator or hardware.

Walkthrough

In a glibc cross bootstrap, the first GCC cannot build normal programs because libc is not available yet. It exists to compile libc itself after kernel headers are installed. The final GCC is rebuilt after libc so its specs can find startup files, libgcc helpers, pthread support, and the target dynamic loader path.

27. Minimal C Demo

This demo tries the real aarch64-linux-gnu-gcc and qemu-aarch64 path when those tools are installed, then creates a tiny sysroot layout and shows how pkg-config is redirected for cross builds.

Cross Compiler and Sysroot Inspection — C Demo
stdin (optional)

28. Kernel Source Pointers

AreaFiles / symbolsWhat to inspect
UAPI headersinclude/uapi/, arch/*/include/uapi/The syscall, ioctl, struct, and constant definitions exported into a target sysroot.
Header installationmake headers_install, usr/include/How kernel headers are sanitized before libc and userspace consume them.
Architecture ABIarch/arm64/include/uapi/asm/, arch/x86/include/uapi/asm/Target-specific syscall numbers, signal frames, stat layout, and auxv details.
ELF loadingfs/binfmt_elf.c, load_elf_binaryWhy the target dynamic loader path and ELF machine type must match the runtime system.
Compatibility layersfs/compat_binfmt_elf.c, arch/*/kernel/sys_*.cWhere 32-bit user ABIs and multilib-style runtime compatibility meet the kernel.

29. Interview Prep

What is the difference between build, host, and target?

Build is where the build tools run. Host is where the output program runs. Target is what a compiler being built will emit; for ordinary app cross compilation, build and host are usually the important pair.

Why is a sysroot required?

It gives the cross compiler the target's headers, startup objects, libc, and libraries. Without it, the build can accidentally include or link files from the build machine.

How do you validate a cross-compiled binary?

Use file and readelf -h to verify the ELF machine and interpreter, then run it on target hardware or with qemu-user when the syscall and library surface is compatible.

When would you choose Buildroot over crosstool-NG?

Choose Buildroot when you need a root filesystem and packages in addition to the compiler. Choose crosstool-NG when the main deliverable is a standalone, reproducible cross toolchain.

Why can pkg-config break cross builds?

Host .pc files can return host include and library paths. Cross builds should set PKG_CONFIG_SYSROOT_DIR and PKG_CONFIG_LIBDIR so metadata comes from the target sysroot.

30. §17.17 Library Inspection Tools

Library inspection is the practical loop for answering "what is this binary, what will load it, which symbols does it need, and where did this address come from?" The tools look at different layers of the same artifact: ELF identity, loader metadata, link-time sections, symbol tables, disassembly, archive indexes, and debug line tables.

ToolLayerQuestion it answers
fileartifact identityIs this ELF 32/64-bit, which machine, static or dynamic, stripped or debug-rich?
ldd / LD_TRACE_LOADED_OBJECTSdynamic loader planWhich shared objects would ld.so map for this executable?
readelfELF metadataWhat do the ELF header, program headers, sections, dynamic tags, and relocations say?
objdumpcode and section bytesWhat instructions were emitted, and which relocations or sections sit near them?
nmsymbol tableWhich symbols are defined, undefined, weak, local, or exported dynamically?
strings / strip / sizepayload and footprintWhich printable constants remain, how large are sections, and what changes after stripping?
ar / ranlibstatic archivesWhich object members are inside a .a, and is its symbol index present?
c++filt / addr2linedebugging names and PCsHow do mangled C++ names and instruction addresses map back to source?

Key Data Structures

The inspection surface is not one structure; it is a stack of ELF records and tool-specific views. Program headers are the loader's contract, sections and symbols are the linker/debugger's contract, and archive indexes let the static linker pull only the object files that satisfy unresolved symbols.

RecordRead withFields that matterPurpose
ELF headerreadelf -h, fileclass, endian, machine, type, entry, header offsetsConfirms architecture, executable/shared-object kind, and table locations.
Program headersreadelf -lPT_LOAD, PT_INTERP, PT_DYNAMIC, permissionsShows what the kernel and ld.so map into memory.
Dynamic sectionreadelf -dDT_NEEDED, RUNPATH, relocation tables, init arraysExplains runtime library dependencies and relocation work.
Symbol tablesnm, readelf -sname, value, size, type, bind, visibility, section indexIdentifies defined, undefined, weak, local, and exported symbols.
Relocationsreadelf -r, objdump -roffset, relocation type, addend, referenced symbolTells where the linker or loader must patch addresses.
Debug line tablesaddr2linePC ranges mapped to source files and line numbersTurns crash addresses into source locations when debug info exists.

31. §17.18 Runtime Tracing

Runtime tracing observes the program after the loader and kernel start executing it. strace sees syscall boundaries, ltrace sees dynamically linked library calls, perf trace uses kernel tracing infrastructure, and LD_DEBUG asks ld.so to print its own search and binding decisions.

Dynamic-loader debug output is strongest before main even starts: it shows which files were searched, which relocations were processed, and which library definition won a symbol binding.

TracerWhat it catchesMechanismBest use
stracesyscall entry and exitptrace syscall stopsopen/read/write/connect/ioctl failures, errno, forked process behavior with -f.
ltracedynamic library callsPLT interposition/tracingprintf, malloc, dlopen calls when the program uses dynamically linked symbols.
perf tracesyscalls and tracepointskernel tracepoints/perf eventsLower-overhead syscall tracing on a live system, often better for production-like sampling.
LD_DEBUGdynamic loader decisionsld.so debug loggingLibrary search, relocation, symbol lookup, and binding order before or during startup.

Background

A binary works on one machine but fails on another with ENOENT, an unresolved symbol, or a missing shared object. Static inspection tells you what the binary requested; runtime tracing tells you what actually happened on this host.

Plan

  1. Start with file and readelf -h/-l/-d to verify architecture, interpreter, and dependencies.
  2. Use ldd or LD_TRACE_LOADED_OBJECTS=1 to see the loader's dependency resolution plan.
  3. Use nm, readelf -s, and c++filt when symbol names or C++ ABI names are suspect.
  4. Use strace -e for failing syscalls and LD_DEBUG=bindings,files for loader failures.

Walkthrough

Suppose ./app prints "error while loading shared libraries: libfoo.so.1". readelf -d ./app confirms a NEEDED entry for libfoo.so.1. LD_DEBUG=files ./app then shows each searched directory. If the file exists but a function is missing, switch to nm -D libfoo.so.1 and LD_DEBUG=bindings to see whether the expected symbol is exported and which object won the binding.

32. Minimal C Demo

This demo builds a tiny shared library and executable under /tmp, then runs the normal inspection commands against them. When installed in the environment, strace and ltrace are used as optional runtime tracers; otherwise the demo prints a skipped message.

ELF Inspection and Runtime Tracing — C Demo
stdin (optional)

33. Kernel Source Pointers

AreaFiles / symbolsWhat to inspect
ELF loadingfs/binfmt_elf.c, load_elf_binaryHow program headers, PT_INTERP, stack setup, and auxv become a running process.
ptrace syscall tracingkernel/ptrace.c, arch/*/kernel/ptrace.cThe stop/resume mechanism that lets strace inspect registers at syscall entry and exit.
syscall tracepointskernel/trace/trace_syscalls.c, include/trace/events/syscalls.hWhere perf-style syscall tracing gets structured syscall enter/exit events.
perf eventskernel/events/core.cHow perf trace attaches to tracepoints and streams events with lower overhead than ptrace.
Userspace loaderglibc/elf/dl-load.c, glibc/elf/dl-lookup.cLibrary search, link-map construction, symbol lookup, and the source of LD_DEBUG output.

34. Interview Prep

What is the difference between readelf and objdump?

readelf is a direct ELF metadata reader and does not rely on BFD interpretation. objdump is stronger for disassembly and mixed section/code views.

Why can ldd be misleading?

It shows the loader's dependency resolution for this host and environment, not a universal truth. Environment variables, RUNPATH, loader cache, architecture, and container root can change the answer.

How do you debug "undefined symbol" at runtime?

Check readelf -d for dependencies, nm -D or readelf -s for exported symbols, then use LD_DEBUG=symbols,bindings to see the lookup scope and selected definition.

When do you use strace instead of ltrace?

Use strace when the boundary of interest is the kernel: files, sockets, mmap, signals, permissions, and errno. Use ltrace when the question is which dynamically linked library function was called.

What does addr2line need to work well?

It needs an address in the correct binary or shared object plus DWARF debug line information. For shared libraries under ASLR, subtract the library load base before querying the file offset address.

35. §17.19 C Calling Conventions and ABI

A calling convention is the contract that lets separately compiled object files call each other: which registers carry arguments, which registers a callee must preserve, where return values live, how the stack is aligned, and where extra arguments spill.

ABIInteger / pointer argsFloat argsReturnCaller-savedCallee-savedImportant rule
SysV AMD64rdi, rsi, rdx, rcx, r8, r9xmm0..xmm7rax or rax:rdxrax, rcx, rdx, rsi, rdi, r8-r11rbx, rbp, r12-r15128-byte red zone; stack 16-byte aligned before call.
x86 cdeclall arguments on stackx87 or SSE depending ABI/compilereax or edx:eaxeax, ecx, edxebx, esi, edi, ebpCaller cleans arguments after call.
ARM AAPCSr0-r3, then stacks0-s15 or d0-d7 under hard-float ABIr0 or r0:r1r0-r3, r12, lrr4-r11Stack is 8-byte aligned at public call boundaries.

Background

The compiler can optimize inside one function freely, but a function boundary must be predictable to the linker, debugger, unwinder, assembly code, and foreign-function interfaces. That boundary is why an object built by Clang can call one built by GCC when both obey the platform ABI.

Plan

  1. Put the first arguments in the ABI-defined registers and spill the rest to stack slots.
  2. Maintain the required stack alignment before call or branch-link instructions.
  3. Let the caller assume caller-saved registers may be clobbered after the call.
  4. Make the callee save and restore callee-saved registers when it uses them.
  5. Return small scalar values in the ABI return registers and larger aggregates through hidden pointers when required.

Walkthrough

For add8(a,b,c,d,e,f,g,h) on SysV AMD64, arguments one through six arrive in rdi, rsi, rdx, rcx, r8, and r9. The seventh and eighth arguments are stack slots above the return address, which is exactly what gcc -S makes visible in assembly.

36. §17.20 C and C++ ABI

Linux C++ compilers usually follow the Itanium C++ ABI for name mangling, object layout, vtables, RTTI, exception metadata, and rules for crossing shared-library boundaries. C interop uses extern "C" because C has no C++ namespace, overload, or class encoding in symbol names.

A virtual call is not a name lookup at runtime. The generated code loads the object's vptr, indexes a fixed vtable slot known at compile time, then indirect-calls the function pointer with the object address passed as the implicit this argument.

ABI featureExampleWhy it matters
Name mangling_ZN3foo3barEi demangles to a method-like C++ nameEncodes namespace, class, overload, and argument types into linker-visible symbols.
vtable_ZTV7DerivedProvides stable slots for virtual dispatch across object files.
RTTItype_infoSupports dynamic_cast, typeid, and exception matching.
C boundaryextern "C" int plugin_init(...)Exports a stable unmangled symbol for C callers, dlsym, and plugins.

37. §17.21 errno and Reentrancy

Modern libc does not expose errno as one writable global integer. The macro expands to a dereference of a function such as __errno_location(), and that function returns the current thread's TLS-backed errno address.

APIReentrant?Async-signal-safe?Reason
strtokNoNoUses hidden process-global parse state; concurrent tokenization corrupts progress.
strtok_rYesNoCaller supplies save pointer, so each thread or parse stream owns state.
gmtimeNoNoReturns pointer to static storage that later calls overwrite.
gmtime_rYesNoCaller supplies output struct.
printfMostly thread-safe through internal locksNoMay lock stdio state and allocate; unsafe inside signal handlers.
writeYes for independent buffers/fdsYesSmall async-signal-safe primitive used by signal handlers.
errnoYesN/AMacro resolves to thread-local storage, not one global integer.

Background

Library functions that keep hidden static state are convenient in single-threaded programs and dangerous in concurrent or signal-heavy programs. Reentrant variants push that state into caller-owned storage, and async-signal-safe functions avoid locks, malloc, and complex libc internals.

Plan

  1. Use TLS for state that should be independent per thread, such as errno.
  2. Prefer *_r APIs or explicit state objects when two calls must progress independently.
  3. Inside signal handlers, restrict work to async-signal-safe calls such as write and _exit.

38. §17.22 C11 Atomics and Memory Model

C11 atomics separate atomicity from ordering. A relaxed atomic counter prevents torn updates but says almost nothing about nearby non-atomic data; release/acquire pairs are the common pattern for publishing initialized data to another thread.

Memory orderGuaranteeRelative costCommon use
memory_order_relaxedAtomicity only, no ordering of surrounding reads/writeslowestCounters, statistics, reference counts with separate synchronization.
memory_order_acquireLater reads/writes cannot move before this loadlow to mediumConsumer reads a published pointer or ready flag.
memory_order_releaseEarlier reads/writes become visible before this storelow to mediumProducer publishes initialized data.
memory_order_acq_relAcquire plus release on read-modify-writemediumCAS or fetch_add used as a handoff point.
memory_order_seq_cstAcquire/release plus one global order for seq_cst operationshighestDefault choice when correctness matters more than tuning.

Background

A producer writes a payload and sets a ready flag. If the flag is atomic but the payload writes are not ordered before the flag store, a consumer on a weakly ordered CPU can observe the flag and still read stale payload fields.

Plan

  1. Write the payload with ordinary stores while only the producer can see it.
  2. Publish readiness with atomic_store_explicit(..., memory_order_release).
  3. Poll readiness with atomic_load_explicit(..., memory_order_acquire).
  4. Read the payload only after the acquire load observes the released value.

Walkthrough

In a single-producer single-consumer handoff, the producer fills slot.text and then stores ready = 1 with release ordering. The consumer spins with acquire loads; once it sees one, the C memory model guarantees the earlier payload writes happen-before the consumer's payload reads.

39. Minimal C Demo

This program keeps the ABI and memory-model ideas concrete: add8 is suitable for compiling with cc -S to inspect register versus stack arguments, the producer-consumer pair uses release/acquire atomics, and two threads print distinct TLS-backed errno addresses.

ABI Arguments, errno TLS, and Release/Acquire Atomics — C Demo
stdin (optional)

40. Kernel Source Pointers

AreaFiles / symbolsWhat to inspect
Syscall ABI entryarch/x86/entry/entry_64.S, arch/arm64/kernel/entry.SHow register state crosses from userspace ABI into kernel entry code.
Signal frame ABIarch/x86/kernel/signal.c, arch/arm64/kernel/signal.cHow kernel-built user stacks preserve register state for signal delivery and return.
TLS basearch/x86/kernel/process_64.c, arch/arm64/kernel/process.cHow thread-pointer registers are installed or saved across context switches.
Futex atomics boundarykernel/futex/, futex_wait, futex_wakeWhere userspace atomics hand off to kernel blocking primitives after a fast path fails.
Compiler ABI referencesgcc/config/*, llvm/lib/Target/*Userspace compiler backends that implement register assignment, stack alignment, and C++ ABI lowering.

41. Interview Prep

What happens to the seventh integer argument on SysV AMD64?

The first six integer or pointer arguments use registers. The seventh and later arguments are passed in stack slots, while the return address from call also lives on the stack.

Why does C++ need name mangling?

The linker sees flat symbol names, but C++ has namespaces, classes, overloads, templates, and argument types. Mangling encodes that information into unique linker symbols; extern "C" disables it for C ABI boundaries.

Why is errno thread-safe?

It is a macro that resolves to the current thread's TLS storage, commonly through __errno_location(). Two threads can set errno independently because they write different addresses.

What is the difference between reentrant and async-signal-safe?

Reentrant code can be safely called concurrently when each caller owns its state. Async-signal-safe code can run from a signal handler even if the signal interrupted libc while it held locks or was mutating internal state.

When is relaxed atomic ordering enough?

It is enough when only atomicity matters, such as a diagnostic counter whose exact ordering relative to other memory is irrelevant. For publishing data between threads, use release on the producer side and acquire on the consumer side.

42. §17.23 glibc Internals

glibc is not only a collection of exported C functions. Its runtime loader, private symbol namespace, symbol-version database, IFUNC resolvers, NSS modules, malloc state, pthread state, and startup code cooperate before main ever runs.

Internal mechanismShapeWhy it exists
ld.so self relocationearly dynamic loader stateThe interpreter must fix its own relocations before it can safely relocate anybody else.
link_maploader list of mapped ELF objectsRecords load addresses, dynamic sections, scopes, and dependency order for symbol lookup.
hidden_defglibc internal visibility macroKeeps private implementation names from becoming interposable ABI promises.
versioned_symbolsymbol plus GLIBC_x.y export contractLets old binaries bind old semantics while new binaries use the current default version.
IFUNCindirect function relocationRuns a resolver once so hot functions such as memcpy can select CPU-specific implementations.

IFUNC is glibc's hot-path dispatch trick. A symbol can be a resolver instead of a direct implementation; during relocation the resolver inspects hardware capabilities and returns the address of the best implementation for this CPU.

Symbol versioning is glibc's long-term ABI contract. The dynamic symbol is not just realpath or printf; it also carries a version tag, so old binaries can keep binding the old behavior while new links select the default version.

Background

The loader has a bootstrapping problem: it is itself a shared ELF object, but it must execute before the normal relocation machinery is available. glibc solves this with a small self-contained early path that relocates ld.so first, then uses the now-working loader to process the main program and dependencies.

Plan

  1. Kernel maps the executable and the PT_INTERP interpreter.
  2. ld.so relocates its own GOT, dynamic data, and early function pointers.
  3. It builds the link_map for the main executable and each DT_NEEDED object.
  4. It resolves normal relocations, IFUNC relocations, symbol versions, and constructor arrays.
  5. It transfers into libc startup, which initializes process state and calls main.

Walkthrough

On a Haswell-era x86_64 machine, a relocation for memcpy can point at an IFUNC resolver. During startup, the resolver sees AVX2 and ERMS CPU feature bits, picks the tuned implementation, and ld.so writes that address into the relocation target. The application pays the dispatch cost during relocation, not on every copy.

43. §17.24 Security Hardening

Linux C hardening is layered: the compiler can insert object-size checks and stack canaries, the linker can make relocation tables read-only, the loader and kernel can randomize mappings, and page tables can forbid execution from writable data pages.

RELRO focuses on the dynamic linker's writable relocation targets. Full RELRO combines -z relro with -z now, resolves PLT entries before the program starts, then makes GOT pages read-only so a later memory corruption cannot redirect imported calls through GOT overwrite.

Flag / featureHelps preventMechanismCost
-D_FORTIFY_SOURCE=2/3strcpy, memcpy, sprintf with known object sizesCompile-time diagnostics plus checked runtime wrappers when the compiler can infer destination size.Low; needs optimization for best coverage.
-fstack-protector-strongStack smashing that crosses from local buffers toward saved control dataPlaces a random canary before saved frame data and calls __stack_chk_fail when it changes.Low for most code; more prologue/epilogue work.
PIE + ASLRHard-coded code and library addressesBuilds executables as position-independent ELF so the kernel and loader can randomize load addresses.Usually low on x86_64 and aarch64.
RELROGOT overwrite after relocationMarks relocation metadata and GOT pages read-only after ld.so has finished required writes.Partial is cheap; full adds eager binding cost.
NX / DEPExecuting injected bytes from stack or heapUses page permissions so writable data pages are not executable.Near zero on modern hardware.
CFIIndirect calls to invalid targetsAdds compiler/runtime checks that restrict function-pointer and virtual-call destinations.Medium; strongest with LTO and compatible toolchain.

Background

A classic stack overflow writes beyond a local buffer toward saved control data. Without hardening, that bug can overwrite the return address; with stack protector enabled, the overwrite usually corrupts the canary first and the function aborts before returning through attacker-controlled data.

Plan

  1. Compile with bounds-aware libc wrappers using -D_FORTIFY_SOURCE=2 or 3 when supported.
  2. Add stack canaries for vulnerable frames with -fstack-protector-strong.
  3. Build PIE executables so ASLR can randomize the main program as well as shared libraries.
  4. Use full RELRO for network-facing programs when startup binding cost is acceptable.
  5. Keep NX enabled and use CFI or sanitizers where the toolchain and deployment model allow them.

Walkthrough

Compile a vulnerable function with a local char buf[8] and strcpy. A short input returns normally. A long input overwrites past the buffer; before the function returns, generated epilogue code compares the saved canary against TLS guard state and calls __stack_chk_fail if the value changed.

44. Minimal C Demo

This demo writes a tiny vulnerable program, compiles it with stack protector and fortify flags, runs one safe input, then runs an overflowing input so the abort path is visible. It also prints a few ELF facts that interviewers often ask you to verify with readelf and nm.

Stack Canary, Fortify, and ELF Hardening Checks — C Demo
stdin (optional)

45. Kernel Source Pointers

AreaFiles / symbolsWhat to inspect
ELF interpreter handofffs/binfmt_elf.c, load_elf_binaryHow the kernel maps PT_INTERP, builds auxv, and transfers control to ld.so.
ASLR policymm/mmap.c, arch_mmap_rnd, randomize_va_spaceWhere mmap bases, stack, brk, and PIE load addresses receive entropy.
Non-executable stackfs/binfmt_elf.c, PT_GNU_STACKHow ELF stack permission metadata affects user stack VM flags.
Memory permissionsmm/mprotect.c, do_mprotect_pkeyHow ld.so can mark GOT pages read-only after relocation.
Userspace piecesglibc/elf/rtld.c, glibc/elf/dl-reloc.c, glibc/sysdeps/*/multiarch/Loader bootstrap, relocation processing, and IFUNC-selected multiarch implementations.

46. Interview Prep

Why does ld.so have to relocate itself?

The kernel maps the interpreter but does not run the full dynamic-link process for it. ld.so starts in a restricted early mode, fixes its own relocation state, then uses that working machinery to relocate the main executable and its libraries.

What problem does IFUNC solve?

It lets one exported symbol choose a CPU-specific implementation at load time. Hot functions such as memcpy can use AVX2, ERMS, or a baseline path without checking CPU features on every call.

What does printf@@GLIBC_2.2.5 mean?

The double at-sign marks the default symbol version used by new links. A single at-sign names a non-default older version retained so old binaries keep their original ABI contract.

What is full RELRO?

Full RELRO uses -z relro -z now: ld.so resolves PLT entries eagerly, then marks GOT-related pages read-only. That blocks later GOT overwrite attacks at the cost of more startup binding work.

What does a stack canary actually protect?

It detects many contiguous overwrites from local buffers toward saved frame data before a function returns. It does not prevent all memory corruption, heap bugs, arbitrary writes, information leaks, or logic bugs.

47. §17.25 System Libraries

Many names that used to feel like separate Unix libraries are now part of the libc surface on modern glibc, but the link names still matter for portability, older distributions, static links, and interview debugging. Know which API family owns the concept before chasing missing symbols.

LibraryImportant APIsWhat to remember
libpthreadpthread_create, mutexes, condition variables, rwlocksThreading API; since glibc 2.34, symbols are provided by libc for compatibility.
librtclock_nanosleep, timer_create, shm_open, mq_openPOSIX realtime extensions; also folded into libc on modern glibc.
libdldlopen, dlsym, dlerror, dladdrRuntime loader API for plugins and symbol inspection; modern glibc keeps compatibility stubs.
libmsin, cos, exp, pow, sqrt, fmaMath library; still commonly linked explicitly with -lm because link order matters for static and older builds.
libresolvres_init, res_query, res_searchLower-level DNS resolver controls beneath getaddrinfo and NSS.
libcrypt / libxcryptcrypt, crypt_rPassword hash compatibility layer for traditional Unix password formats.

Background

A build that works on a current Linux laptop can fail on an older production image because glibc moved symbols over time. Before glibc 2.34, pthread, realtime, and dl APIs often required explicit libraries; after 2.34 many of those symbols live directly in libc, while compatibility linker names remain.

Plan

  1. Identify the API family: threads, realtime timers, dynamic loading, math, resolver, or password hashing.
  2. Check whether the target libc version folds that family into libc or still needs an explicit library.
  3. Put dependent libraries after the object files that reference them when using traditional linkers.
  4. Use pkg-config, CMake imported targets, or build-system probes instead of hardcoding assumptions for portable projects.

Walkthrough

Suppose a small daemon uses clock_gettime, pthread_create, dlopen, and sqrt. On new glibc, the first three may link through libc alone, but sqrt still commonly needs -lm. On an older glibc build host, the portable command may include -pthread -ldl -lrt -lm, with -pthread preferred over raw -lpthread because it also sets compiler flags.

48. §17.26 Crypto / Network Libraries

Network-facing C programs usually compose several specialized libraries: libcurl owns protocol policy, c-ares can make DNS nonblocking, OpenSSL owns TLS and cryptographic primitives, and libpcap or netlink libraries expose lower-level packet and kernel networking surfaces.

Packet capture follows a different path from socket I/O. libpcap compiles a BPF predicate, asks the kernel capture path to filter frames early, then returns packet metadata and captured bytes to the userspace sniffer.

LibrarySurfaceTypical job
OpenSSLlibssl + libcryptoTLS protocol state machine, EVP crypto API, certificates, X.509 verification, provider-backed algorithms.
libcurleasy + multi interfaceProtocol client for HTTP, FTP, SMTP, and more; integrates TLS, DNS, proxy, redirect, and connection reuse policy.
c-aresasync DNS resolverNonblocking DNS queries for event-loop programs that cannot afford blocking getaddrinfo calls.
libpcappcap_compile, pcap_next_exPortable packet capture wrapper around BPF filters, AF_PACKET, capture files, and platform-specific capture backends.
libnl / libnl-3rtnetlink, generic netlink, nl80211Structured userspace access to kernel networking configuration APIs.
libnftnl / libmnlnftables netlink messagesLow-level netlink message builders/parsers used by nftables tooling.

Background

An HTTPS client looks simple at the call site, but the library stack hides several failure domains: DNS lookup, socket connect, TLS negotiation, certificate validation, HTTP framing, proxy behavior, retries, and timeout accounting. Good C services keep those responsibilities explicit.

Plan

  1. Use libcurl when the task is an application protocol client rather than a custom socket protocol.
  2. Use OpenSSL's EVP interface for cryptography; avoid direct low-level AES/RSA calls in new code.
  3. Use c-ares or the libcurl multi interface when DNS must integrate with an event loop.
  4. Use libpcap for passive packet capture, and libnl/libmnl/libnftnl for configuring kernel networking state.

Walkthrough

For https://api.example.com/v1, libcurl asks DNS for addresses, opens a nonblocking TCP socket, hands the connected file descriptor to OpenSSL, verifies the certificate chain against trust anchors, then sends HTTP bytes through TLS records. If capture is needed, libpcap observes packets from the side; it is not part of the client's send/receive path.

49. Minimal C Demo

The first demo shows the practical link behavior around clock_gettime, -lrt, and -lm. The second writes a minimal libpcap sniffer that prints the first 32 bytes of captured packets when libpcap and capture permissions are available.

System Library Link Checks — C Demo
stdin (optional)
Minimal libpcap Sniffer — C Demo
stdin (optional)

50. Kernel Source Pointers

AreaFiles / symbolsWhat to inspect
Realtime clockskernel/time/posix-timers.c, kernel/time/hrtimer.cHow POSIX timers and high-resolution timer queues back libc and librt timer APIs.
POSIX shared memorymm/shmem.c, fs/namei.cHow shm_open maps onto tmpfs-backed files under the hood.
Packet capturenet/packet/af_packet.c, sk_filterWhere AF_PACKET sockets tap frames and run classic/eBPF packet filters.
Netlinknet/netlink/af_netlink.c, net/core/rtnetlink.cHow userspace libraries such as libnl speak routing and generic netlink to kernel subsystems.
Userspace piecesglibc/nptl/, glibc/resolv/, openssl/ssl/, libpcap/pcap-linux.cThread wrappers, resolver internals, TLS state machines, and Linux packet capture glue.

51. Interview Prep

Why does -pthread differ from -lpthread?

-lpthread only asks the linker for a library. -pthread is a compiler-driver option that can also define thread-related macros and choose the correct startup/link flags for the platform.

Why did old code use -lrt for clock_gettime?

Older glibc exposed some POSIX realtime APIs through librt. Since glibc 2.34, those symbols are folded into libc, but retaining -lrt is often harmless and keeps old build recipes understandable.

Why prefer OpenSSL EVP APIs?

EVP gives algorithm-independent contexts, provider support, padding/mode handling, and future agility. Direct low-level crypto calls hardcode algorithms and are easier to misuse.

What does libpcap add over a raw socket?

It provides a portable capture API, filter compilation, capture-file support, timestamp/header metadata, and platform-specific backend handling. On Linux it commonly wraps AF_PACKET plus BPF filtering.

When would you use libnl instead of shelling out to ip?

Use libnl when a program must configure routes, links, addresses, wireless state, or generic netlink families directly and needs structured errors, batching, subscriptions, or daemon-grade control.

52. §17.27 Event Loops & Async I/O Libraries

Event-loop libraries turn many blocking-looking resources into one explicit readiness machine: register file descriptors, wait for the kernel to report progress, dispatch short callbacks, then rearm or remove the interest. The design keeps one thread useful while thousands of sockets mostly wait.

io_uring changes the shape from readiness to submission and completion. Userspace fills SQEs in a shared submission ring, the kernel posts CQEs after work finishes, and liburing hides most memory-barrier and ring-index details.

LibraryBackendsSurfaceBest fit
libevepoll, kqueue, poll, selectI/O, timers, signal watchersSmall event core for C programs that want minimal policy and very low overhead.
libeventepoll, kqueue, poll, selectEvents, bufferevent, DNS, HTTP helpersLarger batteries-included loop used by daemons and network services.
libuvepoll, kqueue, IOCP, event portsTCP/UDP, fs work queue, timers, processesCross-platform runtime layer used by Node.js and embeddable async tools.
libaioLinux kernel AIOO_DIRECT file I/OOlder async disk interface; narrow scope and weaker socket integration.
liburingio_uring SQ/CQ ringsFiles, sockets, accept, timeout, splice-like operationsModern Linux async interface with shared rings and batching.

Background

A proxy with 50,000 mostly idle TCP connections cannot afford one blocking thread per connection. The kernel already knows which descriptors are ready, so an event loop asks for readiness notifications and runs only the code that can make progress.

Plan

  1. Represent each connection as state plus callbacks, not as a long blocking call stack.
  2. Register read, write, timer, and signal interests with the loop backend.
  3. Keep callbacks short; move CPU-heavy work to worker threads or a separate queue.
  4. Use liburing when completion-style asynchronous kernel operations matter more than portable readiness APIs.

Walkthrough

Suppose connection 42 needs to read a request, connect upstream, and write a response. The loop first registers EPOLLIN. When bytes arrive, the read callback parses headers and registers a writable upstream socket. Later a write-ready event flushes pending bytes. At no point does the loop wait inside that connection; it returns to the kernel after each small state transition.

53. §17.28 DPDK / Kernel-Bypass Libraries

DPDK is a userspace packet-processing stack, but its library split matters for ordinary C systems work: EAL owns process and hardware setup, mbuf/mempool own packet memory, rings connect lcores, and PMDs poll NIC queues without the kernel socket path.

liburcu brings the same read-mostly idea to userspace services. Readers mark a tiny critical section and dereference a protected pointer; writers publish a replacement and wait for a grace period before freeing the old object.

LibraryRoleHot-path lesson
librte_ealEnvironment Abstraction LayerDiscovers hugepages, PCI devices, lcores, memory zones, and launches worker lcores.
librte_mempoolFixed-size object allocatorPreallocates packet objects so RX/TX does not call general malloc in the hot path.
librte_mbufPacket buffer metadataCarries packet length, data offset, offload flags, refcount, and segment chaining state.
librte_ringLock-free queueMoves packets or work items between lcores with single/multi producer-consumer variants.
librte_hash / librte_lpmClassifier tablesCuckoo hash for exact flow lookup and DIR-24-8 LPM for routing prefixes.
libnuma / libhugetlbfsMemory placement helpersKeep packet memory near the NIC and CPU socket, often backed by huge pages.
liburcuUserspace RCULets readers run without locks while writers wait for a grace period before freeing old data.

Background

A 100 Gbps packet path has no budget for per-packet malloc, syscalls, interrupts, or remote-NUMA memory misses. Kernel-bypass libraries move those costs to initialization and keep the hot loop to polling descriptors and touching preallocated cache-hot objects.

Plan

  1. Reserve hugepage memory and bind lcores close to the NIC NUMA node during startup.
  2. Allocate packet buffers from mempools, not from the general-purpose heap.
  3. Move packets between stages with rings and keep ownership transfer explicit.
  4. Use RCU-style publication for read-mostly tables such as routing or policy snapshots.

Walkthrough

On node 0, EAL reserves hugepages and PMD queue 0 receives packets into mbufs from a node-local mempool. Lcore 2 classifies each mbuf with rte_hash, enqueues work to an rte_ring, and another lcore transmits. If a routing table changes, the writer publishes a new table pointer and frees the old table only after all current RCU readers exit.

54. §17.29 eBPF Userspace Libraries

eBPF userspace libraries are loaders and control planes. They parse BPF ELF objects, create maps, ask the kernel verifier to load programs, attach links to hooks, and then read events or update maps while the actual program runs inside kernel hook points.

AF_XDP uses an XDP program to redirect packets into rings backed by UMEM. libxdp provides setup helpers and dispatcher support so programs can share XDP hooks instead of overwriting each other.

LibrarySurfaceTypical use
libbpfBPF object loaderParses ELF/BTF, creates maps, applies CO-RE relocations, loads verifier-checked programs, and attaches links.
libxdpXDP and AF_XDP helpersManages xdp-loader flows, multi-program dispatch, UMEM rings, and AF_XDP socket setup helpers.
libelfELF parsingReads sections, symbols, relocations, and string tables needed by loaders and inspection tools.
libdebuginfodDebug-info retrievalFetches build-id keyed DWARF and source metadata for tools such as perf and gdb.
BCCTracing toolkitPython/Lua front end that compiles tracing snippets and drives BPF programs for ad-hoc observability.

Background

A BPF program is not just a blob of bytecode. It has maps, relocations, type metadata, attach points, and permissions. libbpf packages those details so production loaders can be deterministic instead of a pile of raw bpf() syscalls.

Plan

  1. Compile the kernel-side BPF object with BTF information for CO-RE relocations.
  2. Open the ELF with libbpf, let it create maps, and load programs through the verifier.
  3. Attach with persistent link objects where possible so lifecycle is explicit.
  4. Use ring buffers, perf buffers, or maps for communication back to userspace.

Walkthrough

A kprobe tracer starts with trace.bpf.o. libbpf opens the ELF, applies CO-RE field relocations against the running kernel BTF, creates a map for events, loads the program through the verifier, attaches it to the chosen kprobe, and returns a link handle. The userspace process then polls a ring buffer while the kernel invokes the BPF program on each matching event.

55. Minimal C Demo

These demos keep the core ideas runnable without requiring the full libraries in the sandbox: a tiny callback loop, a libnuma allocation build probe, and the minimal libbpf loader sequence a real tracing tool follows.

Event Loop Callback Dispatch — C Demo
stdin (optional)
NUMA Allocation with libnuma — C Demo
stdin (optional)
libbpf Loader Sketch — C Demo
stdin (optional)

56. Kernel Source Pointers

AreaFiles / symbolsWhat to inspect
Readiness I/Ofs/eventpoll.c, ep_poll, ep_send_eventsHow epoll stores interests and returns ready file descriptors to event-loop libraries.
io_uringio_uring/io_uring.c, io_submit_sqes, io_cqring_event_overflowHow SQEs become kernel requests and CQEs are posted back to shared completion rings.
Huge pages and NUMAmm/hugetlb.c, mm/mempolicy.c, do_mbindHow userspace libraries request hugepage-backed memory and node-local allocation policy.
BPF load pathkernel/bpf/syscall.c, kernel/bpf/verifier.c, BPF_PROG_LOADWhere libbpf's syscalls create maps, load programs, and pass bytecode through the verifier.
XDP and AF_XDPnet/core/dev.c, net/xdp/xsk.c, bpf_redirect_mapHow XDP programs redirect packets into AF_XDP sockets and shared UMEM rings.

57. Interview Prep

How do libev, libevent, and libuv differ?

libev is a small event core, libevent adds higher-level helpers such as bufferevents and HTTP/DNS pieces, and libuv is a cross-platform runtime abstraction with filesystem work queues, process APIs, and Windows IOCP support.

What is the key difference between epoll and io_uring?

epoll reports readiness: the fd can make progress if you call read or write. io_uring submits operations and reports completion: the kernel tells you the requested operation finished with a result.

Why does DPDK care so much about NUMA and huge pages?

Packet rates are high enough that remote socket memory and TLB misses become visible throughput limits. Huge pages reduce TLB pressure, and NUMA placement keeps NIC DMA, CPU polling, and packet buffers close together.

What does libbpf do before a BPF program runs?

It opens the BPF ELF, parses BTF and maps, applies CO-RE relocations, creates maps, loads bytecode through the verifier with bpf(), and attaches the resulting program to a hook.

Why use liburcu instead of a rwlock for read-mostly data?

RCU readers avoid contended locks and usually execute only lightweight bookkeeping. Writers pay the heavier cost by publishing a replacement and waiting for old readers before freeing retired objects.

58. §17.30 Compression Libraries

Compression libraries trade CPU cycles for fewer bytes on disk, network, or cache. The useful question is not which algorithm is best globally; it is whether the workload is latency-bound, storage-bound, decompression-heavy, dictionary-friendly, or locked to a format such as gzip or PNG.

On the same corpus, LZ4 and Snappy optimize throughput, zlib optimizes compatibility, zstd gives the broadest tunable speed/ratio envelope, and xz or bzip2 are usually archive choices rather than hot-path service choices.

LibraryFormat / algorithmSpeedRatioTypical fit
zlibDEFLATEmediummediumgzip streams, PNG, HTTP content-encoding, broadly portable baseline.
libbz2BWT + HuffmanslowhighArchive workloads where decompression speed is less important than smaller files.
liblzmaLZMA2 / xzslowesthighestRelease tarballs and firmware images where maximum ratio matters.
lz4LZ4fastestmodestHot paths such as log shipping, btrfs, zswap, and RPC payloads with latency budgets.
libzstdZstandardfasthighModern default for services: tunable levels, dictionaries, streaming, strong ratio/speed balance.
snappy-cSnappyvery fastmodestSimple block compression for storage systems that prioritize CPU time.

Background

A logging pipeline writes repeated symbols, timestamps, and field names. Compressing each batch before disk or network transfer can save I/O, but a slow codec can steal CPU from request handling.

Plan

  1. Pick the wire or file format first when compatibility is fixed.
  2. Benchmark representative data, not random bytes, because compressors depend on repetition.
  3. Measure compression and decompression separately; many systems decompress far more often than they compress.
  4. Use streaming APIs for large files and bounded in-memory APIs for request-sized payloads.

Walkthrough

Suppose a service batches 64 KB of JSON logs. LZ4 may shrink it to roughly 25-40 KB with very low latency, zstd level 3 may shrink it further while staying service-friendly, and xz may win ratio but lose the request budget. For an online path, the best answer is usually the fastest codec that hits the network or storage target.

59. §17.31 Serialization Libraries

Serialization libraries define the contract between processes, files, and languages. Schema-based formats move type decisions to build time and make compatibility rules explicit; schemaless formats are easier to inspect but push validation and evolution rules into application code.

LibraryModelPractical notes
protobuf-cSchema-based binarySmall typed messages, explicit field tags, generated C structs and pack/unpack helpers.
msgpack-cBinary object treeJSON-like model without text overhead; useful for dynamic maps and RPC frames.
libcjson / janssonJSON DOMHuman-readable config and APIs; larger payloads and runtime type checks.
libxml2XML SAX and DOMStandards-heavy documents, schemas, streaming parse, XPath, and legacy protocols.
libyamlYAML parser/emitterHuman-edited config with comments and anchors; avoid on untrusted complex input without limits.

Background

An RPC payload must survive version skew: new clients, old servers, missing fields, and unknown fields. The format must make those cases boring instead of turning deploy order into a correctness condition.

Plan

  1. Use field numbers or stable names as the compatibility contract.
  2. Reserve removed fields so future versions do not accidentally reuse old meaning.
  3. Keep unknown fields harmless and optional fields truly optional.
  4. Use text formats for human-authored config and binary formats for hot RPC or storage records.

Walkthrough

A Trade message starts with id and symbol. Version 2 adds price as field 3. Old decoders ignore field 3, new decoders treat it as optional when reading old messages, and the field number is never reused for another meaning.

60. §17.32 Data Structure Libraries

C does not ship a standard hash table, vector, queue, or tree library, so production programs either build narrow local containers or adopt a utility library. GLib is the most common broad option; specialized libraries cover balanced trees, concurrent hashes, and compressed sparse arrays.

LibraryStructuresWhere it fits
GLibGArray, GHashTable, GTree, GAsyncQueue, GMainLoopProduction C apps that want containers, refcounted objects, event loops, and portable utilities.
libavlAVL, red-black, threaded treesTeaching and embedded-style balanced tree code where dependency size matters.
libcuckooConcurrent cuckoo hash tableHigh-concurrency exact lookup with relocation-based cuckoo hashing.
Judy arraysCompressed sparse arraysLarge sparse integer or string-keyed sets where memory density matters.

Background

A daemon needs a map from interface name to runtime state. A hand-rolled table is easy to start but easy to get wrong on resizing, ownership, iteration during deletion, and cleanup after partial initialization.

Plan

  1. Decide key and value ownership up front; destroy callbacks should match that decision.
  2. Use a stable hash and equality function for the exact key type.
  3. Keep mutation rules clear while iterating; remove through iterator-aware APIs where available.
  4. Reach for specialized concurrent or sparse structures only when profiling shows the generic container is the bottleneck.

Walkthrough

The daemon inserts eth0 => up and eth1 => down into a GHashTable created with g_str_hash, g_str_equal, and g_free destroy callbacks. Replacing eth1 frees the old value automatically, so the container encodes both lookup policy and memory ownership.

61. Minimal C Demo

These demos write small standalone programs that use the real library APIs when the matching development packages are installed, while still showing the complete compile and run commands in the sandbox output.

zstd In-Memory Roundtrip — C Demo
stdin (optional)
protobuf-c Encode / Decode — C Demo
stdin (optional)
GLib GHashTable Walkthrough — C Demo
stdin (optional)

62. Kernel Source Pointers

AreaFiles / symbolsWhat to inspect
Kernel decompressionlib/decompress_*.c, lib/zlib_*, lib/lz4/, lib/zstd/How the kernel carries small decompressor implementations for boot images, filesystems, and compressed memory paths.
zswap and compressed memorymm/zswap.c, crypto/, zpoolHow compressed pages trade CPU for fewer swap writes and less memory pressure.
Netlink serializationinclude/net/netlink.h, lib/nlattr.c, nla_parseKernel-native TLV-style message parsing that mirrors many userspace serialization concerns.
Generic mapslib/hashtable.c, include/linux/hashtable.h, lib/rhashtable.cHow kernel hash tables handle bucket arrays, linked nodes, resizing, and RCU-friendly lookup.
Trees and sparse arrayslib/rbtree.c, lib/xarray.c, include/linux/xarray.hKernel equivalents for balanced trees and sparse integer-indexed storage.

63. Interview Prep

When would you choose LZ4 over zstd?

Choose LZ4 when latency and CPU time dominate and a modest ratio is enough. Choose zstd when you can spend more CPU for a better ratio or need dictionaries and tunable compression levels.

Why is gzip still everywhere if newer codecs are better?

DEFLATE compatibility is universal across HTTP, archives, package formats, and old tooling. In many systems, deployment compatibility matters more than peak compression ratio.

What is the protobuf compatibility rule interviewers expect?

Field numbers are the wire contract. Do not change a field number's meaning, do not reuse removed numbers, add new fields as optional or backward-compatible, and make old readers safely ignore unknown fields.

Why use GLib containers instead of plain arrays and structs?

GLib gives tested resize, lookup, iteration, reference, queue, and cleanup behavior. It is valuable when ownership and error-path cleanup matter more than removing one dependency.

How is a userspace hash table similar to kernel hash tables?

Both start with a hash function, bucket selection, collision handling, and ownership rules. The kernel adds constraints such as RCU-safe traversal, lock granularity, allocation context, and no blocking in atomic paths.

64. §17.33–17.34 Sanitizers, Build Systems, and pkg-config

Sanitizers move memory, thread, and undefined-behavior bugs from production crashes into reproducible test failures by adding compiler checks and runtime shadow state. Build systems decide how those flags, dependency headers, link libraries, and generated backend files reach every translation unit.

65. Key Data Structures

Sanitizers are easiest to compare by bug class, runtime cost, and whether the program must be rebuilt. Use ASan/UBSan in routine CI, add TSan for focused concurrency tests, and keep valgrind useful for binaries you cannot rebuild.

ToolCatchesCostRecompile?Practical use
ASanheap/stack/global out-of-bounds, use-after-free, leaks with LSAN~2x CPU, ~2-3x memoryYesBest first pass for memory safety bugs in tests and fuzzers.
MSanuse of uninitialized memory~3x CPU, high memoryYes; Clang and fully instrumented depsPowerful but requires clean instrumented dependency trees.
TSandata races and lock-order issues~5-15x CPU, high memoryYesUse in focused concurrency tests; false positives can come from custom atomics.
UBSansigned overflow, bad shifts, null misuse, invalid castslow to mediumYesGood always-on CI target, often combined with ASan.
valgrind memcheckinvalid reads/writes, leaks, uninitialized reads~20x CPUNoUseful for third-party binaries or when rebuilding is hard.
helgrind / drdthread races and mutex misusehighNoValgrind race detectors for pthread-heavy programs.
cachegrind / massifcache behavior and heap growthhighNoProfiling tools, not correctness sanitizers.
clang-tidy / cppcheck / scan-buildsuspicious code patterns before runtimebuild-time onlyNo runtimeStatic analysis finds paths tests may not execute.

Build tools form a pipeline: a project description becomes concrete compile and link commands, often through a fast backend such as Ninja. Package discovery fills in the non-portable details: include paths, library names, library directories, and sometimes feature versions.

ToolConfig modelBackendSpeed profileWhere it fits
MakeMakefile rules and variablesmakefast incremental, no configure unless writtenSmall C projects, kernel-style trees, explicit dependency control.
CMakeCMakeLists.txt meta buildMake, Ninja, IDE generatorsmedium configure, fast with NinjaPortable libraries/apps that need package discovery and IDE support.
Mesonmeson.build declarative DSLNinjafast configure and buildsModern C/C++ projects that value simple dependency declarations.
autotoolsconfigure.ac, Makefile.am, libtoolMakefilesslow configure, portable outputGNU and legacy Unix packages with many platform probes.
Ninjagenerated build.ninjaexecutor onlyvery fastBackend used by CMake and Meson; usually not hand-written.
pkg-config.pc metadatacompiler/linker flagsquery-time onlyLibrary discovery: --cflags, --libs, --modversion, PKG_CONFIG_PATH.

A .pc file is a small build metadata record. pkg-config --cflags --libs openssl turns that metadata into the exact compiler and linker flags needed on this machine or sysroot.

66. Core Mechanism

Background

A test suite passes until one path reads a pointer after free. Without instrumentation the program may print an old value, corrupt a later allocation, or crash far away from the real bug.

Plan

  1. Compile with -fsanitize=address -g -fno-omit-frame-pointer so every relevant load and store is checked.
  2. Let ASan wrap allocations with redzones and poison freed memory in shadow state.
  3. Run the normal test; the stale load consults shadow memory before reading the actual heap byte.
  4. Abort immediately with allocation, free, and failing access stack traces.

Walkthrough

A heap allocation returns p, the program writes 123, and then free(p) poisons that address range. When *p is evaluated later, the compiler-inserted check sees poisoned shadow bytes and stops at the exact use-after-free instead of letting execution continue.

67. Minimal C Demo

The first demo creates and runs a deliberate heap use-after-free under AddressSanitizer. The second writes a tiny CMake project that discovers OpenSSL with find_package and also prints the raw pkg-config flags when available.

AddressSanitizer Heap UAF — C Demo
stdin (optional)
CMake and pkg-config Discovery — C Demo
stdin (optional)

68. Kernel Source Pointers

AreaFiles / symbolsWhat to inspect
Kernel sanitizerslib/asan/, mm/kasan/, CONFIG_KASANKASAN applies the same shadow-memory idea to kernel heap, stack, and globals.
Race detectionkernel/kcsan/, CONFIG_KCSANKCSAN samples memory accesses to find data races in kernel code.
Undefined behaviorlib/ubsan.c, CONFIG_UBSANRuntime checks for undefined operations such as overflow and invalid shifts.
Kernel buildMakefile, scripts/Kbuild.include, scripts/Makefile.*The Linux kernel uses Kbuild on top of GNU Make rather than CMake or Meson.
Tool checksscripts/Makefile.compiler, scripts/cc-version.shHow kernel builds probe compiler features and version-specific flags.

69. Interview Prep

What is the difference between ASan and valgrind memcheck?

ASan needs recompilation and uses compiler-inserted checks plus a runtime, so it is much faster and better for CI. Valgrind runs unmodified binaries through dynamic instrumentation, so it is slower but useful when source or rebuilds are unavailable.

Why does TSan complain about code that uses custom atomics?

TSan understands compiler-recognized atomic operations and pthread synchronization. If a project hides synchronization in inline assembly or unsupported primitives, the tool may miss the happens-before edge and report a race.

What does pkg-config actually return?

It reads .pc files and prints compiler flags from Cflags and linker flags from Libs. PKG_CONFIG_PATH adds search directories, which matters for custom prefixes and cross sysroots.

CMake versus Make: what is the real distinction?

Make is a rule executor driven by Makefiles. CMake is a meta-build system: it configures compilers, finds packages, and generates Makefiles, Ninja files, or IDE projects.

Which sanitizer would you enable first in a C service CI job?

Start with ASan plus UBSan on unit and integration tests because they catch common memory safety and UB bugs at tolerable cost. Add TSan as a separate, slower job for concurrency-heavy paths.