Part I — Boot

§ 1.2 Kernel Boot Process

From the first C instruction in start_kernel() to userspace PID 1: early memory, CPU init, allocators, VFS, scheduler, and network stack bootstrap.

1. Overview

After the bootloader decompresses the kernel and jumps to startup_64 (x86) or _text (ARM64), a few dozen assembly instructions enable paging and branch to start_kernel() — the first C function ever called. From here the kernel must bootstrap everything from scratch: memory allocators, interrupt handling, scheduling, filesystems, and the network stack, all in a carefully ordered sequence where each subsystem may only use services already initialized before it.

The entire sequence runs on a single CPU with interrupts disabled until local_irq_enable() is explicitly called. When rest_init() is reached, the kernel spawns PID 1 (kernel_init) and PID 2 (kthreadd), then becomes the idle thread (PID 0).

2. Key Data Structures

struct boot_params — x86 Boot Protocol

On x86, the bootloader fills a 4 KB boot_params structure at physical address 0x90000 and passes its address in register rsi. setup_arch() reads it in the very first lines of kernel C code. The most important sub-structure is the e820_table — the physical memory map.

struct e820_entry — Physical Memory Map

Each entry describes one contiguous physical region. The BIOS/UEFI firmware fills these via INT 0x15, AX=0xe820. The kernel uses this map to know which RAM is free, which is reserved (MMIO, ACPI tables), and which is unusable. It is the foundation for memblock.

Type valueMeaningKernel action
1 — E820_RAMUsable system RAMAdd to memblock; later given to buddy allocator
2 — E820_RESERVEDReserved by firmware / MMIONever touched; ioremap() maps these for drivers
3 — E820_ACPIACPI tables — readable RAMFree after ACPI init; reclaimed as normal RAM
4 — E820_NVSACPI NVS — firmware needs it across S3Never freed; survives suspend/resume cycle
5 — E820_UNUSABLEBroken / bad RAMExcluded from all allocators permanently

memblock — The Boot-Time Allocator

Before the buddy allocator exists there must be a simpler allocator to manage memory. memblock is that allocator: it keeps two sorted arrays of physical ranges — memory[] (available) and reserved[] (in use). Allocation is a simple range split; there is no free — everything is freed at once when mem_init() hands the remaining ranges to the buddy allocator.

Field (in struct memblock)TypePurpose
bottom_upboolAllocate from low addresses (KASLR needs top-down)
current_limitphys_addr_tUpper bound for allocations (lowmem cap)
memorystruct memblock_typeArray of available physical ranges from e820
reservedstruct memblock_typeArray of ranges already allocated (kernel, initrd…)
memory.regions[]struct memblock_regionEach region: base, size, flags, NUMA node id

3. Core Mechanism — From memblock to Buddy

Background: The kernel needs to allocate memory for its own data structures before the buddy allocator exists (page tables, IRQ descriptors, CPU stacks…). But it also needs to give the buddy allocator the correct list of free physical pages once it is ready. memblock is the bridge: it answers early allocation requests, then transfers ownership to the buddy allocator in a single pass during mem_init().

Plan:

  1. setup_arch() calls e820__memblock_setup(): every E820_RAM range becomes a memblock.memory entry; the kernel image, initrd, and ACPI tables are immediately reserved.
  2. Subsystems allocate from memblock via memblock_alloc(size, align). The allocator finds the last free range that fits and splits it, adding the allocated portion to memblock.reserved.
  3. mm_init() calls free_area_init() to set up zone descriptors (DMA, Normal, HighMem on 32-bit) without touching actual pages yet.
  4. mem_init() iterates every memblock.memory range, skips reserved sub-ranges, and calls __free_pages_memory() on the gaps — this puts pages into the buddy free lists.
  5. After mem_init() returns, the buddy allocator is live. memblock data can still be read (for NUMA queries) but is no longer used for allocation.

Example — 512 MB RAM, kernel at 0x1000000:

Stepmemblock.memorymemblock.reserved
After e820__memblock_setup()[0x0–0x9FFFF] + [0x100000–0x1FFFFFFF](empty)
Kernel image reserved(unchanged)[0x1000000–0x1FFFFFF] kernel text+data+bss
initrd reserved(unchanged)+ [0x4000000–0x4200000] initramfs
memblock_alloc(4 KB) for IDT(unchanged)+ [0x1FF000–0x1FFFFF] IDT table
mem_init() transfers to buddy(read-only from now on)gaps between reserved ranges → buddy free lists

After mem_init(), the buddy allocator owns approximately 480 MB of free pages split across order-0 through order-10 free lists. The kernel image and initrd are permanently reserved and never returned to the allocator.

4. Minimal C Demos

Demo A — memblock Range Allocator

memblock keeps a sorted array of free ranges and a sorted array of reserved ranges. Allocation is a top-down scan: find the last free range that fits, mark the tail as reserved, return its address.

memblock — boot allocator simulation — C Demo
stdin (optional)

Demo B — e820 Memory Map Parser

setup_arch() walks the e820 table from boot_params.e820_table and calls memblock_add() for every E820_RAM entry. The demo below simulates this scan, computing total usable RAM and the largest contiguous free region — the same logic as e820__memblock_setup().

e820 table scan — usable RAM — C Demo
stdin (optional)

5. Kernel Source Pointers

File / FunctionWhat it does
init/main.c :: start_kernel()First C function; calls every subsystem init in order
arch/x86/kernel/setup.c :: setup_arch()x86 arch init: parse boot_params, e820, KASLR, NUMA
arch/arm64/kernel/setup.c :: setup_arch()ARM64 arch init: unflatten DTB, map memory, setup CPU
mm/memblock.c :: memblock_alloc()Boot-time allocator; split free range, add to reserved list
arch/x86/kernel/e820.c :: e820__memblock_setup()Walk e820 table and call memblock_add() for each RAM range
mm/page_alloc.c :: free_area_init()Initialise per-zone free_area[MAX_ORDER] buddy lists
mm/page_alloc.c :: mem_init()Transfer memblock free ranges to buddy allocator
mm/slab_common.c :: kmem_cache_init()Bootstrap slab/slub allocator using early buddy pages
kernel/sched/core.c :: sched_init()Create per-CPU run-queues; init CFS, RT, deadline classes
fs/dcache.c :: vfs_caches_init()Allocate dentry/inode hash tables; register bdev/char filesystems
init/main.c :: rest_init()Spawn kernel_init (PID 1) + kthreadd (PID 2); become idle loop
init/init_task.c :: init_taskStatic compile-time definition of PID 0 (swapper/idle)

6. Interview Prep

#QuestionConcise Answer
Q1What is the first C function the kernel runs after decompression?start_kernel() in init/main.c. The assembly stubs in startup_32/startup_64 (x86) or head.S (ARM64) set up page tables and stack, then branch to it. It runs with interrupts off on a single CPU.
Q2What does memblock do and why is it needed before the buddy allocator?memblock is a static boot-time allocator: two sorted arrays of physical ranges (available and reserved). It answers allocation requests before the buddy page allocator is live, then transfers all remaining free ranges to the buddy allocator in a single pass during mem_init().
Q3Walk me through the full boot sequence from BIOS to PID 1.BIOS/UEFI → bootloader → decompress kernel → startup_64/head.S (MMU on) → start_kernel() → setup_arch (e820/DTB, memblock) → mm_init (buddy) → kmem_cache_init (slab) → sched_init → vfs_caches_init → rest_init → kernel_init thread mounts rootfs → exec /sbin/init (PID 1).
Q4Why does start_kernel() run with interrupts disabled?Interrupts require a live IDT (x86) or interrupt vector table (ARM64), per-CPU stacks, and a softirq mechanism — none of which exist yet. trap_init() and init_IRQ() set these up during start_kernel(); local_irq_enable() is called only after all interrupt infrastructure is ready.
Q5What is PID 0 and how does it relate to the idle thread?PID 0 is init_task, a statically compiled task_struct. It is the bootstrap task that runs start_kernel(). After rest_init() spawns PID 1 and PID 2, init_task becomes the per-CPU idle thread: cpu_idle_loop() runs HLT (x86) / WFI (ARM) whenever no other task is runnable.