Web3 Docs

1. Overview

After the bootloader decompresses the kernel and jumps to startup_64 (x86) or _text (ARM64), a few dozen assembly instructions enable paging and branch to start_kernel() — the first C function ever called. From here the kernel must bootstrap everything from scratch: memory allocators, interrupt handling, scheduling, filesystems, and the network stack, all in a carefully ordered sequence where each subsystem may only use services already initialized before it.

The entire sequence runs on a single CPU with interrupts disabled until local_irq_enable() is explicitly called. When rest_init() is reached, the kernel spawns PID 1 (kernel_init) and PID 2 (kthreadd), then becomes the idle thread (PID 0).

2. Key Data Structures

`struct boot_params` — x86 Boot Protocol

On x86, the bootloader fills a 4 KB boot_params structure at physical address 0x90000 and passes its address in register rsi. setup_arch() reads it in the very first lines of kernel C code. The most important sub-structure is the e820_table — the physical memory map.

`struct e820_entry` — Physical Memory Map

Each entry describes one contiguous physical region. The BIOS/UEFI firmware fills these via INT 0x15, AX=0xe820. The kernel uses this map to know which RAM is free, which is reserved (MMIO, ACPI tables), and which is unusable. It is the foundation for memblock.

Type value	Meaning	Kernel action
`1 — E820_RAM`	Usable system RAM	Add to memblock; later given to buddy allocator
`2 — E820_RESERVED`	Reserved by firmware / MMIO	Never touched; ioremap() maps these for drivers
`3 — E820_ACPI`	ACPI tables — readable RAM	Free after ACPI init; reclaimed as normal RAM
`4 — E820_NVS`	ACPI NVS — firmware needs it across S3	Never freed; survives suspend/resume cycle
`5 — E820_UNUSABLE`	Broken / bad RAM	Excluded from all allocators permanently

`memblock` — The Boot-Time Allocator

Before the buddy allocator exists there must be a simpler allocator to manage memory. memblock is that allocator: it keeps two sorted arrays of physical ranges — memory[] (available) and reserved[] (in use). Allocation is a simple range split; there is no free — everything is freed at once when mem_init() hands the remaining ranges to the buddy allocator.

Field (in `struct memblock`)	Type	Purpose
`bottom_up`	bool	Allocate from low addresses (KASLR needs top-down)
`current_limit`	phys_addr_t	Upper bound for allocations (lowmem cap)
`memory`	struct memblock_type	Array of available physical ranges from e820
`reserved`	struct memblock_type	Array of ranges already allocated (kernel, initrd…)
`memory.regions[]`	struct memblock_region	Each region: base, size, flags, NUMA node id

3. Core Mechanism — From memblock to Buddy

Background: The kernel needs to allocate memory for its own data structures before the buddy allocator exists (page tables, IRQ descriptors, CPU stacks…). But it also needs to give the buddy allocator the correct list of free physical pages once it is ready. memblock is the bridge: it answers early allocation requests, then transfers ownership to the buddy allocator in a single pass during mem_init().

Plan:

setup_arch() calls e820__memblock_setup(): every E820_RAM range becomes a memblock.memory entry; the kernel image, initrd, and ACPI tables are immediately reserved.
Subsystems allocate from memblock via memblock_alloc(size, align). The allocator finds the last free range that fits and splits it, adding the allocated portion to memblock.reserved.
mm_init() calls free_area_init() to set up zone descriptors (DMA, Normal, HighMem on 32-bit) without touching actual pages yet.
mem_init() iterates every memblock.memory range, skips reserved sub-ranges, and calls __free_pages_memory() on the gaps — this puts pages into the buddy free lists.
After mem_init() returns, the buddy allocator is live. memblock data can still be read (for NUMA queries) but is no longer used for allocation.

Example — 512 MB RAM, kernel at 0x1000000:

Step	memblock.memory	memblock.reserved
After e820__memblock_setup()	`[0x0–0x9FFFF] + [0x100000–0x1FFFFFFF]`	(empty)
Kernel image reserved	`(unchanged)`	[0x1000000–0x1FFFFFF] kernel text+data+bss
initrd reserved	`(unchanged)`	+ [0x4000000–0x4200000] initramfs
memblock_alloc(4 KB) for IDT	`(unchanged)`	+ [0x1FF000–0x1FFFFF] IDT table
mem_init() transfers to buddy	`(read-only from now on)`	gaps between reserved ranges → buddy free lists

After mem_init(), the buddy allocator owns approximately 480 MB of free pages split across order-0 through order-10 free lists. The kernel image and initrd are permanently reserved and never returned to the allocator.

4. Minimal C Demos

Demo A — memblock Range Allocator

memblock keeps a sorted array of free ranges and a sorted array of reserved ranges. Allocation is a top-down scan: find the last free range that fits, mark the tail as reserved, return its address.

memblock — boot allocator simulation — C Demo

stdin (optional)

Demo B — e820 Memory Map Parser

setup_arch() walks the e820 table from boot_params.e820_table and calls memblock_add() for every E820_RAM entry. The demo below simulates this scan, computing total usable RAM and the largest contiguous free region — the same logic as e820__memblock_setup().

e820 table scan — usable RAM — C Demo

stdin (optional)

5. Kernel Source Pointers

File / Function	What it does
`init/main.c :: start_kernel()`	First C function; calls every subsystem init in order
`arch/x86/kernel/setup.c :: setup_arch()`	x86 arch init: parse boot_params, e820, KASLR, NUMA
`arch/arm64/kernel/setup.c :: setup_arch()`	ARM64 arch init: unflatten DTB, map memory, setup CPU
`mm/memblock.c :: memblock_alloc()`	Boot-time allocator; split free range, add to reserved list
`arch/x86/kernel/e820.c :: e820__memblock_setup()`	Walk e820 table and call memblock_add() for each RAM range
`mm/page_alloc.c :: free_area_init()`	Initialise per-zone free_area[MAX_ORDER] buddy lists
`mm/page_alloc.c :: mem_init()`	Transfer memblock free ranges to buddy allocator
`mm/slab_common.c :: kmem_cache_init()`	Bootstrap slab/slub allocator using early buddy pages
`kernel/sched/core.c :: sched_init()`	Create per-CPU run-queues; init CFS, RT, deadline classes
`fs/dcache.c :: vfs_caches_init()`	Allocate dentry/inode hash tables; register bdev/char filesystems
`init/main.c :: rest_init()`	Spawn kernel_init (PID 1) + kthreadd (PID 2); become idle loop
`init/init_task.c :: init_task`	Static compile-time definition of PID 0 (swapper/idle)

6. Interview Prep

#	Question	Concise Answer
Q1	What is the first C function the kernel runs after decompression?	start_kernel() in init/main.c. The assembly stubs in startup_32/startup_64 (x86) or head.S (ARM64) set up page tables and stack, then branch to it. It runs with interrupts off on a single CPU.
Q2	What does memblock do and why is it needed before the buddy allocator?	memblock is a static boot-time allocator: two sorted arrays of physical ranges (available and reserved). It answers allocation requests before the buddy page allocator is live, then transfers all remaining free ranges to the buddy allocator in a single pass during mem_init().
Q3	Walk me through the full boot sequence from BIOS to PID 1.	BIOS/UEFI → bootloader → decompress kernel → startup_64/head.S (MMU on) → start_kernel() → setup_arch (e820/DTB, memblock) → mm_init (buddy) → kmem_cache_init (slab) → sched_init → vfs_caches_init → rest_init → kernel_init thread mounts rootfs → exec /sbin/init (PID 1).
Q4	Why does start_kernel() run with interrupts disabled?	Interrupts require a live IDT (x86) or interrupt vector table (ARM64), per-CPU stacks, and a softirq mechanism — none of which exist yet. trap_init() and init_IRQ() set these up during start_kernel(); local_irq_enable() is called only after all interrupt infrastructure is ready.
Q5	What is PID 0 and how does it relate to the idle thread?	PID 0 is init_task, a statically compiled task_struct. It is the bootstrap task that runs start_kernel(). After rest_init() spawns PID 1 and PID 2, init_task becomes the per-CPU idle thread: cpu_idle_loop() runs HLT (x86) / WFI (ARM) whenever no other task is runnable.

§ 1.2 Kernel Boot Process