§ 10.1-10.5 XDP Architecture, AF_XDP, Maps, and Use Cases
Early packet processing in the Linux RX path: XDP modes, AF_XDP rings, BPF maps, and production uses such as DDoS defense, Katran, and Cilium.
1. Overview
XDP runs an eBPF program at the earliest practical point in packet receive: inside the NIC driver's NAPI poll path, after the RX descriptor is ready but before the kernel allocates a sk_buff. That is the key trade: XDP keeps Linux networking available for passed packets while letting simple decisions happen before the expensive stack.
The program returns a verdict, and that verdict decides whether the packet enters the normal stack, gets dropped, is bounced back out, or is redirected to another kernel object such as a devmap, cpumap, or AF_XDP socket.
XDP vs DPDK
| Aspect | XDP | DPDK |
|---|---|---|
| Kernel integration | Retains routing, netfilter, sockets for XDP_PASS | Full userspace bypass; app owns the datapath |
| Performance shape | Excellent for early drop, redirect, sampling, L4 steering | Best for full packet processing pipelines and virtual switches |
| CPU model | No mandatory dedicated polling cores | Dedicated PMD cores are normal |
| Programming model | Verifier-limited eBPF, bounded loops, explicit packet bounds checks | Unrestricted C in userspace with large library ecosystem |
| Sweet spot | DDoS mitigation, L4 load balancing, observability, kernel-integrated policy | NFV, OVS-DPDK, telecom packet pipelines, full bypass appliances |
2. Key Data Structures
The XDP program sees an xdp_buff, not a socket buffer. The most important fields are the packet bounds: every header access must prove data + size <= data_end or the BPF verifier rejects the program.
| Structure | Field | Purpose |
|---|---|---|
xdp_buff | data / data_end | Verifier-checked packet byte range visible to the program. |
xdp_buff | data_meta | Optional metadata space used to pass information to later hooks. |
xdp_rxq_info | dev, queue_index | Ingress device and RX queue identity. |
xsk_buff_pool | UMEM chunks, DMA mapping, rings | AF_XDP buffer pool used by zero-copy drivers. |
| BPF map | key/value storage | Shared policy and state between userspace control plane and XDP datapath. |
AF_XDP adds a userspace-visible memory allocator and four rings. Fill and completion rings move buffer ownership; RX and TX rings move packet descriptors.
BPF Map Types Used by XDP
| Map Type | Typical Key / Value | Use Case |
|---|---|---|
BPF_MAP_TYPE_HASH | 5-tuple, IP, tenant id → action | Blocklists, session tables, service policy. |
BPF_MAP_TYPE_LPM_TRIE | prefix length + IP → route or policy | Longest-prefix route or CIDR block matching. |
BPF_MAP_TYPE_DEVMAP | index → netdev ifindex | XDP_REDIRECT to another NIC. |
BPF_MAP_TYPE_CPUMAP | CPU id → queue config | Move packet processing to another CPU. |
BPF_MAP_TYPE_XSKMAP | RX queue id → AF_XDP socket | Redirect packets into userspace through AF_XDP. |
BPF_MAP_TYPE_PERCPU_ARRAY | index → per-CPU counter | Stats without cache-line contention. |
3. Core Mechanism
XDP has three attach modes on the RX path. Offload mode runs on capable NIC hardware, native mode runs in the driver before sk_buff, and generic mode runs after sk_buff allocation so it works almost everywhere but loses the main early-drop advantage.
Background
Suppose a host is receiving 20 Mpps of spoofed UDP packets. If every packet allocates a sk_buff, runs netfilter, and reaches UDP demux before being dropped, the CPU budget is spent on attack traffic. XDP moves the reject decision before those allocations.
Plan
- Attach a native XDP program to the ingress NIC.
- Parse Ethernet and IP headers with explicit
data_endchecks. - Look up source IP, prefix, or service tuple in a BPF map maintained by userspace.
- Return
XDP_DROPfor attack traffic,XDP_PASSfor normal Linux traffic, orXDP_REDIRECTfor fast forwarding.
Walkthrough
In AF_XDP zero-copy mode, the userspace process gives the kernel empty UMEM chunks, the NIC DMA-writes packet bytes directly into those chunks, and XDP redirects the descriptor to the socket. The userspace process receives a descriptor, not a copied packet.
| Step | State Change | Why It Matters |
|---|---|---|
| 1 | Userspace posts UMEM addresses to fill ring. | Kernel has empty buffers ready for RX. |
| 2 | NIC DMA-writes packet into UMEM chunk. | No kernel-to-user copy is needed later. |
| 3 | XDP program redirects queue id through XSKMAP. | Packet is delivered to the matching AF_XDP socket. |
| 4 | Userspace reads RX ring descriptor and parses UMEM bytes. | The packet path is a descriptor handoff, not a socket receive path. |
| 5 | Userspace recycles the chunk through fill or transmits it through TX. | Buffer ownership stays explicit and batched. |
BPF maps are the control plane boundary: userspace changes policy with map updates, while the XDP program stays attached and reads those maps on every packet.
4. Minimal C Demo
This first demo models the AF_XDP setup skeleton: allocate UMEM, configure rings, bind a socket to an interface queue, publish fill buffers, and receive one descriptor.
This second demo models an XDP program using a BPF hash map as an IP blocklist. Real XDP code uses helper calls and verifier-safe pointer arithmetic; the control-flow idea is the same.
5. Kernel Source Pointers
| Path / Function | What to Read |
|---|---|
include/net/xdp.h | struct xdp_buff, struct xdp_frame, RX queue metadata. |
net/core/dev.c | Generic RX path, NAPI receive flow, generic XDP integration. |
net/core/filter.c | BPF helper implementations including redirect behavior. |
kernel/bpf/devmap.c | DEVMAP redirect path used by XDP_REDIRECT. |
kernel/bpf/cpumap.c | CPUMAP redirect and per-CPU packet handoff. |
net/xdp/xsk.c | AF_XDP socket lifecycle, bind, sendmsg, poll, mmap. |
net/xdp/xsk_queue.h | AF_XDP ring producer/consumer mechanics. |
net/xdp/xsk_buff_pool.c | UMEM registration, DMA mapping, zero-copy buffer pool. |
Documentation/networking/af_xdp.rst | Official AF_XDP architecture and userspace API notes. |
tools/testing/selftests/bpf/ | Small runnable examples for XDP programs, maps, and AF_XDP behavior. |
6. Use Cases
Katran uses XDP as a high-performance L4 load balancer: match a VIP, choose a backend with consistent hashing, encapsulate the packet, and forward before the normal host stack handles it.
Cilium uses BPF maps as its datapath state store for identities, services, policy, and connection tracking. XDP is useful for the earliest drop or redirect decisions, while other BPF hooks handle paths that need richer context.
- DDoS mitigation: drop attack packets before
sk_buff, conntrack, or socket lookup. - Monitoring and sampling: redirect selected traffic to AF_XDP for userspace analysis.
- Hybrid migration: keep non-XDP traffic on Linux or OVS while moving hot paths to XDP or AF_XDP.
7. Interview Prep
| Question | Concise Answer |
|---|---|
| What is XDP and where exactly does it run? | XDP is an eBPF hook in the RX path, normally inside the NIC driver's NAPI poll before sk_buff allocation. Generic mode runs later after an skb exists. |
| What are the four common XDP verdicts? | XDP_PASS continues into Linux networking, XDP_DROP discards, XDP_TX sends back out the same NIC, and XDP_REDIRECT sends to another device, CPU, or AF_XDP socket. |
| Native mode vs generic mode? | Native mode needs driver support and runs before skb allocation, so it is fast. Generic mode works on most NICs but runs after skb allocation, making it useful for testing and fallback rather than maximum performance. |
| What is AF_XDP? | An address-family socket that lets XDP redirect packets to userspace through UMEM and four rings: fill, completion, RX, and TX. |
| How does AF_XDP zero-copy work? | Userspace registers UMEM, posts empty chunks to the fill ring, and the NIC DMA-writes directly into those chunks. The kernel delivers descriptors to the RX ring, so userspace parses packet bytes without memcpy. |
| Which BPF maps matter for XDP? | Hash maps for exact policy, LPM tries for prefixes, devmaps for NIC redirect, cpumaps for CPU redirect, xskmaps for AF_XDP sockets, and per-CPU maps for counters. |
| How does Katran use XDP? | It performs L4 load balancing at the edge: parse packet, look up VIP and backend maps, choose a backend with consistent hashing, encapsulate with GUE, and redirect. |
| How does Cilium use XDP? | Cilium uses BPF maps for endpoint, service, policy, and conntrack state. XDP can enforce early drop or redirect decisions; tc-BPF and socket hooks handle richer policy paths. |
| When choose XDP instead of DPDK? | Choose XDP when you need kernel integration, safe verified programs, early drop, redirect, or observability. Choose DPDK when the whole datapath belongs in userspace and needs a full C packet-processing framework. |
| What are XDP-to-OVS migration trade-offs? | XDP can peel off hot early-drop or steering paths while OVS keeps feature-rich switching, OpenFlow, tunnels, and operational tooling. The cost is split datapath ownership: maps, counters, troubleshooting, and fallback behavior must stay consistent. |