Part X — XDP

§ 10.1-10.5 XDP Architecture, AF_XDP, Maps, and Use Cases

Early packet processing in the Linux RX path: XDP modes, AF_XDP rings, BPF maps, and production uses such as DDoS defense, Katran, and Cilium.

1. Overview

XDP runs an eBPF program at the earliest practical point in packet receive: inside the NIC driver's NAPI poll path, after the RX descriptor is ready but before the kernel allocates a sk_buff. That is the key trade: XDP keeps Linux networking available for passed packets while letting simple decisions happen before the expensive stack.

The program returns a verdict, and that verdict decides whether the packet enters the normal stack, gets dropped, is bounced back out, or is redirected to another kernel object such as a devmap, cpumap, or AF_XDP socket.

XDP vs DPDK

AspectXDPDPDK
Kernel integrationRetains routing, netfilter, sockets for XDP_PASSFull userspace bypass; app owns the datapath
Performance shapeExcellent for early drop, redirect, sampling, L4 steeringBest for full packet processing pipelines and virtual switches
CPU modelNo mandatory dedicated polling coresDedicated PMD cores are normal
Programming modelVerifier-limited eBPF, bounded loops, explicit packet bounds checksUnrestricted C in userspace with large library ecosystem
Sweet spotDDoS mitigation, L4 load balancing, observability, kernel-integrated policyNFV, OVS-DPDK, telecom packet pipelines, full bypass appliances

2. Key Data Structures

The XDP program sees an xdp_buff, not a socket buffer. The most important fields are the packet bounds: every header access must prove data + size <= data_end or the BPF verifier rejects the program.

StructureFieldPurpose
xdp_buffdata / data_endVerifier-checked packet byte range visible to the program.
xdp_buffdata_metaOptional metadata space used to pass information to later hooks.
xdp_rxq_infodev, queue_indexIngress device and RX queue identity.
xsk_buff_poolUMEM chunks, DMA mapping, ringsAF_XDP buffer pool used by zero-copy drivers.
BPF mapkey/value storageShared policy and state between userspace control plane and XDP datapath.

AF_XDP adds a userspace-visible memory allocator and four rings. Fill and completion rings move buffer ownership; RX and TX rings move packet descriptors.

BPF Map Types Used by XDP

Map TypeTypical Key / ValueUse Case
BPF_MAP_TYPE_HASH5-tuple, IP, tenant id → actionBlocklists, session tables, service policy.
BPF_MAP_TYPE_LPM_TRIEprefix length + IP → route or policyLongest-prefix route or CIDR block matching.
BPF_MAP_TYPE_DEVMAPindex → netdev ifindexXDP_REDIRECT to another NIC.
BPF_MAP_TYPE_CPUMAPCPU id → queue configMove packet processing to another CPU.
BPF_MAP_TYPE_XSKMAPRX queue id → AF_XDP socketRedirect packets into userspace through AF_XDP.
BPF_MAP_TYPE_PERCPU_ARRAYindex → per-CPU counterStats without cache-line contention.

3. Core Mechanism

XDP has three attach modes on the RX path. Offload mode runs on capable NIC hardware, native mode runs in the driver before sk_buff, and generic mode runs after sk_buff allocation so it works almost everywhere but loses the main early-drop advantage.

Background

Suppose a host is receiving 20 Mpps of spoofed UDP packets. If every packet allocates a sk_buff, runs netfilter, and reaches UDP demux before being dropped, the CPU budget is spent on attack traffic. XDP moves the reject decision before those allocations.

Plan

  1. Attach a native XDP program to the ingress NIC.
  2. Parse Ethernet and IP headers with explicit data_end checks.
  3. Look up source IP, prefix, or service tuple in a BPF map maintained by userspace.
  4. Return XDP_DROP for attack traffic, XDP_PASS for normal Linux traffic, or XDP_REDIRECT for fast forwarding.

Walkthrough

In AF_XDP zero-copy mode, the userspace process gives the kernel empty UMEM chunks, the NIC DMA-writes packet bytes directly into those chunks, and XDP redirects the descriptor to the socket. The userspace process receives a descriptor, not a copied packet.

StepState ChangeWhy It Matters
1Userspace posts UMEM addresses to fill ring.Kernel has empty buffers ready for RX.
2NIC DMA-writes packet into UMEM chunk.No kernel-to-user copy is needed later.
3XDP program redirects queue id through XSKMAP.Packet is delivered to the matching AF_XDP socket.
4Userspace reads RX ring descriptor and parses UMEM bytes.The packet path is a descriptor handoff, not a socket receive path.
5Userspace recycles the chunk through fill or transmits it through TX.Buffer ownership stays explicit and batched.

BPF maps are the control plane boundary: userspace changes policy with map updates, while the XDP program stays attached and reads those maps on every packet.

4. Minimal C Demo

This first demo models the AF_XDP setup skeleton: allocate UMEM, configure rings, bind a socket to an interface queue, publish fill buffers, and receive one descriptor.

AF_XDP UMEM and Ring Setup Skeleton — C Demo
stdin (optional)

This second demo models an XDP program using a BPF hash map as an IP blocklist. Real XDP code uses helper calls and verifier-safe pointer arithmetic; the control-flow idea is the same.

XDP Map Lookup for DDoS Drop — C Demo
stdin (optional)

5. Kernel Source Pointers

Path / FunctionWhat to Read
include/net/xdp.hstruct xdp_buff, struct xdp_frame, RX queue metadata.
net/core/dev.cGeneric RX path, NAPI receive flow, generic XDP integration.
net/core/filter.cBPF helper implementations including redirect behavior.
kernel/bpf/devmap.cDEVMAP redirect path used by XDP_REDIRECT.
kernel/bpf/cpumap.cCPUMAP redirect and per-CPU packet handoff.
net/xdp/xsk.cAF_XDP socket lifecycle, bind, sendmsg, poll, mmap.
net/xdp/xsk_queue.hAF_XDP ring producer/consumer mechanics.
net/xdp/xsk_buff_pool.cUMEM registration, DMA mapping, zero-copy buffer pool.
Documentation/networking/af_xdp.rstOfficial AF_XDP architecture and userspace API notes.
tools/testing/selftests/bpf/Small runnable examples for XDP programs, maps, and AF_XDP behavior.

6. Use Cases

Katran uses XDP as a high-performance L4 load balancer: match a VIP, choose a backend with consistent hashing, encapsulate the packet, and forward before the normal host stack handles it.

Cilium uses BPF maps as its datapath state store for identities, services, policy, and connection tracking. XDP is useful for the earliest drop or redirect decisions, while other BPF hooks handle paths that need richer context.

  • DDoS mitigation: drop attack packets before sk_buff, conntrack, or socket lookup.
  • Monitoring and sampling: redirect selected traffic to AF_XDP for userspace analysis.
  • Hybrid migration: keep non-XDP traffic on Linux or OVS while moving hot paths to XDP or AF_XDP.

7. Interview Prep

QuestionConcise Answer
What is XDP and where exactly does it run?XDP is an eBPF hook in the RX path, normally inside the NIC driver's NAPI poll before sk_buff allocation. Generic mode runs later after an skb exists.
What are the four common XDP verdicts?XDP_PASS continues into Linux networking, XDP_DROP discards, XDP_TX sends back out the same NIC, and XDP_REDIRECT sends to another device, CPU, or AF_XDP socket.
Native mode vs generic mode?Native mode needs driver support and runs before skb allocation, so it is fast. Generic mode works on most NICs but runs after skb allocation, making it useful for testing and fallback rather than maximum performance.
What is AF_XDP?An address-family socket that lets XDP redirect packets to userspace through UMEM and four rings: fill, completion, RX, and TX.
How does AF_XDP zero-copy work?Userspace registers UMEM, posts empty chunks to the fill ring, and the NIC DMA-writes directly into those chunks. The kernel delivers descriptors to the RX ring, so userspace parses packet bytes without memcpy.
Which BPF maps matter for XDP?Hash maps for exact policy, LPM tries for prefixes, devmaps for NIC redirect, cpumaps for CPU redirect, xskmaps for AF_XDP sockets, and per-CPU maps for counters.
How does Katran use XDP?It performs L4 load balancing at the edge: parse packet, look up VIP and backend maps, choose a backend with consistent hashing, encapsulate with GUE, and redirect.
How does Cilium use XDP?Cilium uses BPF maps for endpoint, service, policy, and conntrack state. XDP can enforce early drop or redirect decisions; tc-BPF and socket hooks handle richer policy paths.
When choose XDP instead of DPDK?Choose XDP when you need kernel integration, safe verified programs, early drop, redirect, or observability. Choose DPDK when the whole datapath belongs in userspace and needs a full C packet-processing framework.
What are XDP-to-OVS migration trade-offs?XDP can peel off hot early-drop or steering paths while OVS keeps feature-rich switching, OpenFlow, tunnels, and operational tooling. The cost is split datapath ownership: maps, counters, troubleshooting, and fallback behavior must stay consistent.