Tech Notes

Reference Books

Code	Book	Focus
`TCPIP1`	TCP/IP Illustrated Vol.1 (2nd ed.) — Fall & Stevens	Wire-level walk of every protocol
`TCPIP2`	TCP/IP Illustrated Vol.2 — Wright & Stevens	BSD kernel implementation
`TCPIP3`	TCP/IP Illustrated Vol.3 — Stevens	T/TCP, HTTP, NNTP — historical
`UNP`	UNIX Network Programming Vol.1 (3rd ed.) — Stevens	Sockets API bible — select/poll/epoll patterns
`KUROSE`	Computer Networking: A Top-Down Approach — Kurose & Ross	University-level CN reference
`DOYLE1`	Routing TCP/IP Vol.1 (2nd ed.) — Jeff Doyle	IGPs — RIP, EIGRP, OSPF, IS-IS
`DOYLE2`	Routing TCP/IP Vol.2 — Jeff Doyle	BGP, multicast, NAT, IPv6
`HALABI`	Internet Routing Architectures (2nd ed.) — Sam Halabi	BGP design — the BGP bible
`MPLS`	MPLS-Enabled Applications — Minei & Lucek	MPLS, L3VPN, EVPN, SR
`IPV6`	IPv6 Essentials (3rd ed.) — Silvia Hagen	Practical IPv6 reference
`MCAST`	Developing IP Multicast Networks — Beau Williamson	PIM-SM, MSDP, RP design
`HPBN`	High Performance Browser Networking — Ilya Grigorik	TLS, HTTP/2, HTTP/3, WebRTC
`CCNP`	CCNP Enterprise ENCOR / ENARSI Official Cert Guides — Cisco Press	CCNP exam-aligned, hands-on
`CCIE`	CCIE Enterprise Infrastructure / SP / DC v1.x Blueprint — Cisco	Lab-aligned curriculum
`IBTA`	InfiniBand Architecture Spec Vol.1 & 2 — IBTA	RDMA verbs, transports, packet format
`MELLA`	RDMA Aware Networks Programming Manual — NVIDIA/Mellanox	Verbs API and ConnectX behavior
`NCCL`	NCCL Documentation & Source — NVIDIA	Collective algorithms, channels, transports
`SPDK`	SPDK Programmer's Guide — Intel/Linux Foundation	User-space NVMe & NVMe-oF target
`LIBEV`	libev Documentation — Marc Lehmann	Definitive source on libev internals
`LIBEVT`	The libevent Book — Nick Mathewson	Event-loop design patterns
`TLPI`	The Linux Programming Interface — Michael Kerrisk	select/poll/epoll/io_uring chapters

Part I — Cisco CCNP / CCIE Foundations

Doyle Vol.1 · CCNP ENCOR · CCIE EI v1.1 blueprint

§1.1OSI vs TCP/IP Layered Model (encapsulation, MTU/MSS, fragmentation, ethertype)
§1.2Cisco IOS / IOS-XE / NX-OS / IOS-XR (CLI modes, AAA, SSH, NETCONF/YANG)
§1.3Certification Path (CCNA → CCNP Enterprise → CCIE EI; CCNP DC → CCIE DC; CCNP SP → CCIE SP)
§1.4Lab Tooling (GNS3, EVE-NG, Cisco CML/VIRL, Containerlab, Packet Tracer)
§1.5Network Automation (Ansible, NETCONF, RESTCONF, gNMI, Cisco DNA Center)
§1.6Reading the CCIE Lab Topology (control plane vs data plane, stateful vs stateless)

Part II — Layer 2 Switching

CCNP ENCOR · Doyle Vol.1 Ch.4-7

§2.1Ethernet Frame & MAC Forwarding (CAM/TCAM, MAC aging, unicast flooding)
§2.2VLAN (access/trunk, native VLAN, 802.1Q tagging, voice VLAN, QinQ stacking)
§2.3VTP (server/client/transparent, pruning, VTPv3 password)
§2.4STP / RSTP / MSTP (root election, BPDU, port states, edge/PortFast, BPDU guard, root guard, loop guard)
§2.5EtherChannel / LACP (active/passive, PAgP auto/desirable, load-balance hash, min-links)
§2.6Switchport Security (port-security, DHCP snooping, DAI, IPSG, storm-control)
§2.7First-Hop Security (RA Guard, IPv6 Source Guard, BPDU Guard for IPv6)
§2.8TRILL / SPB (modern L2 multipath alternatives to STP)

Part III — Stack & De-stack Architectures

Cisco StackWise / VSS / vPC / MLAG · interview-critical

§3.1StackWise / StackWise-480 / StackWise-1T (Catalyst 3650/9300/9500 ring, master/standby/member)
§3.2StackWise Virtual / SVL (Catalyst 9500/9600 — two physical switches as one logical)
§3.3VSS (Virtual Switching System on Catalyst 6500/6800 — VSL link, dual-active detection, RPR/SSO)
§3.4vPC (Nexus 9K/7K Virtual Port Channel — peer-link, peer-keepalive, orphan ports, vPC roles)
§3.5MLAG / MC-LAG (Arista, Juniper MC-LAG, peer-gateway, ARP sync)
§3.6De-stack Architecture: Pure L3 ECMP Spine-Leaf (no MLAG, BGP-only, server multi-homing via routing)
§3.7Failure Domain Comparison (stack vs MLAG vs vPC vs L3-only)
§3.8ISSU / GIR (in-service software upgrade, Graceful Insertion & Removal)
§3.9Dual-Active / Split-Brain Detection (PAGP enhanced, fast-hello, BFD-based detection)
§3.10Migration Stories (StackWise → SVL → leaf-spine — operational lessons)

Part IV — Routing Protocols — IGP

Doyle Vol.1 · CCIE Routing TCP/IP

§4.1Static & Floating Static Routes (administrative distance, recursive lookup, IP SLA tracking)
§4.2RIPv2 / RIPng (distance vector, split horizon, poison reverse, hold-down — legacy but useful baseline)
§4.3OSPFv2 (LSA Type 1-7, areas, NSSA / Totally NSSA, virtual link, DR/BDR election, SPF & iSPF)
§4.4OSPFv3 (per-link LSAs, IPv6 transport, address-family for v4)
§4.5OSPF Optimization (LSA throttle, SPF throttle, prefix suppression, fast-hello, BFD)
§4.6EIGRP (DUAL algorithm, feasible successor, FD/RD, named mode, classic vs wide metric)
§4.7IS-IS (Level-1/Level-2, NET addressing, CSNP/PSNP, wide metric, multi-topology — common in ISP/DC)
§4.8Route Redistribution (mutual redistribution, route-map, tag-based loop prevention)
§4.9Policy Routing (PBR, route-map on interface, BFD-tracked next-hop)
§4.10Convergence Tuning (BFD sub-second, fast-hello, LFA, RLFA, TI-LFA — all SR-aware)

Part V — Routing Protocols — BGP

Halabi · RFC 4271/7606 · CCIE SP / DC

§5.1BGP Fundamentals (eBGP vs iBGP, full mesh, AS, TCP/179)
§5.2Path Attributes (AS_PATH, NEXT_HOP, LOCAL_PREF, MED, ORIGIN, COMMUNITIES, AGGREGATOR)
§5.3Best Path Selection (13-step algorithm — weight → local-pref → AS-path → origin → MED → eBGP/iBGP → IGP → router-id)
§5.4Route Reflector & Confederation (scaling iBGP)
§5.5BGP Communities (well-known, extended, large communities, community-based policy)
§5.6BGP Security (RPKI/ROA, BGPsec, max-prefix, TTL security, GTSM)
§5.7BGP Convergence (BGP-PIC, ADD-PATH, BFD, graceful restart)
§5.8Multipath BGP (eBGP multipath, iBGP multipath, AS-PATH multipath-relax)
§5.9BGP for Data Center (eBGP unnumbered, allowas-in, FRR, GoBGP, Bird)
§5.10BGP Monitoring (BMP RFC 7854, gNMI streaming telemetry)

Part VI — MPLS, VPN & Segment Routing

MPLS-Enabled Applications (Minei) · CCIE SP

§6.1MPLS Forwarding (label stack, EXP/TC, S bit, TTL, PHP)
§6.2Label Distribution (LDP, RSVP-TE, BGP-LU, downstream unsolicited vs on-demand)
§6.3MPLS L3VPN / RFC 4364 (VRF, RD, RT, MP-BGP VPNv4/v6, PE-CE protocols)
§6.4MPLS L2VPN (VPWS pseudo-wire, VPLS, H-VPLS, control word)
§6.5EVPN (RFC 7432 — Type 1-5 routes, VXLAN/MPLS data plane, EVPN-VPWS)
§6.6Traffic Engineering (RSVP-TE, FRR/MPLS-FRR, auto-bandwidth)
§6.7Segment Routing — SR-MPLS (SID, prefix-SID, adj-SID, anycast SID, IGP shortest path)
§6.8SRv6 (locator + function + arg, micro-SID/uSID, end.x, end.dt4/dt6, SRv6 BE vs Policy)
§6.9TI-LFA (Topology-Independent LFA — sub-50ms protection over SR)
§6.10VXLAN (RFC 7348, head-end replication, flood-and-learn vs EVPN control plane)
§6.11Geneve / NVGRE / STT (alternative overlays — Geneve is the modern winner)

Part VII — Campus Network Design

Cisco SAFE · CCNP ENCOR / ENARSI

§7.1Three-Tier Hierarchy (access / distribution / core)
§7.2Two-Tier Collapsed Core (medium campus, distribution = core)
§7.3SD-Access (LISP control plane, VXLAN data plane, ISE for policy, fabric edge/border/control)
§7.4802.1X / MAB / Profiling (Cisco ISE, dot1x event-driven, change-of-authorization)
§7.5Wireless Integration (CAPWAP local-mode vs FlexConnect, WLC, anchor controller, AP groups)
§7.6Wi-Fi 6 / 6E / 7 (OFDMA, BSS coloring, 6GHz channels, MLO multi-link operation)
§7.7PoE / PoE+ / UPoE / 802.3bt (15.4 W → 90 W, LLDP power negotiation)
§7.8Campus Multicast (PIM-SM, IGMP snooping, AutoRP / BSR)
§7.9Campus QoS (DSCP marking trust boundary, queueing on access/uplink)
§7.10Cisco DNA Center / Catalyst Center (assurance, automation, fabric provisioning)

Part VIII — Data Center & Cloud Network

RFC 7938 · Clos · ACI / NSX-T / OVN · interview-critical

§8.1Spine-Leaf Clos Topology (k-ary fat-tree, ECMP, oversubscription ratio, rail design)
§8.2BGP-Only Underlay (RFC 7938, eBGP unnumbered, allowas-in, ECMP load-balance)
§8.3EVPN-VXLAN Overlay (Type 2 MAC/IP, Type 3 IMET, Type 5 IP prefix, anycast gateway)
§8.4Cisco ACI (APIC controller, EPG, contracts, bridge domain, VRF, policy model)
§8.5NSX-T (T0/T1 routers, segments, distributed firewall, GENEVE)
§8.6Open vSwitch / OVN (flow tables, OpenFlow, OVN northbound DB, logical routers/switches)
§8.7Container CNI (Calico BGP, Cilium eBPF, Flannel VXLAN, Antrea, Multus)
§8.8Hyperscale Fabrics (Facebook F16/Disaggregated Backbone, Google Jupiter, Azure SONiC)
§8.9AWS VPC Internals (mapping service, ENI, Hyperplane, GWLB, Transit Gateway)
§8.10Lossless DC Fabric for RoCE (PFC, ECN, DCQCN, headroom buffering, Spectrum-X)
§8.11DCI (Data Center Interconnect — OTV, EVPN-VXLAN over DWDM, VPLS legacy)

Part IX — Service Provider / ISP Network

Doyle Vol.2 · CCIE SP · MEF · ITU-T

§9.1ISP Topology (PE-P-PE backbone, ASBR, route reflector hierarchy, IGP design)
§9.2Internet Peering (transit, settlement-free peering, full table vs partial vs default-only)
§9.3IXP (Internet Exchange Points — DE-CIX, AMS-IX, route servers, BGP Looking Glass)
§9.4BGP/MPLS L3VPN at Carrier Scale (RD/RT design, inter-AS option A/B/C, CSC)
§9.56PE / 6VPE (carrying IPv6 over an MPLS IPv4 core)
§9.6Carrier Ethernet & MEF (E-Line, E-LAN, E-Tree, E-Access, OAM 802.1ag/Y.1731)
§9.7Optical Transport (SDH/SONET legacy, OTN ODU/OTU, DWDM, ROADM, coherent optics)
§9.85G Transport (xHaul: fronthaul/midhaul/backhaul, eCPRI, Slicing IETF / SR-aware)
§9.9SD-WAN (Cisco Viptela/Meraki, VMware Velocloud, Versa, Fortinet — overlay tunneling)
§9.10NFV (VNF chaining, NFV-MANO, OpenStack Tacker, ONAP)
§9.11Anycast Services (DNS root servers, CDN POPs, BGP-injected /32-/24)
§9.12DDoS Mitigation (BGP Flowspec, RTBH, scrubbing centers, BCP38)

Part X — TCP State Machine & Connection Lifecycle

TCP/IP Illustrated Vol.1 Ch.11-13 · RFC 9293 · interview-critical

§10.1TCP Header Anatomy (seq, ack, flags S/A/F/R/P/U/E/C, window, urg-ptr, options TLV)
§10.2TCP State Machine — 11 States (CLOSED, LISTEN, SYN-SENT, SYN-RCVD, ESTABLISHED, FIN-WAIT-1/2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT)
§10.3Three-Way Handshake (SYN → SYN-ACK → ACK; ISN selection, RFC 6528 hash-based)
§10.4Four-Way Close (FIN, half-close, simultaneous close → CLOSING)
§10.5TIME_WAIT Deep Dive (2*MSL rationale: orphan dup segments + reliable last ACK; tw_reuse vs tw_recycle hazards)
§10.6SYN Flood & SYN Cookie (encode mss/wscale/sack into ISN via MD5; tradeoffs — no TS/SACK on cookie path)
§10.7SYN Queue vs Accept Queue (somaxconn, tcp_max_syn_backlog, listen() backlog meaning)
§10.8RST Handling & RST Attacks (blind reset, off-path attacker, RFC 5961 challenge ACK)
§10.9Half-Open / Half-Closed Connections (keepalive vs application heartbeat to detect)
§10.10Connection Table Sizing (ephemeral port range, conntrack table, source port reuse with SO_REUSEADDR)
§10.11Linux TCP Tracing (ss -tnipo, /proc/net/tcp, tcptrace, bpftrace tcp:* tracepoints)

Part XI — TCP Reliability & Flow Control

TCP/IP Illustrated Vol.1 Ch.14-19 · RFC 5681/6675/9293

§11.1Sliding Window — Sender (cwnd, snd.una, snd.nxt, snd.wnd; pipe = bytes in flight)
§11.2Sliding Window — Receiver (rcv.wnd advertise, zero-window probe, silly-window-syndrome avoidance)
§11.3Cumulative ACK vs SACK vs D-SACK (RFC 2018, RFC 2883 — duplicate detection)
§11.4RTT & RTO Estimation (Karn's algorithm, Jacobson SRTT/RTTVAR, RFC 6298)
§11.5Fast Retransmit & Fast Recovery (3 dup ACK trigger, NewReno partial ACK handling)
§11.6RACK Loss Detection (RFC 8985 — time-based, replaces dup-ACK threshold)
§11.7Nagle Algorithm vs TCP_NODELAY (small-packet coalescing trade-off, why interactive apps disable it)
§11.8Delayed ACK (200ms typical, ACK-every-other-segment; the Nagle ↔ delayed-ACK 200ms stall classic)
§11.9TCP_CORK / MSG_MORE (block partial sends until uncorked; sendfile + cork pattern for HTTP)
§11.10TCP Keepalive (TCP_KEEPIDLE / KEEPINTVL / KEEPCNT; default 7200s — almost useless, app-layer heartbeat preferred)
§11.11Path MTU Discovery (DF bit, ICMP need-frag, blackhole detection, PLPMTUD RFC 8899)
§11.12TCP Fast Open (TFO cookie, 0-RTT data on subsequent connects, middlebox interference)
§11.13Window Scaling, Timestamps, PAWS (RFC 7323 — large windows over high-BDP links)
§11.14Urgent Pointer & OOB Data (legacy, broken interop — never use it)
§11.15TCP Linger / SO_LINGER (close behavior, abortive vs graceful)

Part XII — TCP Congestion Control

RFC 5681/8312/9438 · BBR papers · interview-critical

§12.1Framework (cwnd, ssthresh, slow start, congestion avoidance, AIMD, fast recovery)
§12.2Tahoe (slow start + cong-avoid, no fast recovery — historical)
§12.3Reno (fast retransmit + fast recovery on triple-dup-ACK)
§12.4NewReno (RFC 6582 — handles multiple losses in one window without SACK)
§12.5BIC (Binary Increase) — predecessor of CUBIC
§12.6CUBIC (RFC 8312 — cubic-function cwnd vs time-since-loss; Linux default since 2.6.19)
§12.7Westwood / Westwood+ (bandwidth estimate to set ssthresh after loss; for wireless lossy links)
§12.8Vegas (delay-based, RTT increase = congestion; mostly displaced by BBR)
§12.9Compound TCP (Microsoft — combines AIMD with delay component)
§12.10BBR v1 (max-bw × min-RTT, 4-state machine: Startup/Drain/ProbeBW/ProbeRTT, pacing)
§12.11BBR v2 / v3 (ECN integration, fairness with CUBIC, loss tolerance)
§12.12DCTCP (RFC 8257 — ECN-marked fraction → multiplicative decrease; data center workhorse)
§12.13PCC / Copa (utility-driven, machine-learned, no hand-tuned thresholds)
§12.14ECN (RFC 3168, ECT/CE codepoint, AccECN RFC 9341) & AQM (RED, CoDel, FQ-CoDel, PIE)
§12.15Bufferbloat & Pacing (TSO/GSO interaction with pacing, sch_fq pacing)
§12.16Pluggable CC in Linux (net.ipv4.tcp_congestion_control, /proc/sys/net/ipv4/tcp_available_congestion_control)
§12.17CC Selection Cheat Sheet (DC = DCTCP/BBRv2; long-haul = BBR; mobile = BBR/Westwood; small RTT LAN = CUBIC)

Part XIII — Modern Transports — QUIC, KCP, SCTP, MPTCP

RFC 9000-9002 · KCP / kcptun · interview-critical

§13.1Why Move Off TCP — middlebox ossification, head-of-line blocking, slow handshake, no migration
§13.2QUIC History (Google QUIC 2013 → IETF QUIC RFC 9000 in 2021)
§13.3QUIC Wire Image (long header — Initial/Handshake/0-RTT/1-RTT; short header; CONNECTION_ID; varint encoding)
§13.4QUIC Streams (per-stream flow control, no HoL between streams, server/client + uni/bidi 4 stream types)
§13.5QUIC Crypto / TLS 1.3 Integration (CRYPTO frames, key updates, packet protection)
§13.6QUIC Handshake (1-RTT new conn, 0-RTT with cached server config — replay-safety constraints)
§13.7QUIC Connection Migration (stable Connection ID survives NAT rebinding / Wi-Fi → cellular)
§13.8QUIC Loss Recovery (RFC 9002 — packet number monotonic, ACK frame with ranges, time threshold + packet threshold)
§13.9QUIC Congestion Control (NewReno default, BBR/CUBIC pluggable, separate from TCP stack)
§13.10HTTP/3 (QUIC + QPACK header compression — replaces HPACK; H3 frame layer)
§13.11QUIC Implementations (Cloudflare quiche, Google quiche, Microsoft msquic, Linux kernel UDP GSO+GRO support)
§13.12KCP — ARQ over UDP (selective repeat, fast retransmit on N skip, FEC, kcptun + crypto wrapping)
§13.13KCP Tuning Knobs (nodelay, interval, resend, nc — interactive vs throughput presets)
§13.14SCTP (RFC 9260 — multi-streaming, multi-homing, message-oriented; used in telecom signaling)
§13.15MPTCP (RFC 8684 — multi-path TCP, subflows, scheduler, fallback to plain TCP, used by Apple Siri / Korea KT GiGA)

Part XIV — UDP & UDP-Based Protocols

RFC 768 · TCP/IP Illustrated Vol.1 Ch.10

§14.1UDP Header (8 bytes — src/dst port, length, checksum); pseudo-header for checksum
§14.2UDP Socket API (sendto/recvfrom, connect() on UDP for filtering, getsockname for ephemeral port)
§14.3Batched I/O (recvmmsg / sendmmsg — single syscall for N packets)
§14.4UDP-Lite (RFC 3828 — partial checksum coverage for tolerant codecs)
§14.5UDP GSO / GRO (Linux 4.18+, segment offload for big UDP for QUIC throughput)
§14.6SO_REUSEPORT (kernel hashing across listening sockets — UDP load balancing without an LB)
§14.7UDP Fragmentation Risks (DF + ICMP needed; black-holed by middleboxes; QUIC restricts to PMTU)
§14.8RTP / RTCP (real-time media, sequence + timestamp, SR/RR reports, SSRC)
§14.9DTLS (TLS over UDP — used by WebRTC media, OpenVPN, CoAP)

Part XV — IP Layer & ICMP

RFC 791/792/4443 · TCP/IP Illustrated Vol.1 Ch.5-8

§15.1IPv4 Header (version, IHL, TOS/DSCP/ECN, total len, ID/flags/frag-offset, TTL, proto, checksum, options)
§15.2IP Fragmentation (DF flag, MF flag, identification, reassembly buffer, fragment overlap attacks)
§15.3ICMP Types & Codes (Echo Request/Reply, Dest Unreachable codes, Time Exceeded, Redirect, Source Quench legacy)
§15.4ICMP Probe Tools (ping, traceroute UDP/ICMP/TCP variants, MTR, paris-traceroute)
§15.5ICMP Errors (orig packet header in payload, rate-limiting per RFC 4443)
§15.6PMTUD via ICMP (Need Frag with next-hop MTU) — and why it often breaks (firewalls drop ICMP)
§15.7PLPMTUD (RFC 8899 — search MTU at transport layer, no ICMP dependency, used by QUIC)
§15.8ICMP Attacks (smurf, ping of death, ICMP redirect injection, ICMP tunneling)
§15.9ICMPv6 (RFC 4443 — types reorganized, multicast for NDP/MLD, ICMPv6 essential not blockable)

Part XVI — ARP & Layer 2 Discovery

RFC 826/3927/5227 · TCP/IP Illustrated Vol.1 Ch.4

§16.1ARP Operation (request broadcast, unicast reply, ARP cache aging, GC threshold)
§16.2Gratuitous ARP — GARP (announce IP move; MAC change; used by VRRP/keepalived takeover; duplicate-IP detection RFC 5227)
§16.3Proxy ARP (router answers ARP for hosts in another subnet; use cases: PPP, mobile IP, transparent firewall)
§16.4ARP Spoofing & Defenses (DAI on switches, static ARP, arpwatch, arp-scan)
§16.5ARP Sponging (used by load balancers and live-migration to redirect traffic mid-flight)
§16.6RARP / InARP (legacy — Reverse ARP for diskless boot; Inverse ARP on Frame Relay)
§16.7Linux ARP Table Tuning (gc_thresh1/2/3, base_reachable_time, when ARP table overflows in HPC clusters)

Part XVII — IPv6 Deep Dive

RFC 8200/4861/4862/8106/8415 · IPv6 Essentials (Hagen)

§17.1IPv6 Header (40-byte fixed; flow label; no checksum; no header-level fragmentation)
§17.2Address Architecture (Global Unicast 2000::/3, Link-Local fe80::/10, ULA fc00::/7, Multicast ff00::/8, Anycast)
§17.3Address Types & Scopes (interface-local, link-local, site-local deprecated, global, multicast scopes)
§17.4EUI-64 Interface ID (MAC-derived, U/L bit flip; privacy concerns)
§17.5SLAAC — Stateless Address Autoconfig (RFC 4862 — RA + on-link prefix → host generates address; DAD)
§17.6Privacy Extensions (RFC 8981 — random IID rotation; RFC 7217 stable but opaque IID)
§17.7NDP — Neighbor Discovery (RFC 4861 — RS, RA, NS, NA, Redirect; replaces ARP + ICMP Redirect)
§17.8RA — Router Advertisement (M/O flags, prefix info option, MTU option, route info option)
§17.9DAD — Duplicate Address Detection (NS to solicited-node multicast before claiming)
§17.10Optimistic DAD (RFC 4429) — start using address before DAD completes
§17.11RDNSS / DNSSL (RFC 8106 — DNS resolver via RA, replaces stateless DHCPv6 for DNS)
§17.12DHCPv6 Stateful (IA_NA assignment, RFC 8415, replaces DHCPv4 for managed networks)
§17.13DHCPv6 Stateless (only options like NTP/SIP, address still SLAAC)
§17.14DHCPv6-PD — Prefix Delegation (IA_PD; how home routers get a /56 or /60 from ISP)
§17.15IPv6 Extension Headers (Hop-by-Hop, Routing, Fragment, Destination, AH/ESP — ordering rules)
§17.16IPv6 Transition Mechanisms (dual-stack, 6to4 deprecated, 6rd, Teredo legacy, NAT64+DNS64, MAP-T/MAP-E, 464XLAT)
§17.17IPv6 Multicast & MLD (MLDv1/v2, replaces IGMP, runs over ICMPv6)
§17.18IPv6 Security (RA Guard, DHCPv6 Guard, ND inspection, SAVI, why disabling IPv6 hurts more than helps)

Part XVIII — Multicast

RFC 3376/4604/7761/7450 · Doyle Vol.2

§18.1IPv4 Multicast Addressing (224.0.0.0/4, 224.0.0.x link-local, 232/8 SSM, 233/8 GLOP, 239/8 admin-scoped)
§18.2MAC Mapping for Multicast (01:00:5e:00:00:00 + lower 23 bits of group; 32→1 collision)
§18.3IGMP v1 / v2 / v3 (host membership; v3 adds source filter for SSM)
§18.4IGMP Snooping (L2 switch tracks IGMP joins; multicast not flooded; querier election)
§18.5PIM-DM — Dense Mode (flood-and-prune, state refresh, only for small dense groups)
§18.6PIM-SM — Sparse Mode (RP, shared tree (*,G), source tree (S,G), Register encapsulation, SPT switchover)
§18.7PIM-SSM — Source-Specific Multicast (no RP, IGMPv3/MLDv2 host signals (S,G) directly; for IPTV)
§18.8RP Discovery (static, AutoRP, BSR — Bootstrap Router; Anycast-RP via MSDP)
§18.9RPF — Reverse Path Forwarding Check (loop prevention, unicast routing table by default)
§18.10Source Tree vs Shared Tree (latency vs state trade-off; SPT-threshold knob)
§18.11MSDP — Multicast Source Discovery Protocol (inter-domain SA messages, Anycast-RP within a domain)
§18.12MLD v1/v2 (IPv6 host membership over ICMPv6; MLDv2 is SSM-capable)
§18.13Bidir-PIM (RFC 5015 — many-to-many, single shared tree, no source state)
§18.14BIER — Bit Indexed Explicit Replication (RFC 8279 — stateless multicast, ingress encodes bitstring)
§18.15Multicast in EVPN-VXLAN (Type 6/7/8 routes, head-end vs underlay multicast replication)

Part XIX — NAT & NAT Traversal

RFC 4787/5128/5389/5766/8489 · WebRTC ICE

§19.1NAT Types (Full Cone, Restricted Cone, Port-Restricted Cone, Symmetric — STUN-defined behaviors)
§19.2NAPT / PAT — Port Address Translation (port overload, conntrack tuple, port allocation strategies)
§19.3NAT Conntrack (Linux nf_conntrack — tuple, expectations, helpers for FTP/SIP)
§19.4Hairpin / NAT Loopback (internal client → public IP → back to internal server)
§19.5Carrier-Grade NAT — CGN / NAT444 (port-block allocation, IPv4 exhaustion mitigation, logging volume)
§19.6NAT64 + DNS64 (RFC 6146/6147 — IPv6-only client to IPv4 server)
§19.7464XLAT (RFC 6877 — CLAT on phone + NAT64 in network; T-Mobile US)
§19.8NPTv6 (stateless 1:1 IPv6 prefix translation, RFC 6296)
§19.9STUN (RFC 8489 — discover external mapping, NAT type detection)
§19.10TURN (RFC 8656 — relay server when direct fails; allocation, permissions)
§19.11ICE (RFC 8445 — gather candidates → pair → connectivity checks → nominate; trickle ICE)
§19.12UDP Hole Punching (mutual STUN, simultaneous send; works for cone NATs, fails for symmetric)
§19.13TCP Hole Punching (TCP simultaneous open; SYN crossing; sequence & state machine challenges)
§19.14UPnP IGD / NAT-PMP / PCP (router-mediated mapping; PCP is the modern winner)
§19.15Tailscale / WireGuard / Nebula NAT Traversal (DERP relays, peer-to-peer establishment)
§19.16WebRTC End-to-End (signaling out-of-band, ICE for media, DTLS-SRTP for security)

Part XX — DHCP

RFC 2131 · RFC 8415 (DHCPv6)

§20.1DHCPv4 DORA (Discover broadcast → Offer → Request → Ack; xid correlation; lease renewal T1/T2)
§20.2DHCP Options (1=mask, 3=router, 6=DNS, 12=hostname, 43=vendor, 51=lease, 55=PRL, 60/61=class/client-id)
§20.3DHCP Option 82 (relay agent info — circuit-ID, remote-ID; used by DHCP snooping & ISP CGNAT)
§20.4DHCP Option 121 / 249 (classless static routes; how to push routes via DHCP)
§20.5DHCP Relay (UDP broadcast on access VLAN → unicast to server; ip helper-address)
§20.6DHCP Snooping (security — only trusted ports may answer; binding table feeds DAI/IPSG)
§20.7DHCP Server Implementations (ISC dhcpd legacy, Kea modern, dnsmasq for SOHO, Windows DHCP)
§20.8DHCP HA / Failover (ISC failover protocol, Kea HA hooks, primary/secondary lease split)
§20.9DHCPv6 (UDP/546-547, multicast ff02::1:2; SOLICIT/ADVERTISE/REQUEST/REPLY; M/O flags interplay with SLAAC)
§20.10DHCPv6 Prefix Delegation (IA_PD — how home routers get /56 from ISP)
§20.11PXE Boot / iPXE (next-server option 66, boot file 67, UEFI HTTP boot)

Part XXI — High Availability

RFC 5798 (VRRP) · RFC 5880 (BFD) · keepalived docs

§21.1HSRP (Cisco — virtual IP/MAC 0000.0c07.acXX, active/standby, group, priority, preempt)
§21.2VRRP (RFC 5798 — open standard, virtual MAC 0000.5e00.01XX, master/backup election)
§21.3GLBP (Cisco — load-balancing FHRP, AVG + AVF, weighted vs round-robin vs host-dependent)
§21.4keepalived (Linux VRRPv2/v3 daemon, healthcheck scripts, IPVS director integration)
§21.5BFD — Bidirectional Forwarding Detection (RFC 5880, sub-second protocol-agnostic detection; async/demand/echo modes)
§21.6BFD Multi-hop (RFC 5883 — for iBGP / RR / IPsec tunnels)
§21.7Anycast HA (BGP-injected /32 from healthy node; DNS root, public DNS 1.1.1.1 / 8.8.8.8)
§21.8MC-LAG / MLAG (multi-chassis link aggregation, dual-active forwarding, peer-link, peer-keepalive)
§21.9LVS / IPVS Modes (DR — direct routing same L2; NAT — return through LB; TUN — IP-in-IP)
§21.10L4 LB Architectures (Maglev consistent hashing, Katran XDP, GitHub GLB, Cloudflare Unimog)
§21.11L7 LB / Proxy (HAProxy, NGINX, Envoy — health checks, retries, circuit breaker, outlier detection)
§21.12Stateful Firewall / NAT HA (conntrackd, pacemaker, session sync)
§21.13DNS-Based Failover (low TTL, GeoDNS, weighted policy)
§21.14Cluster Resource Manager (Pacemaker + Corosync, Linux-HA, fencing/STONITH)

Part XXII — RDMA & RoCE

IBTA Vol.1/2 · RoCE v2 RFC · Mellanox / NVIDIA docs · interview-critical

§22.1Why RDMA — kernel bypass, zero-copy, CPU offload (motivation: 100/200/400/800 GbE saturating CPU memcpy)
§22.2RDMA Operations (SEND/RECV — two-sided; RDMA WRITE / RDMA READ — one-sided; ATOMIC fetch-add / compare-swap)
§22.3RDMA Verbs API — libibverbs (PD, MR, CQ, QP, WR, WC; ibv_post_send / ibv_post_recv)
§22.4Queue Pair States (RESET → INIT → RTR → RTS → SQD/SQE → ERR; transitions via ibv_modify_qp)
§22.5Memory Registration (MR, lkey/rkey, ODP — On-Demand Paging, FRWR — Fast Reg WR)
§22.6Transport Types (RC — Reliable Connection, UC, UD — Unreliable Datagram, XRC — eXtended Reliable Connection)
§22.7InfiniBand Fundamentals (HCA, subnet manager OpenSM, LID/GID, GUID, SL/VL — service level / virtual lanes)
§22.8RoCE v1 — RDMA over Ethernet (L2-only, ethertype 0x8915, no IP routing)
§22.9RoCE v2 — RDMA over UDP/IP (UDP/4791, routable, used in cloud DC fabrics; uses BTH header)
§22.10iWARP — RDMA over TCP (RFC 5040 — older, less performant, but tolerates lossy networks)
§22.11PFC — Priority Flow Control (802.1Qbb — per-priority pause, lossless class for RoCE)
§22.12ECN & DCQCN (Data Center QCN — RoCE congestion control, ConnectX hardware-offloaded reaction)
§22.13PFC Deadlock & Headroom Buffering (cyclic dependency on credits; DCBX exchange; CC vs PFC roles)
§22.14Lossless Ethernet Design (DCB stack — PFC + ETS + DCBX; mlnx_qos tooling)
§22.15Adaptive Routing & Flowlet (NVIDIA AR; per-packet vs per-flow vs flowlet-level spraying)
§22.16RDMA in Storage (NVMe-oF over RDMA, NFS over RDMA, SMB Direct, Ceph BlueStore msgr2 RDMA)
§22.17RDMA Diagnostics (perftest ib_send_bw / ib_write_bw, ibv_devinfo, ibstat, mlx5dump, NVIDIA NEO/UFM)
§22.18RDMA in K8s (SR-IOV, Multus, RDMA CNI, GPU-Operator, Network Operator)

Part XXIII — AI Training & Inference Networking

NCCL docs · NVIDIA Spectrum-X · Meta RoCE · OCP HPN — interview-critical, intentionally deep

§23.1Why AI Networking Is Different (synchronous bulk-synchronous traffic, all-to-all incast, microsecond tail-latency)
§23.2Collective Communication Primitives (Broadcast, Reduce, AllReduce, AllGather, ReduceScatter, AllToAll, Scatter, Gather, Barrier)
§23.3AllReduce Algorithms — Ring (2(N-1) bandwidth-optimal steps, used by NCCL default for large messages)
§23.4AllReduce Algorithms — Tree (latency-optimal, log N depth; NCCL Tree for small messages)
§23.5AllReduce Algorithms — Halving-Doubling, Recursive Doubling, Hierarchical (multi-node + intra-node split)
§23.6AllToAll Patterns (used in MoE expert routing, sequence parallelism — most network-stressful collective)
§23.7NCCL — NVIDIA Collective Communications Library (architecture: comm, channel, proxy thread, work queue)
§23.8NCCL Topology Detection (PCIe / NVLink / NVSwitch / CPU NUMA / NIC affinity → graph search → optimal channels)
§23.9NCCL Transports (Shared Memory intra-process, P2P over PCIe/NVLink, IB Verbs RDMA, Sockets fallback)
§23.10NCCL Tuning (NCCL_ALGO ring/tree, NCCL_PROTO LL/LL128/Simple, NCCL_NTHREADS, NCCL_BUFFSIZE, NCCL_IB_HCA)
§23.11NCCL Plugin API (Networking plugin, e.g. AWS OFI plugin for EFA, Microsoft MSCCL plugin for custom algos)
§23.12MSCCL / MSCCLPP (Microsoft programmable collectives — XML algo description, GPU-driven for inference)
§23.13RCCL (AMD ROCm fork of NCCL for MI300/MI250)
§23.14Gloo / MPI Alternatives (Gloo CPU, OpenMPI, MVAPICH2-GDR, UCX, OneCCL — when not NCCL)
§23.15NVLink / NVSwitch (intra-node fabric — 5th-gen NVLink 1.8 TB/s, NVL72 rack-scale, NVLink-C2C for Grace-Hopper)
§23.16GPUDirect RDMA (GPU memory ↔ NIC without host bounce — ConnectX + NVIDIA driver path)
§23.17GPUDirect Storage / GDS (GPU ↔ NVMe direct via cuFile + nvidia-fs)
§23.18Rail-Optimized Fat-Tree / Clos for AI (per-GPU rail to dedicated leaf, 8 rails per server, no rail crossing)
§23.19NVIDIA Spectrum-X (Spectrum-4 + BlueField-3 + adaptive routing + congestion control DDP)
§23.20Meta AI Backend Network (RoCE-based, 24K-GPU clusters, FBOSS, dual-rail design)
§23.21OCP Hyperscale Network for AI / SONiC AI Optimizations (open AI fabric reference)
§23.22Adaptive Routing in AI Fabrics (per-packet spraying with reordering tolerance via NIC, flowlet, IB AR)
§23.23ECN/PFC Tuning for AI (DCQCN target rate, headroom, watchdog timer; lossless gotchas)
§23.24Congestion Control for AI — HPCC, Swift, Annapurna, EQDS (next-gen receiver-driven CC)
§23.25Inference Networking — Disaggregated Prefill/Decode (PD disaggregation, KV-cache transfer over RDMA)
§23.26Tensor Parallelism / Pipeline Parallelism / Expert Parallelism Traffic Patterns (TP all-reduce per-layer, PP send/recv, EP all-to-all)
§23.27Communication Frameworks (PyTorch DDP / FSDP, Megatron-LM, DeepSpeed ZeRO — what each demands of the network)
§23.28AWS EFA / Google JCT / Azure SDN-AI (cloud AI fabric implementations)
§23.29Backend AI vs Frontend AI (training cluster vs inference serving — different latency/throughput profiles)
§23.30Worked Example — Tracing One AllReduce (8-GPU node, 2-node, 16 GPUs total: ring chunks, schedule, RDMA WR queue)

Part XXIV — SPDK — Storage Performance Dev Kit

SPDK docs · DPDK shared model · Intel/NVMe

§24.1SPDK Motivation (kernel I/O stack overhead, polling > interrupts at >1M IOPS, kernel-bypass storage parallel to DPDK)
§24.2SPDK Architecture (event framework, reactor per lcore, message-passing thread model, no shared mutable state)
§24.3User-Space NVMe Driver (PCI BAR mmap, SQ/CQ doorbells, MSI-X interrupts via VFIO eventfd, polling preferred)
§24.4SPDK BDEV Layer (block device abstraction, drivers for NVMe / AIO / virtio-blk / Ceph RBD / iSCSI / NVMe-oF initiator)
§24.5NVMe-oF Target — Transports (TCP RFC 8009, RDMA, FC; subsystems, namespaces, controllers)
§24.6BlobStore & BlobFS (lightweight storage abstraction; not a POSIX FS — used by Rocksdb backend)
§24.7vhost-user-blk / vhost-user-scsi (zero-copy VM I/O — virtqueue shared mem with QEMU)
§24.8SPDK + DPDK Shared Model (memory allocator rte_malloc, mempool, ring; runs as DPDK secondary or unified)
§24.9SPDK NVMe-oF Performance (1M+ IOPS per CPU core, sub-10µs latency over RDMA)
§24.10Real-World Deployments (Alibaba PolarStore, Ceph BlueStore + SPDK, AWS Nitro storage, Azure Premium SSD v2)
§24.11SPDK vs io_uring (when each wins — full bypass vs in-kernel batched async)

Part XXV — I/O Multiplexing — select / poll / epoll

TLPI Ch.63 · Linux source fs/select.c, fs/eventpoll.c · interview-critical

§25.1Five I/O Models (blocking, non-blocking, I/O multiplexing, signal-driven, async I/O — Stevens UNP Ch.6)
§25.2select(2) — fd_set bitmap, FD_SETSIZE=1024, copy in/out every call, O(n) full scan; sys_select() in kernel
§25.3select Internals (fs/select.c — do_select() loop: poll_wait()/sets bits, wait via __pollwait, restartable timeout)
§25.4select Limitations (1024 fd cap, expensive setup, no edge-trigger, returns count not list)
§25.5poll(2) — pollfd array, no FD_SETSIZE limit, still O(n) scan, still copy-in/out every call
§25.6poll Internals (do_poll() builds wait queues per fd, walks list)
§25.7epoll Architecture — Big Picture (interest set persistent in kernel; ready list maintained on event; O(1) wait)
§25.8epoll Kernel Data Structures (struct eventpoll: rbr RB-tree of registered fds, rdllist ready list, ovflist; struct epitem)
§25.9epoll_create / epoll_create1 (creates anon inode; returns fd; CLOEXEC flag)
§25.10epoll_ctl ADD / MOD / DEL (insert/update/remove epitem; hooks into target fd's wait queue via ep_ptable_queue_proc)
§25.11epoll_wait — How Events Reach Ready List (target fd's poll callback ep_poll_callback fires, splices epitem onto rdllist, wakes waiters)
§25.12Level-Triggered (LT) — Default Semantics (re-reports as long as condition holds; safer; equivalent to poll)
§25.13Edge-Triggered (EPOLLET) — One Notify Per Transition (must drain till EAGAIN; non-blocking fd required)
§25.14EPOLLONESHOT (one-shot, must rearm via EPOLL_CTL_MOD; clean handoff between threads)
§25.15EPOLLEXCLUSIVE (RFC 4.5 — wake only one waiter; mitigates accept thundering herd in multi-process listeners)
§25.16epoll Drain Rule (with ET: read until EAGAIN; with LT: optional but faster with batching)
§25.17Common Pitfalls (close() of fd auto-removes from epoll only if last ref; dup'd fds trap; TOCTOU on unregister)
§25.18epoll vs kqueue vs IOCP (BSD/macOS unified event filters; Windows completion-based vs readiness-based)
§25.19epoll vs io_uring (readiness vs true async; io_uring SQ/CQ shared rings; multishot + zero-copy)
§25.20Reactor Pattern (epoll_wait → dispatch → handler — one thread per loop)
§25.21Proactor Pattern (true async completion — io_uring, IOCP)
§25.22Worked Example — Echo Server Progression (select → poll → epoll-LT → epoll-ET; benchmarks; pitfalls at each step)
§25.23Worked Example — High-Concurrency HTTP Server with epoll-ET + accept4 + SO_REUSEPORT

Part XXVI — Event Loop Libraries — libev / libevent / libuv

libev manual · libevent book · libuv design · interview-critical

§26.1Why Wrap epoll/kqueue/IOCP — portability, watcher abstraction, timer wheel, signal safety
§26.2Library Landscape (libev — minimalist by Marc Lehmann; libevent — older, more features; libuv — Node.js, cross-platform incl. Windows)
§26.3libev Architecture — Loops & Watchers (one struct ev_loop, many ev_*_watcher embedded into user struct)
§26.4libev Watcher Types (ev_io fd readiness, ev_timer relative, ev_periodic absolute/repeating, ev_signal, ev_child SIGCHLD, ev_stat inotify, ev_idle, ev_prepare/check loop hooks, ev_async cross-thread wakeup, ev_embed nested loop, ev_fork)
§26.5libev Backend Selection (auto: epoll on Linux, kqueue on BSD/Mac, port on Solaris, poll/select fallback; EVBACKEND_* flags)
§26.6libev Core Loop (ev_run / ev_loop) — Phases (1.before-fork → 2.queue pending → 3.invoke check → 4.fdupdate → 5.timer → 6.io wait → 7.invoke pending → repeat)
§26.7libev Timer Implementation (4-heap min-heap; O(log n) insert/extract; ev_now caching to avoid repeated clock_gettime)
§26.8libev fd-to-Watchers Map (ANFD array indexed by fd; multiple watchers per fd via linked list; reify on next loop iteration)
§26.9libev Priority Queue (priority -2..+2; pending events queued by priority; invoke_pending walks from highest)
§26.10libev Signal Handling — Safe Async (signalfd if available; pipe-based wakeup fallback; ev_signal watcher coalesces deliveries)
§26.11libev Fork Handling (ev_loop_fork: re-arm epoll fd in child, re-register signals; child should not call old loop)
§26.12libev Threading Model (one loop per thread; ev_async only safe cross-thread API; ev_loop is NOT thread-safe)
§26.13libev Embed Watcher (run a child loop inside parent — used to mix backends, e.g. select inside epoll loop)
§26.14libev vs libevent (libev = simpler, faster, less feature creep; libevent = HTTP/RPC helpers, evbuffer, evdns, deprecated event_base API)
§26.15libuv Internals — Cross-Platform (epoll/kqueue/IOCP/event ports; thread pool for FS + DNS; req-based async file I/O)
§26.16Worked Example — libev Echo Server (ev_io accept watcher → spawn ev_io read watcher per conn; ev_timer idle reaper)
§26.17Worked Example — Tearing Down libev for an Interviewer (loop-by-loop walkthrough; how each watcher type maps to a kernel mechanism)
§26.18Common Pitfalls (forgetting ev_io_stop on close; ET vs LT mismatch — libev assumes LT; not draining means infinite wakeups)
§26.19Choosing Library (libev for embedded / minimalist; libevent for HTTP; libuv for cross-platform Node-like; raw epoll if you want zero abstraction)

Appendix — Common Protocols & Well-Known Ports

Protocol	Transport / Port	Notes
`DNS`	UDP/53, TCP/53, DoT TCP/853, DoH TCP/443	UDP for queries, TCP for >512B / zone xfer
`DHCP / BOOTP`	UDP/67 server, UDP/68 client	Broadcast at L2 then unicast
`DHCPv6`	UDP/547 server, UDP/546 client	ff02::1:2 link-scoped multicast
`HTTP / HTTPS`	TCP/80, TCP/443; HTTP/3 UDP/443	QUIC over UDP for HTTP/3
`SSH`	TCP/22	Default for sshd, scp, sftp, ssh tunnels
`BGP`	TCP/179	MD5 / TCP-AO authentication
`OSPF`	IP proto 89	224.0.0.5 (all SPF), 224.0.0.6 (DR)
`EIGRP`	IP proto 88	224.0.0.10
`IS-IS`	L2 directly (no IP)	AllL1ISs / AllL2ISs MAC
`VRRP`	IP proto 112	224.0.0.18
`PIM`	IP proto 103	224.0.0.13
`IGMP`	IP proto 2	v2 gen-query 224.0.0.1
`LDP`	TCP/646, UDP/646 hello	Targeted hellos for remote LDP
`GRE`	IP proto 47	Generic encap; classic tunnel
`IPsec ESP`	IP proto 50	AH IP proto 51, IKE UDP/500, NAT-T UDP/4500
`VXLAN`	UDP/4789 (RFC 7348)	Linux historically used 8472 (pre-IANA)
`Geneve`	UDP/6081	Variable-length TLV options
`WireGuard`	UDP (configurable, 51820 default)	Single UDP port
`NVMe-oF / TCP`	TCP/4420	RFC 8009; or RDMA on 4791
`RoCE v2`	UDP/4791	BTH header inside UDP
`RDMA CM`	TCP/18 (well-known) — actually port 18 unused; RDMA CM uses random ports	Verbs allocates QP numbers
`NTP`	UDP/123	Stratum hierarchy, leap seconds
`Syslog`	UDP/514, TCP/6514 TLS	RFC 5424 structured data
`SNMP`	UDP/161 query, UDP/162 trap	v3 has authPriv security
`NetFlow / IPFIX`	UDP/2055 / 4739	Templated flow records
`BFD`	UDP/3784 single-hop, UDP/4784 multi-hop, UDP/3785 echo	Sub-second liveness

Appendix — TCP State Machine Quick Reference

State	Side	Triggered by	Next on normal path
`CLOSED`	both	Initial / after teardown	LISTEN (server) / SYN-SENT (client)
`LISTEN`	server	listen()	SYN-RCVD on incoming SYN
`SYN-SENT`	client	connect() sends SYN	ESTABLISHED on SYN-ACK
`SYN-RCVD`	server	Got SYN, sent SYN-ACK	ESTABLISHED on ACK
`ESTABLISHED`	both	Handshake complete	FIN-WAIT-1 (active close) / CLOSE-WAIT (passive close)
`FIN-WAIT-1`	active closer	close() sends FIN	FIN-WAIT-2 (ACK only) / CLOSING (FIN crosses) / TIME-WAIT (FIN+ACK)
`FIN-WAIT-2`	active closer	Peer ACKed our FIN	TIME-WAIT on peer's FIN
`CLOSE-WAIT`	passive closer	Got peer's FIN	LAST-ACK after own close()
`LAST-ACK`	passive closer	Sent FIN after peer's	CLOSED on peer's ACK
`CLOSING`	both (rare)	Simultaneous close — FIN crossing	TIME-WAIT on ACK
`TIME-WAIT`	active closer	Final ACK sent	CLOSED after 2*MSL

Appendix — Congestion Control Algorithms Cheat Sheet

Algo	Signal	cwnd Behavior	Best For
`Tahoe`	Loss (3 dup ACK or RTO)	cwnd = 1, slow start to ssthresh = cwnd/2	Historical baseline
`Reno`	Loss (3 dup ACK)	cwnd = ssthresh = cwnd/2 + fast recovery	Low-loss small-RTT links
`NewReno`	Loss + partial ACK	Stay in fast recovery for multiple losses	Pre-SACK era; still default fallback
`CUBIC`	Loss (cubic function of t since loss)	Cubic concave then convex around W_max	Long-haul high-BDP TCP — Linux default
`BBR v1`	Bandwidth × min-RTT (model-based)	Pace at estimated BtlBw × min-RTT, no slow-start collapse	Long-haul, lossy, video/CDN
`BBR v2`	BBR signal + ECN + loss	Adds ECN response and CUBIC-fairness	DC + WAN mixed traffic
`Vegas`	RTT increase (delay-based)	Reduce on RTT growth, no loss needed	Low-loss links; loses to Reno in mixed
`Westwood`	Loss + bandwidth estimate	ssthresh = bw * min-RTT after loss	Wireless / lossy links
`DCTCP`	ECN-CE fraction	α-weighted multiplicative decrease per round	Data center fabrics with ECN-marking switches
`CTCP`	Loss + delay	AIMD + delay-based component (Microsoft)	Windows long-haul
`HTCP`	Loss, time-since-loss	Aggressive cwnd growth on long no-loss periods	Very high-BDP scientific links

Appendix — I/O Multiplexing API Comparison

API	OS	Style	Complexity	Limit	Notes
`select`	POSIX everywhere	Readiness	O(n) scan, O(n) copy	FD_SETSIZE = 1024	Bitmap in/out, oldest, broken at scale
`poll`	POSIX everywhere	Readiness	O(n) scan, O(n) copy	RLIMIT_NOFILE	Better than select; still O(n)
`epoll`	Linux 2.6+	Readiness	O(1) wait, O(log n) ctl	RLIMIT_NOFILE	ET / LT, EPOLLEXCLUSIVE, persistent kernel state
`kqueue`	BSD / macOS	Readiness + filters	O(1) wait	kern.maxfilesperproc	Filters on fs, signals, timers, processes
`IOCP`	Windows	Completion	O(1)	—	True async — kernel completes I/O, posts to queue
`io_uring`	Linux 5.1+	Completion (true async)	O(1) batched	RLIMIT_NOFILE	SQ/CQ shared rings, SQPOLL, multishot, registered FDs/buffers
`AIO (libaio)`	Linux	Completion	O(1) batched	—	Only O_DIRECT; effectively replaced by io_uring
`POSIX AIO`	POSIX	Completion	User-thread emulation	—	Slow — glibc emulates with threads

Appendix — High Availability Mechanisms

Mechanism	Layer	Failover Time	Common Use
`HSRP`	L3 first-hop (Cisco)	~3-10s default, sub-second tuned	Default-gateway redundancy on access network
`VRRP`	L3 first-hop (RFC)	~3s default, sub-second tuned	Open-standard FHRP, used by keepalived
`GLBP`	L3 first-hop + LB (Cisco)	Like HSRP	Active-active gateway load balancing
`BFD`	L3-agnostic liveness	<50ms typical	Speed up OSPF/BGP/static convergence
`MC-LAG / vPC`	L2	Sub-second	Server multi-homing without STP blocking
`StackWise / VSS`	Chassis	ISSU sub-second; RPR/SSO sub-second	Two physical → one logical control plane
`Anycast (BGP)`	L3 routed	BGP convergence (1-30s)	DNS, CDN, public services
`LFA / TI-LFA`	IGP	<50ms	IGP-driven sub-50ms protection
`MPLS FRR`	MPLS	<50ms	RSVP-TE backup tunnels
`Pacemaker / Corosync`	Service	Seconds	Resource manager + STONITH
`Keepalived + IPVS`	L4 LB	Sub-second	Linux virtual server with VRRP failover

Appendix — AI Collective Operations Cheat Sheet

Op	What it does	Bandwidth Cost (N ranks, M bytes)	When used
`Broadcast`	1 → all (root sends to everyone)	M (per link in tree)	Initial weight distribution
`Reduce`	all → 1 (sum/max/min at root)	M	Aggregating loss / metrics to rank 0
`AllReduce`	all → all of reduced value	2M(N-1)/N (ring; bandwidth-optimal)	DDP / FSDP gradient sync — most common
`AllGather`	concat tensors from all → all	M(N-1)/N	FSDP unshard, sequence parallelism gather
`ReduceScatter`	elementwise reduce + scatter slices	M(N-1)/N	FSDP gradient pre-shard; half of AllReduce
`AllToAll`	rank i sends slice j to rank j	M(N-1)/N — but every pair	MoE expert dispatch, sequence parallelism
`Scatter`	1 → all (each gets a slice)	M(N-1)/N	Initial data partitioning
`Gather`	all → 1 (concatenate slices)	M(N-1)/N	Collect outputs to rank 0
`Barrier`	synchronize without data	log N rounds	Phase boundaries

Appendix — RDMA Verb Operations

Operation	Sided	Receiver CPU?	Notes
`SEND / RECV`	Two-sided	Yes (must post RECV)	Like sockets — needs matching RECV WR posted
`RDMA WRITE`	One-sided	No	Initiator writes into peer's pre-registered MR using rkey
`RDMA WRITE with IMM`	One-sided + signal	Yes (consumes RECV)	Write + 4-byte immediate value triggers receive completion
`RDMA READ`	One-sided	No	Initiator reads from peer's MR; lower throughput than WRITE
`ATOMIC FETCH_ADD`	One-sided RMW	No	8-byte atomic, consistent across HCA & host CPU only on certain hw
`ATOMIC CMP_SWP`	One-sided RMW	No	Compare-and-swap on remote 8 bytes
`SEND with INVALIDATE`	Two-sided	Yes	Invalidates a receiver-side rkey atomically with delivery

Appendix — libev Watcher Types Quick Reference

Watcher	Triggered by	Mapped to
`ev_io`	fd readable / writable	epoll_ctl ADD on backend
`ev_timer`	Relative timeout (after X seconds, optional repeat)	Min-heap; loop computes nearest deadline for epoll_wait timeout
`ev_periodic`	Absolute time / cron-like reschedule callback	Min-heap with reschedule cb
`ev_signal`	POSIX signal received	signalfd or pipe + sigaction handler
`ev_child`	SIGCHLD for a specific PID	Internal signal watcher + waitpid
`ev_stat`	File stat changes (path-based)	inotify if available, else periodic stat()
`ev_idle`	No other events pending	Run after all ready events processed
`ev_prepare`	Before each loop iteration's poll	Hook used by glue layers (Perl, etc.)
`ev_check`	After each poll, before invoke	Hook for glue layers
`ev_async`	ev_async_send() from another thread	eventfd / pipe wakeup — ONLY safe cross-thread API
`ev_embed`	Inner ev_loop made pollable as one fd	Run a kqueue inside an epoll loop, etc.
`ev_fork`	After fork() in child	Cleanup on fork
`ev_cleanup`	Loop destroyed	Final teardown hook

Appendix — Cisco Certification Path Quick Reference

Track	CCNA	CCNP (core + concentration)	CCIE (lab)
Enterprise	200-301 CCNA	ENCOR 350-401 + ENARSI / ENSLD / ENWLSI / ENWLSD / SPCOR / etc.	CCIE Enterprise Infrastructure (Lab v1.x)
Data Center	200-301 CCNA	DCCOR 350-601 + DCID / DCACI / DCACIA / DCAUI	CCIE Data Center
Service Provider	200-301 CCNA	SPCOR 350-501 + SPRI / SPVI / SPCNI / SPAUI	CCIE Service Provider
Security	200-301 CCNA	SCOR 350-701 + SISE / SNCF / SVPN / SWSA / SAUTO	CCIE Security
Collaboration	200-301 CCNA	CLCOR 350-801 + CLICA / CLACCM / CLCEI / CLAUTO	CCIE Collaboration
DevNet	DEVASC	DEVCOR 350-901 + concentration	CCDE / DevNet Expert

Deep Dive — Study Catalog