Part XII — Modern TCP Congestion Control

§ 13. TCP CC Modern

BBR, DCTCP, ECN, AQM, pacing, and Linux's pluggable congestion-control framework.

1. § 13.1 — BBR v1 Pipe Model

BBR does not wait for loss to infer congestion. It builds a path model from the maximum recent delivery rate, btlBW, and the minimum recent RTT, RTprop. The target operating point is a full pipe with little standing queue: send near btlBW and keep roughly one BDP in flight.

BBR uses pacing as a first-class control surface. Instead of releasing a burst whenever ACKs arrive, it spaces packets across the RTT, then cycles pacing gains to probe for more bandwidth and refresh delay measurements.

Minimal C Demo — BBR Pipe Model

BBR Pipe Model — C Demo
stdin (optional)

Minimal C Demo — BBR Phase Cycle

BBR Phase Cycle — C Demo
stdin (optional)

2. § 13.2 — BBR v2 and v3

BBR v1 can be unfair beside CUBIC on shallow buffers because it keeps probing bandwidth while loss-based flows interpret the same queue as a hard congestion signal. BBR v2 adds stronger loss and ECN signals; BBR v3 refines the probe cycle and fairness behavior further while preserving the bandwidth-delay model.

  • Startup can exit on loss or ECN evidence, not only a bandwidth plateau.
  • Inflight bounds keep a BBR flow from dominating a bottleneck shared with classic loss-based TCP.
  • ProbeBW changes are designed to reduce latency spikes while still checking for new capacity.

3. § 13.3 — DCTCP

DCTCP is built for data centers where switches can mark ECN precisely and buffers are intentionally shallow. The switch marks packets with CE once the queue crosses threshold K; the receiver echoes the signal, and the sender estimates alpha, the fraction of marked bytes.

The key interview point is proportional response. Reno halves the window after a loss regardless of severity; DCTCP applies cwnd *= 1 - alpha / 2, making a small reduction for a small queue excursion.

Minimal C Demo — DCTCP Alpha Response

DCTCP Alpha Response — C Demo
stdin (optional)

4. § 13.4 — Vegas, Westwood, and Compound

These algorithms are useful because they show the design space between pure loss response and pure path modeling. Vegas reacts to RTT growth before drops, Westwood estimates available bandwidth from ACK spacing after loss, and Compound TCP combines a Reno-style loss component with a delay component.

AlgorithmPrimary signalBest mental model
VegasExpected throughput minus actual throughput.Queue is forming when RTT rises before loss.
WestwoodACK-rate bandwidth estimate.Random wireless loss should not always imply full queue overflow.
Compound TCPLoss plus delay.Probe like Reno, temper growth when delay rises.

5. § 13.5 — ECN and AQM

ECN lets routers signal congestion by setting CE bits instead of dropping an already delivered packet. TCP carries that signal back with ECE, and the sender acknowledges the response with CWR. AQM is the queue discipline side of the same story: RED, CoDel, FQ-CoDel, and PIE act before a buffer becomes a latency reservoir.

  • RED drops or marks probabilistically as average queue size grows, but it is sensitive to tuning.
  • CoDel watches packet sojourn time and acts when delay stays above target for an interval.
  • FQ-CoDel adds per-flow queues, so one large flow cannot hide latency-sensitive flows behind it.
  • PIE uses a proportional-integral controller and is common in cable-access networks.

Minimal C Demo — ECN/AQM Queue Response

ECN/AQM Queue Response — C Demo
stdin (optional)

6. § 13.6 — Pacing and Pluggable CC in Linux

Linux exposes congestion control through the tcp_congestion_ops framework. The system default comes from net.ipv4.tcp_congestion_control, but a process can choose a different algorithm per socket with TCP_CONGESTION. Pacing can be driven by the TCP stack and enforced through queue disciplines such as sch_fq.

  • net/ipv4/tcp_bbr.c: BBR model, pacing gains, ProbeBW, and ProbeRTT behavior.
  • net/ipv4/tcp_dctcp.c: DCTCP ECN accounting and alpha response.
  • net/ipv4/tcp_cong.c: registration, lookup, and default congestion-control helpers.
  • include/net/tcp.h: tcp_congestion_ops and per-socket state.
  • net/ipv4/tcp_output.c: pacing rate and packet output decisions.

7. Interview Prep

QuestionConcise answer
What are BBR's two core measurements?btlBW, the max recent delivery rate, and RTprop, the minimum recent RTT.
Why does ProbeRTT temporarily cut inflight?It drains queues so the sender can refresh a clean propagation-delay sample.
How does DCTCP differ from Reno?DCTCP reduces proportionally to the ECN-marked fraction; Reno-style loss response usually halves the window.
What is bufferbloat?Excess buffering that preserves throughput but turns standing queues into large latency.
How can one Linux process choose BBR while the system default is CUBIC?Call setsockopt(IPPROTO_TCP, TCP_CONGESTION, "bbr", 3) on that socket.