Part XV - IP and ICMP

17. IP & ICMP Deep Dive

IPv4 wire format, fragmentation arithmetic, ICMP error payloads, ping, traceroute, PMTUD, PLPMTUD, and ICMPv6.

1. Overview

IPv4 is the best-effort packet layer: it names endpoints, chooses a next hop, decrements TTL, and either forwards, fragments, or drops each datagram. ICMP is the control channel beside it, carrying errors such as TTL expiry, unreachable ports, and fragmentation-needed feedback.

The protocol field tells the receiver what payload parser comes next: ICMP is 1, TCP is 6, UDP is 17, OSPF is 89, and SCTP is 132. The header checksum covers only the IPv4 header, so every router must update it after TTL changes.

3. 17.2 - IP Fragmentation

IPv4 fragmentation splits one datagram when the next link MTU is smaller than the packet and DF is clear. Every fragment gets its own IPv4 header, the same Identification value, and an offset measured in 8-byte units. For a 4000-byte datagram over a 1500-byte MTU, the 3980-byte payload becomes two full 1480-byte fragments and one 1020-byte tail.

Reassembly happens only at the destination. The receiver groups fragments by source IP, destination IP, Identification, and protocol, then sorts by offset and waits until every byte range is present and an MF=0 tail fragment has arrived.

Fragmentation is fragile: losing one fragment loses the whole datagram, later fragments lack TCP or UDP ports, and overlapping fragments have a long security history. Modern stacks normally avoid it with DF and PMTUD.

Minimal C Demo - IP Fragmentation Animator

IP Fragmentation Animator — C Demo
stdin (optional)

4. 17.3 - ICMP Types and Codes

ICMP error packets quote the original IPv4 header plus the first 8 bytes of the offending payload. For TCP and UDP this is enough to include source and destination ports, so the sender can map the error back to a socket or flow.

TypeCodeNameUse
00Echo ReplyReply to ping.
30-15Destination UnreachableNetwork, host, protocol, port, fragmentation-needed, or admin-prohibited failures.
40Source QuenchDeprecated congestion signal.
50-3RedirectRouter tells a host a better next hop on the local link.
80Echo RequestPing request.
110-1Time ExceededTTL expired in transit or fragment reassembly timeout.
120-2Parameter ProblemBad IP header field.

Routers and hosts rate-limit ICMP because error generation can amplify traffic. Linux exposes knobs such as net.ipv4.icmp_ratelimit. ICMP redirect, smurf amplification, and historical ping-of-death bugs are examples of why operators filter specific ICMP messages, but blanket filtering breaks PMTUD.

5. 17.4 - ping Internals

ping sends ICMP Echo Request, type 8 code 0, and expects Echo Reply, type 0 code 0. The identifier is often derived from the process ID so multiple ping processes can share the same host, and the sequence number detects loss or reordering.

RTT measurement is simple: put a timestamp in the request payload, receive the same bytes in the reply, then subtract. Classic ping used raw sockets; modern Linux can also allow unprivileged ICMP datagram sockets through net.ipv4.ping_group_range.

Minimal C Demo - ping Step-by-Step

ping Step-by-Step — C Demo
stdin (optional)

6. 17.5 - traceroute Internals

Traceroute deliberately sends probes with TTL 1, 2, 3, and upward. Each router that decrements TTL to zero drops the probe and returns ICMP Time Exceeded, so the sender learns the address and latency of that hop.

  • UDP traceroute uses high destination ports; final hop returns ICMP port unreachable.
  • ICMP traceroute sends Echo Requests; final hop returns Echo Reply.
  • TCP traceroute sends SYN packets; final hop returns SYN+ACK or RST and often passes firewalls that block UDP or ICMP probes.
  • Paris traceroute keeps the flow hash stable so ECMP does not make consecutive probes appear to take unrelated paths.

Minimal C Demo - traceroute Simulator

traceroute Simulator — C Demo
stdin (optional)

7. 17.6 - PMTUD and PLPMTUD

Path MTU Discovery sends packets with DF=1. If a router sees a smaller next-hop MTU, it drops the packet and returns ICMP Destination Unreachable type 3 code 4 with the next-hop MTU. The sender lowers its PMTU cache and, for TCP, lowers MSS so later segments fit.

The classic blackhole is a firewall that drops ICMP fragmentation-needed messages. The sender keeps retransmitting packets that are too large, but never receives the signal needed to shrink them.

PLPMTUD moves discovery into the packetization layer. Instead of depending on ICMP, the transport sends probes of controlled sizes and treats successful acknowledgment as proof that the size works. QUIC relies on this style because it runs its transport machinery in user space over UDP. IPv6 makes this more important: routers never fragment packets, and ICMPv6 Packet Too Big is mandatory for normal operation.

8. 17.7 - ICMPv6

ICMPv6 is not just an error protocol. It carries IPv6 errors, ping, Neighbor Discovery, router discovery, redirects, and multicast listener signaling. Filtering ICMPv6 as if it were optional IPv4 ICMP breaks basic IPv6 functions such as address resolution and PMTUD.

The most important operational difference is fragmentation: IPv6 routers do not fragment. A source may fragment with an extension header, but routers only send Packet Too Big messages when the path MTU is exceeded.

9. Kernel Source Pointers

AreaLinux files and functions
IPv4 inputnet/ipv4/ip_input.c: ip_rcv, ip_local_deliver
IPv4 forwardingnet/ipv4/ip_forward.c: ip_forward
Fragmentationnet/ipv4/ip_fragment.c: reassembly queues; net/ipv4/ip_output.c: fragmentation output path
ICMPnet/ipv4/icmp.c: icmp_rcv, icmp_send
ICMPv6net/ipv6/icmp.c and net/ipv6/ndisc.c

10. Interview Prep

QuestionAnswer checkpoint
How does traceroute determine each hop?It sends probes with increasing TTL; the router where TTL reaches zero returns ICMP Time Exceeded.
Why is ICMP type 3 code 4 critical?It tells a DF sender that the packet exceeds the next-hop MTU, allowing PMTUD and MSS reduction.
How does IPv4 reassembly know when it is complete?Fragments share src, dst, ID, and protocol; offsets cover all byte ranges and the final fragment has MF=0.
Why use TCP traceroute?TCP SYN probes to ports like 80 or 443 may pass firewalls that drop UDP or ICMP probes.
How does IPv6 fragmentation differ?Routers never fragment IPv6 packets; only sources fragment, and PMTUD uses ICMPv6 Packet Too Big.