26. First-Hop HA
HSRP, VRRPv2/v3, GLBP, keepalived, and BFD as the practical toolkit for redundant default gateways.
1. Overview
First-hop HA hides router failure from end hosts. Hosts keep one default gateway IP, while routers elect which node owns the virtual IP and virtual MAC. Fast designs add interface tracking or BFD so upstream failures trigger gateway movement before users notice a long routing timer.
2. 26.1 - HSRP
HSRP is Cisco's classic active/standby first-hop protocol. The Active router owns the virtual IP and virtual MAC, sends hello packets on UDP 1985, and remains Active until a better peer preempts or the Standby stops hearing hellos.
End hosts do not ARP for a router's physical address. They ARP for the gateway VIP and cache the HSRP virtual MAC, so failover changes the router behind the MAC rather than every host configuration.
| Field | HSRP v1 | HSRP v2 |
|---|---|---|
| Multicast | 224.0.0.2 | 224.0.0.102 |
| Transport | UDP 1985 | UDP 1985 |
| Virtual MAC | 0000.0C07.ACxx | 0000.0C9F.Fxxx |
| Timers | 3s hello / 10s hold default | Supports millisecond timers and IPv6 |
| Preemption | Disabled by default | Disabled by default |
Minimal C Demo - HSRP Election and Tracking
3. 26.2 - VRRPv2 / VRRPv3
VRRP is the open-standard equivalent. A Master sends advertisements to 224.0.0.18 using IP protocol 112. The highest priority wins; priority 255 is special because it means the router owns the real address used as the virtual IP.
VRRPv3 supports IPv4 and IPv6 and expresses timers in centiseconds, making it a better fit for tuned sub-second failover. BFD can feed VRRP an immediate down signal instead of waiting for missed advertisements.
| Trait | HSRP | VRRPv2 | VRRPv3 |
|---|---|---|---|
| Standard | Cisco proprietary | RFC 3768 | RFC 5798 |
| Address families | IPv4, v2 adds IPv6 support | IPv4 only | IPv4 and IPv6 |
| Role names | Active / Standby | Master / Backup | Master / Backup |
| Preemption default | Off | On | On |
| Owner priority | No special owner rule | IP owner priority 255 | IP owner priority 255 |
Minimal C Demo - VRRP Master Election
4. 26.3 - GLBP
GLBP keeps a single gateway IP but avoids the active/standby waste. The Active Virtual Gateway answers ARP and hands out different virtual MACs, while Active Virtual Forwarders actually forward packets for those MACs.
HSRP and VRRP normally use one router at a time for a VLAN. GLBP can make several routers forward at once with round-robin, weighted, or host-dependent assignment.
| Role | Responsibility | Failure behavior |
|---|---|---|
| AVG | Wins group election and answers ARP for the virtual gateway IP. | Another gateway becomes AVG if hellos stop. |
| AVF | Forwards packets sent to one assigned virtual MAC. | AVG reassigns the failed AVF's MAC to a surviving forwarder. |
| Client | Uses the same gateway IP but may cache different virtual MACs. | No default-gateway reconfiguration is required. |
Minimal C Demo - GLBP ARP Distribution
5. 26.4 - keepalived
keepalived is the common Linux implementation of VRRP. It listens and sends VRRP packets with raw sockets, uses netlink to add or remove VIPs, and can call notify scripts or program LVS/IPVS when state changes.
A real configuration is built from a vrrp_instance for ownership of VIPs plus optional virtual_server and real_serverblocks when keepalived also manages IPVS load balancing.
| Directive | Meaning | Operational risk |
|---|---|---|
| virtual_router_id | VRRP group identifier shared by peers. | Duplicate IDs on the same L2 domain can cause wrong elections. |
| priority | Election weight before tracking adjustments. | Bad priority design can promote the wrong gateway. |
| track_interface | Demotes the node when a local interface fails. | Without it, a router can own the VIP while its uplink is dead. |
| track_script | Runs a health probe that can reduce priority. | Flapping probes need hysteresis and sane weights. |
| nopreempt | Stops recovered nodes from taking back Master automatically. | Useful for stability, but may leave traffic on a lower-capacity node. |
Minimal C Demo - keepalived State Action
6. 26.5 - BFD
BFD is a protocol-agnostic liveness detector. Instead of waiting for OSPF, BGP, or VRRP timers, peers exchange tiny control packets and declare failure after a negotiated number of missed receives.
The BFD state machine is intentionally small. Once both sides learn each other's discriminators and reach Up, a missed detection window can notify the client protocol immediately.
Timer negotiation matters because each direction can have a different effective send interval. Detection time is the receive interval multiplied by the detect multiplier, so asymmetric settings can produce asymmetric failure times.
| Mode | Packet behavior | Best fit |
|---|---|---|
| Asynchronous | Both sides send BFD control packets continuously. | Common routing and VRRP integrations. |
| Demand | Peers stop continuous control packets after session health is proven. | Lower overhead, less common operationally. |
| Echo | Echo packets loop through the forwarding plane. | Detects forwarding failure even when the control plane is alive. |
| Single-hop | TTL 255 and directly connected peers. | Fast link and first-hop failure detection. |
| Multi-hop | Routed BFD session across multiple hops. | BGP or routed adjacency health over a path. |
Minimal C Demo - BFD Timer Calculator
7. Core Mechanism Walkthrough
Background: A pair of distribution switches share gateway 10.10.10.1 for a user VLAN. R1 is Active, but its uplink to the core fails while the access-facing interface stays up.
Plan: Track the uplink or a BFD session, reduce R1's effective priority, let R2 win the election, then refresh host neighbor caches with gratuitous ARP or IPv6 neighbor advertisements.
| Step | State change | Why traffic survives |
|---|---|---|
| 1 | R1 is Active, R2 is Standby or Backup. | Hosts send to the virtual MAC, not a real router MAC. |
| 2 | R1 uplink tracking drops priority below R2. | The protocol detects a reachability failure before hosts do. |
| 3 | R2 becomes Active or Master and owns the VIP. | The default gateway IP remains unchanged. |
| 4 | R2 advertises ownership with GARP or NA. | Switch and host caches converge to the new forwarding point. |
| 5 | R1 recovers. | Preemption policy decides whether it takes the VIP back or stays stable. |
8. Source and Tooling Pointers
keepalived/core/andkeepalived/vrrp/- keepalived VRRP state and netlink actions.ip addr show- verify which Linux node currently owns the VIP.tcpdump proto 112- observe VRRP advertisements on a Linux host.show standby brief,show vrrp brief,show glbp brief- operational state on network devices.show bfd neighbors detail- negotiated timers, discriminators, and client protocols.
9. Interview Prep
Questions and concise answers
| When does HSRP move from Standby to Active? | When the Active hold timer expires, or when election/preemption logic says this router should own the group. |
| What is the biggest default-behavior difference between HSRP and VRRP? | HSRP preemption is disabled by default; VRRP preemption is normally enabled. |
| Why does VRRP priority 255 matter? | It marks the router that owns the real IP address used as the virtual IP, so it should be Master when healthy. |
| How does GLBP load-balance without changing host gateway config? | The AVG answers ARP for one gateway IP with different virtual MACs, spreading hosts across AVFs. |
| What does keepalived do on a Master transition? | It adds the VIP to the interface via netlink and can run notify scripts or update IPVS state. |
| How is BFD detection time computed? | Effective receive interval multiplied by detect multiplier, after peers negotiate TX/RX constraints. |