Part XXI - High Availability

26. First-Hop HA

HSRP, VRRPv2/v3, GLBP, keepalived, and BFD as the practical toolkit for redundant default gateways.

1. Overview

First-hop HA hides router failure from end hosts. Hosts keep one default gateway IP, while routers elect which node owns the virtual IP and virtual MAC. Fast designs add interface tracking or BFD so upstream failures trigger gateway movement before users notice a long routing timer.

2. 26.1 - HSRP

HSRP is Cisco's classic active/standby first-hop protocol. The Active router owns the virtual IP and virtual MAC, sends hello packets on UDP 1985, and remains Active until a better peer preempts or the Standby stops hearing hellos.

End hosts do not ARP for a router's physical address. They ARP for the gateway VIP and cache the HSRP virtual MAC, so failover changes the router behind the MAC rather than every host configuration.

FieldHSRP v1HSRP v2
Multicast224.0.0.2224.0.0.102
TransportUDP 1985UDP 1985
Virtual MAC0000.0C07.ACxx0000.0C9F.Fxxx
Timers3s hello / 10s hold defaultSupports millisecond timers and IPv6
PreemptionDisabled by defaultDisabled by default

Minimal C Demo - HSRP Election and Tracking

HSRP Priority Election — C Demo
stdin (optional)

3. 26.2 - VRRPv2 / VRRPv3

VRRP is the open-standard equivalent. A Master sends advertisements to 224.0.0.18 using IP protocol 112. The highest priority wins; priority 255 is special because it means the router owns the real address used as the virtual IP.

VRRPv3 supports IPv4 and IPv6 and expresses timers in centiseconds, making it a better fit for tuned sub-second failover. BFD can feed VRRP an immediate down signal instead of waiting for missed advertisements.

TraitHSRPVRRPv2VRRPv3
StandardCisco proprietaryRFC 3768RFC 5798
Address familiesIPv4, v2 adds IPv6 supportIPv4 onlyIPv4 and IPv6
Role namesActive / StandbyMaster / BackupMaster / Backup
Preemption defaultOffOnOn
Owner priorityNo special owner ruleIP owner priority 255IP owner priority 255

Minimal C Demo - VRRP Master Election

VRRP Master Election — C Demo
stdin (optional)

4. 26.3 - GLBP

GLBP keeps a single gateway IP but avoids the active/standby waste. The Active Virtual Gateway answers ARP and hands out different virtual MACs, while Active Virtual Forwarders actually forward packets for those MACs.

HSRP and VRRP normally use one router at a time for a VLAN. GLBP can make several routers forward at once with round-robin, weighted, or host-dependent assignment.

RoleResponsibilityFailure behavior
AVGWins group election and answers ARP for the virtual gateway IP.Another gateway becomes AVG if hellos stop.
AVFForwards packets sent to one assigned virtual MAC.AVG reassigns the failed AVF's MAC to a surviving forwarder.
ClientUses the same gateway IP but may cache different virtual MACs.No default-gateway reconfiguration is required.

Minimal C Demo - GLBP ARP Distribution

GLBP ARP Distribution — C Demo
stdin (optional)

5. 26.4 - keepalived

keepalived is the common Linux implementation of VRRP. It listens and sends VRRP packets with raw sockets, uses netlink to add or remove VIPs, and can call notify scripts or program LVS/IPVS when state changes.

A real configuration is built from a vrrp_instance for ownership of VIPs plus optional virtual_server and real_serverblocks when keepalived also manages IPVS load balancing.

DirectiveMeaningOperational risk
virtual_router_idVRRP group identifier shared by peers.Duplicate IDs on the same L2 domain can cause wrong elections.
priorityElection weight before tracking adjustments.Bad priority design can promote the wrong gateway.
track_interfaceDemotes the node when a local interface fails.Without it, a router can own the VIP while its uplink is dead.
track_scriptRuns a health probe that can reduce priority.Flapping probes need hysteresis and sane weights.
nopreemptStops recovered nodes from taking back Master automatically.Useful for stability, but may leave traffic on a lower-capacity node.

Minimal C Demo - keepalived State Action

keepalived VRRP State Transition — C Demo
stdin (optional)

6. 26.5 - BFD

BFD is a protocol-agnostic liveness detector. Instead of waiting for OSPF, BGP, or VRRP timers, peers exchange tiny control packets and declare failure after a negotiated number of missed receives.

The BFD state machine is intentionally small. Once both sides learn each other's discriminators and reach Up, a missed detection window can notify the client protocol immediately.

Timer negotiation matters because each direction can have a different effective send interval. Detection time is the receive interval multiplied by the detect multiplier, so asymmetric settings can produce asymmetric failure times.

ModePacket behaviorBest fit
AsynchronousBoth sides send BFD control packets continuously.Common routing and VRRP integrations.
DemandPeers stop continuous control packets after session health is proven.Lower overhead, less common operationally.
EchoEcho packets loop through the forwarding plane.Detects forwarding failure even when the control plane is alive.
Single-hopTTL 255 and directly connected peers.Fast link and first-hop failure detection.
Multi-hopRouted BFD session across multiple hops.BGP or routed adjacency health over a path.

Minimal C Demo - BFD Timer Calculator

BFD Timer Negotiation — C Demo
stdin (optional)

7. Core Mechanism Walkthrough

Background: A pair of distribution switches share gateway 10.10.10.1 for a user VLAN. R1 is Active, but its uplink to the core fails while the access-facing interface stays up.

Plan: Track the uplink or a BFD session, reduce R1's effective priority, let R2 win the election, then refresh host neighbor caches with gratuitous ARP or IPv6 neighbor advertisements.

StepState changeWhy traffic survives
1R1 is Active, R2 is Standby or Backup.Hosts send to the virtual MAC, not a real router MAC.
2R1 uplink tracking drops priority below R2.The protocol detects a reachability failure before hosts do.
3R2 becomes Active or Master and owns the VIP.The default gateway IP remains unchanged.
4R2 advertises ownership with GARP or NA.Switch and host caches converge to the new forwarding point.
5R1 recovers.Preemption policy decides whether it takes the VIP back or stays stable.

8. Source and Tooling Pointers

  • keepalived/core/ and keepalived/vrrp/ - keepalived VRRP state and netlink actions.
  • ip addr show - verify which Linux node currently owns the VIP.
  • tcpdump proto 112 - observe VRRP advertisements on a Linux host.
  • show standby brief, show vrrp brief, show glbp brief - operational state on network devices.
  • show bfd neighbors detail - negotiated timers, discriminators, and client protocols.

9. Interview Prep

Questions and concise answers
When does HSRP move from Standby to Active?When the Active hold timer expires, or when election/preemption logic says this router should own the group.
What is the biggest default-behavior difference between HSRP and VRRP?HSRP preemption is disabled by default; VRRP preemption is normally enabled.
Why does VRRP priority 255 matter?It marks the router that owns the real IP address used as the virtual IP, so it should be Master when healthy.
How does GLBP load-balance without changing host gateway config?The AVG answers ARP for one gateway IP with different virtual MACs, spreading hosts across AVFs.
What does keepalived do on a Master transition?It adds the VIP to the interface via netlink and can run notify scripts or update IPVS state.
How is BFD detection time computed?Effective receive interval multiplied by detect multiplier, after peers negotiate TX/RX constraints.