FortiGate High Availability (HA) –
Internal Mechanism & Timing
Breakdown
1️️ HA Cluster Initialization (Boot-up Phase)
- When FortiGates boot, they check HA settings (enabled, group name, group ID, etc.).
- Devices begin FGCP discovery via heartbeat interfaces.
- FGCP heartbeat packets are exchanged to discover peers and form a cluster.
- A role election is performed.
⏱️ Time to form cluster: Usually within 5–10 seconds after both devices finish booting and
interfaces come up.
2️️ FGCP Heartbeat Process (Ongoing Operation)
- FortiGates in an HA cluster communicate using FGCP over dedicated HA links.
- Heartbeat packets are multicast or broadcast packets exchanged between HA members.
Heartbeat Characteristics:
- Interval: Every 200 milliseconds (default)
- Timeout: No heartbeat for 3 seconds triggers failover
- Packet Size: ~64 bytes
- Interface: Uses dedicated HA ports only
⛔Failover Trigger: If a member does not receive 15 consecutive heartbeats (200 ms × 15 =
3 seconds), it assumes the peer is dead and initiates failover.
3️️ Role Election Logic (Primary vs Secondary)
Role selection priority:
1. Hard disk health
2. Number of monitored interfaces UP
3. HA priority (lower is preferred)
4. Uptime (longer wins)
5. Serial number (final tiebreaker)
⏱️ Election process completes in about 1–2 seconds after cluster discovery.
4️️ Configuration Synchronization Mechanism
- Config is synced from Config-Master → Secondary units over HA link.
- Includes firewall policies, routing, VPNs, interfaces, etc.
⏱️ Time Taken:
- First full sync: 5–15 seconds depending on config size
- Incremental sync: real-time (few milliseconds)
5️️ Session Synchronization Mechanism
Ensures stateful failover (no session drops). Types:
- L4 Session Pickup: TCP/UDP sessions
- L4 Persistence Pickup: Source IP stickiness
- L7 Persistence Pickup: Protocol-aware (e.g., HTTP)
⏱️ Session sync latency: 1–5 ms
6️️ Monitored Interface Failure Detection
- Specific interfaces (e.g., WAN, LAN) can be monitored.
- If interface goes down on primary and up on secondary, failover occurs.
⏱️ Time: Immediate (1–2 seconds)
7️️ Failover Execution Process
Steps:
1. Failure detected
2. Secondary promotes to Primary
3. Sends Gratuitous ARP using vMAC
4. Switches update MAC tables (~1–2 seconds)
5. Traffic resumes
⏱️ Total failover time: 3–5 seconds
8️️ Virtual MAC (vMAC) Behavior
- Each interface in the HA cluster has a virtual MAC.
- New master retains the vMAC to avoid switch MAC relearning.
� Sub-second convergence.
9️️ Failback and Override
- If override is enabled and returning unit has higher priority, it reclaims primary role.
- If override is disabled, current master stays.
⏱️ Failback time: 5–10 seconds
� System Health Monitoring (Internal)
Monitors:
- CPU usage
- Memory pressure
- Daemon crashes (sslvpnd, dnsproxy, etc.)
- Disk I/O errors
Health degradation can trigger failover.
1️️1️️ Summary: Key Timings and Triggers
| Event
| Time
|
|----------------------|----------------|
| Heartbeat interval | 200 ms
|
| Heartbeat timeout | 3 seconds |
| Config full sync | 5–15 seconds |
| Incremental sync | ~5 ms
|
| Failover detection | 3 sec (heartbeat), 1 sec (link) |
| Failover execution | 1–2 seconds |
| Total failover time | ~4–5 seconds |
| ARP convergence | <1 second |
| Failback time
| 5–10 seconds |