PIX/ASA High Availability Presentation . Agenda . . Failover Models - Active/Standby failover - Active/Active failover - Stateful failover . Failover Types - Serial cable failover (PIX only) - LAN based failover (PIX and ASA) . Failover Health Monitoring - Unit health monitoring - Interface health monitoring - SSM card health monitoring . Agenda – cont. . Other failover related features - Config management - Hitless (Zero Down Time) Upgrade - Management Applications features: - Auto Update - Remote Command Execution - CSM Config Rollback - Redundant Interface • HA Programming • Useful links Failover Model . Active/Standby Failover - 2 nodes: One active and one in Hot Standby. - Hardware config must be identical. Software license must be identical (ASA) or identical/compatible (PIX Failover-Only licenses). - Node identification: Primary or Secondary. - States: Active or Standby - Stateless and Stateful failover - Single and Virtual Firewall mode support - In Virtual Firewall mode, failover is whole unit. - Router and Transparent mode - Through the box traffic is dropped on the standby unit. To the box traffic can reach both units (ie. Ssh to both units). Active/Standby failover - LAN-Based or Serial cable failover (PIX only) - Master election - Primary unit (higher priority) become active in initial negotiation. - Two units’ interface need to be on the same subnet. - Each interface has 2 (active and standby) IP/MAC addresses. The addresses are swapped when failover. Grat. ARPs or special multicast SNAP pkts in TFW mode are sent to switch to flush the stale CAM entries on the switch. http://wwwineng.cisco.com/Workgroup/Eng/VLAN/Specs/ciscomcast-macda.txt@4 HA State Machine a Disabled Initialization Reboot Negotiation b Reboot Standby Cold Active Fast l Active Drain m k c Standby Config d Standby File Sys Active PreConf g Standby Bulk Active PostConf h Standby Active Ready e U n i f i Failed j o Active/Standby failover • • • • • • • • • • • • • • • • • • • • • • • • es-RAS-ASA01(config)# show failover Failover On Failover unit Primary Failover LAN Interface: flink GigabitEthernet0/1 (up) Unit Poll frequency 1 seconds, holdtime 15 seconds Interface Poll frequency 5 seconds, holdtime 25 seconds Interface Policy 1 Monitored Interfaces 2 of 250 maximum Version: Ours 7.2(3), Mate 7.2(3) Last Failover at: 10:06:31 JST Apr 24 2008 This host: Primary - Active Active time: 232968 (sec) slot 0: ASA5520 hw/sw rev (1.0/7.2(3)) status (Up Sys) Interface outside (192.168.5.101): Normal Interface inside (10.87.72.5): Normal slot 1: ASA-SSM-20 hw/sw rev (1.0/6.0(3)E1) status (Up/Up) IPS, 6.0(3)E1, Up Other host: Secondary - Standby Ready Active time: 459657 (sec) slot 0: ASA5520 hw/sw rev (1.0/7.2(3)) status (Up Sys) Interface outside (192.168.5.111): Normal Interface inside (10.87.72.7): Normal slot 1: ASA-SSM-20 hw/sw rev (1.0/6.0(3)E1) status (Up/Up) IPS, 6.0(3)E1, Up • • • • • • • • Stateful Failover Logical Update Statistics Link : flink GigabitEthernet0/1 (up) Stateful Obj xmit xerr rcv rerr General 5 0 5 0 sys cmd 5 0 5 0 up time 0 0 0 0 RPC services 0 0 0 0 Failover Model . Active/Active Failover - 2 nodes: Both nodes can active passing traffic and be backup for peer. - Virtual context mode only - 2 failover groups - Firewall contexts assigned to failover group - Interface uses virtual mac address (one for each failover group) because an physical interface can be used by both groups (ie. May need 2 active mac addresses and 2 standby mac addresses). Active/Active failover – cont. - Config example: failover group 1 Primary preempt 10 : failover group 2 Secondary preempt : context admin join failover-group 1 : context ctx2 join failover-group 2 Active/Active failover • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • es-RAS-ASA01(config)# show faiover : Monitored Interfaces 5 of 250 maximum Version: Ours 7.2(3), Mate 7.2(3) Group 1 last failover at: 10:12:35 JST Apr 24 2008 Group 2 last failover at: 10:12:33 JST Apr 24 2008 This host: Primary Group 1 State: Active time: Group 2 State: Active time: Active 208 (sec) Standby Ready 0 (sec) slot 0: ASA5520 hw/sw rev (1.0/7.2(3)) status (Up Sys) admin Interface outside (192.168.5.101): Normal admin Interface inside (10.87.72.5): Normal admin Interface manage (192.168.0.5): Link Down (Waiting) ctx2 Interface outside (10.0.100.111): Normal ctx2 Interface inside (10.0.3.111): Link Down (Waiting) slot 1: ASA-SSM-20 hw/sw rev (1.0/6.0(3)E1) status (Up/Up) IPS, 6.0(3)E1, Up Other host: Secondary Group 1 State: Standby Ready Active time: 0 (sec) Group 2 State: Active Active time: 211 (sec) slot 0: ASA5520 hw/sw rev (1.0/7.2(3)) status (Up Sys) admin Interface outside (192.168.5.111): Normal admin Interface inside (10.87.72.7): Normal admin Interface manage (0.0.0.0): Link Down (Waiting) ctx2 Interface outside (10.0.100.1): Normal ctx2 Interface inside (10.0.3.101): Unknown (Waiting) : Stateful Failover • Application states are replicated to standby unit, so when active unit goes down and standby unit takes over, client sessions are not affected. • Supported on all platforms except ASA 5505 • Active/Standby and Active/Active failover • Features/Services support Stateful Failover • - TCP and UDP connections • - GTP and SIP signal states • - IPSec and SSL VPN • Stateful link need to match the fastest data interface on all platforms except ASA 5510, 5580 and PIX 525 because bottleneck is CPU. Failover Types • Serial Cable failover - PIX platform only - Cisco propriety serial cable (6 feet) as failover command link. - Cable connector determine Primary or Secondary unit - Minimum config required (only failover command is needed) - Reliable detection of peer power-down, crash or reload through hardware signal change of the serial cable - Config is the same on both units. - Speed up to 115200 bps, so will take long time to sync. large size config. PIX Serial Failover cable PIX serial failover Failover Types • LAN Based failover - PIX and ASA platforms - Use Ethernet interface as failover command link - Higher throughput - Longer distance between HA units. Can be used to deploy long distance failover (Ex: Data centers in NY and NJ) using transport like Metro Ethernet. - Vulnerable to false alarm due to CPU hogging in low failover holdtime config. - Require bootstrap config failover command link. int g0/3 no shut failover lan unit primary/secondary failover lan int flink g0/3 failover int ip flink 10.0.3.1 255.255.255.0 standby 10.0.3.11 failover - Config has one line difference (primary or secondary unit) Health Monitoring • Unit Health Monitoring: - Serial failover: Power failure or reload (crash) detection using serial cable. - Keep alive packets on the failover command link - Additional testings on data interfaces to guard against failover command link failure. • Data Interface Health Monitoring: - physical link (up or down) - keep-alive packets (end to end connectivity) - Additional testing when missing keep-alive pkts: - Traffic, ARP and broadcast PING tests . ASA SSM Card Health Monitoring: - Keep-alive packets on both control and data planes. SSM card will just loopback those keep-alive packets. Other HA features • Configuration Management: - Configuration is synchronized from active to standby unit when HA cluster is up - Configuration command entered on the active unit is replicated to standby unit. . Hitless (Zero Down Time) upgrade - 2 units can run different version of image - Load standby unit with new image and reload - switchover to standby unit running new image after connection states are replicated - load new image on the (old) active unit and reload - Does not support hitless downgrade . Management application features: - Auto Update – CSM server deploy config or images from auto-update server - CSM Configuration Rollback (transaction config update model) - Remote Command Execution (failover exec …). Similar to the unix rsh cmd. Redundant Interface • A logical interface consists of a active and a standby physical interface. • When the active interface fails, the standby interface becomes active and starts passing traffic. • This feature is separate from device-level failover, but you can configure redundant interfaces in failover setup if desired. Redundant interface take precedence over device failover. • You can configure up to 8 redundant interface pairs HA Programming • Code needs to consider 3 scenarios: 1. stand-alone, 2. as HA active unit and 3. as HA standby unit. For example: dhcp server should discard client requests when in standby state. • • • • • • dhcp_server_input_proc() if (ha_my_ctx_is_active()) { process_input_pkt() } else { drop_input_pkt() } . Two ways to interact with HA subsystem: • 1. Use HA exported APIs (fover/ha.h). • 2. Register as a HA client and receive HA state callbacks. Most stateful failover features like VPN and SIP stateful fover are done this way. Example: sip_ha_main.c, vpnfol_ha.c. HA programming • If replicate data to standby unit, beware some data may be different on the standby unit like: vcid, vpif#. • : • Sending side: • vcidvlan = makeVcidVlanFromVpifNum(vPifNum); • Rcving side: • vPifNum = convertPeerVcidVlanToMyVpifNum(vcidvlan); • : • Consider hitless upgrade scenario: • - New version need to consider a peer running older version may not support a particular feature. • If modify CLI syntax, new version still need to able to accept old version CLI from peer. HA Programming • For replicated object, always add new data or field at end. If a field is no longer needed, do not remove it, just leave it their. • Beware each interface has 2 IP addresses, and standby unit/context need to check for standby IP addresses for to the box traffic. vPif_getIpAddr() returns the IP address of interface. vPif_getSysIpAddr() returns the active IP address of interface. • Remember to add PRIV_REP option in the parser file for config mode command. No need to EXEC mode command as they are not replicated. HA Programming • • • • • • • • • • • • • • • • • • • • • • • • • • • • • /* * SNP_FLOW_LU_{ADD|UPDATE|DELETE} * !!!!!!!!!!!!!!!!!!!!! WARNING !!!!!!!!!!!!!!!!!! !!!!!!!! * Making changes to this structure (like adding/removing * parameters) will cause hitless upgrade to fail in some cas es. * Make sure you add a flag to indicate this parameter has be en added to the * structure (refer SNP_LU_FLAGS_XLATE_PTO_PRESENT flag below ). * Add any new parameters just before the last field (message ). * structure (refer SNP_LU_FLAGS_XLATE_PTO_PRESENT flag below ). * Add any new parameters just before the last field (message ). * structure used for flow stateful update */ typedef struct snp_flow_lu_t_ { uint8_t dscp PAK; ing size field */ uint16_t lu_flags PAK; : uint32_t forward_xlate_timeout PAK; uint32_t reverse_xlate_timeout PAK; uchar message[0]; /* must be la st field */ } snp_flow_lu_t; • • • • • • • • • /* * flag bit for lu_flags */ #define SNP_LU_FLAGS_ASR_MAC #define SNP_LU_FLAGS_AAA_AUTH #define SNP_LU_FLAGS_ASR_MAC #define SNP_LU_FLAGS_AAA_AUTH #define SNP_LU_FLAGS_INSPECT_RTP #define SNP_LU_FLAGS_INSPECT_RTCP 0x0001 0x0002 0x0001 0x0002 0x0004 0x0008 HA Programming • • • • • • • • • • • • • • • static inline boolean extract_direct_parent_flow (snp_flow_lu_t *lu, snp_flow_id_t *parent_flow_id, snp_ifc_t **parent_ifc_in, snp_time_t *econn_timeout) { parent_flow_id->protocol = lu->parent_prot; if (lu->lu_flags & SNP_LU_FLAGS_PINHOLE_INFO_PRESENT) { /* * In earlier versions parent sip and parent dip was not part * of the LU message. SIP and DIP between parent and child will be * different in case of indirect secondary connections. Check the flag * to see if the extra information is present. */ parent_flow_id->sip = lu->parent_sip; parent_flow_id->dip = lu->parent_dip; Useful Links • http://cisco.com/en/US/partner/docs/security/asa/asa80/configuratio n/guide/failover.html • http://cisco.com/en/US/partner/docs/security/asa/asa80/command/re ference/ef.html#wp1883939 • http://wwwineng.cisco.com/Eng/VSec/FormulaOne/SW_Func_Specs/f1ha_82.doc@20 . http://wwwineng.cisco.com/Eng/VSec/FormulaOne/Design_Specs/f1-aa-designspec.doc@7 . http://wwwineng.cisco.com/Eng/VSec/FormulaOne/Design_Specs/f1-ha-designspec.doc@5