Uploaded by Surendra Ponguru

2008-05-06-failover-tom

advertisement
PIX/ASA High Availability
Presentation
.
Agenda
.
. Failover Models
- Active/Standby failover
- Active/Active failover
- Stateful failover
. Failover Types
- Serial cable failover (PIX only)
- LAN based failover (PIX and ASA)
. Failover Health Monitoring
- Unit health monitoring
- Interface health monitoring
- SSM card health monitoring
.
Agenda – cont.
. Other failover related features
- Config management
- Hitless (Zero Down Time) Upgrade
- Management Applications features:
- Auto Update
- Remote Command Execution
- CSM Config Rollback
- Redundant Interface
• HA Programming
• Useful links
Failover Model
. Active/Standby Failover
- 2 nodes: One active and one in Hot Standby.
- Hardware config must be identical. Software license must be identical (ASA)
or identical/compatible (PIX Failover-Only licenses).
- Node identification: Primary or Secondary.
- States: Active or Standby
- Stateless and Stateful failover
- Single and Virtual Firewall mode support
- In Virtual Firewall mode, failover is whole unit.
- Router and Transparent mode
- Through the box traffic is dropped on the standby unit. To the box traffic can
reach both units (ie. Ssh to both units).
Active/Standby failover
- LAN-Based or Serial cable failover (PIX only)
- Master election - Primary unit (higher priority) become
active in initial negotiation.
- Two units’ interface need to be on the same subnet.
- Each interface has 2 (active and standby) IP/MAC
addresses. The addresses are swapped when failover.
Grat. ARPs or special multicast SNAP pkts in TFW mode
are sent to switch to flush the stale CAM entries on the
switch.
http://wwwineng.cisco.com/Workgroup/Eng/VLAN/Specs/[email protected]
HA State Machine
a
Disabled
Initialization
Reboot
Negotiation
b
Reboot
Standby Cold
Active Fast
l
Active Drain
m
k
c
Standby Config
d
Standby File Sys
Active PreConf
g
Standby Bulk
Active PostConf
h
Standby
Active
Ready
e
U
n
i
f
i
Failed
j
o
Active/Standby failover
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
es-RAS-ASA01(config)# show failover
Failover On
Failover unit Primary
Failover LAN Interface: flink GigabitEthernet0/1 (up)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 2 of 250 maximum
Version: Ours 7.2(3), Mate 7.2(3)
Last Failover at: 10:06:31 JST Apr 24 2008
This host: Primary - Active
Active time: 232968 (sec)
slot 0: ASA5520 hw/sw rev (1.0/7.2(3)) status (Up Sys)
Interface outside (192.168.5.101): Normal
Interface inside (10.87.72.5): Normal
slot 1: ASA-SSM-20 hw/sw rev (1.0/6.0(3)E1) status (Up/Up)
IPS, 6.0(3)E1, Up
Other host: Secondary - Standby Ready
Active time: 459657 (sec)
slot 0: ASA5520 hw/sw rev (1.0/7.2(3)) status (Up Sys)
Interface outside (192.168.5.111): Normal
Interface inside (10.87.72.7): Normal
slot 1: ASA-SSM-20 hw/sw rev (1.0/6.0(3)E1) status (Up/Up)
IPS, 6.0(3)E1, Up
•
•
•
•
•
•
•
•
Stateful Failover Logical Update Statistics
Link : flink GigabitEthernet0/1 (up)
Stateful Obj xmit
xerr
rcv
rerr
General
5
0
5
0
sys cmd
5
0
5
0
up time
0
0
0
0
RPC services 0
0
0
0
Failover Model
. Active/Active Failover
- 2 nodes: Both nodes can active passing traffic and be
backup for peer.
- Virtual context mode only
- 2 failover groups
- Firewall contexts assigned to failover group
- Interface uses virtual mac address (one for each failover
group) because an physical interface can be used by
both groups (ie. May need 2 active mac addresses and 2
standby mac addresses).
Active/Active failover – cont.
-
Config example:
failover group 1
Primary
preempt 10
:
failover group 2
Secondary
preempt
:
context admin
join failover-group 1
:
context ctx2
join failover-group 2
Active/Active failover
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
es-RAS-ASA01(config)# show faiover
:
Monitored Interfaces 5 of 250 maximum
Version: Ours 7.2(3), Mate 7.2(3)
Group 1 last failover at: 10:12:35 JST Apr 24 2008
Group 2 last failover at: 10:12:33 JST Apr 24 2008
This host: Primary
Group 1
State:
Active time:
Group 2
State:
Active time:
Active
208 (sec)
Standby Ready
0 (sec)
slot 0: ASA5520 hw/sw rev (1.0/7.2(3)) status (Up Sys)
admin Interface outside (192.168.5.101): Normal
admin Interface inside (10.87.72.5): Normal
admin Interface manage (192.168.0.5): Link Down (Waiting)
ctx2 Interface outside (10.0.100.111): Normal
ctx2 Interface inside (10.0.3.111): Link Down (Waiting)
slot 1: ASA-SSM-20 hw/sw rev (1.0/6.0(3)E1) status (Up/Up)
IPS, 6.0(3)E1, Up
Other host: Secondary
Group 1
State:
Standby Ready
Active time: 0 (sec)
Group 2
State:
Active
Active time: 211 (sec)
slot 0: ASA5520 hw/sw rev (1.0/7.2(3)) status (Up Sys)
admin Interface outside (192.168.5.111): Normal
admin Interface inside (10.87.72.7): Normal
admin Interface manage (0.0.0.0): Link Down (Waiting)
ctx2 Interface outside (10.0.100.1): Normal
ctx2 Interface inside (10.0.3.101): Unknown (Waiting)
:
Stateful Failover
• Application states are replicated to standby unit, so when
active unit goes down and standby unit takes over, client
sessions are not affected.
• Supported on all platforms except ASA 5505
• Active/Standby and Active/Active failover
• Features/Services support Stateful Failover
• - TCP and UDP connections
• - GTP and SIP signal states
• - IPSec and SSL VPN
• Stateful link need to match the fastest data interface on
all platforms except ASA 5510, 5580 and PIX 525
because bottleneck is CPU.
Failover Types
• Serial Cable failover
- PIX platform only
- Cisco propriety serial cable (6 feet) as failover
command link.
- Cable connector determine Primary or
Secondary unit
- Minimum config required (only failover command is
needed)
- Reliable detection of peer power-down, crash or reload
through hardware signal change of the serial cable
- Config is the same on both units.
- Speed up to 115200 bps, so will take long time to sync.
large size config.
PIX Serial Failover cable
PIX serial failover
Failover Types
•
LAN Based failover
- PIX and ASA platforms
- Use Ethernet interface as failover command link
- Higher throughput
- Longer distance between HA units. Can be used to deploy long distance
failover (Ex: Data centers in NY and NJ) using transport like Metro Ethernet.
- Vulnerable to false alarm due to CPU hogging in low failover holdtime
config.
- Require bootstrap config failover command link.
int g0/3
no shut
failover lan unit primary/secondary
failover lan int flink g0/3
failover int ip flink 10.0.3.1 255.255.255.0 standby 10.0.3.11
failover
- Config has one line difference (primary or secondary unit)
Health Monitoring
• Unit Health Monitoring:
- Serial failover: Power failure or reload (crash) detection using serial
cable.
- Keep alive packets on the failover command link
- Additional testings on data interfaces to guard against failover
command link failure.
• Data Interface Health Monitoring:
- physical link (up or down)
- keep-alive packets (end to end connectivity)
- Additional testing when missing keep-alive pkts:
- Traffic, ARP and broadcast PING tests
. ASA SSM Card Health Monitoring:
- Keep-alive packets on both control and data planes. SSM card will
just loopback those keep-alive packets.
Other HA features
•
Configuration Management:
- Configuration is synchronized from active to standby unit when HA cluster is up
- Configuration command entered on the active unit is replicated to standby unit.
. Hitless (Zero Down Time) upgrade
- 2 units can run different version of image
- Load standby unit with new image and reload
- switchover to standby unit running new image after connection states are replicated
- load new image on the (old) active unit and reload
- Does not support hitless downgrade
. Management application features:
- Auto Update – CSM server deploy config or images from auto-update server
- CSM Configuration Rollback (transaction config update model)
- Remote Command Execution (failover exec …). Similar to the unix rsh cmd.
Redundant Interface
• A logical interface consists of a active and a
standby physical interface.
• When the active interface fails, the standby
interface becomes active and starts passing
traffic.
• This feature is separate from device-level
failover, but you can configure redundant
interfaces in failover setup if desired. Redundant
interface take precedence over device failover.
• You can configure up to 8 redundant interface
pairs
HA Programming
•
Code needs to consider 3 scenarios: 1. stand-alone, 2. as HA active unit and 3. as
HA standby unit. For example: dhcp server should discard client requests when in
standby state.
•
•
•
•
•
•
dhcp_server_input_proc()
if (ha_my_ctx_is_active()) {
process_input_pkt()
} else {
drop_input_pkt()
}
. Two ways to interact with HA subsystem:
•
1. Use HA exported APIs (fover/ha.h).
•
2. Register as a HA client and receive HA state callbacks. Most stateful failover
features like VPN and SIP stateful fover are done this way. Example: sip_ha_main.c,
vpnfol_ha.c.
HA programming
• If replicate data to standby unit, beware some data may be different
on the standby unit like: vcid, vpif#.
•
:
• Sending side:
•
vcidvlan = makeVcidVlanFromVpifNum(vPifNum);
• Rcving side:
•
vPifNum = convertPeerVcidVlanToMyVpifNum(vcidvlan);
•
:
• Consider hitless upgrade scenario:
•
- New version need to consider a peer running older version may
not support a particular feature.
• If modify CLI syntax, new version still need to able to accept old
version CLI from peer.
HA Programming
• For replicated object, always add new data or field at
end. If a field is no longer needed, do not remove it, just
leave it their.
• Beware each interface has 2 IP addresses, and standby
unit/context need to check for standby IP addresses for
to the box traffic. vPif_getIpAddr() returns the IP address
of interface. vPif_getSysIpAddr() returns the active IP
address of interface.
• Remember to add PRIV_REP option in the parser file for
config mode command. No need to EXEC mode
command as they are not replicated.
HA Programming
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
/*
* SNP_FLOW_LU_{ADD|UPDATE|DELETE}
* !!!!!!!!!!!!!!!!!!!!!
WARNING
!!!!!!!!!!!!!!!!!!
!!!!!!!!
* Making changes to this structure (like adding/removing
* parameters) will cause hitless upgrade to fail in some cas
es.
* Make sure you add a flag to indicate this parameter has be
en added to the
* structure (refer SNP_LU_FLAGS_XLATE_PTO_PRESENT flag below
).
* Add any new parameters just before the last field (message
).
* structure (refer SNP_LU_FLAGS_XLATE_PTO_PRESENT flag below
).
* Add any new parameters just before the last field (message
).
* structure used for flow stateful update
*/
typedef struct snp_flow_lu_t_ {
uint8_t
dscp
PAK;
ing size field */
uint16_t
lu_flags
PAK;
:
uint32_t
forward_xlate_timeout PAK;
uint32_t
reverse_xlate_timeout PAK;
uchar message[0];
/* must be la
st field */
} snp_flow_lu_t;
•
•
•
•
•
•
•
•
•
/*
* flag bit for lu_flags
*/
#define SNP_LU_FLAGS_ASR_MAC
#define SNP_LU_FLAGS_AAA_AUTH
#define SNP_LU_FLAGS_ASR_MAC
#define SNP_LU_FLAGS_AAA_AUTH
#define SNP_LU_FLAGS_INSPECT_RTP
#define SNP_LU_FLAGS_INSPECT_RTCP
0x0001
0x0002
0x0001
0x0002
0x0004
0x0008
HA Programming
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
static inline boolean
extract_direct_parent_flow (snp_flow_lu_t *lu, snp_flow_id_t
*parent_flow_id,
snp_ifc_t **parent_ifc_in,
snp_time_t *econn_timeout)
{
parent_flow_id->protocol = lu->parent_prot;
if (lu->lu_flags & SNP_LU_FLAGS_PINHOLE_INFO_PRESENT) {
/*
* In earlier versions parent sip and parent dip was not part
* of the LU message. SIP and DIP between parent and child will be
* different in case of indirect secondary connections. Check the flag
* to see if the extra information is present.
*/
parent_flow_id->sip = lu->parent_sip;
parent_flow_id->dip = lu->parent_dip;
Useful Links
• http://cisco.com/en/US/partner/docs/security/asa/asa80/configuratio
n/guide/failover.html
• http://cisco.com/en/US/partner/docs/security/asa/asa80/command/re
ference/ef.html#wp1883939
• http://wwwineng.cisco.com/Eng/VSec/FormulaOne/SW_Func_Specs/[email protected]
. http://wwwineng.cisco.com/Eng/VSec/FormulaOne/Design_Specs/[email protected]
. http://wwwineng.cisco.com/Eng/VSec/FormulaOne/Design_Specs/[email protected]
Download