Choosing Tap or SPAN for Data Center Monitoring Technical Brief Key Points ▪ Taps are passive, silent, and deliver a perfect record of link traffic, but require additional hardware and create a point of failure. ▪ SPAN ports are configurable for specific data, can capture intra-switch traffic, and create no additional expense, but may drop packets randomly and will not transmit errored packets. ▪ Choose SPAN or tap resources based on your particular monitoring needs. A mix of SPAN and tap is often superior to using one or the other exclusively. In network and security monitoring, there’s an ongoing debate about the best data access method to deliver copied network traffic to monitoring tools. The debate comes down to taps or port mirroring/SPAN technology – and there are good points for both methods. There is no objectively correct answer to this debate – the best practice must be decided for each data source in each network. However, because of the different characteristics from these two different technologies, we should be able to come up with a general guideline to make a sensible decision based on different monitoring scenarios, requirements, capture locations or projects. The pro and con for taps versus SPANs works out to a few key points, summarized below. Taps – Pro and Con Benefits of taps include: ▪ Taps are completely passive, purely optical splitters and do not need power or IP configuration. ▪ Taps are not addressable network devices and therefore cannot be hacked. T E C H N I C A L B R I E F Technical Brief – Taps vs SPAN in Network Monitoring SPAN Ports – Pro and Con Switches Switch Ingress Traffic Catalyst 2960-S Series SI SPAN Ports CONSOLE SYSY RPS STAT DPLX SPED MODE Egress Traffic APCONTAP Chassis with 16 Passive Taps Benefits of SPAN ports include: A B TAP A B C D TAP C D A B TAP A B C D TAP C D A B TAP A B C D TAP C D A B TAP A B C D TAP C D A B TAP A B C D TAP C D A B TAP A B C D TAP C D A B TAP A B C D TAP C D A B TAP A B C D TAP C D ▪ No additional cost to create a SPAN port. ▪ SPAN ports are remotely configurable from any management station that can access the configuration of the switch. ▪ SPAN ports are capable of capturing intra-switch traffic. Challenges with SPAN ports include: RMON Analyzer Forensic IDS ▪ Taps are failsafe, especially when placed in the aggregation layers where network redundancy is already established. ▪ Taps provide total visibility into full-duplex networks and eliminate the risk of dropped packets, regardless of the bandwidth. ▪ With taps, monitoring devices receive all packets, including packets with physical errors. Taps do not groom data in any way. This is particularly helpful in troubleshooting common physical layer problems, including bad frames that can be caused by a faulty NIC or cable. ▪ Taps do not alter the time relationships of frames. This time relationship is critical for certain latency sensitive measurements. Taps do not introduce any additional jitter or distortion, which is important in VoIP and Video signal analysis. ▪ Taps can monitor both sides of a full duplex link individually. ▪ Taps do not behave differently if the traffic is IPv4 or IPv6; they pass all traffic through unaltered. Challenges with taps include: ▪ Each analysis device may need to budget 2 capture interfaces to receive both sides of a tapped link. ▪ There is an additional cost for tap hardware. ▪ Taps create an additional potential point of failure. ▪ Taps create additional deployment complexity: •Split ratio and light budget loss calculation. •Disruption of the production network for tap insertion. ▪ SPAN ports cannot handle heavily utilized full-duplex links without dropping packets. If the throughput of all TX and RX traffic is higher than the SPAN port line rate, frames are dropped randomly by the SPAN port. To completely capture bidirectional traffic from a 10G link, a SPAN port would need up to 20G of capacity. ▪ SPAN ports drop all packets that are corrupt or those that are over- or under-sized, thus hampering some physical layer analysis. ▪ SPAN ports place a burden on a switch's CPU and fabric channels to copy all data passing through ports. This potentially affects the performance of production traffic. For example, Centralized Replication in certain switches can reduce performance. Some SPAN ports require you to monitor these factors to avoid issues: •SPAN Destination •Switch Fabric •Replication Engine •Forwarding Engine ▪ SPAN ports can change the timing of frame interaction, altering measured response times. ▪ Switches prioritize SPAN port data lower than regular portto-port data. If replicating a frame becomes an issue, the hardware will temporally drop the SPAN process and therefore stop the data flow to the SPAN port. The more SPAN sessions that are configured, the easier it is to reach this threshold. ▪ RSPAN/ERSPAN ports put the monitoring traffic into the production network, which reduces the amount of throughput available for user traffic. Technical Brief – Taps vs SPAN in Network Monitoring ▪ Without special configuration details and settings, VLAN tags are not normally passed through any SPAN port. This can lead to false VLAN issues and difficulty in finding actual VLAN issues. Choosing SPAN or Tap – Production Network Impact The integrity of traffic forwarded to the monitoring tools is critical to provide accurate monitoring and troubleshooting results. However, the greater concern is that the data access method chosen will affect the performance of the actual production network traffic. SPAN EXAMPLE: CISCO 6500 IOS RELEASE 12.2SX On this switch, SXF7 code configures Rx SPAN in Distributed Mode, but Tx SPAN is configured in Centralized Mode. In contrast, SXI3 configures both Tx and Rx SPAN in Distributed Mode. In Distributed Mode, the packets can be replicated between the source and destination modules/interfaces without supervisor intervention. In Centralized Replication Mode, packets go from the source module/interface to the replication engine on the supervisor and are replicated to the destination module/interface. All the replicated SPAN traffic must traverse the backplane fabric, increasing backplane fabric utilization. Data centers are advised to upgrade to SXI3 on systems where Tx SPAN is required. However, regardless of SXF or SXI, Distributed Mode is supported only on modules with a local replication engine (for example, DFC based modules). None of the classic line cards support Distributed Replication. In general, taps are totally passive, especially optical fiber taps. They do not generally impact production traffic at all. However, SPAN ports might have a potential impact on the production network traffic. There are 4 key pieces involved with SPAN: 1. SPAN destination port 2. Fabric Channel 3. Replication Engine 4. Forwarding Engine Any of the 4 pieces above may become oversubscribed depending on other traffic flowing through the system, the number of replication sessions configured, types of source and destination line cards, available buffer, forwarding engine capacity, and other factors. So it is important these four areas be well-understood to avoid any adverse effects to the production traffic. To avoid oversubscription issues, Cisco recommends using Cisco EEM (Embedded Event Manager). The Embedded Event Manager is made up of TCL scripts embedded in the IOS to run commands for Replication Engine monitoring. Additionally, Cisco recommends that users continuously monitor fabric utilization. If the SPAN source interface is a VLAN, users are advised to be cautious, as fabric utilization can easily rise. SPAN Oversubscription Point Monitoring Options To monitor your network using SPAN ports without risking oversubscription on the Cisco Nexus line of switches, consider the following options: 1. Platform SNMP MIB – Supported as part of CISCO-SWITCH-ENGINEMIB and CISCO-SWITCH-FABRIC-MIB in 5.2 2. XML API – XML version of internal show commands to monitor oversubscription in 5.2 3. EEM/TCL – Supported in 5.2 4. CLI – Available in 4.2.x Technical Brief – Taps vs SPAN in Network Monitoring The following Cisco command sets may be used to monitor different points of oversubscription on switches running NX-OS: 1. Replication engine utilization Show hardware internal statistics device rewrite congestion asic-all | i error 2. Forwarding engine throughput show hardware internal forwarding statistics L3 show hardware internal forwarding engine usage show hardware capacity forwarding 3. Fabric VQI utilization EXAMPLE: CISCO NEXUS 5000 NX-OS 4.2.6 Oversubscribing the SPAN can impact production traffic. Consider the following: 1. Resource contention to the replication engine. For example, multicast packets that use the same replication engine used to replicate SPAN packets. 2. Resource contention to the forwarding engine (60 MPPS limit on M1). For example, more forwarding engine lookups for SPAN traffic. A Tx/Rx SPAN port requires 3 lookups in the forwarding engine compared to just one for non-SPAN traffic. 3. Fabric Virtual Output Queuing oversubscription. Spanned traffic drop at the destination is of minimal concern. The impact to the production traffic and system resources is the main concern. Cisco recommends against implementing continuous SPAN until you are able to monitor the adverse impacts, arrange notification, and be ready to respond to those notifications. Unfortunately, such monitoring can be accomplished only through the Cisco command line interface unless users upgrade their software to NX-OS version 5.2, followed by design and test of a solution for monitoring SPAN oversubscription with XML API or using EEM/TCL scripting. show hardware fabric-utilization detail To summarize the potential impact of the continuous SPAN setup, users are advised to monitor the switch internal resource utilization after creating the SPAN. If the utilization threshold is exceeded, users are advised to turn off the SPAN to prevent any adverse impact to the production network. Obviously, monitoring a continuous SPAN setup can be quite involved and challenging. More importantly, if the SPAN port must be turned off, the monitoring tool will no longer receive its data. Tap Versus SPAN – The Bottom Line When you are deciding whether to use tap or SPAN in your network monitoring system, the two primary factors on which to base your decision are the type of analysis you plan to performance and the amount of bandwidth that analysis will require. Taps are ideal when analysis requires seeing all traffic, including physical layer errors. Taps are required if your network utilization is moderate to heavy. When it comes to aggregation layer monitoring, taps are often used to ensure that the performance of production network traffic is not being impacted by a SPAN. In a latency measurement environment, taps are highly recommended to avoid the inconsistent queuing delay from a SPAN port. SPAN ports perform well on networks with lower utilization, or when analysis is not affected by dropped packets. SPAN ports on the access layer are suitable and are often used for ondemand short term network and application troubleshooting. Technical Brief – Taps vs SPAN in Network Monitoring Internet External Aggregation AT AT AT AT TAPs External Firewalls ABOUT APCON APCON INTELLAFLEX™ Blade ACI-3030-E36-6 1 3 5 7 9 11 2 4 6 8 10 12 13 15 14 16 1 17 19 18 20 2 21 23 22 24 3 4 25 27 29 31 26 28 30 32 33 35 23 24 34 36 Status 3 5 7 9 11 4 6 8 10 12 PPS/IRIG IN OUT INTELLAFLEX Blade ACI-3032-E36-1 15 17 19 21 23 16 18 20 22 24 3 5 7 9 11 4 6 8 10 12 13 15 17 19 21 23 14 16 18 20 22 24 AT 27 29 31 33 35 28 30 32 34 36 GPS ANT Please email sales@apcon.com or call 503–682–4050 if you have any questions 25 27 29 31 26 28 30 32 33 35 23 24 34 36 Analyzer AT IDS Forensic AT Internal Aggregation Reference: http://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ ios/12-2SX/configuration/guide/book/span.html http://www.cisco.com/c/en/us/td/docs/switches/datacenter/ nexus5000/sw/configuration/guide/cli/CLIConfigurationGuide/Span. html http://www.cisco.com/c/en/us/td/docs/ios/netmgmt/command/ reference/nm_book/nm_15.html © 2014 APCON, Inc. All Rights Reserved. @APCON ▪ company/APCON ▪ APCON is an Equal Opportunity Employer – MFDV 14025-R1-0414 C B Packet Aggregator 1/10 Gbps Status A Power AT Internal Firewalls APCON, Inc. ▪ apcon.com ▪ +1 503–682–4050 ▪ 800–624–6808 Aggregator Plus Time Stamping 1/10 Gbps Status Corp. Intranet Contact Us D Power 1/10 Gbps Ethernet 1 2 1/10 Gbps Ethernet INTELLAFLEX™ Blade ACI-3030-E36-6 25 26 1/10 Gbps Ethernet 13 14 1/10 Gbps Ethernet 1 2 1/10 Gbps Ethernet ENTER Status Power Power INTELLAFLEX™ Blade ACI-3031-E04-1 CANCEL Packet Aggregator 1/10 Gbps Packet Controller 1/10 Gbps 1/10 Gbps Ethernet Unnamed S/N: 72020004 Ver: 4 Hit [Enter] for configuration 1/10 Gbps Ethernet 10.1.102.72 / 255.255.0.0 26.7ºc 1/10 Gbps Ethernet DMZ Server Switches 1/10 Gbps Ethernet APCON develops innovative, scalable technology solutions to enhance network monitoring, support IT traffic analysis, and streamline IT network management and security. APCON is the industry leader for state-ofthe-art IT data aggregation, filtering, and network switching products, as well as leading-edge managementsoftware support. Organizations in over 40 countries depend on APCON network infrastructure solutions. Customers include Global Fortune 500 companies, banks and financial services institutions, telecommunication service providers, government and military, and computer equipment manufacturers. Probe