Choosing Tap or SPAN for Data Center Monitoring

Choosing Tap or SPAN for
Data Center Monitoring
Technical Brief
Key Points
▪ Taps are passive, silent, and deliver a
perfect record of link traffic, but require
additional hardware and create a point
of failure.
▪ SPAN ports are configurable for specific
data, can capture intra-switch traffic,
and create no additional expense, but
may drop packets randomly and will not
transmit errored packets.
▪ Choose SPAN or tap resources based
on your particular monitoring needs. A
mix of SPAN and tap is often superior to
using one or the other exclusively.
In network and security monitoring, there’s an ongoing debate
about the best data access method to deliver copied network
traffic to monitoring tools. The debate comes down to taps or port
mirroring/SPAN technology – and there are good points for both
methods.
There is no objectively correct answer to this debate – the best
practice must be decided for each data source in each network.
However, because of the different characteristics from these
two different technologies, we should be able to come up with a
general guideline to make a sensible decision based on different
monitoring scenarios, requirements, capture locations or projects.
The pro and con for taps versus SPANs works out to a few key
points, summarized below.
Taps – Pro and Con
Benefits of taps include:
▪ Taps are completely passive, purely optical splitters and do not
need power or IP configuration.
▪ Taps are not addressable network devices and therefore cannot
be hacked.
T E C H N I C A L
B R I E F
Technical Brief – Taps vs SPAN in Network Monitoring
SPAN Ports – Pro and Con
Switches
Switch
Ingress
Traffic
Catalyst 2960-S Series SI
SPAN
Ports
CONSOLE
SYSY
RPS
STAT
DPLX
SPED
MODE
Egress
Traffic
APCONTAP
Chassis
with 16
Passive Taps
Benefits of SPAN ports include:
A
B
TAP
A
B
C
D
TAP
C
D
A
B
TAP
A
B
C
D
TAP
C
D
A
B
TAP
A
B
C
D
TAP
C
D
A
B
TAP
A
B
C
D
TAP
C
D
A
B
TAP
A
B
C
D
TAP
C
D
A
B
TAP
A
B
C
D
TAP
C
D
A
B
TAP
A
B
C
D
TAP
C
D
A
B
TAP
A
B
C
D
TAP
C
D
▪ No additional cost to create a SPAN port.
▪ SPAN ports are remotely configurable from any management
station that can access the configuration of the switch.
▪ SPAN ports are capable of capturing intra-switch traffic.
Challenges with SPAN ports include:
RMON
Analyzer
Forensic
IDS
▪ Taps are failsafe, especially when placed in the aggregation
layers where network redundancy is already established.
▪ Taps provide total visibility into full-duplex networks and
eliminate the risk of dropped packets, regardless of the
bandwidth.
▪ With taps, monitoring devices receive all packets, including
packets with physical errors. Taps do not groom data in any
way. This is particularly helpful in troubleshooting common
physical layer problems, including bad frames that can be
caused by a faulty NIC or cable.
▪ Taps do not alter the time relationships of frames. This
time relationship is critical for certain latency sensitive
measurements. Taps do not introduce any additional jitter or
distortion, which is important in VoIP and Video signal analysis.
▪ Taps can monitor both sides of a full duplex link individually.
▪ Taps do not behave differently if the traffic is IPv4 or IPv6;
they pass all traffic through unaltered.
Challenges with taps include:
▪ Each analysis device may need to budget 2 capture
interfaces to receive both sides of a tapped link.
▪ There is an additional cost for tap hardware.
▪ Taps create an additional potential point of failure.
▪ Taps create additional deployment complexity:
•Split ratio and light budget loss calculation.
•Disruption of the production network for tap insertion.
▪ SPAN ports cannot handle heavily utilized full-duplex links
without dropping packets. If the throughput of all TX and
RX traffic is higher than the SPAN port line rate, frames are
dropped randomly by the SPAN port. To completely capture
bidirectional traffic from a 10G link, a SPAN port would need
up to 20G of capacity.
▪ SPAN ports drop all packets that are corrupt or those that
are over- or under-sized, thus hampering some physical layer
analysis.
▪ SPAN ports place a burden on a switch's CPU and fabric
channels to copy all data passing through ports. This
potentially affects the performance of production traffic.
For example, Centralized Replication in certain switches
can reduce performance. Some SPAN ports require you to
monitor these factors to avoid issues:
•SPAN Destination
•Switch Fabric
•Replication Engine
•Forwarding Engine
▪ SPAN ports can change the timing of frame interaction,
altering measured response times.
▪ Switches prioritize SPAN port data lower than regular portto-port data. If replicating a frame becomes an issue, the
hardware will temporally drop the SPAN process and therefore
stop the data flow to the SPAN port. The more SPAN sessions
that are configured, the easier it is to reach this threshold.
▪ RSPAN/ERSPAN ports put the monitoring traffic into
the production network, which reduces the amount of
throughput available for user traffic.
Technical Brief – Taps vs SPAN in Network Monitoring
▪ Without special configuration details and settings, VLAN tags are not
normally passed through any SPAN port. This can lead to false VLAN
issues and difficulty in finding actual VLAN issues.
Choosing SPAN or Tap –
Production Network Impact
The integrity of traffic forwarded to the monitoring tools is critical to
provide accurate monitoring and troubleshooting results. However, the
greater concern is that the data access method chosen will affect the
performance of the actual production network traffic.
SPAN EXAMPLE: CISCO 6500
IOS RELEASE 12.2SX
On this switch, SXF7 code configures Rx
SPAN in Distributed Mode, but Tx SPAN
is configured in Centralized Mode. In
contrast, SXI3 configures both Tx and Rx
SPAN in Distributed Mode.
In Distributed Mode, the packets can
be replicated between the source and
destination modules/interfaces without
supervisor intervention. In Centralized
Replication Mode, packets go from
the source module/interface to the
replication engine on the supervisor
and are replicated to the destination
module/interface. All the replicated
SPAN traffic must traverse the
backplane fabric, increasing backplane
fabric utilization.
Data centers are advised to upgrade
to SXI3 on systems where Tx SPAN is
required. However, regardless of SXF
or SXI, Distributed Mode is supported
only on modules with a local replication
engine (for example, DFC based
modules). None of the classic line cards
support Distributed Replication.
In general, taps are totally passive, especially optical fiber taps. They do
not generally impact production traffic at all. However, SPAN ports might
have a potential impact on the production network traffic.
There are 4 key pieces involved with SPAN:
1. SPAN destination port
2. Fabric Channel
3. Replication Engine
4. Forwarding Engine
Any of the 4 pieces above may become oversubscribed depending
on other traffic flowing through the system, the number of replication
sessions configured, types of source and destination line cards, available
buffer, forwarding engine capacity, and other factors. So it is important
these four areas be well-understood to avoid any adverse effects to the
production traffic.
To avoid oversubscription issues, Cisco recommends using Cisco EEM
(Embedded Event Manager). The Embedded Event Manager is made up of
TCL scripts embedded in the IOS to run commands for Replication Engine
monitoring. Additionally, Cisco recommends that users continuously
monitor fabric utilization. If the SPAN source interface is a VLAN, users are
advised to be cautious, as fabric utilization can easily rise.
SPAN Oversubscription Point Monitoring Options
To monitor your network using SPAN ports without risking
oversubscription on the Cisco Nexus line of switches, consider the
following options:
1. Platform SNMP MIB – Supported as part of CISCO-SWITCH-ENGINEMIB and CISCO-SWITCH-FABRIC-MIB in 5.2
2. XML API – XML version of internal show commands to monitor
oversubscription in 5.2
3. EEM/TCL – Supported in 5.2
4. CLI – Available in 4.2.x
Technical Brief – Taps vs SPAN in Network Monitoring
The following Cisco command sets may be used to monitor
different points of oversubscription on switches running NX-OS:
1. Replication engine utilization
Show hardware internal statistics device rewrite
congestion asic-all | i error
2. Forwarding engine throughput
show hardware internal forwarding statistics L3
show hardware internal forwarding engine usage
show hardware capacity forwarding
3. Fabric VQI utilization
EXAMPLE: CISCO NEXUS 5000 NX-OS 4.2.6
Oversubscribing the SPAN can impact production
traffic. Consider the following:
1. Resource contention to the replication engine.
For example, multicast packets that use the
same replication engine used to replicate SPAN
packets.
2. Resource contention to the forwarding engine
(60 MPPS limit on M1). For example, more
forwarding engine lookups for SPAN traffic.
A Tx/Rx SPAN port requires 3 lookups in the
forwarding engine compared to just one for
non-SPAN traffic.
3. Fabric Virtual Output Queuing oversubscription.
Spanned traffic drop at the destination is of
minimal concern. The impact to the production
traffic and system resources is the main concern.
Cisco recommends against implementing
continuous SPAN until you are able to monitor the
adverse impacts, arrange notification, and be ready
to respond to those notifications. Unfortunately,
such monitoring can be accomplished only
through the Cisco command line interface unless
users upgrade their software to NX-OS version
5.2, followed by design and test of a solution for
monitoring SPAN oversubscription with XML API or
using EEM/TCL scripting.
show hardware fabric-utilization detail
To summarize the potential impact of the continuous
SPAN setup, users are advised to monitor the switch
internal resource utilization after creating the SPAN. If
the utilization threshold is exceeded, users are advised
to turn off the SPAN to prevent any adverse impact to the
production network. Obviously, monitoring a continuous
SPAN setup can be quite involved and challenging. More
importantly, if the SPAN port must be turned off, the
monitoring tool will no longer receive its data.
Tap Versus SPAN – The Bottom Line
When you are deciding whether to use tap or SPAN in your
network monitoring system, the two primary factors on
which to base your decision are the type of analysis you
plan to performance and the amount of bandwidth that
analysis will require.
Taps are ideal when analysis requires seeing all traffic,
including physical layer errors. Taps are required if your
network utilization is moderate to heavy. When it comes
to aggregation layer monitoring, taps are often used to
ensure that the performance of production network traffic
is not being impacted by a SPAN. In a latency measurement
environment, taps are highly recommended to avoid the
inconsistent queuing delay from a SPAN port.
SPAN ports perform well on networks with lower utilization, or
when analysis is not affected by dropped packets. SPAN ports
on the access layer are suitable and are often used for ondemand short term network and application troubleshooting.
Technical Brief – Taps vs SPAN in Network Monitoring
Internet
External
Aggregation
AT
AT
AT
AT
TAPs
External
Firewalls
ABOUT APCON
APCON

INTELLAFLEX™ Blade
ACI-3030-E36-6
1
3
5
7
9
11
2
4
6
8
10
12
13
15
14
16
1
17
19
18
20
2
21
23
22
24
3
4
25
27
29
31
26
28
30
32
33
35
23
24
34
36
Status
3
5
7
9
11
4
6
8
10
12
PPS/IRIG
IN
OUT
INTELLAFLEX Blade
ACI-3032-E36-1
15
17
19
21
23
16
18
20
22
24
3
5
7
9
11
4
6
8
10
12
13
15
17
19
21
23
14
16
18
20
22
24
AT
27
29
31
33
35
28
30
32
34
36
GPS
ANT
Please email sales@apcon.com
or call 503–682–4050 if you have
any questions
25
27
29
31
26
28
30
32
33
35
23
24
34
36
Analyzer
AT
IDS
Forensic
AT
Internal
Aggregation
Reference:
http://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/
ios/12-2SX/configuration/guide/book/span.html
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/
nexus5000/sw/configuration/guide/cli/CLIConfigurationGuide/Span.
html
http://www.cisco.com/c/en/us/td/docs/ios/netmgmt/command/
reference/nm_book/nm_15.html
© 2014 APCON, Inc. All Rights Reserved.
@APCON ▪
company/APCON ▪ APCON is an Equal Opportunity Employer – MFDV
14025-R1-0414
C
B
Packet Aggregator
1/10 Gbps
Status
A
Power
AT
Internal
Firewalls
APCON, Inc. ▪ apcon.com ▪ +1 503–682–4050 ▪ 800–624–6808
Aggregator Plus
Time Stamping
1/10 Gbps
Status
Corp. Intranet
Contact Us
D
Power
1/10 Gbps Ethernet
1
2
1/10 Gbps Ethernet
INTELLAFLEX™ Blade
ACI-3030-E36-6
25
26
1/10 Gbps Ethernet
13
14
1/10 Gbps Ethernet
1
2
1/10 Gbps Ethernet
ENTER
Status
Power
Power
INTELLAFLEX™ Blade
ACI-3031-E04-1
CANCEL
Packet Aggregator
1/10 Gbps
Packet Controller
1/10 Gbps
1/10 Gbps Ethernet
Unnamed
S/N: 72020004
Ver: 4
Hit [Enter] for configuration
1/10 Gbps Ethernet
10.1.102.72 / 255.255.0.0
26.7ºc
1/10 Gbps Ethernet
DMZ
Server
Switches
1/10 Gbps Ethernet
APCON develops innovative, scalable
technology solutions to enhance
network monitoring, support IT traffic
analysis, and streamline IT network
management and security. APCON
is the industry leader for state-ofthe-art IT data aggregation, filtering,
and network switching products, as
well as leading-edge managementsoftware support. Organizations
in over 40 countries depend on
APCON network infrastructure
solutions. Customers include Global
Fortune 500 companies, banks
and financial services institutions,
telecommunication service providers,
government and military, and
computer equipment manufacturers.
Probe
Related documents