Network Health Assessment

advertisement
Network Health Assessment
LEA Name: Brunswick County Schools
Primary POC
Leonard Jenkins
Director, Technology Department
Brunswick County Schools
35 Referendum Drive
Bolivia, NC 28422
910-253-2931
ljenkins@bcswan.net
Technical POC
Mike Crawford
Wide Area Network (WAN) Manager
910-253-2996
mcrawford@bcswan.net
Date Data Collected
06/17/2008 - 06/19/2008
Network Description
Wide Area Network
The Brunswick County Schools’ (BCS) wide area network (WAN) utilizes fiber
infrastructure constructed and maintained in partnership with Atlantic Telecom
Membership Corporation (ATMC). The network topology is a distributed star network
with the central hubs of the stars owned and maintained by ATMC. ATMC has network
hubs in four locations across the county. These locations are connected to each other via
gigabit Ethernet. The BCS schools are connected to switches owned by ATMC at these
locations, or to other BCS schools utilizing dark fiber provided by ATMC. The furthest
school (JMM) from the BCS core is six “switch hops” away. The network core at the
BCS central office at 35 Referendum Drive is connected to ATMC’s network at the
Bolivia-CO.
All school facilities are connected via gigabit Ethernet. Single mode 1000base-lx and
1000base-zx modules are used to light the fiber between the schools. Every connection
between switches is setup as 802.1Q trunks which allow several VLANs to communicate.
There is a core Cisco 6509 series switch acting as the VLAN controller and main router
for the district. All traffic between VLANs is routed via this device.
An updated network diagram of the BCS wide area network follows:
This diagram can be found in Visio format in Appendix A.
Note: The above diagram differs slightly from the network diagram provided by BCS at
the start of the network assessment. The network layout detailed in the above diagram
was validated with both the SolarWinds LANsurveyor auto-discovery tool, and a review
of data obtained through the use on the Cisco Discovery Protocol (CDP).
The core 6509 switch is split into a L2 switch and a L3 router. The 6509L3 is the main
router for the district, and is routing IP and IPX. The Layer 2 switch is the main switch
for the Central Office as well as the VLAN controller for the district. 802.1Q VLAN
trunk encapsulation is used on all WAN connections. Cisco VTP is used to control and
configure VLAN configurations across the network. The layer3 switch is using IOS and
the Layer 2 switch us running CATOS. Subnet masks and IP address schemes are
consistent with best practices.
Every school has a single VLANs setup for traffic. The Logical network is fairly flat as
the 6509 routes all internet traffic between the multiple VLANs.
Local Area Network
The Local Area Networks utilizes Cisco and Dell brand switches and routers. These
include full gigabit switches, 10/100 Mbps Switches with and without gigabit uplink
capability. Cisco model numbers include: 2950, 2950G, 2924M, 3512, 3508, 3550,
3500XL, 4006, and 6509. Many of the Cisco switches have been designated end-of-life
(EoL) by Cisco. All the Dell switches are 3024s. Unmanaged switches are also widely
deployed through the district.
Internet Access
BCS contracts with ATMC for 22 Mbps of Internet access capacity.
BCS’s public IP address space is provided by ATMC.
216.99.112.128 /27
Security and WAN Optimization
The BCS security and WAN optimization infrastructure is comprised of the following:
 Cisco ASA 5520 Firewall
 SmoothWall School Guardian Content/URL Filter
 Packeteer PacketShaper 6500
Data Collection and Testing Process Summary
Data collection and testing focused in three key areas as follows:
Physical-layer analysis:
1. Physical inventory of all network infrastructure and LAN cabling
2. Automated network discovery and mapping using SolarWinds LANsurveyor
3. Packet capture and analysis at the core switch at each school using Wireshark the physical layers was inspected by capturing packets at each location and
analyzed for problems and anomalies.
4. Analysis of switch logs and port errors.
Configuration analysis:
1. Analysis of core switch configurations – configurations compared against best
practices published by major switch manufactures.
Network Performance analysis:
1. Network utilization analysis – examine network utilization for WAN and Internet
access connections using Cacti for WAN connections.
2. Network latency analysis – examine network latency across the WAN using
SmokePing
3. Throughput analysis – examine link and end-to-end throughput using the Network
Diagnostic Tool (NDT) developed by I2.
Results & Observations
Physical Layer Analysis:
1. Physical Inventory and Cabling:
Varying levels of craftsmanship are evident throughout the district’s network
infrastructure. In assessing the craftsmanship, we try to distinguish between best
practices related directly to the reliability and performance of the network, and those
related to network maintainability. Our comments are focused on issues that are or may
be contributing to degradation in network reliability and performance.
Of particular significance and concern are issues related to copper and fiber patch cables.
BCS uses a mixture of manufactured and field-terminated copper Ethernet patch cables.
Care must be taken to ensure field-terminated cables meet industry standards.
When creating Ethernet patch cables by hand, the jacket of the cable needs to be all the
way into the crimp jack. Also, TIA/EIA 568-A or TIA/EIA 568-B standard needs to be
followed to ensure proper crosstalk elimination.
Virginia Williamson – New Wing
Sometimes, being too neat and orderly can cause problems as well. This rack is very
organized and maintained; however the two most important cables in the rack are bent to
the point of creasing. This can cause packet loss and CRC errors.
Bolivia Elementary IDF - 200 Wing
Bolivia Elementary IDF - 200 Wing
BCS also uses both manufactured and field-terminated fiber optic patch cables. In many
cases the fiber connectors do not have strain relievers and/or the protective layers of
shielding have been stripped away from the fiber itself. In addition, adequate protection
is often not provided for the fiber optic cables.
The following pictures were taken at the BCS central office main server room.
Server Room – Central Office
Server Room – Central Office
The quality of the fiber installation is poor with fiber jumper cables routed
in/around/between OSP cables, and field-terminated fiber cables fully exposed under the
floor.
Shown below are examples of problems with the fiber optic infrastructure that need to be
addressed. Many of the fiber optic cables have bends and angles that exceed
recommended standards. These bends cause single loss and refraction resulting in an
unstable network. Unused terminated fiber cables should have covers to protect the ends.
Bolivia Elementary– Room 405
Jessie Mae Monroe – Server Room
Union Elementary – Room 501
Pictures of all the wiring closets and switch locations located throughout the Brunswick
County Schools System are included in the assessment packet.
2. SolarWinds LANsurveyor Network Discovery
As noted earlier, results from the SolarWinds LANsurveyor auto-discovery tool were
used to develop the network diagram included in Appendix A. The raw data generated
by the network discovery tool is included in Appendix B. All network switches are
included in the map and it can be used to see the hierarchical design if needed. Some
parts of the auto discovery did not complete correctly due to ATMC equipment not being
accessible to the discovery agent.
3. Packet Capture using WireShark
Packet captures of the broadcast traffic show normal network broadcast traffic protocols
which include: ARP, DHCP, IPX, Spanning-Tree, and NetBIOS. EIGRP is leaking out
into the LANs of the school from the Core 6509. There doesn’t seem to be an overload
of one type of traffic, and the amount of broadcast traffic is well within normal
operations.
Packet captures of all traffic using port mirroring, shows much of the same. HTTP and
TCP requests to and from the proxy server at the district consist of the majority of the
traffic. There are some NCP file access requests to 10.1.10.5. Packet Captures at NBHS
show a possible switch loop.
4. Analysis of Logs and Port Errors
Below are the problems we identified while analyzing the logs and port errors of all the
switches. The majority of the errors are CRC or Collision type errors. CRC errors are
generally cable based, while Collision errors generally mean there is a duplex mismatch
issue.
Although we listed all the port error statistics for every school, many of the switches were
restarted less than 4 days prior to our arrival. In addition, network port usage was low at
the time of our assessment since school was not in session. Consequently, it is likely
there are additional ports with errors.
Central Office 6509
4/1 – duplex mismatch 100-half collisions (Packeteer)
4/5 – duplex mismatch 100-half collisions (NovaNet)
6/3 – CRC / TCP Runts – Possible bad handmade cable
6/36 – Native VLAN mismatch
2/4 – Native VLAN mismatch
Bolivia – 10.4.0.0
10.4.1.4 – Fa0/16 – CRC errors
- CRC errors
10.4.1.5 – Fa0/2 CRC errors
10.4.1.11 – VLAN 1 input errors – CRC – on virtual interface – this should not happen
- Possible issue with the switch
10.4.1.13 – fan fault – replace fan or switch
BCA – 10.8.0.0
10.0.0.8 – fa0/24 – CRC error
10.8.1.3 – fa0/23 – CRC error
- fa0/27 – CRC error
10.8.1.4 – fa0/48 – collisions – duplex mismatch
10.8.1.5 – fa0/18 – CRC error
10.8.1.6 – fa0/18 – CRC & VLAN mismatch
Jessie Mae Monroe – 10.10.0.0
** Entire school needs to have the 1000BaseCX modules replaced due to collisions
** Switch response is slower due to dropped packets and retransmits
Leland Middle – 10.16.0.0
10.16.1.38 – fa0/21 – CRC error
- fa0/22 – CRC error
10.16.1.39 – fa0/2 – CRC error & interface flapping up/down
- fa0/3 – CRC error
- fa0/19 – CRC error
10.16.1.40 – fa0/2 – CRC error
- fa0/2 – CRC error & interface flapping up/down
- fa0/9 – CRC error & interface flapping up/down
- fa0/19 – CRC error
- fa0/23 – CRC error
10.16.1.41 – fa0/10 – CRC error & interface flapping up/down
- fa0/23 – CRC error
10.16.1.42 – fa0/5 – CRC error & interface flapping up/down
- fa0/13 – CRC error & interface flapping up/down
- fa0/22 – CRC error
10.16.1.43 – fa0/12 – CRC error & interface flapping up/down
- fa0/22 – interface flapping up/down
- fa0/24 – CRC error
10.16.1.46 – fa0/22 – collisions – duplex mismatch ( Access Point )
Leland Elementary – 10.20.0.0
10.0.0.20 – gi0/1 – high number of ignored packets (might not be a problem)
10.20.1.3 – fa0/8 – CRC errors & collisions – duplex mismatch
10.20.1.6 – fa0/18 – collisions – duplex mismatch
10.20.1.10 – switch not accessible
10.20.1.13 – fa0/13 – CRC error
10.20.1.14 – fa0/1 – CRC error
- fa0/3 – CRC error
NBHS – 10.26.0.0
10.0.0.26 - 3/1 – xmit errors –100/full
- 3/22 – xmit errors – 100/full
- 3/46 – xmit errors – 100/full
- 2/2, 2/3, & 2/4 – Native VLAN mismatch
10.26.1.20 – fa0/28 – duplex mismatch 100-half
10.26.1.50 – fa0/7 – duplex mismatch – collisions
10.26.1.247 – Gi0/3 – duplex mismatch – collisions 100-half
- Gi0/4 – CRC errors – check cable / environment
- Gi0/7 – CRC errors – check cable / environment
- Gi0/8 – CRC errors – check cable / environment
10.26.1.253 – IOS 11.2 needs to be upgraded
- Fa0/12 – CRC errors – check cable / environment
10.26.1.254 – IOS 11.2 needs to be upgraded
Shallotte Middle – 10.32.0.0
10.0.0.32 – fa0/1 – CRC error
10.32.1.10 – ALL ports except (fa0/1, fa0/16, fa0/21, and fa0/24) CRC errors
- fa0/12 has the most CRC errors
10.32.1.11 – fa0/3 – CRC error
- fa0/4 – CRC error
- fa0/6 – CRC error
- fa0/7 – CRC error
- fa0/8 – CRC error
- fa0/10 – CRC error
- fa0/12 – CRC error
10.32.1.13 – fa0/4 – CRC error
- fa0/5 – CRC error
- fa0/15 – CRC error
10.32.1.17 – fa0/13 – CRC error
- fa0/18 – CRC error
10.32.1.21 – fa0/7 – CRC error
- fa0/10 – CRC error
SBHS – 10.34.0.0
10.0.0.34 – gi3/2 – CRC error
- gi3/4 – CRC error
SBMS – 10.35.0.0
10.35.1.24 – fa0/3 – CRC error
10.25.1.26 – fa0/9 – CRC error
Configuration analysis:
1. Switch configuration analysis:
Switch configurations at the school system follow standard best practices. In a few rare
occasions, VLAN uniformity is not followed. These are listed under the Port Error
section of the assessment as Native VLAN Mismatches.
BCS should consider hard-coding the speed and duplex on all ports that connect a switch
to another switch. After power outages there is a possibility that one of the ports will not
auto-negotiate correctly and cause latency and packet loss.
Network Performance Analysis
1. Network Utilization Analysis (Appendix C)
Bandwidth usage was very low during our test period as school was out of session. All
traffic from the schools flows through the ATMC Bolivia-CO. Traffic at the time of the
assessment, was well within limits. There were some large spikes of traffic during the
morning hours destined to NBHS. These spikes were to the local servers at the Central
Office.
ATMC Bolivia-CO
NBHS
2. Network Latency Analysis (Appendix D)
Network latency is very low across the entire wide area network. Average ping times
range between 1ms and 3ms. As the “switch hops” between the school and the district
increase so does the average ping to the school.
There are a few schools that have packet loss issues. These schools include Jessie Mae
Monroe, SBMS and SBHS. SBHS gets its connectivity from SBMS, so we have to
assume that if the packet loss to SBMS was fixed that SBHS would be as well. The rest
of the schools show low loss, and low jitter.
Jessie Mae Monroe
Packet Loss Average – 0.12%
Packet Loss Max – 35.00%
Test points for Jessie Mae Monroe are showing packet loss pikes in the 35% to 50%
range.
South Brunswick Middle
Packet Loss Average – 2.75%
Packet Loss Max – 100.00%
There is a steady 2% to 3% packet loss between the core 6509 and SBMS.
Sbms_3508#sh interfaces status
Port Name
Status
Vlan Duplex Speed Type
------- ------------------ ------------ -------- ------ ------- ---Gi0/1
notconnect 35
Auto 1000 Missing
Gi0/2
connected 35
A-Full 1000 1000BaseSX
Gi0/3
connected 35
A-Full 1000 1000BaseSX
Gi0/4
connected 35
A-Full 1000 1000BaseSX
Gi0/5
connected 35
A-Full 1000 1000BaseSX
Gi0/6
connected 35
A-Full 1000 1000BaseSX
Gi0/7 Connection LH to S connected trunk A-Full 1000 1000BaseLX
Gi0/8 Connection ZX to B connected trunk A-Full 1000 1000BaseLX
If the distance is greater than 7 km between SBMS and ATMC’s Bolivia-CO, then there
is a very high possibility that the packet loss is due to wrong module installed at SBMS.
Gigabit 0/8 in the SBMS_3508 switch should be a 1000BaseZX module. A review of the
OTDR traces taken during the fiber optic cable installation should be done to determine if
the installed fiber optic modules are sufficient to support the distance between schools.
3. Throughput Analysis
The data collected using the Network Diagnostic Tool (NDT) was incomplete and
consequently inconclusive. We are working to modify the tool and associated test
procedures to improve future data collection.
Recommendations
High Priority Recommendations:



SBMS – the fiber module installed to deliver the signal to ATMC Bolivia-CO is
of the LX variety and is only rated for 7 km of distance. Suggest replacing with a
ZX module.
Jessie Mae Monroe– The fiber module installed at JMM is showing as unknown
to the switch. Suggest replacing this module or the switch.
Replace all 1000Base-CX (Cisco Gigastack) modules at Jessie Mae. Switch to
switch connections inside the school are using Cisco 1000_CX_Gigastack
modules which run at half duplex. The industry has abandoned these modules
due to the issues inherent with half-duplex connections.
Other Recommendations:









Acquire a network cabling certifier. Due to the high number of field-terminated
cables, a network cable certifier will ensure that each cable meets specifications,
and is not introducing instability into the network. The tester should be able to
certify copper and fiber based cables.
Test and replace if necessary cables (closet-to-computer) that are plugged into the
ports listed with CRC errors.
Hard code speed and duplex settings for all servers and switches. Any device that
is a part of the network infrastructure should not be using auto-negotiate.
Investigate and check port speed and duplex settings for all ports listed as having
collisions.
Review OTDR traces taken during the installation of the fiber optic cable plant.
If they are not available, hire a contractor to test all long-haul fiber connections
with an OTDR to verify that the connections between the schools meet standards
and loss thresholds.
Move from a proxy-based filter system to a pass-by or in-line appliance. Proxybased filters add latency to network requests. Proxy based systems do not scale
well when speeds of 100Mbit are being utilized.
Review the switch configurations for ATMC-owned 4000 series switches.
Develop and implement a plan for improving the level of craftsmanship at schools
where appropriate. The plan should include network rack cable management,
installing patch panels where appropriate, protecting and securing fiber optic
cables, cleaning and covering unused fiber optic cables, and re-routing the
network patch cables.
Develop and implement a multi-year plan/schedule to replace end-of-life Cisco
switches. The plan should ensure product standardization across the district.
(Minimize/eliminate the use of non-standard products, e.g. Dell switches.) Start
with a school at a time, replace every switch in the school, and use the ones taken
out as spares for the other schools. Appendix E includes the Cisco schedule for
switches designated EoL.



Upgrade the IOS on switches where appropriate to a newer version with bug
fixes.
Remove IPX routing on the WAN. Move to an IP only environment.
Consider redesigning the WAN infrastructure to have less switch hops between
the Central Office and the school edge.
Download