Network Health Assessment LEA Name: Brunswick County Schools Primary POC Leonard Jenkins Director, Technology Department Brunswick County Schools 35 Referendum Drive Bolivia, NC 28422 910-253-2931 ljenkins@bcswan.net Technical POC Mike Crawford Wide Area Network (WAN) Manager 910-253-2996 mcrawford@bcswan.net Date Data Collected 06/17/2008 - 06/19/2008 Network Description Wide Area Network The Brunswick County Schools’ (BCS) wide area network (WAN) utilizes fiber infrastructure constructed and maintained in partnership with Atlantic Telecom Membership Corporation (ATMC). The network topology is a distributed star network with the central hubs of the stars owned and maintained by ATMC. ATMC has network hubs in four locations across the county. These locations are connected to each other via gigabit Ethernet. The BCS schools are connected to switches owned by ATMC at these locations, or to other BCS schools utilizing dark fiber provided by ATMC. The furthest school (JMM) from the BCS core is six “switch hops” away. The network core at the BCS central office at 35 Referendum Drive is connected to ATMC’s network at the Bolivia-CO. All school facilities are connected via gigabit Ethernet. Single mode 1000base-lx and 1000base-zx modules are used to light the fiber between the schools. Every connection between switches is setup as 802.1Q trunks which allow several VLANs to communicate. There is a core Cisco 6509 series switch acting as the VLAN controller and main router for the district. All traffic between VLANs is routed via this device. An updated network diagram of the BCS wide area network follows: This diagram can be found in Visio format in Appendix A. Note: The above diagram differs slightly from the network diagram provided by BCS at the start of the network assessment. The network layout detailed in the above diagram was validated with both the SolarWinds LANsurveyor auto-discovery tool, and a review of data obtained through the use on the Cisco Discovery Protocol (CDP). The core 6509 switch is split into a L2 switch and a L3 router. The 6509L3 is the main router for the district, and is routing IP and IPX. The Layer 2 switch is the main switch for the Central Office as well as the VLAN controller for the district. 802.1Q VLAN trunk encapsulation is used on all WAN connections. Cisco VTP is used to control and configure VLAN configurations across the network. The layer3 switch is using IOS and the Layer 2 switch us running CATOS. Subnet masks and IP address schemes are consistent with best practices. Every school has a single VLANs setup for traffic. The Logical network is fairly flat as the 6509 routes all internet traffic between the multiple VLANs. Local Area Network The Local Area Networks utilizes Cisco and Dell brand switches and routers. These include full gigabit switches, 10/100 Mbps Switches with and without gigabit uplink capability. Cisco model numbers include: 2950, 2950G, 2924M, 3512, 3508, 3550, 3500XL, 4006, and 6509. Many of the Cisco switches have been designated end-of-life (EoL) by Cisco. All the Dell switches are 3024s. Unmanaged switches are also widely deployed through the district. Internet Access BCS contracts with ATMC for 22 Mbps of Internet access capacity. BCS’s public IP address space is provided by ATMC. 216.99.112.128 /27 Security and WAN Optimization The BCS security and WAN optimization infrastructure is comprised of the following: Cisco ASA 5520 Firewall SmoothWall School Guardian Content/URL Filter Packeteer PacketShaper 6500 Data Collection and Testing Process Summary Data collection and testing focused in three key areas as follows: Physical-layer analysis: 1. Physical inventory of all network infrastructure and LAN cabling 2. Automated network discovery and mapping using SolarWinds LANsurveyor 3. Packet capture and analysis at the core switch at each school using Wireshark the physical layers was inspected by capturing packets at each location and analyzed for problems and anomalies. 4. Analysis of switch logs and port errors. Configuration analysis: 1. Analysis of core switch configurations – configurations compared against best practices published by major switch manufactures. Network Performance analysis: 1. Network utilization analysis – examine network utilization for WAN and Internet access connections using Cacti for WAN connections. 2. Network latency analysis – examine network latency across the WAN using SmokePing 3. Throughput analysis – examine link and end-to-end throughput using the Network Diagnostic Tool (NDT) developed by I2. Results & Observations Physical Layer Analysis: 1. Physical Inventory and Cabling: Varying levels of craftsmanship are evident throughout the district’s network infrastructure. In assessing the craftsmanship, we try to distinguish between best practices related directly to the reliability and performance of the network, and those related to network maintainability. Our comments are focused on issues that are or may be contributing to degradation in network reliability and performance. Of particular significance and concern are issues related to copper and fiber patch cables. BCS uses a mixture of manufactured and field-terminated copper Ethernet patch cables. Care must be taken to ensure field-terminated cables meet industry standards. When creating Ethernet patch cables by hand, the jacket of the cable needs to be all the way into the crimp jack. Also, TIA/EIA 568-A or TIA/EIA 568-B standard needs to be followed to ensure proper crosstalk elimination. Virginia Williamson – New Wing Sometimes, being too neat and orderly can cause problems as well. This rack is very organized and maintained; however the two most important cables in the rack are bent to the point of creasing. This can cause packet loss and CRC errors. Bolivia Elementary IDF - 200 Wing Bolivia Elementary IDF - 200 Wing BCS also uses both manufactured and field-terminated fiber optic patch cables. In many cases the fiber connectors do not have strain relievers and/or the protective layers of shielding have been stripped away from the fiber itself. In addition, adequate protection is often not provided for the fiber optic cables. The following pictures were taken at the BCS central office main server room. Server Room – Central Office Server Room – Central Office The quality of the fiber installation is poor with fiber jumper cables routed in/around/between OSP cables, and field-terminated fiber cables fully exposed under the floor. Shown below are examples of problems with the fiber optic infrastructure that need to be addressed. Many of the fiber optic cables have bends and angles that exceed recommended standards. These bends cause single loss and refraction resulting in an unstable network. Unused terminated fiber cables should have covers to protect the ends. Bolivia Elementary– Room 405 Jessie Mae Monroe – Server Room Union Elementary – Room 501 Pictures of all the wiring closets and switch locations located throughout the Brunswick County Schools System are included in the assessment packet. 2. SolarWinds LANsurveyor Network Discovery As noted earlier, results from the SolarWinds LANsurveyor auto-discovery tool were used to develop the network diagram included in Appendix A. The raw data generated by the network discovery tool is included in Appendix B. All network switches are included in the map and it can be used to see the hierarchical design if needed. Some parts of the auto discovery did not complete correctly due to ATMC equipment not being accessible to the discovery agent. 3. Packet Capture using WireShark Packet captures of the broadcast traffic show normal network broadcast traffic protocols which include: ARP, DHCP, IPX, Spanning-Tree, and NetBIOS. EIGRP is leaking out into the LANs of the school from the Core 6509. There doesn’t seem to be an overload of one type of traffic, and the amount of broadcast traffic is well within normal operations. Packet captures of all traffic using port mirroring, shows much of the same. HTTP and TCP requests to and from the proxy server at the district consist of the majority of the traffic. There are some NCP file access requests to 10.1.10.5. Packet Captures at NBHS show a possible switch loop. 4. Analysis of Logs and Port Errors Below are the problems we identified while analyzing the logs and port errors of all the switches. The majority of the errors are CRC or Collision type errors. CRC errors are generally cable based, while Collision errors generally mean there is a duplex mismatch issue. Although we listed all the port error statistics for every school, many of the switches were restarted less than 4 days prior to our arrival. In addition, network port usage was low at the time of our assessment since school was not in session. Consequently, it is likely there are additional ports with errors. Central Office 6509 4/1 – duplex mismatch 100-half collisions (Packeteer) 4/5 – duplex mismatch 100-half collisions (NovaNet) 6/3 – CRC / TCP Runts – Possible bad handmade cable 6/36 – Native VLAN mismatch 2/4 – Native VLAN mismatch Bolivia – 10.4.0.0 10.4.1.4 – Fa0/16 – CRC errors - CRC errors 10.4.1.5 – Fa0/2 CRC errors 10.4.1.11 – VLAN 1 input errors – CRC – on virtual interface – this should not happen - Possible issue with the switch 10.4.1.13 – fan fault – replace fan or switch BCA – 10.8.0.0 10.0.0.8 – fa0/24 – CRC error 10.8.1.3 – fa0/23 – CRC error - fa0/27 – CRC error 10.8.1.4 – fa0/48 – collisions – duplex mismatch 10.8.1.5 – fa0/18 – CRC error 10.8.1.6 – fa0/18 – CRC & VLAN mismatch Jessie Mae Monroe – 10.10.0.0 ** Entire school needs to have the 1000BaseCX modules replaced due to collisions ** Switch response is slower due to dropped packets and retransmits Leland Middle – 10.16.0.0 10.16.1.38 – fa0/21 – CRC error - fa0/22 – CRC error 10.16.1.39 – fa0/2 – CRC error & interface flapping up/down - fa0/3 – CRC error - fa0/19 – CRC error 10.16.1.40 – fa0/2 – CRC error - fa0/2 – CRC error & interface flapping up/down - fa0/9 – CRC error & interface flapping up/down - fa0/19 – CRC error - fa0/23 – CRC error 10.16.1.41 – fa0/10 – CRC error & interface flapping up/down - fa0/23 – CRC error 10.16.1.42 – fa0/5 – CRC error & interface flapping up/down - fa0/13 – CRC error & interface flapping up/down - fa0/22 – CRC error 10.16.1.43 – fa0/12 – CRC error & interface flapping up/down - fa0/22 – interface flapping up/down - fa0/24 – CRC error 10.16.1.46 – fa0/22 – collisions – duplex mismatch ( Access Point ) Leland Elementary – 10.20.0.0 10.0.0.20 – gi0/1 – high number of ignored packets (might not be a problem) 10.20.1.3 – fa0/8 – CRC errors & collisions – duplex mismatch 10.20.1.6 – fa0/18 – collisions – duplex mismatch 10.20.1.10 – switch not accessible 10.20.1.13 – fa0/13 – CRC error 10.20.1.14 – fa0/1 – CRC error - fa0/3 – CRC error NBHS – 10.26.0.0 10.0.0.26 - 3/1 – xmit errors –100/full - 3/22 – xmit errors – 100/full - 3/46 – xmit errors – 100/full - 2/2, 2/3, & 2/4 – Native VLAN mismatch 10.26.1.20 – fa0/28 – duplex mismatch 100-half 10.26.1.50 – fa0/7 – duplex mismatch – collisions 10.26.1.247 – Gi0/3 – duplex mismatch – collisions 100-half - Gi0/4 – CRC errors – check cable / environment - Gi0/7 – CRC errors – check cable / environment - Gi0/8 – CRC errors – check cable / environment 10.26.1.253 – IOS 11.2 needs to be upgraded - Fa0/12 – CRC errors – check cable / environment 10.26.1.254 – IOS 11.2 needs to be upgraded Shallotte Middle – 10.32.0.0 10.0.0.32 – fa0/1 – CRC error 10.32.1.10 – ALL ports except (fa0/1, fa0/16, fa0/21, and fa0/24) CRC errors - fa0/12 has the most CRC errors 10.32.1.11 – fa0/3 – CRC error - fa0/4 – CRC error - fa0/6 – CRC error - fa0/7 – CRC error - fa0/8 – CRC error - fa0/10 – CRC error - fa0/12 – CRC error 10.32.1.13 – fa0/4 – CRC error - fa0/5 – CRC error - fa0/15 – CRC error 10.32.1.17 – fa0/13 – CRC error - fa0/18 – CRC error 10.32.1.21 – fa0/7 – CRC error - fa0/10 – CRC error SBHS – 10.34.0.0 10.0.0.34 – gi3/2 – CRC error - gi3/4 – CRC error SBMS – 10.35.0.0 10.35.1.24 – fa0/3 – CRC error 10.25.1.26 – fa0/9 – CRC error Configuration analysis: 1. Switch configuration analysis: Switch configurations at the school system follow standard best practices. In a few rare occasions, VLAN uniformity is not followed. These are listed under the Port Error section of the assessment as Native VLAN Mismatches. BCS should consider hard-coding the speed and duplex on all ports that connect a switch to another switch. After power outages there is a possibility that one of the ports will not auto-negotiate correctly and cause latency and packet loss. Network Performance Analysis 1. Network Utilization Analysis (Appendix C) Bandwidth usage was very low during our test period as school was out of session. All traffic from the schools flows through the ATMC Bolivia-CO. Traffic at the time of the assessment, was well within limits. There were some large spikes of traffic during the morning hours destined to NBHS. These spikes were to the local servers at the Central Office. ATMC Bolivia-CO NBHS 2. Network Latency Analysis (Appendix D) Network latency is very low across the entire wide area network. Average ping times range between 1ms and 3ms. As the “switch hops” between the school and the district increase so does the average ping to the school. There are a few schools that have packet loss issues. These schools include Jessie Mae Monroe, SBMS and SBHS. SBHS gets its connectivity from SBMS, so we have to assume that if the packet loss to SBMS was fixed that SBHS would be as well. The rest of the schools show low loss, and low jitter. Jessie Mae Monroe Packet Loss Average – 0.12% Packet Loss Max – 35.00% Test points for Jessie Mae Monroe are showing packet loss pikes in the 35% to 50% range. South Brunswick Middle Packet Loss Average – 2.75% Packet Loss Max – 100.00% There is a steady 2% to 3% packet loss between the core 6509 and SBMS. Sbms_3508#sh interfaces status Port Name Status Vlan Duplex Speed Type ------- ------------------ ------------ -------- ------ ------- ---Gi0/1 notconnect 35 Auto 1000 Missing Gi0/2 connected 35 A-Full 1000 1000BaseSX Gi0/3 connected 35 A-Full 1000 1000BaseSX Gi0/4 connected 35 A-Full 1000 1000BaseSX Gi0/5 connected 35 A-Full 1000 1000BaseSX Gi0/6 connected 35 A-Full 1000 1000BaseSX Gi0/7 Connection LH to S connected trunk A-Full 1000 1000BaseLX Gi0/8 Connection ZX to B connected trunk A-Full 1000 1000BaseLX If the distance is greater than 7 km between SBMS and ATMC’s Bolivia-CO, then there is a very high possibility that the packet loss is due to wrong module installed at SBMS. Gigabit 0/8 in the SBMS_3508 switch should be a 1000BaseZX module. A review of the OTDR traces taken during the fiber optic cable installation should be done to determine if the installed fiber optic modules are sufficient to support the distance between schools. 3. Throughput Analysis The data collected using the Network Diagnostic Tool (NDT) was incomplete and consequently inconclusive. We are working to modify the tool and associated test procedures to improve future data collection. Recommendations High Priority Recommendations: SBMS – the fiber module installed to deliver the signal to ATMC Bolivia-CO is of the LX variety and is only rated for 7 km of distance. Suggest replacing with a ZX module. Jessie Mae Monroe– The fiber module installed at JMM is showing as unknown to the switch. Suggest replacing this module or the switch. Replace all 1000Base-CX (Cisco Gigastack) modules at Jessie Mae. Switch to switch connections inside the school are using Cisco 1000_CX_Gigastack modules which run at half duplex. The industry has abandoned these modules due to the issues inherent with half-duplex connections. Other Recommendations: Acquire a network cabling certifier. Due to the high number of field-terminated cables, a network cable certifier will ensure that each cable meets specifications, and is not introducing instability into the network. The tester should be able to certify copper and fiber based cables. Test and replace if necessary cables (closet-to-computer) that are plugged into the ports listed with CRC errors. Hard code speed and duplex settings for all servers and switches. Any device that is a part of the network infrastructure should not be using auto-negotiate. Investigate and check port speed and duplex settings for all ports listed as having collisions. Review OTDR traces taken during the installation of the fiber optic cable plant. If they are not available, hire a contractor to test all long-haul fiber connections with an OTDR to verify that the connections between the schools meet standards and loss thresholds. Move from a proxy-based filter system to a pass-by or in-line appliance. Proxybased filters add latency to network requests. Proxy based systems do not scale well when speeds of 100Mbit are being utilized. Review the switch configurations for ATMC-owned 4000 series switches. Develop and implement a plan for improving the level of craftsmanship at schools where appropriate. The plan should include network rack cable management, installing patch panels where appropriate, protecting and securing fiber optic cables, cleaning and covering unused fiber optic cables, and re-routing the network patch cables. Develop and implement a multi-year plan/schedule to replace end-of-life Cisco switches. The plan should ensure product standardization across the district. (Minimize/eliminate the use of non-standard products, e.g. Dell switches.) Start with a school at a time, replace every switch in the school, and use the ones taken out as spares for the other schools. Appendix E includes the Cisco schedule for switches designated EoL. Upgrade the IOS on switches where appropriate to a newer version with bug fixes. Remove IPX routing on the WAN. Move to an IP only environment. Consider redesigning the WAN infrastructure to have less switch hops between the Central Office and the school edge.