NOC Services and Applications Sunday Folayan Nishal Goburdhan Isatou Jah NOC Services and Applications 1 What is Network Management? “In order to operate a reliable service, the network must be managed according to a determined discipline, using a coherent structure of information management.” Geoff Huston, ISP Survival Guide NOC Services and Applications 2 What is a NOC? Network Operations Centre (NOC) Monitors and manages a service provider’s network • Information about current, historical and planned availability of systems • Network status and operational statistics • Fault monitoring and management Engineers can coordinate their work through the NOC NOC Services and Applications 3 Network Management - Components Parts of Network Management • • • • Configuration/Change management Performance/Accounting management Fault management Security management NOC Services and Applications 4 Configuration Management Maintaining information relating to the design of the network and its current configuration Network State • Record of network topology – Static what is deployed where it is deployed how it is attached Who is responsible for it How do I contact them – Dynamic operational status of the network elements NOC Services and Applications 5 Configuration Management inventory management • database of network elements • history of changes & problems directory maintenance • all hosts & applications • nameserver database host and service naming coordination • "Information is not information if you can't find it" NOC Services and Applications 6 Configuration Management Operational Control of network Start/stop individual components Alter configuration of devices Load and save config versions Hardware/Software upgrades Methods of access • SNMPGet / SNMPSet • Out-of-Band access NOC Services and Applications 7 RANCID RANCID - Really Awesome New Cisco confIg Differ Also works for IOS/CatOS/JunOS/... Open Source Runs on FreeBSD, Linux, OSX, even MSWindows http://www.shrubbery.net/ (lots of other useful tools here too!) NOC Services and Applications 8 RANCID Collections of scripts that run from cron and automate • logging into routers • capturing configuration • highlighting configuration ‘differences’ • emailing the ‘diffs’ to a mail list • installing ‘diffs’ into CVS NOC Services and Applications 9 RANCID • Track config changes – Normal day-to-day • Track hardware changes – Where’s that spare…? • Track (I)OS changes • Malicious changes ? – What did your NOC do last night? • Retrieve dead router configs. • Track router crashes!! NOC Services and Applications 10 RANCID aka Big Brother • Announce changes to entire team everybody starts looking out for anyone making random changes! • If it’s broken, what’s changed? • Make it user friendly - CVSWeb NOC Services and Applications 11 RANCID Sample Output !Slot 2/MBUS: hvers 1.1 !Slot 2/MBUS: software 01.36 (RAM) (ROM version is 01.33) !Slot 2/MBUS: 128 Mbytes DRAM, 16384 Kbytes SDRAM ! - !Slot 6: 1 Port Gigabit Ethernet - !Slot 6/PCA: part 73-3302-03 rev C0 ver 3, serial CAB031216OL - !Slot 6/PCA: hvers 1.1 - !Slot 6/MBUS: part 73-2146-07 rev B0 dev 0, serial CAB031112SB - !Slot 6/MBUS: hvers 1.2 - !Slot 6/MBUS: software 01.36 (RAM) (ROM version is 01.33) !Slot 7: Route Processor !Slot 7/PCA: part 73-2170-03 rev B0 ver 3, serial CAB024901SI !Slot 7/PCA: hvers 1.4 !Slot 7/MBUS: part 73-2146-06 rev A0 dev 0, serial CAB02060044 NOC Services and Applications 12 RANCID Demo Demo of live RANCID system NOC Services and Applications 13 RANCID Re-use More than configuration management. • Cheap Asset Tracker/NMS UNIX script - easily extendible to other applications. • Re-use login scripts • Manage configuration changes Correlate syslog and RANCID using Simple Event Correlator (SEC) – http://threebit.net/mail-archive/cisconsp/msg00053.html NOC Services and Applications 14 RANCID - Even More Uses Looking Glass software See Joe Abley and Stephen Stuart NANOG presentation: • http://www.nanog.org/mtg-0210/abley.html • Consistency/Audit checks • Generate DNS zone files • Create Topographic maps NOC Services and Applications 15 What is SNMP? Simple Network Management Protocol query - response system • can obtain status from a device • standard queries • enterprise specific uses database defined in MIB • management information base NOC Services and Applications 16 What do we use SNMP for? query routers for: • • • • in and out bytes per second CPU load uptime BGP peer session status query hosts for: • • • • network status Message queues Web traffic Squid proxy load NOC Services and Applications 17 SNMP Exercise NOC Services and Applications 18 Configuration Management SNMP driven display wjh12 mghgw generali husc6 harvard talcott wjhgw1 harvisr huelings geo pitirium nnhvd nngw oitgw1 sphgw1 lmagw1 dfch NOC Services and Applications tch tch 19 Performance Management A Consistent level of network performance Data collection – interface stats – throughput – error rates – usage – percent availability Data analysis for performance metrics and trends Establishment of performance thresholds Capacity planning and deployment NOC Services and Applications 20 Importance of Network Statistics Accounting Troubleshooting Long-term trend analysis Capacity Planning Two different types • active measurement • passive measurement Management Tools have statistical functionality NOC Services and Applications 21 MRTG System: Maintainer: Description: ifType: ifName: Max Speed: Ip: bb-rtr.ws.afnog.org in FastEthernet0/0.67-802.1Q-vLAN-subif Upstream Link Layer 2 Virtual LAN using 802.1Q (135) Fa0/0.67 12.5 MBytes/s 196.216.67.254 () NOC Services and Applications 22 MRTG and MRTG Exercise NOC Services and Applications 23 Netflow Cisco developed - 1996 Initially a mechanism for forwarding packets No longer - Now, primarily used for • Accounting/Billing • Network planning • Peering arrangements • Traffic engineering • Security monitoring NOC Services and Applications 24 Netflow Netflow packet typically contains • IP SRC+DST • Port SRC+DST • Protocol information • TOS byte (DSCP) • Input logical interface (ifIndex) Extendible (IOS capable) • AS / VRF / ... NOC Services and Applications 25 Netflow Uses CPU and memory! Export Netflow to external collector (or use online on router) • http://www.splintered.net/sw/flow-tools/ Router summarisation possible Netflow V5 is most commonly used http://www.cisco.com/go/netflow NOC Services and Applications 26 Netflow Only works on inbound traffic Unidirectional flow Shows transit (traffic through) and to the router. Enabled by: • ip route-cache flow • ip flow ingress (new syntax) Output seen with: • show ip cache [verbose] flow NOC Services and Applications 27 Netflow Example From your workstation: ping 196.200.220.1 On your router: router# conf t router(config) int fa0/0 router(config-if)# ip flow ingress router# show ip cache flow NOC Services and Applications 28 Netflow Example (cont). What’s missing? (Why are the flows only in 1 direction?) How do you fix it ? Now repeat the BCP38 packet spoofing exercise, but track the bogus packets with Netflow. Pay attention to what happens when uRPF is enabled. NOC Services and Applications 29 Netflow examples Top ten lists (or top five) ##### Top 5 AS's based on number of bytes ####### srcAS dstAS pkts bytes 6461 237 4473872 3808572766 237 237 22977795 3180337999 3549 237 6457673 2816009078 2548 237 5215912 2457515319 ##### Top 5 Nets based on number of bytes ###### Net Matrix ---------number of net entries: 931777 SRCNET/MASK DSTNET/MASK PKTS 165.123.0.0/16 35.8.0.0/13 745858 207.126.96.0/19 198.108.98.0/24 708205 206.183.224.0/19 198.108.16.0/22 740218 35.8.0.0/13 128.32.0.0/16 671980 ##### Top 10 Ports ####### input port packets bytes 119 10863322 2808194019 80 36073210 862839291 20 1079075 1100961902 7648 1146864 419882753 25 1532439 97294492 BYTES 1036296098 907577874 861538792 467274801 output packets bytes 5712783 427304556 17312202 1387817094 614910 62754268 1147081 414663212 2158042 722584770 NOC Services and Applications 30 Accounting Management What do you account for? • Use of the network and the services it provides Types of accounting data • RADIUS/TACACS accounting data from Access servers • Interface statistics • Protocol statistics Accounting Data affects Business Models • Bill on usage? • Flat-rate billing? NOC Services and Applications 31 Fault Management Identify the fault • Regular polling of network elements Isolate the fault • Diagnosis of the network components Respond to the fault • Allocate resources to resolve the fault • Priority scheduling • Technical/management escalation Resolve the fault • notification NOC Services and Applications 32 Fault Management - systems reporting mechanism • link to NOC • notify on-call personnel setup & control alarm procedures repair/recovery procedures ticket system NOC Services and Applications 33 Fault Management - Fault Detection Who notices a problem with the network? • Network Operations Center w/ 24x7 operations staff – open trouble ticket to track problem – preliminary troubleshooting – Assign engineer to problem or escalate ticket status • Customer call • Other ISPs NOC Services and Applications 34 Fault Management Fault Detection (con) How can you tell if there is a problem with the network? • Network Monitoring Tools – common utilities ping Traceroute Ethereal Snmp – Monitoring Systems NOCol Big Brother Nagios HP Openview, etc… • Report state or unreachability – detect node down – routing problems NOC Services and Applications 35 Fault Management - Ticket System Very Important! Need mechanism to track: • failures • current status of outage • carrier tickets NOC Services and Applications 36 Fault Management:Ticket System system provides for: • • • • • • short term memory & communication scheduling and work assignment referrals and dispatching oversight statistical analysis long term accountability NOC Services and Applications 37 Fault Management - Ticket Usage create a ticket on ALL calls create a ticket on ALL problems create a ticket for ALL scheduled events copy of ticket mailed to reporter and mailing list(s) all milestones in resolution of problem maintain the same ticket # ticket stays "open" until problem resolved Ticket reporter determines that ticket should be closed. NOC Services and Applications 38 Fault Management - Ticket Example Sample opening ticket Subject Fix sshd on E2 instructor machines Serial Number 6 Area none Queue afnog-noc Requestors pfs@cisco.com Owner inst Status resolved Last User Contact Wed May 10 17:02:21 2006 (12 hr ago) Current Priority 1 Final Priority 1 Due No date assigned Last Action Wed May 10 17:02:21 2003 (12 hr ago) Created Mon May 8 14:08:08 2003 (2 days ago) NOC Services and Applications 39 Exercise: Ticket System •RT is already installed on http://e2-noc.ws.afnog.org •Create tickets to track network occurrences as they occur - network failures will be provided ;-) NOC Services and Applications 40 Fault Management - typical failures • Node unpingable • no ip connectivity to router • possible reasons: – serial link down call telco – router down/hardware problem call engineer – routing problem troubleshoot with traceroute routeviews machine NOC Services and Applications 41 Security Management: Do’s & Don’t’s Dont’ leave things that are likely to be interesting to mice lying on the kitchen table overnight Plug the holes that mice are using to get into the house Don’t provide places within the house for mice to build nests Set traps along walls where you often see mice out of the corner of your eye Check the traps daily to rebait them and to dispose of squashed mice. Full traps don’t catch mice, and they smell Avoid using commercial bait-and-kill poisons. Traditional snap traps are best. Get a cat! NOC Services and Applications 42 Security Management - Tools security tools • • • • • cops - host configuration checker (www.cert.org) swatch - email reports of activity on machine Tcpwrappers – log connections, restrict access ssh/skey – crypto authentication and communications Tripwire – monitor changes to system files Keep up to date with security information • bug reports – CERT advisories mailing list: http://www.cert.org./contact_cert/certmaillist.html • bug fixes • intruder alerts NOC Services and Applications 43 Security Management – Good Practice reporting procedure for security events • e.g. break-ins • abuse email address for customers to report complaints (abuse@your-isp.net) control internal and external gateways • control firewalls (external and internal) security log management • centralized logging host • Stealth logger, so it cannot be compromised NOC Services and Applications 44 How do I manage my network? Which tools should I use? What do I really need? • Keep it simple! • Need to consider engineers working remotely • Don’t want to spend too much time maintaining the tool (it should be helping you!) • Different tools for NOC and engineers • Different tools for statistics • RELIABILITY! NOC Services and Applications 45 References http://www.merit.edu/ipma/docs/isp.html http://www.nanog.org http://www.caida.org http://www.nlanr.net http://www.cisco.com http://www.amazing.com/internet/ http://www.isp-resource.com/ http://www.merit.edu/ipma http://www.ripe.net NOC Services and Applications 46 More Tools! http://www.caida.org/Tools/ • OC3Mon/Coral http://www.merit.edu/~ipma – RouteTracker – IRRj – ASExplorer http://www.geektools.com/ http://www.merit.edu/ipma/tools/other.html NOC Services and Applications 47 SNMP Tool references • • • • • • MON - http://www.kernel.org/software/mon/ NOCol - ftp://ftp.navya.com/pub/vikas/nocol.tar.gz Sysmon - ftp://puck.nether.net/pub/jared Rover - http://www.merit.edu/~rover Concord - http://www.concord.com http://www.merit.net/~netscarf NOC Services and Applications 48