NOC Services and Management - (ISOC) Workshop Resource Centre

advertisement
NOC Services and
Applications
Sunday Folayan
Nishal Goburdhan
Isatou Jah
NOC Services and Applications
1
What is Network Management?
“In order to operate a reliable service, the network
must be managed according to a determined
discipline, using a coherent structure of information
management.”
Geoff Huston, ISP Survival Guide
NOC Services and Applications
2
What is a NOC?
Network Operations Centre (NOC)
 Monitors and manages a service provider’s
network
• Information about current, historical and planned
availability of systems
• Network status and operational statistics
• Fault monitoring and management
Engineers can coordinate their work through the NOC
NOC Services and Applications
3
Network Management - Components
Parts of Network Management
•
•
•
•
Configuration/Change management
Performance/Accounting management
Fault management
Security management
NOC Services and Applications
4
Configuration Management
Maintaining information relating to the design of the
network and its current configuration

Network State
• Record of network topology
– Static
what is deployed
where it is deployed
how it is attached
Who is responsible for it
How do I contact them
– Dynamic
operational status of the network elements
NOC Services and Applications
5
Configuration Management

inventory management
• database of network elements
• history of changes & problems

directory maintenance
• all hosts & applications
• nameserver database

host and service naming coordination
• "Information is not information if you can't find it"
NOC Services and Applications
6
Configuration Management
Operational Control of network
 Start/stop individual components
 Alter configuration of devices
 Load and save config versions
 Hardware/Software upgrades
 Methods of access
• SNMPGet / SNMPSet
• Out-of-Band access
NOC Services and Applications
7
RANCID
RANCID - Really Awesome New Cisco
confIg Differ
Also works for IOS/CatOS/JunOS/...
Open Source
Runs on FreeBSD, Linux, OSX, even MSWindows
http://www.shrubbery.net/
(lots of other useful tools here too!)
NOC Services and Applications
8
RANCID

Collections of scripts that run from
cron and automate
• logging into routers
• capturing configuration
• highlighting configuration ‘differences’
• emailing the ‘diffs’ to a mail list
• installing ‘diffs’ into CVS
NOC Services and Applications
9
RANCID
• Track config changes
– Normal day-to-day
• Track hardware changes
– Where’s that spare…?
• Track (I)OS changes
• Malicious changes ?
– What did your NOC do last night?
• Retrieve dead router configs.
• Track router crashes!!
NOC Services and Applications
10
RANCID aka Big Brother
• Announce changes to entire team everybody starts looking out for anyone
making random changes!
• If it’s broken, what’s changed?
• Make it user friendly - CVSWeb
NOC Services and Applications
11
RANCID Sample Output
!Slot 2/MBUS: hvers 1.1
!Slot 2/MBUS: software 01.36 (RAM) (ROM version is 01.33)
!Slot 2/MBUS: 128 Mbytes DRAM, 16384 Kbytes SDRAM
!
- !Slot 6: 1 Port Gigabit Ethernet
- !Slot 6/PCA: part 73-3302-03 rev C0 ver 3, serial CAB031216OL
- !Slot 6/PCA: hvers 1.1
- !Slot 6/MBUS: part 73-2146-07 rev B0 dev 0, serial CAB031112SB
- !Slot 6/MBUS: hvers 1.2
- !Slot 6/MBUS: software 01.36 (RAM) (ROM version is 01.33)
!Slot 7: Route Processor
!Slot 7/PCA: part 73-2170-03 rev B0 ver 3, serial CAB024901SI
!Slot 7/PCA: hvers 1.4
!Slot 7/MBUS: part 73-2146-06 rev A0 dev 0, serial CAB02060044
NOC Services and Applications
12
RANCID Demo
Demo of live RANCID system
NOC Services and Applications
13
RANCID Re-use
More than configuration management.
• Cheap Asset Tracker/NMS
UNIX script - easily extendible to other
applications.
• Re-use login scripts
• Manage configuration changes
Correlate syslog and RANCID using
Simple Event Correlator (SEC)
– http://threebit.net/mail-archive/cisconsp/msg00053.html
NOC Services and Applications
14
RANCID - Even More Uses
Looking Glass software
See Joe Abley and Stephen Stuart NANOG
presentation:
• http://www.nanog.org/mtg-0210/abley.html
• Consistency/Audit checks
• Generate DNS zone files
• Create Topographic maps
NOC Services and Applications
15
What is SNMP?
Simple Network Management Protocol
 query - response system

• can obtain status from a device
• standard queries
• enterprise specific

uses database defined in MIB
• management information base
NOC Services and Applications
16
What do we use SNMP for?

query routers for:
•
•
•
•

in and out bytes per second
CPU load
uptime
BGP peer session status
query hosts for:
•
•
•
•
network status
Message queues
Web traffic
Squid proxy load
NOC Services and Applications
17
SNMP Exercise
NOC Services and Applications
18
Configuration Management
SNMP driven display
wjh12
mghgw
generali
husc6
harvard
talcott
wjhgw1
harvisr
huelings
geo
pitirium
nnhvd
nngw
oitgw1
sphgw1
lmagw1
dfch
NOC Services and Applications
tch
tch
19
Performance Management
A Consistent level of network performance
 Data collection
– interface stats
– throughput
– error rates
– usage
– percent availability
Data analysis for performance metrics and trends
 Establishment of performance thresholds
 Capacity planning and deployment

NOC Services and Applications
20
Importance of Network Statistics
Accounting
 Troubleshooting
 Long-term trend analysis
 Capacity Planning
 Two different types

• active measurement
• passive measurement

Management Tools have statistical functionality
NOC Services and Applications
21
MRTG
System:
Maintainer:
Description:
ifType:
ifName:
Max Speed:
Ip:
bb-rtr.ws.afnog.org in
FastEthernet0/0.67-802.1Q-vLAN-subif Upstream Link
Layer 2 Virtual LAN using 802.1Q (135)
Fa0/0.67
12.5 MBytes/s
196.216.67.254 ()
NOC Services and Applications
22
MRTG and MRTG Exercise
NOC Services and Applications
23
Netflow
Cisco developed - 1996
Initially a mechanism for forwarding
packets
No longer - Now, primarily used for
• Accounting/Billing
• Network planning
• Peering arrangements
• Traffic engineering
• Security monitoring
NOC Services and Applications
24
Netflow

Netflow packet typically contains
• IP SRC+DST
• Port SRC+DST
• Protocol information
• TOS byte (DSCP)
• Input logical interface (ifIndex)

Extendible (IOS capable)
• AS / VRF / ...
NOC Services and Applications
25
Netflow
Uses CPU and memory!
Export Netflow to external collector (or
use online on router)
• http://www.splintered.net/sw/flow-tools/
Router summarisation possible
Netflow V5 is most commonly used
http://www.cisco.com/go/netflow
NOC Services and Applications
26
Netflow
Only works on inbound traffic
Unidirectional flow
Shows transit (traffic through) and to the
router.
Enabled by:
• ip route-cache flow
• ip flow ingress (new syntax)
Output seen with:
• show ip cache
[verbose]
flow
NOC Services and Applications
27
Netflow Example
From your workstation:
ping 196.200.220.1
On your router:
router# conf t
router(config) int fa0/0
router(config-if)# ip flow ingress
router# show ip cache flow
NOC Services and Applications
28
Netflow Example (cont).
What’s missing?
(Why are the flows only in 1 direction?)
How do you fix it ?
Now repeat the BCP38 packet spoofing
exercise, but track the bogus packets with
Netflow. Pay attention to what happens
when uRPF is enabled.
NOC Services and Applications
29
Netflow examples

Top ten lists (or top five)
##### Top 5 AS's based on number of bytes #######
srcAS dstAS
pkts
bytes
6461 237
4473872
3808572766
237 237
22977795
3180337999
3549 237
6457673
2816009078
2548 237
5215912
2457515319
##### Top 5 Nets based on number of bytes ######
Net Matrix
---------number of net entries: 931777
SRCNET/MASK DSTNET/MASK
PKTS
165.123.0.0/16 35.8.0.0/13
745858
207.126.96.0/19 198.108.98.0/24
708205
206.183.224.0/19 198.108.16.0/22
740218
35.8.0.0/13 128.32.0.0/16
671980
##### Top 10 Ports #######
input
port
packets
bytes
119
10863322 2808194019
80
36073210
862839291
20
1079075 1100961902
7648
1146864
419882753
25
1532439
97294492
BYTES
1036296098
907577874
861538792
467274801
output
packets
bytes
5712783
427304556
17312202 1387817094
614910
62754268
1147081
414663212
2158042
722584770
NOC Services and Applications
30
Accounting Management

What do you account for?
• Use of the network and the services it provides

Types of accounting data
• RADIUS/TACACS accounting data from Access
servers
• Interface statistics
• Protocol statistics

Accounting Data affects Business Models
• Bill on usage?
• Flat-rate billing?
NOC Services and Applications
31
Fault Management

Identify the fault
• Regular polling of network elements

Isolate the fault
• Diagnosis of the network components

Respond to the fault
• Allocate resources to resolve the fault
• Priority scheduling
• Technical/management escalation

Resolve the fault
• notification
NOC Services and Applications
32
Fault Management - systems

reporting mechanism
• link to NOC
• notify on-call personnel
setup & control alarm procedures
 repair/recovery procedures
 ticket system

NOC Services and Applications
33
Fault Management - Fault Detection
Who notices a problem with the
network?
• Network Operations Center w/ 24x7 operations staff
– open trouble ticket to track problem
– preliminary troubleshooting
– Assign engineer to problem or escalate ticket status
• Customer call
• Other ISPs
NOC Services and Applications
34
Fault Management Fault Detection (con)
How can you tell if there is a problem with the
network?
• Network Monitoring Tools
– common utilities
ping
Traceroute
Ethereal
Snmp
– Monitoring Systems
NOCol
Big Brother
Nagios
HP Openview, etc…
• Report state or unreachability
– detect node down
– routing problems
NOC Services and Applications
35
Fault Management - Ticket System
Very Important!
 Need mechanism to track:

• failures
• current status of outage
• carrier tickets
NOC Services and Applications
36
Fault Management:Ticket System

system provides for:
•
•
•
•
•
•
short term memory & communication
scheduling and work assignment
referrals and dispatching
oversight
statistical analysis
long term accountability
NOC Services and Applications
37
Fault Management - Ticket Usage
create a ticket on ALL calls
 create a ticket on ALL problems
 create a ticket for ALL scheduled events
 copy of ticket mailed to reporter and mailing
list(s)
 all milestones in resolution of problem maintain
the same ticket #
 ticket stays "open" until problem resolved
 Ticket reporter determines that ticket should be
closed.

NOC Services and Applications
38
Fault Management - Ticket Example
Sample opening ticket
Subject
Fix sshd on E2 instructor machines
Serial Number
6
Area
none
Queue
afnog-noc
Requestors
pfs@cisco.com
Owner
inst
Status
resolved
Last User Contact
Wed May 10 17:02:21 2006 (12 hr ago)
Current Priority
1
Final Priority
1
Due
No date assigned
Last Action
Wed May 10 17:02:21 2003 (12 hr ago)
Created
Mon May 8 14:08:08 2003 (2 days ago)
NOC Services and Applications
39
Exercise: Ticket System
•RT is already installed on http://e2-noc.ws.afnog.org
•Create tickets to track network occurrences as they occur - network failures
will be provided ;-)
NOC Services and Applications
40
Fault Management - typical failures
• Node unpingable
• no ip connectivity to router
• possible reasons:
– serial link down
call telco
– router down/hardware problem
call engineer
– routing problem
troubleshoot with traceroute
routeviews machine
NOC Services and Applications
41
Security Management: Do’s & Don’t’s







Dont’ leave things that are likely to be interesting to mice lying
on the kitchen table overnight
Plug the holes that mice are using to get into the house
Don’t provide places within the house for mice to build nests
Set traps along walls where you often see mice out of the corner
of your eye
Check the traps daily to rebait them and to dispose of squashed
mice. Full traps don’t catch mice, and they smell
Avoid using commercial bait-and-kill poisons. Traditional snap
traps are best.
Get a cat!
NOC Services and Applications
42
Security Management - Tools

security tools
•
•
•
•
•

cops - host configuration checker (www.cert.org)
swatch - email reports of activity on machine
Tcpwrappers – log connections, restrict access
ssh/skey – crypto authentication and communications
Tripwire – monitor changes to system files
Keep up to date with security information
• bug reports
– CERT advisories mailing list:
http://www.cert.org./contact_cert/certmaillist.html
• bug fixes
• intruder alerts
NOC Services and Applications
43
Security Management – Good Practice

reporting procedure for security events
• e.g. break-ins
• abuse email address for customers to report
complaints (abuse@your-isp.net)

control internal and external gateways
• control firewalls (external and internal)

security log management
• centralized logging host
• Stealth logger, so it cannot be compromised
NOC Services and Applications
44
How do I manage my network?

Which tools should I use? What do I really
need?
• Keep it simple!
• Need to consider engineers working remotely
• Don’t want to spend too much time maintaining the
tool (it should be helping you!)
• Different tools for NOC and engineers
• Different tools for statistics
• RELIABILITY!
NOC Services and Applications
45
References
http://www.merit.edu/ipma/docs/isp.html
 http://www.nanog.org
 http://www.caida.org
 http://www.nlanr.net
 http://www.cisco.com
 http://www.amazing.com/internet/
 http://www.isp-resource.com/
 http://www.merit.edu/ipma
 http://www.ripe.net

NOC Services and Applications
46
More Tools!

http://www.caida.org/Tools/
• OC3Mon/Coral

http://www.merit.edu/~ipma
– RouteTracker
– IRRj
– ASExplorer
http://www.geektools.com/
 http://www.merit.edu/ipma/tools/other.html

NOC Services and Applications
47
SNMP Tool references
•
•
•
•
•
•
MON - http://www.kernel.org/software/mon/
NOCol - ftp://ftp.navya.com/pub/vikas/nocol.tar.gz
Sysmon - ftp://puck.nether.net/pub/jared
Rover - http://www.merit.edu/~rover
Concord - http://www.concord.com
http://www.merit.net/~netscarf
NOC Services and Applications
48
Download