tutorial PPT

advertisement
RedIRIS monitoring and
operational procedures
RedIRIS – Alberto Escolano Sánchez
alberto.escolano@rediris.es
Agenda
•
•
•
•
•
•
Part I: Monitoring
Concepts
SNMP
Hardware
Tools
Active Monitoring
2
Concepts
• SNMP (Simple Network Management Protocol)
– RFC 1157
– Protocol developed to manage nodes of an IP network
• UDP (User Datagram Protocol)
– RFC 768
– Most commonly used transport protocol for SNMP
• SMI (Structure of Management Information)
– RFC 1155
– RFC 2578 (version 2)
– Contains the definitions for the structure and
identification of management information for the Internet
3
Concepts
• MIB (Management Information Base)
– RFC 1156
– RFC 1213 (version 2)
– Together with SNMP and SMI provide the architecture
for managing the Internet
• OID (Object Identifier)
– List of numbers separated by points which specify an
exact parameter
• NMS (Network Management System)
– Set of applications that monitor and control managed
devices
– Can be standard or vendor specific
4
Agenda
•
•
•
•
•
•
Part I: Monitoring
Concepts
SNMP
Hardware
Tools
Active Monitoring
5
SNMP
• Protocol used to manage network devices such as
switches, routers and servers
• Components
– NMS: Software used to monitor and control managed
devices
– SNMP agent: Management software running in the
managed device
– Network device: Network node to be managed
• SNMP uses the information provided by MIBs
• MIBs describe the structure of the management data of a
network device in a hierarchical way using OIDs
• OIDs identify variables or elements that can be read or
written via SNMP
• Network devices generate and send SNMP traps to the
management system
6
SNMP
• SNMP versions
– SNMPv1: Basic operations and features
– Simplicity
– Lack of security
– RFC 1157
– SNMPv2: Additional operations and features
– Several versions (SNMPv2p, SNMPv2c, SNMPv2u,
SNMPv2*)
– Improved security
– Difficult choice between versions
– i.e: SNMPv2c – RFC 1901
– SNMPv3: Security enhacement
– Uses features from several SNMPv2 versions
– Flexible way to define security methods and
parameters
– RFC 2570
7
SNMP
• SNMP architecture
SNMP Trap (UDP Port 162)
NMS
L2 Switch
SNMP Response (UDP Port 161)
SNMP Request (UDP Port 161)
SNMP Agent
MIBs
SNMP Manager
MIBs
L3 Router
SNMP Trap (UDP Port 162)
SNMP Agent
MIBs
8
SNMP
• MIB Tree structure
– Each SNMP OID represents an individual object of the
MIB
– The MIB can be broken down into a tree structure
where OIDs are leaves on the tree
root
iso (1)
ccitt (0)
standard (0)
joint-iso-ccitt (2)
identified organization (3)
dod (6)
…
internet (1)
directory (1)
mgmt (2)
mib-II (1)
experimental (3)
interface (2)
private (4)
security (5)
snmpv2 (6)
…
9
SNMP
• First approach: How does all these things work?
– Query for inbound octets passed through an interface of
a switch in the network
– Let’s assume all the SNMP stuff is configured and
running properly
– We’ll need the MIB and OID for the SNMP query in
the hierarchy of the OIDs tree
– 1.3.6.1.2.1.2 is the OID for the interfaces related
data (
– 1.3.6.1.2.1.2.2.1.10 is the OID for the ifInOctets
parameter value
– Now we need the interface index to refer to it.
Let’s assume it is 65.
– The full OID is 1.3.6.1.2.1.2.2.1.10.65
– OID translation:
– .iso.org.dod.internet.mgmt.mib2.interfaces.ifTable.ifEntry.ifInOctets.65
10
SNMP
• Second approach: Numeric OID conversion
– 1.3.6.1.2.1.2.2.1.10.65 is converted using IF-MIB
– IF-MIB partially detailed:
IF-MIB DEFINITIONS ::= BEGIN
IMPORTS
MODULE-IDENTITY, OBJECT-TYPE, Counter32, Gauge32, Counter64,
Integer32, TimeTicks, mib-2,
NOTIFICATION-TYPE
FROM SNMPv2-SMI
…
ifMIB MODULE-IDENTITY
LAST-UPDATED "200006140000Z"
ORGANIZATION "IETF Interfaces MIB Working Group"
CONTACT-INFO
…
ifEntry OBJECT-TYPE
SYNTAX
IfEntry
MAX-ACCESS not-accessible
STATUS
current
DESCRIPTION
"An entry containing management information applicable to a
particular interface."
INDEX { ifIndex }
::= { ifTable 1 }
11
SNMP
– IF-MIB partially detailed (cont.):
IfEntry ::=
SEQUENCE {
ifIndex
InterfaceIndex,
ifDescr
DisplayString,
ifType
IANAifType,
ifMtu
Integer32,
ifSpeed
Gauge32,
ifPhysAddress
PhysAddress,
ifAdminStatus
INTEGER,
ifOperStatus
ifLastChange
ifInOctets
INTEGER,
TimeTicks,
Counter32,
ifInUcastPkts
Counter32,
…
ifInOctets OBJECT-TYPE
SYNTAX
Counter32
MAX-ACCESS read-only
STATUS
current
DESCRIPTION
"The total number of octets received on the interface,
including framing characters.
Discontinuities in the value of this counter can occur at
re-initialization of the management system, and at other
times as indicated by the value of
ifCounterDiscontinuityTime."
::= { ifEntry 10 }
12
SNMP
• Result of the SNMP query
– The OID has a Counter32 variable, so the result of the
query is a 32 bits value stored in that variable
– i.e.: Real query done to a Cisco switch:
– .1.3.6.1.2.1.2.2.1.10.65 = Counter32: 36307165
– That result translated into text using IF-MIB
– .iso.org.dod.internet.mgmt.mib2.interfaces.ifTable.ifEntry.ifInOctets.65 =
Counter32: 36307165
• Conclusion of the results obtained
– The inbound octects that have passed through the
Interface Index 65 of the network equipment queried
are 36307165 total octets at the time queried
– For having results in bps, queries must be polled in
time and calculate delta value between samples
13
Agenda
•
•
•
•
•
•
Part I: Monitoring
Concepts
SNMP
Hardware
Tools
Active Monitoring
14
Hardware
•
•
•
•
The hardware involved in SNMP monitoring are all the network equipment
and servers
RedIRIS core network
– Layer 2 switches
– Nortel MERS 8610
– Cisco Catalyst 6500
– Layer 3 routers
– Juniper T-320, M-320
– Juniper MX-480, MX-960
– Juniper M120, M40e, M20, M10i
RedIRIS access network
– Layer 2 switches
– Juniper EX-4200
– Cisco Catalyst 2960
– Layer 3 routers
– Juniper M7i
RedIRIS servers
– Red Hat Linux Enterprise 4.x and 5.x
– Solaris 8 and Solaris 10
15
Hardware
•
SNMP configuration
– Network equipment (L2, L3)
– General config parameters
– SNMP version
– SNMP communities (RO, RW)
– SNMP clients
– TRAPs to send to the SNMP manager
– Source address to bind TRAP packets
– Location and contact details
– TRAP details
– Vendor specific
– Vendor MIBs in SNMP manager
– Categories
– Authentication
– Chassis
– Link
– VLANs
– Configuration
– Routing
– STP
– …
16
Hardware
•
SNMP configuration
– Cisco IOS
– Parameters configured globally
snmp-server community public RO
snmp-server community private RW
snmp-server trap-source Vlan40
snmp-server location RedIRIS NOC; Ed. BRONCE, Pza. Manuel Gomez
Moreno, s/n, 28020-Madrid
snmp-server contact RedIRIS NOC; +34 91 2127620; <noc@rediris.es>
snmp-server enable traps snmp authentication linkdown linkup coldstart
warmstart
snmp-server enable traps vlancreate
snmp-server enable traps vlandelete
snmp-server enable traps config
snmp-server enable traps bridge newroot topologychange
snmp-server enable traps syslog
snmp-server host 130.206.1.39 version 2c community
snmp-server tftp-server-list 80
snmp-server chassis-id number
17
Hardware
•
SNMP configuration
– Juniper JUNOS
– Configured in snmp dedicated module of the configuration
snmp {
location "Centro de Gestion de RedIRIS, C/ Serrano 142 (28006-Madrid)";
contact "RedIRIS NOC; +34 912127620; +34 629148201; <noc@rediris.es>";
community <community> {
authorization read-only;
clients {
130.206.1.39/32;
130.206.1.40/32;
}
}
trap-options {
source-address lo0;
}
/* Notifications */
trap-group <trap-group-name>{
version v2;
categories {
authentication;
chassis;
link;
remote-operations;
routing;
startup;
rmon-alarm;
}
targets {
130.206.1.39;
}
}
}
18
Hardware
•
SNMP configuration
– Servers (Solaris, Linux)
– SNMP manager used in RedIRIS (NET-SNMP)
– Both client and server features
– Used for Solaris and Linux systems
– Available for free (http://www.net-snmp.org/)
– SNMP config files
– /etc/snmp/snmpd.conf
– SNMP daemon config file
– Listening UDP port 161
#ACL
com2sec local 127.0.0.1/32 <community>
com2sec myLAN192.168.1.0/24 <community>
#ACL assignment for RW and RO groups
group MyRWGroup v1 local
group MyRWGroup v2c local
group MyROGroup v1 myLAN
group MyROGroup v2c myLAN
# MIB tree to be queried
## name incl/excl subtree mask(optional)
view all included .1 80
#group context sec.model sec.level prefix read write notif
access MyROGroup "" any noauth exact all none none
access MyRWGroup "" any noauth exact all all all
# Contact Information
syslocation RedIRIS NOC; Ed. BRONCE, Pza. Manuel Gomez Moreno, s/n, 28020-Madrid
syscontact RedIRIS NOC; +34 91 2127620; noc@rediris.es
19
Hardware
•
SNMP configuration
– Servers (Solaris, Linux)
– SNMP manager used in RedIRIS (NET-SNMP)
– Both client and server features
– Used for Solaris and Linux systems
– Available for free (http://www.net-snmp.org/)
– SNMP config files
– /etc/snmp/snmptrapd.conf
– TRAP receiver daemon config file
– Listening UDP port 162
# --== SONET/SDH Alamrs ==-traphandle JUNIPER-SONET-MIB::jnxSonetAlarmSet /usr/local/bin/traptoemail -s chico.rediris.es -f monitorred@rediris.es ops@rediris.es
traphandle JUNIPER-SONET-MIB::jnxSonetAlarmCleared
red@rediris.es ops@rediris.es
/usr/local/bin/traptoemail -s chico.rediris.es -f monitor-
# --== Links ==-traphandle IF-MIB::linkUp
traphandle IF-MIB::linkDown
ops@rediris.es
/usr/local/bin/traptoemail -s chico.rediris.es -f monitor-red@rediris.es ops@rediris.es
/usr/local/bin/traptoemail -s chico.rediris.es -f monitor-red@rediris.es
# --== BGP ==-traphandle BGP4-MIB::bgpEstablished
ops@rediris.es
/usr/local/bin/traptoemail -s chico.rediris.es -f monitor-red@rediris.es
traphandle BGP4-MIB::bgpBackwardTransition
red@rediris.es ops@rediris.es
/usr/local/bin/traptoemail -s chico.rediris.es -f monitor-
– Traphandle is used to execute a script (traptoemail)
– Traptoemail is a script that processes traps and send them
user-friendly via e-mail to RedIRIS NOC
20
Hardware
•
SNMP configuration
– Servers (Solaris, Linux)
– SNMP daemons
– /etc/init.d/snmpd
– /etc/init.d/snmptrapd
– Launching options
– start
– status (for snmpd)
– stop
– restart
– reload
– Options in daemon:
– OPTIONS="-c /etc/snmp/snmptrapd.conf -o
/var/log/snmptrap.log -u /var/run/snmptrapd.pid -M
/usr/local/share/snmp/mibs/ -m ALL”
– This will take snmptrapd.conf as config file for the
daemon, will generate snmptrapd.log and snmptrapd.pid
files and will load ALL MIBs on the machine in the defined
path
21
Agenda
•
•
•
•
•
•
Part I: Monitoring
Concepts
SNMP
Hardware
Tools
Active Monitoring
22
Tools
•
trap2email
– Perl script combined with SNMP trap handler used to convert SNMP traps
to e-mail messages
– Should be launched as an extension of snmptrapd, not as a regular user
– Options
– -s smtpserver
– -f fromaddress
– toaddress
– traphandle IF-MIB::linkUp
/usr/local/bin/traptoemail -s
chico.rediris.es -f monitor-red@rediris.es ops@rediris.es
– Line in /etc/snmp/snmptrapd.conf file
– Results
Host: EB-Santiago0 (130.206.204.254)
SNMPv2-MIB::sysUpTime.0 112:4:13:18.95
SNMPv2-MIB::snmpTrapOID.0 IF-MIB::linkUp
IF-MIB::ifIndex.121 121
IF-MIB::ifAdminStatus.121 up
IF-MIB::ifOperStatus.121 up
IF-MIB::ifName.121 so-3/0/0
SNMPv2-MIB::snmpTrapEnterprise.0 JUNIPER-CHASSIS-DEFINES-MIB::jnxProductNameM40e
Interfaz: so-3/0/0
Descripcion del interfaz: -- Conexion RedIRIS-FCCN I - Num. Adm. 1530000-1022512
23
Tools
•
MRTG (The Multi Router Traffic Grapher)
– Tool written in Perl downloadable for free from MRTG main web-site
licensed under GPL (http://oss.oetiker.ch/mrtg/)
– The tool uses SNMP to query network devices and gets information from
them
– The results of the queries are stored (log or RRD)
– Those files are processed and included in a HTML file with PNG graphs
– RedIRIS use RRD (Round Robin Database) format to store data collected
– Example of graph generated with MRTG and RRD data
24
Tools
•
•
MRTG basic components
– mrtg: main program
– cfgmaker: script used to generate .cfg files needed for the main program to
generate graphs
– RRDtool: if required. In RedIRIS RRD is used so RRDtool is needed and
information is stored in RRD database format
– RRDtool is a free opensource tool licensed under GPL
– Downloadable (http://oss.oetiker.ch/rrdtool/)
MRTG configuration
– MRTG needs .cfg files to generate HTML web pages where information is
displayed
– cfgmaker [options] [community@]router [[options] [community@]router
...]
– Some options available:
–
--ifref=nr
–
--ifref=ip
–
--ifref=eth
–
--ifref=descr
–
--ifref=name
–
--ifref=type
–
--ifdesc=nr
–
--ifdesc=ip
–
--ifdesc=descr
–
--ifdesc=name
–
--ifdesc=alias
–
--ifdesc=type
interface references by Interface Number (default)
... by Ip Address
... by Ethernet Number
... by Interface Description
... by Interface Name
... by Interface Type
interface description uses Interface Number (default)
... uses Ip Address
... uses Interface Description
... uses Interface Name
... uses Interface Alias
... uses Interface Type
25
Tools
•
MRTG configuration
– Command used in RedIRIS
– ./cfgmaker --global "HtmlDir: /home/mrtg/datos/GAL/html" --global
"ImageDir: /home/mrtg/datos/GAL/html/image" --global "LogDir:
/home/mrtg/datos/GAL/html/log" --global "LogFormat: rrdtool" --global
"PathAdd: /usr/bin/" --global "Options[_]: growright, bits" --snmpoptions=:::::2 <community>@eb-santiago0
HtmlDir: /home/mrtg/datos/GAL/html
ImageDir: /home/mrtg/datos/GAL/html/images
LogDir: /home/mrtg/datos/GAL/html/log
LogFormat: rrdtool
PathAdd:/usr/bin/
#WorkDir:/home/noc/mrtg/html/GAL
Refresh:300
Language: Spanish
Forks: 4
RunAsDaemon:Yes
Interval:5
Background[_]: #e8e7dc
#--------------------------------------------------------------YLegend[cesga]: Bits por segundo
Options[cesga]: growright, bits
Target[cesga]: /130.206.204.21:<community>@eb-santiago0.rediris.es:::::2
MaxBytes[cesga]: 312500000
Title[cesga]: Línea de acceso CESGA
PageTop[cesga]:
<TABLE>
<TR><TD>Línea:</TD><TD>GigabitEthernet 1000 Mbps</TD></TR>
<TR><TD>Sistema:</TD><TD>EB-Santiago0</TD></TR>
<TR><TD>Administrador:</TD><TD>NOC de RedIRIS; +34-91 212 76 20/25; <noc@rediris.es></TD></TR>
</TABLE>
#---------------------------------------------------------------
26
Tools
•
MRTG results
27
Tools
•
MRTG organization in RedIRIS
– Each RedIRIS Node has an unique cfg file
– MRTG statistics divided in several groups
– RedIRIS10 links
– External links
– Multicast statistics
– BGP peerings
– Monthly statistics
– Yearly statistics
– RedIRIS Central Services
– Special Projects links
– Access statistics
– Alphabetically ordered by Institution
28
Tools
•
Wheathermap
– Combination of several files to generate the map
– SVG map for output
– XML file with the status of the network
– PNG files to display in a web page
29
Tools
•
Nagios
– Open Source monitoring tool licensed under GPL
– Free downloadable (http://www.nagios.org/)
– Prerequisites needed to install the tool
– HTTP server (Apache)
– GCC compiler to build the binaries from source
– GD development libraries
– In fedora Linux for example all packages can be installed with
yum
yum install httpd
yum install gcc
yum install glibc glibc-common
yum install gd gd-devel
– Download and install Nagios and Nagios Plugins
– Nagios Plugins are needed to check the status of hosts and services
– HTTP, POP3, FTP, SSH, NTP…
– CPU Load, Disk Usage, Memory Usage, Users…
– Servers and Hosts (Unix/Linux, Windows)
– Routers, Switches
– …
30
Tools
•
Nagios configuration
– Main Configuration File
– /usr/local/nagios/etc/nagios.cfg
– File read by daemon and CGIs
– Default file OK for starting
– Resource Files
– Used to store user defined macros
– Referenced in nagios.cfg
– Object Definition Files
– Used to define hosts, services and
everything to be monitored
– Used to define HOW hosts are
monitored
– Referenced in nagios.cfg
– CGI Configuration File
– Used to define directives that affect
the operation of CGIs
– Referenced in nagios.cfg
31
Tools
•
Nagios configuration examples
– Main Configuration File – nagios.cfg
– Default file after installing is OK for starting with the tool
– Resource Files
– Optional and useful to store usernames, passwords of paths
– See resource.cfg file in the sample-config directory of the Nagios
installation package
– Object Definition Files
– Defined in nagios cfg: cfg_file=<file_name>
cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
– Example hosts.cfg file
define host{
use generic-host
host_name chico.rediris.es
alias Chico
Address 130.206.1.3
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24×7
notification_options d,u,r
}
– CGI Configuration File
– cgi.cfg file located in the config directory
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
32
Tools
•
Nagios running
33
Tools
•
Nagios running
34
Tools
•
Nagios running
35
Tools
•
Nagios running
36
Tools
•
Nagios running
37
Tools
•
NagVis
– NagVis is a visualization addon for Nagios
– Free GPL software (http://www.nagvis.org/)
– Objects placed in maps updated periodically
– Maps organized:
– geographically
– physicallly
– Logically
– By processes
– NagVis collects the information from backends
– Default backend delivered with NagVis: NDO (Nagios Data Out)
MySQL Backend
– All objects from Nagios can be added to NagVis
– Each map has its own configuration file
38
Tools
•
NagVis deployment in RedIRIS
39
Tools
•
NagVis deployment in RedIRIS
40
Tools
•
NagVis deployment in RedIRIS
41
Tools
•
NagVis deployment in RedIRIS
42
Agenda
•
•
•
•
•
•
Part I: Monitoring
Concepts
SNMP
Hardware
Tools
Active Monitoring
43
Active Monitoring
•
Until now all monitoring issues covered are passive monitoring
related
– Passive monitoring is considered when devices are periodically
polled to collect data
•
Active Monitoring – What is?
– Active requires “action”
– Active monitoring is considered when injecting packets in the
network to make tests and get results
– Throughput
– Delay
•
Active Monitoring – How to do it?
– In RedIRIS we are actually deploying perfSONAR
(PERFormance Service Oriented Network monitoring
ARchitecture )
– Information and downloading (http://www.perfsonar.net/)
– DANTE vs Internet2 version
– JAVA vs Perl
44
Active Monitoring
•
perfSONAR components
– Client / Server application
– Client-side - perfSONAR UI (User Interface)
– Server-side
– 1 Linux box for throughput measurements (BWCTL)
– 1 Linux box for delay measurements (OWAMP)
– Server installation
– Red Hat Enterprise Linux 5.3 recomended
– May run in any Linux distribution
– RedIRIS tested in CentOS Linux 5.3
– Set of tools available in RPM binaries and TGZ sources
– Some dependencies not resolved
– It’s not expensive but hard to deploy
– Client installation
– JAVA graphical client multi-platform available
45
Active Monitoring
•
perfSONAR UI in action
46
Active Monitoring
•
perfSONAR services
– Measurement Point Service
– It creates and/or publish monitoring information related to active or
passive measurements
– Measuremente Archive Service
– It stores and publish received information from Measurement Point
Services
– Transformation Service
– It provides the capability to manipulate the stored data of the
measurements performed
– Lookup Service
– Used to discover services and other LS
– Topology Service
– Allows the information of network topology is available to other
services
– Finds closest MP
– Provides information of network topology to the visualization tools
– Authentication Service
– Controls access to services
47
Active Monitoring
•
perfSONAR services
– Measurement Point Service
– It creates and/or publish monitoring information related to active or
passive measurements
– Measuremente Archive Service
– It stores and publish received information from Measurement Point
Services
– Transformation Service
– It provides the capability to manipulate the stored data of the
measurements performed
– Lookup Service
– Used to discover services and other LS
– Topology Service
– Allows the information of network topology is available to other
services
– Finds closest MP
– Provides information of network topology to the visualization tools
– Authentication Service
– Controls access to services
48
Active Monitoring
•
perfSONAR Client interaction
¿Where get info from Networks A and B?
gLS
Graph
LS A, LS B
¿Link utilization – IPs a,b,c?
Client
a,b,c : Net A, MA A
Get link abc utilization
Response
LS A
a
MA A
LS B
b
e
c
Network A
MA B
f
d
Network B
49
Active Monitoring
•
perfSONAR tools
– OWAMP (One Way Active Measurement Protocol)
– Daemon that runs one-way latency tests
– Provides:
– More accurate picture of the performance degradation
(direction of degradation, is more sensitive to jitter)
– Vision of the routing (hops, one-way latency)
– Availability Information
– Temporal reference about problems
– BWCTL (BandWidth test ConTroLler)
– Daemon that runs iperf tests with multiple instances support
– Provides:
– Troubleshooting tool because it makes use of the network the
same way as a user wouldArchivado de pruebas realizadas
con límite de tráfico alcanzado
– More tools
50
Active Monitoring
•
Spanish LHC architecture
Active Monitoring
•
perfSONAR web-services (LS web admin interface)
Active Monitoring
•
perfSONAR web-services (LS Basic Configuration)
Agenda
•
•
•
•
•
•
•
Part II: Operational Procedures
Organization
Incidents
Maintenance works
24x7
SLAs
Procedure
54
Organization
•
RedIRIS NOC is structured in levels
– Level 1
– Initial response team
– Monitoring network devices in real time
– Answering ops mailbox and level 1 queue
– Answering customer phone calls
– First approach to solve problems
– Dealing with carriers directly
– External company support
– Level 2
– Second level response team
– Answering noc mailbox and level 2 queue
– Supporting more complex network problems
– Dealing with vendors
– RedIRIS people
– External company support
55
Agenda
•
•
•
•
•
•
•
Part II: Operational Procedures
Organization
Incidents
Maintenance works
24x7
SLAs
Procedure
56
Incidents
•
Incidents reported in several ways
– Tickets tool
– Web interface tool where all incidents are queued
– Main level 1 and level 2 team support tool
– e-mail
– RedIRIS ops and noc mailboxes
– Customers suppport mailboxes
– Network devices problems reports
– Telephone
– Customers also contact level 1 by phone
– Monitoring tools
– All the monitoring platform reports indicents in the network
– Level 1 continue checking monitoring tools
– Logs
– All the machines logs are stored and processed when
problems are detected
57
Agenda
•
•
•
•
•
•
•
Part II: Operational Procedures
Organization
Incidents
Maintenance works
24x7
SLAs
Procedure
58
Maintenance works
•
Different possibilities
– Network operator programmed work
– 15 previous days notification
– RedIRIS aceptation
– RedIRIS programmed work
– Engineering tasks
– Maintenance tasks
– New service configuration
– Non-programmed works
– Due to unexpected problems
– Network links (fiber cuts, etc.)
– Network equipment (hardware problems)
•
Ticket system notification for all Institutions connected to RedIRIS
– Web based tool used to notify and update information about
network problems
– Notifications via e-mail
59
Agenda
•
•
•
•
•
•
•
Part II: Operational Procedures
Organization
Incidents
Maintenance works
24x7
SLAs
Procedure
60
24x7
•
External company 24x7x365 monitoring
– Support when RedIRIS people not in the office
– Procedures to monitor all RedIRIS equipment
– Procedures to open/close RMAs
– Hardware replacement procedures established
– Network operator and hardware vendors interaction
•
They can also do in the equipment
– Execute “show” commands for monitoring
– Receive SNMP trap notifications
– Console login for Hardware replacements
•
They can NOT do in the equipment
– Execute “config” commands
– Modify running configuration
– Configure new services
61
Agenda
•
•
•
•
•
•
•
Part II: Operational Procedures
Organization
Incidents
Maintenance works
24x7
SLAs
Procedure
62
SLAs
•
Network Operators SLA
– Maintanence works MUST be 15 previous days notified
– If this is not done then a penalty is applied
– The links stability and quality must be guaranteed
– No degradation
– No outages
– There is a penalty for link failures greater than 10 secs
– There is a maximum incident response time established
– Incremental penalty to several failures of the same link
•
External company SLA
– Dedicated people guaranteed
– Maximum incident response time
– Hardware stockage available
•
Hardware vendor SLA
– 4 hour hardware replacement guaranteed
– Engineering support
63
Agenda
•
•
•
•
•
•
•
Part II: Operational Procedures
Organization
Incidents
Maintenance works
24x7
SLAs
Procedure
64
Procedure
•
Incidents reported via Trouble Ticket tool
65
Procedure
•
Web or e-mail managed incidents
66
Procedure
•
New ticket creation – Also can be done by e-mail
67
Procedure
•
All new incidents are included in the Trouble Ticket system
– e-mail notifications
– phone calls
– Incidents reported by monitoring tools
– New service deployment
•
All incidents are stored in a MySQL database
– Reports
– Statistics
– Tracing
•
Level 1 to Level 2 escalating
68
Procedure
•
Network outages notifications
– Same tool used
69
Procedure
•
Results – Network tickets opened
70
Procedure
•
Results – Network ticket tracing
71
Questions ?
72
Download