Ethernet OAM

advertisement
Ethernet OAM
Victor Olifer (JANET/GEANT JRA1 Task 1)
JRA1/TERENA workshop, Copenhagen, 20 November 2012
connect • communicate • collaborate
1
Agenda
Ethernet Service Assurance & Monitoring overview

Monitoring standards

Service assurance standards
Service assurance lab trials
CFM/Y.1731 trial

Multi-domain testbed

OAM agent boxes

CyPortal
JRA1 & JRA2 trial (Year 4 extension)

Multi-segment connections

Diverse equipment

perfSONAR extensions
connect • communicate • collaborate
2
Wide-area point-to-point Ethernet
connections
Ethernet
Ethernet over MPLS
Ethernet over Transport
Multi-segment multi-domain connection with:
-
Ethernet UNI (a must);
segments of pure Ethernet (optional);
segments where Ethernet is tunneled over some other technology,
e.g TDM (SDH, OTN) or MPLS (optional)
Where we can find such connections?
- GEANT Plus, JANET Lightpath: demand is from big projects, large scientific centres
- Inter-router connections
- An offer from commercial providers: they had 20% revenue growth in 2010 over 2009. Mobile
backhaul and multi-site corporates are major users; the reasons – price and flexibility
- New demand for academic providers might arise from such areas as cloud services, data centres,
HD videoconferences, multi-site university connections
connect • communicate • collaborate
3
Problems with managing Ethernet
connections
Until recently Ethernet had no OAM tools (hence cheapest equipment) ->
no way to check, monitor and troubleshoot connectivity and performance
end-to-end ( a customer view) or within a domain (a provider view).
E.g. comparing to IP experience: No ping, traceroute and ICMP diagnostic messages
available.
Partial solution: we can use MPLS or SDH/OTN OAM to manage tunnels
Good news: Ethernet OAM functions started being developed and implemented in equipment
since 2007-8
Bad news: We (JANET) don’t have much experience in Ethernet OAM use.
The same situation in other NRENs
connect • communicate • collaborate
(as far as I know from GEANT3 participants).
4
Three areas of emerging Ethernet
OAM standards
Service
assurance
Service
monitoring
Service
trouble
shooting
• Checks whether a connection performs to its specs,
e.g. up to CIR and EIR, after service configuration and
activation.
• Periodic checks of connection connectivity (continuity)
and performance (delay, loss, throughput, availability)
• When monitoring shows a fault one needs to locate a
faulty point along a path and possible reason(s) of a
failure
connect • communicate • collaborate
5
Service Assurance (1)
1. Service definitions
(topology: e.g. point-to-point, bandwidth
profile: CIR, EIR for several CoS):
• MEF 10.2
• ITU-T G.8011
Very important as it is often a cause of confusions:
e.g. CIR might be measured for UDP payload or
Ethernet frames – very different figures for the same
data flow
2. Service performance parameters (delay, loss,
throughput, availability):
• MEF 10.2.1
• Y.1563
connect • communicate • collaborate
6
Service Assurance (2)
3.Service Verification
Relatively new (Summer 2011) ITU-T spec Y.1564
“Ethernet service activation test methodology”
• Defines a simple disruptive on-demand procedure
that tests connectivity and throughput up to CIR & EIR & policing limit
by injecting traffic into a connection
• More suitable for Ethernet than complex and IP-centric RFC2544;
implemented in many traffic generators
connect • communicate • collaborate
and boxes
7
Service Assurance trials
JANET lab trial of SunRise RxT tester
Positive impression, works according the standard,
looks worth to try in wide-area tests
Tester PIR
Box PIR=CIR+EIR
CIR
Just one problem: Y.1564 doesn't’t give an opportunity
to detect the situation when real PIR value set up lower
than expected (not box bug, just the standard intention)
connect • communicate • collaborate
8
Service Monitoring
 IEEE 802.1ag Connectivity Fault Management (CFM) (ratified in 2007):
- Hierarchical sessions of heartbeat messages
(Continuity Check Messages, CCM) -> up/down status check
- VLAN-aware
- MEP (End) and MIP (Intermediate) maintenance points
 ITU-T Y.1731 (ratified in 2008):
Same as CFM + Performance monitoring (delay, loss, throughput)
Customer maintenance session level 7
Service provider maintenance session level 5
Operator maintenance sessions level 3
connect • communicate • collaborate
9
Service Troubleshooting
 CFM:
- Linktrace (analogy of IP traceroute)
- Loopback (analogy of IP ping)
- RDI (Remote Defect Indication)
 Y.1731:
- same as CFM + a richer set of diagnostic messages + performance
monitoring (loss, delay, throughput):
- Alarm Indication Signal (AIS)
- Lock Signal
- …
connect • communicate • collaborate
10
Service monitoring trials
JRA1 Task 1 Ethernet OAM trial (2011):
- 5 NRENs, 5 connections under 6 months monitoring
- Small Y.1731 agent boxes from Overture
- CyPortal from Cyan Optics for storing and visualising of monitoring data
Positive results but only for
single-segment connections
Combined JRA1 Task 1& JRA2 Task 3 Service Assurance & Monitoring trial
GN3 Year 4 (2012-2013) - ongoing
connect • communicate • collaborate
11
JRA 1 Ethernet OAM trial (2011)
objectives
 Test CFM/Y.1731 functions in multi-domain and multi-vendor environment
(5 connections)
 Evaluate Y.1731 agent boxes
 Evaluate OAM data visualisation system (CyPortal)
Essex Uni
JANET LH
Cyan OAM portal
Collector
NORDUnet
OAM
Data from
Collector
Cloud service
Equipment under test
OAM agent (Overture ISG24)
Monitored VLAN connections
SURFnet
CESNET
PIONIER (PSNC)
connect • communicate • collaborate
12
OAM agent options
Dedicated extra network switch with advanced OAM capabilities

Pros: uniform, rich OAM functionality, and consistent source of monitoring data
 Cons: extra boxes overheads (adds complexity, cost – especially for high speed
links, maintenance etc)
OAM capabilities of existing network boxes: routers, switches, muxes

Pros: no extra equipment, ability to test internal segments
 Cons: some vendor-specific features, e.g. in CFM MIBs – diverse environment
with possible incompatibilities
Software OAM agent on a dedicated server (e.g. ‘dot1ag-utils’ developed by
SARA and presented by Ronald van der Pol at NORDUnet 2011)
 Pros: end users can ping and trace network elements; no switches needed
 Cons: currently limited to MEP down functionality, performance depends on
a server performance, time precision might be an issue
connect • communicate • collaborate
13
ISG24 OAM agent box trial
 Compact 4 port GE demarcation box, low cost (~ $1000)
 2 copper GE and 2 SFP ports (there is 10GE version)
 Web GUI
 OAM functions:
 CFM
 Y.1731 D(elay)MM and L(oss)M
 RFC 2544
 PAA – proprietary analogy of Y.1731
 Ethernet First Mile 802.2ag
connect • communicate • collaborate
14
ISG24 CCM (continuity) tests
 Positive results – properly detected the Up/Down state of all 5 connections
by permanent monitoring over 6 months
 Compact
form
web
 Detailed web form
connect • communicate • collaborate
15
ISG 24 DMM (performance) tests
 Mostly positive results – CFM and PAA Delay Measurement sessions
showed stable and close to expected (from other sources) One Way and
Two Ways delays and jitter results
Janet – NORDUnet PAA results:
PSNC– CESNET CFM DMM results:
 We experienced some problems with CFM One Way delay measurements
on two connections – will talk later after CyPortal slides
connect • communicate • collaborate
16
CyPortal: monitoring data storage
and visualisation
 Detailed monitoring data are collected from ISG24 agent boxes and stored
in a cloud-based database
 Web GUI provides a map of all services;
parameters violate SLD
in red those which current
connect • communicate • collaborate
17
CyPortal: Per- service data
 Historical graphical presentation of all parameters under monitoring
 Zooming of a selected time period
 Setting of SLA limits
 Flexible reports
connect • communicate • collaborate
18
Problems encountered
1. Saw-tooth shape of delay between JANET LH and Essex Uni
Level 5 DDM session
 There was no reason for saw-tooth
shape of Two Way Delay with peaks
of about 1 sec showed by MEP Level
5 (ISG24 box)
Level 3 DDM session
 Capturing and analyzing traffic
before and after MEP Level 3
(Ciena 311v box) showed the
‘guilty’ box:
 MEP Level 3 time-stamped packets
of MEP Level 5 instead of their
transparent forwarding – definitely
a bug in a box software
connect • communicate • collaborate
19
Problems encountered (cont.)
2. Inability of ISG boxes to measure CFM One Way Delay on some
connections (LH-Copenhagen, LH-Essex)
PAA: OAD = 10. 903 TWD = 23,004
CFM DMM: OAD = ---- TWD = 23,004
ISG vendor version: too poor synchronization to calculate CFM OWD
Seems not to be true: why it is enough for proprietary PAA
Needs further investigation !
connect • communicate • collaborate
20
JRA1 Ethernet OAM trial
conclusions
 Ethernet OAM functions embedded in the carrier grade Ethernet
equipment are mature enough to be used for effective monitoring of
health and performance of wide-area Ethernet services from a customer
and provider perspectives
 The use of dedicated Ethernet demarcation boxes with a rich set of
OAM functions (Overture ISG and Accedian MetroNID) proved to be an
effective way for monitoring Ethernet services on the end-to-end basis
 Visualization and data store software like CyPortal is a very useful
element for providing managed Ethernet services
 We managed to monitor only single-segment connections on the endto-end basis – still more to try
connect • communicate • collaborate
21
Year 4 JRA1/JRA2 Service
Assurance & Monitoring trial
Trial objectives:
 To carry on the previous trial with extending of an investigation for:
 multi-segment connections with hierarchical monitoring
 troubleshooting
 use of embedded CFM/Y.1731 function in carrier class equipment
(such as Cisco, Juniper, Extreme, Brocade, Alcatel etc)
 To support new Ethernet OAM functionality in perfSONAR software:
• perfSONAR protocol and topology extensions to support Eth OAM data
(data storing, searching and fetching)
• use of the existing GN3 perfSONAR implementation (perfSONAR MDM) with
needed changes
• standardization under the OGF NMC/NM umbrella
Trail term – 1 year, the end in March 2013
connect • communicate • collaborate
22
GN3 Year 4 testbed
Bristol Uni
NORDUnet
core
JANET
LH
Collector
GEYSERS
NORDUnet
testbed
PSNC
1000
TNO (NL)
SARA
SURFnet
CESNET
- Y.1731 agent box (ISG24 from Overture )
- Y.1731 enabled equipment of the trial participants
- non-Y.1731 enabled equipment of the trial participants
connect • communicate • collaborate
23
Multi-segment Janet – NORDUnet
service
Janet ISG24
193.63.63.133
NORDUnet testbed
ALU 1850 TSS
Janet testbed
Ciena 5305
134
NORDUnet ISG24
109.105.113.183
314
Customer, level 5, MA/MEG=“jan-nor-400”
Provider, level 4,MA/MEG=“jan-nor-400-4” – doesn’t exist yet
144
1152
Operator, level 2, MA/MEG=“janet-400-2” 1153
Inter-node, level 0, MA/MEG=“isg-ciena-400-0”
1101 1102
344
Operator, level 2,MA/MEG=“nor-400-2” – doesn’t exist yet
Inter-operator, level 0, MA/MEG=“jan-nor-400-0”
doesn’t exist yet
1
3
Inter-node, level 0, MA/MEG=“tss-isg”
2 1
connect • communicate • collaborate
Multi-segment tests
• Evaluation of different hierarchical schemes:
 Shared levels (same VLAN ID for domains)
 Independent levels (C-VID, S-VID)
• Testing different ways of visualizing of the hierarchical monitoring
information for different types of users – NOC engineers, end users.
• Location of a failure by:
 using a hierarchy of CCM sessions;
 using Linktrace protocol and MIPs
Different types of faults should be emulated:
• Link fauilre
• Port failure
• Route Loops
• VLAN mismatch
connect • communicate • collaborate
25
Year 4 trial team
 JRA1 Task 1:
 Alberto Colmenero, NORDUnet
 Victor Olifer, Janet
 Marcin Garstka, PSNC,
 Jan Radil, CESNET
 Michal Hazlinsky, CESNET
 Mayur Channegowda , Essex Uni
 JRA2 Task 3:
• Roman Lapacz , PSNC
• Jakub Gutkowski, PSNC
• Freek Dijkstra, SARA
• Ronald van der Pol, SARA
• Richa Malhotra, SURFnet
• Borgert van der Kluit, TNO
• Rob Smets, TNO
• Piotr Zuraniewski, TNO
• Otto Baijer, TNO
connect • communicate • collaborate
26
Questions?
connect • communicate • collaborate
27
Year 3 Partner testbed example PSNC testbed
connect • communicate • collaborate
28
Download