Practical use of Ethernet OAM Joerg Ammon (jammon@brocade.com) Systems Engineer Service Provider May 2011 © 2011 Brocade Communications Systems, Inc. Company Proprietary Information 1 Overview • A variety of Operations, Administration, and Management (OAM) protocols and tools were developed in recent years for MPLS, IP, and Ethernet networks. • These tools provide unparalleled power for an operator to proactively manage networks and customer Service Level Agreements (SLAs). • This session reviews the various OAM tools available in MPLS/IP/ Ethernet networks at various layers of the stack and recommends/reviews best practices for choosing the right OAM protocol to use in a network. © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 2 OAM Tools Scope of this presentation Management Plane (NMS,EMS) OAM&P Network Plane (Network Elements) Scope of this presentation: OAM tools across network elements Scope of this presentation is within network plane only (not management plane) © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 3 OAM Layering OAM Layers Service Layer OAM • OAM is layered… Network Layer OAM • Service Layer OAM • Network Layer OAM • Transport Layer OAM • ... and hierarchical • For example, service layer for Operator A is transport layer for the service provider Transport Layer OAM Service Provider Customer Network MPLS Ethernet Operator A Network Operator B Network Customer Network Customer Location 1 Customer Location 2 Service OAM • Each layer supports its own OAM mechanisms • Operator A has an MPLS network and uses MPLS OAM tools • Operator B has an Ethernet network and uses Ethernet OAM tools © 2011 Brocade Communications Systems, Inc. Company Proprietary Information MPLS OAM (Operator A) Link OAM May 2011 Ethernet OAM (Operator B) Link OAM Link OAM 4 OAM Tools Each layer has its own best-suited OAM tools VPN VRF Ping and Traceroute (Layer 3 VPN) 802.1ag CFM for VPLS/VLL Y.1731 PM for VPLS/VLL (Layer 2 VPN) IP Ping and Traceroute BFD for OSPF and IS-IS MPLS LSP Ping and Traceroute BFD for RSVP-TE LSPs Layer 2 Layer 2 Trace Port Loop Detection UDLD Single-link LACP Keep-alive 802.1ag CFM/ Y.1731 PM 802.3ah EFM OAM Business Problem Brocade Solution • Fault detection, verification, and isolation at every level • Standards-based, end-to-end OAM • Proactive detection of service degradation • Comprehensive/scalable MPLS, IP, and Ethernet OAM tools • Performance Monitoring (PM) and SLA verification © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 5 Layer 2 OAM + Layer 2 VPN CFM/PM: 802.1ag CFM, Y.1731 PM © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 6 Layer 2 OAM + Layer 2 VPN CFM/PM: 802.1ag CFM, Y.1731 PM VPN VRF Ping and Traceroute (Layer 3 VPN) 802.1ag CFM for VPLS/VLL Y.1731 PM for VPLS/VLL (Layer 2 VPN) IP Ping and Traceroute BFD for OSPF and IS-IS MPLS LSP Ping and Traceroute BFD for RSVP-TE LSPs Layer 2 Layer 2 Trace Port Loop Detection © 2011 Brocade Communications Systems, Inc. Company Proprietary Information UDLD May 2011 Single-link LACP Keep-alive 802.1ag CFM/ Y.1731 PM 802.3ah EFM OAM 7 IEEE 802.1ag CFM Connectivity Fault Management (CFM) Service Provider • Facilitates • Path discovery • Fault detection • Fault verification and isolation • Fault notification • Fault recovery • Supports • Continuity Check Messages (CCMs) • LinkTrace • Loopback messages Customer Network Operator A Network Operator B Network Customer location 1 Customer Network Customer location 2 Customer CFM Service Provider CFM MEP MIP Operator A CFM Operator B CFM Brocade Implementation •Support for minimum CCM timers (3.3 ms) using hardware offload © 2011 Brocade Communications Systems, Inc. Company Proprietary Information •3.3 ms, 10 ms, 100 ms, 1 s, 1 min, 10 min May 2011 8 IEEE 802.1ag CFM Terminology • MD (Maintenance Domain) • The part of a network for which faults in Layer 2 connectivity can be managed Service Provider Customer Network ME MD level 5 (7, 6, or 5) Service Provider MA ME Operator A MA MEP ME MIP MD level 3 (4 or 3) Operator B MA ME MD level 1 (2, 1, or 0) • ME (Maintenance Entity) • A point-to-point relationship between two MEPs within a single MA • MD Level • An integer from 0 to 7 in a field in a CFM PDU that is used, along with the VLAN ID, to identify which MIPs/MEPs would be interested in the contents of a CFM PDU • MA (Maintenance Association) • A set of MEPs established to verify the integrity of a single service instance (a VLAN or a VPLS) Customer MA UP MEP • Two types: up (inward*) MEP or down (outward) MEP Customer Network Customer location 2 Down MEP • A Maintenance Point (MP) at the edge of a domain that actively sources CFM messages • A maintenance point internal to a domain that only responds when triggered by certain CFM messages Operator B Network Customer location 1 • MEP (Maintenance End Point) • MIP (Maintenance Intermediate Point) Operator A Network (*): “inward” in respect to the device © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 9 IEEE 802.1ag CFM Connectivity Check, LinkTrace, and Loopback Messages • Continuity Check Message (CCM) • • • MEP A periodic hello message multicast by an MEP within the maintenance domain Periodic CCM (multicast) Periodic CCM MEP LinkTrace Message (LTM) • A multicast message used by a source MEP to trace the path to other MEPs and MIPs in the same domain • All reachable MIPs and MEPs respond back with a Link Trace Unicast Reply (LTR) • The originating MEP can then determine the MAC addresses of all MIPs and MEPs belonging to the same Maintenance Domain MEP LTM (multicast) LTR (Unicast) MEP MIP LTR (Unicast) Loopback Message (LBM) • Used to verify the connectivity between a MEP and a peer MEP or MIP • A loopback message is initiated by a MEP with a destination MAC address set to the desired destination MEP or MIP (Unicast) • The receiving MIP or MEP responds to the Loopback message with a Loopback Reply (LBR) (Unicast) • A loopback message helps a MEP identify the precise location of a fault along a given path © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 MEP LBM (Unicast) LBR MEP 10 Hierarchical Fault Detection Example: fault in Operator B network (an MPLS Network) • Customer detects fault using Continuity Check and locates fault using Link Trace • Provider A detects fault using Continuity Check and locates fault using Link Trace • Provider B detects fault using Continuity Check, but isolates fault using MPLS OAM (see MPLS OAM section) • A service provider (not shown) would detect this fault in a similar way using Continuity Check and Link Trace from CPEs (Customer Premise Equipment) 1: Customer Continuity Check detects end-to-end fault 2: Customer Link Traces isolate fault past customer MIPs 3: Provider A’s Continuity Check detects end-to-end fault MIPs and MEPs at VPLS/VLL endpoints 4: Provider A Link Traces isolate fault inside Provider B’s network 5: Provider B’s Continuity Check detects service fault MPLS PE (VPLS/VLL) PE P MEP MIP Fault Customer Network (Site 1) Operator A (Location A1) Operator B Fault Localized © 2011 Brocade Communications Systems, Inc. Company Proprietary Information Operator A (Location A2) May 2011 Customer Network (Site 2) 11 IEEE 802.1ag Configuration Example To verify end-to-end connectivity between CE1 and CE2 MPLS 7 1/1 CE1 Configure a down MEP on CE1 CE1(config)#cfm-enable CE1(config-cfm)#domain-name CUST_1 level 7 CE1(config-cfm-md-CUST_1)#ma-name ma_5 vlan-id 30 priority 3 CE1(config-cfm-md-CUST_1-ma-ma_5)#ccminterval 10-second CE1(config-cfm-md-CUST_1-ma-ma_5)#mep 1 down vlan 30 port ethe 1/1 CE1(config-cfm-md-CUST_1-mama_5)#remote-mep 2 to 2 VLL 7 1/1 7 2/1 PE1 Create a VLL instance (PE1) 7 2/1 PE2 CE2 Create a VLL instance (PE2) PE1(config)#router mpls PE1(config-mpls)vll pe1-to-pe2 30 PE1(config-mpls-vll)vll-peer 1.1.1.2 PE1(config-mpls-vll)untagged ethe 1/1 PE1(config-mpls-vll)vlan 30 PE1(config-mpls-vll-vlan)tagged ethe 1/1 PE2(config)#router mpls PE2(config-mpls)vll pe2-to-pe1 30 PE2(config-mpls-vll)vpls-peer 1.1.1.1 PE2(config-mpls-vll)untagged ethe 2/1 PE2(config-mpls-vll)vlan 30 PE2(config-mpls-vll-vlan)tagged ethe 2/1 Configure CFM on PE1 Configure CFM on PE2 PE1(config)#cfm-enable PE1(config-cfm)#domain-name CUST_1 level 7 PE1(config-cfm-md-CUST_1)#ma-name ma_5 vll-id 30 priority 3 PE1(config-cfm-md-CUST_1-ma-ma_5)#ccminterval 10-second In the above configuration, a MIP is created by default on the VLL port. PE2(config)#cfm-enable PE2(config-cfm)#domain-name CUST_1 level 7 PE2(config-cfm-md-CUST_1)#ma-name ma_5 vll-id 30 priority 3 PE2(config-cfm-md-CUST_1-ma-ma_5)#ccminterval 10-second In the above configuration, a MIP is created by default on the VLL-endpoint. Configure a down MEP on CE2 CE2(config)#cfm-enable CE2(config-cfm)#domain-name CUST_1 level 7 CE2(config-cfm-md-CUST_1)#ma-name ma_5 vlan-id 30 priority 3 CE2(config-cfm-md-CUST_1-ma-ma_5)#ccminterval 10-second CE1(config-cfm-md-CUST_1-ma-ma_5)#mep 2 down vlan 30 port ethe 2/1 CE1(config-cfm-md-CUST_1-mama_5)#remote-mep 1 to 1 LSP ping and LSP traceroute tools would be used inside the MPLS network to detect and diagnose LSP failures © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 12 ITU-T Y.1731 Performance Management • Standards-based performance management for Ethernet networks • Interoperates in a multivendor environment • Supports high-precision, on-demand measurement of round-trip SLA parameters • Frame Delay (FD) • Frame Delay Variation (FDV) • Measurements done between MEPs © 2011 Brocade Communications Systems, Inc. Company Proprietary Information Brocade MLX Brocade MLX MEP MEP ETH-DM Frame Delay Frame Delay Variation MEP: Management Enforcement Point ETH-DM: Ethernet Delay Measurement Benefits • SLA monitoring and verification Applicability • Aggregation, metro, and core networks • Delay-sensitive applications, such as voice • Differentiated services with SLA guarantees Brocade differentiation • Hardware-based time-stamping mechanism • Measurements with microsecond granularity • Y.1731 PM for VPLS/VLL May 2011 13 ITU-T Y.1731 Performance Management Example NetIron# cfm delay_measurement domain md2 ma ma2 src-mep 3 target-mep 2 Y1731: Sending 10 delay_measurement to 0012.f2f7.3931, timeout 1000 msec Type Control-c to abort Reply from 0012.f2f7.3931: time= 32.131 us Reply from 0012.f2f7.3931: time= 31.637 us Brocade MLX Reply from 0012.f2f7.3931: time= 32.566 us Brocade MLX Reply from 0012.f2f7.3931: time= 34.052 us MEP 2 Reply from 0012.f2f7.3931: time= 33.376 us MEP 3 Reply from 0012.f2f7.3931: time= 31.501 us ETH-DM Reply from 0012.f2f7.3931: time= 33.016 us Reply from 0012.f2f7.3931: time= 32.537 us Reply from 0012.f2f7.3931: time= 32.492 us Reply from 0012.f2f7.3931: time= 32.552 us sent = 10 number = 10 A total of 10 delay measurement replies received. Success rate is 100 percent (10/10) ==================================================================== Round Trip Frame Delay Time : min = 31.501 us avg = 32.586 us max = 34.052 us Round Trip Frame Delay Variation : min = 45 ns avg = 839 ns max = 1.875 us ==================================================================== © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 14 Link OAM IEEE 802.3ah Ethernet First Mile (EFM) OAM • Supports point-to-point (single) link OAM • Monitors and supports troubleshooting individual links • Standards-based for Ethernet networks • Interoperates in a multivendor environment • Supports • Fault detection and notification (alarms) • Discovery • Remote failure indication • Loopback testing © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 802.3ah OAM 802.3ah OAM NetIron#show link-oam info detail ethernet 1/1 OAM information for Ethernet port: 1/1 link-oam mode: active link status: up oam status: up Local information multiplexer action: forward parse action: forward stable: satisfied state: up loopback state: disabled dying-gasp: false critical-event: false link-fault: false Remote information multiplexer action: forward parse action: forward stable: satisfied loopback support: disabled dying-gasp: false critical-event: false link-fault: false 15 Layer 2 OAM Summary Intended Application Supports Layer 2 Trace Port Loop Detection UDLD Layer 2 network troubleshooting, detection of mis-configuration Layer 2 network troubleshooting, detection of mis-configuration Link keep-alive Layer 2 topology discovery, Layer 2 loop detection Layer 2 loop detection Single-Link Keep-Alive Link keep-alive 802.1ag CFM Y.1731 PM Single-link keep-alive Service verification Performance (SLA) verification Customer access verification Single-link keep-alive Layer 2 Connectivity Check, Link Trace, Loopback One-way delay and delay variation Single-link OAM: Fault Detection, Discovery, Loop-back, and so on Manual Auto, Manual (LB) Yes Yes Generation Manual Automatic Automatic Automatic CC: auto LT, LB: manual Standard No No No Yes Yes 802.3ah EFM OAM Remember: OAM is layered and hierarchical (service OAM for an operator is transport OAM for a service provider) © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 16 MPLS OAM © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 17 MPLS OAM VPN VRF Ping and Traceroute (Layer 3 VPN) IP Ping and Traceroute MPLS LSP Ping and Traceroute Layer 2 Layer 2 Trace Port Loop Detection © 2011 Brocade Communications Systems, Inc. Company Proprietary Information 802.1ag CFM for VPLS/VLL Y.1731 PM for VPLS/VLL (Layer 2 VPN) BFD for OSPF and IS-IS BFD for RSVP-TE LSPs UDLD May 2011 Single-link LACP Keep-alive 802.1ag CFM/ Y.1731 PM 802.3ah EFM OAM 18 LSP Ping and LSP Traceroute MPLS OAM tools • LSP Ping and LSP Traceroute provide OAM functionality for MPLS networks based on RFC 4379. • LSP Ping and LSP Traceroute tools provide a mechanism to detect MPLS data plane failure. • MPLS echo requests follow the same data path that normal MPLS packets would traverse. • LSP Ping is used to detect data plane failure and to check the consistency between the data plane and the control plane. • LSP Traceroute is used to isolate the data plane failure to a particular router and to provide LSP path tracing. © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 19 LSP Ping MPLS Network • The basic idea is to verify that packets that belong to a particular Forwarding Equivalence Class (FEC) actually end their MPLS path on a Label Switching Router (LSR) that is an egress for that FEC. PE P PE (LER) (LSR) (LER) Echo Request Echo Reply • LDP LSP Ping and RSVP LSP Ping are supported. LSP Ping LDP LSP Ping NetIron# ping mpls ldp 22.22.22.22 Send 5 80-byte MPLS Echo Requests for LDP FEC 22.22.22.22/32, timeout 5000 msec Type Control-c to abort !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max=0/1/1 ms. Syntax: ping mpls ldp <ip-address | ip-address/mask-length> ... options RSVP LSP Ping NetIron# ping mpls rsvp lsp toxmr2frr-18 Send 5 92-byte MPLS Echo Requests over RSVP LSP toxmr2frr-18, timeout 5000 msec Type Control-c to abort !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max=0/1/5 ms. Syntax: ping mpls rsvp lsp <lsp-name> | session <tunnel-source-address> <tunnel-destination-address> <tunnel-id> ... options © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 20 LSP Traceroute MPLS Network • With LSP traceroute, an echo request packet is sent to the control plane of each transit LSR, which confirms that it is a transit LSR for this path. PE P PE (LER) (LSR) (LER) Echo Request • Transit LSRs return echo replies. • LDP LSP Ping and RSVP LSP Ping are supported. Echo Replies LSP Traceroute LDP LSP Traceroute NetIron# traceroute mpls ldp 22.22.22.22 Trace LDP LSP to 22.22.22.22/32, timeout 5000 msec, TTL 1 to 30 Type Control-c to abort 1 10ms 22.22.22.22 return code 3(Egress) Syntax: traceroute mpls ldp < ip-address | ip-address/mask-length> ... options RSVP LSP Traceroute NetIron # traceroute mpls rsvp lsp toxmr2frr-18 Trace RSVP LSP toxmr2frr-18, timeout 5000 msec, TTL 1 to 30 Type Control-c to abort 1 1ms 22.22.22.22 return code 3(Egress) Syntax: traceroute mpls rsvp lsp <lsp-name> | session <tunnel-source-address> <tunneldestination-address> <tunnel-id>... options © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 21 MPLS OAM Summary LSP Ping LSP Traceroute BFD for RSVP-TE LSPs To detect data plane failure and to check the consistency between the data plane and the control plane To isolate the data plane failure to a particular router and to provide LSP path tracing Supports Connectivity verification Fast data plane failure Connectivity troubleshooting, detection (link may fault localization be up, but data path is down) Generation Manual Manual Automatic Standard Yes Yes Yes Intended Application © 2011 Brocade Communications Systems, Inc. Company Proprietary Information May 2011 Fast data plane failure detection for RSVP LSPs 22 Observation ICMP Operates at Ping Layer 3 Specification RFC792 Published Sept 1981 RFC1208 (RFC 1983) March 1991 (Aug 1996) July 1983 CFM Layer 2 802.1ag Dec 2007 26 years of work for going down one layer of OAM © 2010 Brocade Communications Systems, Inc. Company Proprietary Information September 2010 23 Thank You © 2011 Brocade Communications Systems, Inc. Company Proprietary Information 24