Evolution of IP/OL Performance Management Robert Doverspike, Jennifer Yates, Jorge Pastor, Martin Birk – AT&T Labs Research AT&T Labs Research Outline • Key Takeaways – • Performance Management – must consider interlayer (focus IP) Evolution story for IP/OL – Architecture for Long Haul Networks • Example problems • Next chapter in evolution – Page 2 Let’s get it right this time AT&T Labs Research Title Key Takeaways • • Optical PM goals should focus on use in IP layer – Links in the IP layer form connections in the optical layer – Virtually all high rate connections are IP links (between either routers or Ethernet switches) Perfect optical layer detection is a lofty goal, but – will fall short if architected in isolation • • E.g., need to have strong inter-layer coordination Why do we stress this for OL? – Inter-layer fault management has many flaws in practice, even after 15 years of SONET perfecting – Need adequate mechanisms across layers to handle scenarios when things go wrong or confusion reigns Page 3 AT&T Labs Research Title Evolution Story for Long Haul Networks IP Layer 1st Generation SONET Ring Layer Pt-Pt WDM Layer Router DCS/Intelligent Optical Switch ADM Degree-n OADM/WXC Page 4 WDM Terminal AT&T Labs Research Title Evolution Story for Long Haul Networks IP Layer 1st Generation SONET Ring Layer DCS Layer Pt-Pt WDM Layer Router DCS/Intelligent Optical Switch ADM Degree-n OADM/WXC Page 5 WDM Terminal AT&T Labs Research Title Evolution Story for Long Haul Networks IP Layer 2nd Generation 1st Generation SONET Ring Layer ULH/WXC Layer Pt-Pt WDM Layer Router DCS/Intelligent Optical Switch ADM Degree-n OADM/WXC Page 6 WDM Terminal AT&T Labs Research Title Evolution Story for Long Haul Networks IP Layer 1st Generation 3rd Generation 2nd Generation SONET Ring Layer ULH/WXC Layer Pt-Pt WDM Layer Router DCS/Intelligent Optical Switch ADM Degree-n OADM/WXC Page 7 WDM Terminal AT&T Labs Research Title Some of the problems we’ve encountered IP Layer X SONET Ring Layer Ring switching impact on higher layers • Upper layer has timer – waits for lower layer to restore – Done! • Wrong! – not a simple decision on when to take IP link up and down Page 8 AT&T Labs Research Title Some of the problems we’ve encountered 1st Generation of IP/OL IP Layer PPP ACK; OSPF ping AIS-P BER-P CLR SONET Ring Layer • AIS-P X LOS-L AIS-P BER-P CLR LOS-L SONET alarms received by upper layer are ambiguous and conflicting • • Many error types in SONET: BER, AIS, P-LOS, clear during protection switching Arrive at different times • Software bugs – routers don’t behave as expected • Inconsistencies in calculation of BER and IP layer holddown timer Page 9 AT&T Labs Research Title What is the source of these problems? • No standards for inter-layer interaction – – – Physical layer: testers need requirement scripts to test – no standard, no script No industry requirement often means no testing, no sharing of behavior Historically, L1 and L3 labs have been separate • – – Some members of Telecom community have integrated their labs Software bugs – routers don’t behave as expected No specification of common parameters and metric • • • Example: Router measures BER in fixed timer intervals Router takes link down upon TCA (threshold exceeded) Protection switching results in VERY short but high burst of error Crosses router threshold even though it is << 10 ms! Page 10 AT&T Labs Research Title What is the source of these problems? • Shared Risk Groups still not well modeled – Single failure at lower layer results in multiple, scattered link failures at higher layer – network unprepared to restore – Example: portions of dual IP access links routed over same ring – both links taken down due to previous confusion LA NY SF Washington IP (logical) layer Physical (fibre) layer LA NY SF Washington Common SRLG Page 11 AT&T Labs Research Title Identity Crisis 2nd Generation of IP/OL • High speed (2.5/10/40Gbs) IP links skip SONET ring/xconnect layer and instead route over long sequences of Point-to-point WDM systems, interconnected by O/E/O optical transponders – Should the Optical Path pretend to be a transparent (like dark fiber) • • – OR: Should it display characteristics of SONET Section/Line/Path Fault Management Architecture? • • E.g., No AIS/BER TCA – re-transmit all LOS/LOP to Path Termination Points How does one isolate faults for repair (OTs, Amplifiers, WDM Terms)? However: then similar 1st Gen IP/SONET Ring confusion occurs Practicality dictated that industry implemented a combination of both approaches Page 12 AT&T Labs Research Title ULH/WSC: The Final Solution? 3rd Generation of IP/OL • Use long-term model of all-optical path to IP layer link • Two major issues to resolve – – • What if intermediate OEO exists in near-term? How do we model restoration at OL and how does IP layer interact? IP layer responsible for deciding link health – – Fast link layer detection (LOS) GIGE and other signal are going to be transported over the 3rd Gen OL • • Page 13 Is the set of PM alarms and TCAs we inherit from SONET appropriate for 3rd Gen OL? If not, which ones or new ones should we define? AT&T Labs Research Title Some potential approaches • OL only passes simple alarms to the upper layer, e.g., LOS. – – – • Upper layer makes its assessments of BER, packet coding violations, ACK failures OL still does fault isolation for OEO components or amplifiers (e.g., to WDM Term or EMS or Fault OSS), but NOT passed up to IP layer Where/how do we do this? Standards, Fora, vendor interactions, carrier requirements? Repair process: – – Need to correlate what fails in the OL with what fails in the IP layer (1 to many map) Network discovery of IP/OL relationships (e.g., SRLG) across layers would facilitate fault correlation process Page 14 AT&T Labs Research Title