TraceFlow (draft-zinjuvadia-traceflow-00) Arun Viswanathan Subi Krishnamurthy Vishal Zinjuvadia Rajeev Manur --Force10 Networks Agenda… Motivation Problem Statement Use-case scenario Protocol Operation Legacy (non-compliant devices) support Encapsulation Open Issues Q&A Motivation… Today’s networks have evolved into more complex heterogeneous entities LAG & ECMP are commonly used for redundancy and loadbalancing E2E paths are typically an intermix of L2/L3/MPLS hops OAM tools have not kept pace with the OAM reqts of evolving networks Destination addr based Ping/Traceroute do not provide extra level of info needed in networks using LAG/ECMP/flow based fwding func like ACL/PBR Need a mechanism for collecting specific data along the flow path to help diagnose the network problem better. Many of our customers (SPs, Operators of large Datacenters) want to get a solution for this problem. Problem Statement… Need to extend ping/traceroute to have, Ability to trace 3/5 tuple user defined flow to exercise the same ECMP path or component link in a LAG as that taken by corresponding data packets Ability to selectively collect relevant diagnostic data along the flow path. Increased security built into the protocol for tracing intra-area and inter-area E2E paths Extend this trace functionality to L2/MPLS hops along the way by snooping Use-case scenario… Soft Failure LAG-1 10.10.10.1/30 LAG-2 Data-flow 10.10.10.2/30 RT1 10.11.10.2/30 X RT2 10.20.10.1/30 10.11.10.1/30 Current Ping/Traceroute RT3 10.21.10.2/30 ECMP 10.21.10.1/30 10.20.10.2/30 Current Ping/Traceroute RT4 Protocol Operation -- 1 Data-flow TFRSP TFRSP TFRQ TFRQ 10.10.10.1/30 10.11.10.2/30 RT1 RT2 10.11.10.1/30 10.10.10.2/30 10.20.10.1/30 LAG-1 ECMP LAG-2 10.21.10.1/30 10.20.10.2/30 RT4 Current Ping/Traceroute RT3 10.21.10.2/30 Protocol Operation (Fan-out) – 2 Data-flow TFRSP TFRSP TFRQ TFRQ 10.11.10.2/30 RT1 10.10.10.1/30 10.20.10.1/30 RT2 10.11.10.1/30 10.10.10.2/30 LAG-1 LAG-2 ECMP 10.21.10.1/30 10.20.10.2/30 TFRSP RT4 Current Ping/Traceroute RT3 10.21.10.2/30 Protocol Operation (TLVs) -- 3 Request TLVs Flow Descriptor (first 256 bytes of user data pkt) Flow Information (MTU, frag etc) Orig address Termination Requested info Auth Response TLVs Flow Descriptor (first 256 bytes of user data pkt) Encapsulated Packet Mask Record-Route Interface Info TLV(incomin/outgoing) Result TLV Legacy (non-compliant devices) support… Non-compliant nodes likely to forward TraceFlow different from data packets Can render some support but complicates the protocol Potential solution Probes packets must look like data packets Use TTL expiry like traceroute Use it in conjunction with regular TraceFlow in legacy segments Issues Probes are indistinguishable from data We run the risk of leaking spoofed probe packets into app receivers Encapsulation… We started with UDP, but see issues with hop-by-hop termination Other options New IP Proto? Or New ICMP message types? Not sure if all hardware can look into icmp subtypes for punting? Open Issues… Legacy support – Do we need it? Is there enough interest in WG to continue this work? Questions ? Thank You