opsawg-0

advertisement
TraceFlow
(draft-zinjuvadia-traceflow-00)
Arun Viswanathan
Subi Krishnamurthy
Vishal Zinjuvadia
Rajeev Manur
--Force10 Networks
Agenda…
 Motivation
 Problem Statement
 Use-case scenario
 Protocol Operation
 Legacy (non-compliant devices) support
 Encapsulation
 Open Issues
 Q&A
Motivation…
 Today’s networks have evolved into more complex heterogeneous
entities
 LAG & ECMP are commonly used for redundancy and loadbalancing
 E2E paths are typically an intermix of L2/L3/MPLS hops
 OAM tools have not kept pace with the OAM reqts of evolving networks
 Destination addr based Ping/Traceroute do not provide extra level of
info needed in networks using LAG/ECMP/flow based fwding func
like ACL/PBR
 Need a mechanism for collecting specific data along the flow path to
help diagnose the network problem better.
 Many of our customers (SPs, Operators of large Datacenters) want to
get a solution for this problem.
Problem Statement…
 Need to extend ping/traceroute to have,


Ability to trace 3/5 tuple user defined flow to
exercise the same ECMP path or component link
in a LAG as that taken by corresponding data
packets
Ability to selectively collect relevant diagnostic
data along the flow path.
 Increased security built into the protocol for tracing
intra-area and inter-area E2E paths
 Extend this trace functionality to L2/MPLS hops along
the way by snooping
Use-case scenario…
Soft Failure
LAG-1
10.10.10.1/30
LAG-2
Data-flow
10.10.10.2/30
RT1
10.11.10.2/30
X
RT2
10.20.10.1/30
10.11.10.1/30
Current Ping/Traceroute
RT3
10.21.10.2/30
ECMP
10.21.10.1/30
10.20.10.2/30
Current Ping/Traceroute
RT4
Protocol Operation -- 1
Data-flow
TFRSP
TFRSP
TFRQ
TFRQ
10.10.10.1/30
10.11.10.2/30
RT1
RT2
10.11.10.1/30
10.10.10.2/30
10.20.10.1/30
LAG-1
ECMP
LAG-2
10.21.10.1/30
10.20.10.2/30
RT4
Current Ping/Traceroute
RT3
10.21.10.2/30
Protocol Operation (Fan-out) – 2
Data-flow
TFRSP
TFRSP
TFRQ
TFRQ
10.11.10.2/30
RT1
10.10.10.1/30
10.20.10.1/30
RT2
10.11.10.1/30
10.10.10.2/30
LAG-1
LAG-2
ECMP
10.21.10.1/30
10.20.10.2/30
TFRSP
RT4
Current Ping/Traceroute
RT3
10.21.10.2/30
Protocol Operation (TLVs) -- 3
 Request TLVs
Flow Descriptor (first 256 bytes of user data pkt)
 Flow Information (MTU, frag etc)
 Orig address
 Termination
 Requested info
 Auth
 Response TLVs
 Flow Descriptor (first 256 bytes of user data pkt)
 Encapsulated Packet Mask
 Record-Route
 Interface Info TLV(incomin/outgoing)
 Result TLV

Legacy (non-compliant devices)
support…
 Non-compliant nodes likely to forward TraceFlow different
from data packets
 Can render some support but complicates the protocol
 Potential solution
 Probes packets must look like data packets
 Use TTL expiry like traceroute
 Use it in conjunction with regular TraceFlow in legacy
segments
 Issues
 Probes are indistinguishable from data
 We run the risk of leaking spoofed probe packets into
app receivers
Encapsulation…
 We started with UDP, but see issues with
hop-by-hop termination
 Other options


New IP Proto? Or
New ICMP message types?

Not sure if all hardware can look into icmp
subtypes for punting?
Open Issues…
 Legacy support – Do we need it?
 Is there enough interest in WG to continue
this work?
Questions ?
Thank You
Download