Link Failure Monitoring Using Network Coding Hamed Firooz Sumit Roy, Linda Bai firooz,sroy,lyb3@u.washington .edu Outline Network Tomography Introduction (Network Monitoring) Approaches: Deterministic vs. Stochastic Active vs Passive Challenges: Overhead, Identifiability Network Coding Applications to network monitoring: new method Optimization : speed/complexity tradeoffs OPNET Implementation Fundamentals of Networking Lab(FunLab) Network Tomography Networks: set of nodes, links modeled as graph G(V,E) Network monitoring Node or network Involves collection of network performance statistics (link delay, link loss or failure status) Important for QoS guarantees (media streaming, interactive video applications) 5 1 2 8 3 4 7 6 A Logical Network Challenges Choice of appropriate measurement technique and algorithmics Fundamentals of Networking Lab(FunLab) G(V,E) Measurement Methods Node-oriented: These methods are based on cooperation among network nodes, e.g. ping or traceroute Using Ping, round trip delay to every node can be measured. Uses Internet control message protocol (ICMP) packets Many routers do NOT respond to these packets Many service providers do not own the entire network Fundamentals of Networking Lab(FunLab) D l1 l1 R D l2 l2 R R Measurement Methods Edge-oriented: Access is S available to nodes at the edge only (and not to any in the interior) Does not require exchanging special control messages between interior nodes Inverse problem: estimate link level status from end-2-end (path level) measurements Fundamentals of Networking Lab(FunLab) S Network(?) S S Measurement Methods Active (sending probe packets) - Adds overhead to normal data traffic by introducing new control packets Passive (insitu traffic analysis) - No overhead; temporal and spatial dependence might bias measurement Our method: edge-oriented, active network tomography Given a network, and a limited number of end hosts, when can we infer failure status of the links? Fundamentals of Networking Lab(FunLab) Network ? End-to-End Probing • End1 link1 router1 link2 Probes are inserted into a data stream, and end-to-end properties on that route measured. • Probes are exchanged between end nodes using routing matrix of the graph link3 End2 Routing matrix A End3 Fundamentals of Networking Lab(FunLab) link 1 link 2 link 3 End 1 End 2 1 1 0 End 1 End 3 1 0 1 End 2 End 3 0 1 1 End-to-End Probes Routing matrix relates link attribute to route attribute For some parameters like delay or path loss, this relation is linear under some assumptions D End 1 End 2 1 D End 1 End 3 1 D End 2 End 3 0 1 0 1 0 D l1 1 D l2 1 D l 3 Fundamentals of Networking Lab(FunLab) End1 l1 R l2 End2 l3 End3 Deterministic Link attributes (e.g. delay) are considered unknown, constant Goal: estimate constants Link attributes are typically time varying method is suitable for periods of local ‘stationarity’ Fundamentals of Networking Lab(FunLab) Stochastic Link attribute specified by a suitable probability distribution e.g. link delay follows a Gaussian distribution Estimation problem: unknown model parameters based on path observation in the presence of additive noise Fundamentals of Networking Lab(FunLab) Deterministic vs. Stochastic Methods Stochastic Bayesian - requires a prior distribution incorrect choice leads to biases in the estimates More computationally intensive Deterministic Lower complexity but suffers from generic nonidentifiability Fundamentals of Networking Lab(FunLab) Link Failure Model l1 End1 l2 l3 R1 Define an indicator function for status of each link R2 0 x li 1 y end 1 end 2 0 1 Fundamentals of Networking Lab(FunLab) End2 l i is ok l i is congested all of l1 , l 2 , l 3 is ok o .w . Binary Deterministic Model l1 End1 l2 R1 l3 R2 y end 1 end 2 x l1 or x l 2 or x l 3 y = Ax A: N-by-M binary routing matrix x: M-by-1 binary vector, the status of each link y: N-by-1 binary vector, the status of each path (measurements) Fundamentals of Networking Lab(FunLab) End2 Failure Monitoring Network G(V,E) with set of paths P |E | |P | x {0 ,1} , y {0 ,1} x, y are binary vectors A path is congested if at least one of its links is congested l l l End1 1 y1 End 1 End 2 1 y 2 End 1 End 3 1 y 3 End 2 End 3 0 y 1 x l1 ( OR ) x l 2 y 2 x l ( OR ) x l 1 3 y 3 x l 2 ( OR ) x l 3 2 1 0 1 3 0 x l1 1 x l2 , 1 x l 3 Fundamentals of Networking Lab(FunLab) l1 x l1 { 0 ,1} Router l2 End2 l3 End3 Identifiability y = Ax Problem: Estimate x from y with A (N-by-M) : binary routing matrix x (M-by-1) : binary link failure status y (N-by-1) : end-to-end measurements 6 links, 3 End-to-End routes N=6, M=3 Identifiability: a network is identifiable if y = Ax has a unique solution Usually, M ( # of links in network) >> N (# of measurements), so network is generically NOT identifiable. Fundamentals of Networking Lab(FunLab) Identifiability: Binary Model Solution: limit (maximum) number of failed links inside the network Suppose at most k links can fail simultaneously Defn: k-Identifiability Network is k-identifiable if x | E | 1 x 1 , x 2 s.t. x 1 0 0 k k, x2 x 6 1 0 k , x 1 x 2 Ax 1 Ax 2 0 1 Only one link can be congested from end-to-end observation it is possible to uniquely identify up to k congested links Fundamentals of Networking Lab(FunLab) Example of 1-identifiability x 6 1 0 1 l1 y1 2 y1 3 y 2 3 -0 0 0 l1 l2 l3 l4 l5 l6 1 1 0 0 0 0 1 0 1 1 0 1 0 0 0 0 1 1 l2 Fundamentals of Networking Lab(FunLab) l5 l4 1 A 1 0 l3 l6 1 0 1 0 0 1 0 0 0 0 1 1 0 1 1 Example: k=2 identifiability x 6 1 0 2 l1 Ambiguity y 1 2 1 y y 1 3 1 y 2 3 0 y 1 2 1 y y 1 3 1 y 2 3 0 Fundamentals of Networking Lab(FunLab) l2 l5 l4 1 A 1 0 l3 l6 1 0 1 0 0 1 0 0 0 0 1 1 0 1 1 1-Identifiability A network with an intermediate degree two node is not 1-identifiable End1 ` l1 If path End1End2 is congested, it is impossible to determine which link among l1 and l2 is congested . l2 Necessary but not sufficient! End2 ` x l1 1 y End 1 End 2 1 x l 2 1 y End 1 End 2 1 Fundamentals of Networking Lab(FunLab) k=1 Identifiability 1-identifiability Theorem: End-to-End probe based measurements can detect a unique congested link in a network if and only if there are no two identical columns in the network routing matrix P1 P1 1 P3 Fundamentals of Networking Lab(FunLab) 0 0 0 0 0 P3 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 k- identifiability k-identifiability Theorem: End-to-End probe based measurements can detect a unique congested link in a network only if there are no k+1 dependent columns in the network routing matrix Fundamentals of Networking Lab(FunLab) 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 Example: k=2 identifiability x 6 1 0 2 l1 Ambiguity y 1 2 1 y y 1 3 1 y 2 3 0 y 1 2 1 y y 1 3 1 y 2 3 0 Fundamentals of Networking Lab(FunLab) l2 l5 l4 1 A 1 0 l3 l6 1 0 1 0 0 1 0 0 0 0 1 1 0 1 1 Shortest Path Routing Revisited Packets are sent on shortest path between two end nodes - sub-graphs = tree starting from a boundary (source) node Node 4 has degree two in all graphs But node 4 has degree four in the original network Fundamentals of Networking Lab(FunLab) Revisiting Shortest Path Routing What if we could change routing matrix ? Example: in place of shortest path routing, route packets through longer paths, e.g. n1l2l4n2 Now network is 1-identifiable ! Intrinsic limitation for end-to-end measurement methods based on shortest path routes probes transmitted along such paths contain only minimum information Fundamentals of Networking Lab(FunLab) Solution Look to exchange probes between boundary nodes via other (non-shortest) paths? Changing the routing tables violates tomography assumption Use Network Coding; exploit broadcast nature of network coding, a transmitted probe will traverse almost every path between two boundary nodes Fundamentals of Networking Lab(FunLab) Network Coding: Short Review Present: routers just forward incoming packets, i.e. copy the packets on an input link onto the output links Proposed: What if each node in a network performs some computation on received data prior to forwarding? y1 y2 y1 y1 y2 y2 Fundamentals of Networking Lab(FunLab) y3 f1(y1,y2,y3) f2(y1,y2,y3) How does NC work? (1) sender s receiver t2 A C B D receiver t1 “Butterfly” network: All links have the same capacity 1 b/s s wants to send data bits a, b to both t1 and t2 Bottleneck is CD Fundamentals of Networking Lab(FunLab) How does NC work?(2) sender s A a b B XOR D a+b b receiver t2 a receiver t1 Node C XORs received messages on each of its links Fundamentals of Networking Lab(FunLab) How does NC work?(3) sender s A a b B XOR a+b b receiver t2 a D a+b a+b receiver t1 t1 and t2 know both a and b Now s can send data at rate 2 b/s/receiver Fundamentals of Networking Lab(FunLab) Linear Network Coding Network Coding is a coding at layer three The coding is conducted over the finite field Fu, u=2q each coded symbol can be represented by q-bits within an IP layer frame Signal Y(j) on an outgoing link j of node v, is a linear combination of signals Y(i) on incoming link i of v: We assume there is no process generated at node v Y ( j) Y (l ) l { l :d ( l ) v } Fundamentals of Networking Lab(FunLab) Received Symbols Pi : i-th route from source to destination Source sends α over Pi y l i ( G ), F 2 q l P i (G ) i l P l Path NC Coef. i βi depends on topology G hence βi(G) y 1 2 1 ( G ) α γ1 S γ3 γ4 Fundamentals of Networking Lab(FunLab) γ2 D γ5 Received Symbols: Linear Model ek one of source outgoing links Pek : collection of all paths between source and destination starts at ek Source sends αk over ek. By superposition destination receives Pe1 α1 S γ1 y k γ2 e1 i P Pe γ3 γ4 D γ5 k l P i | Pe | k l k i ,ek (G ) i 1 y 1 ( 1 2 1 3 5 ) 1 ( 1, e1 2 , e1 ) Fundamentals of Networking Lab(FunLab) Received Symbols: Linear Model Source sends out symbols αk over ek using superposition once more | Pe | K y k k k 1 i 1 i ,ek (G ) Pe1 α1 S y=αtβ(G) γ1 e1 α2 In vector format: γ β(G) is total network coding vector Fundamentals of Networking Lab(FunLab) γ2 4 γ3 D γ5 Received Symbols: Linear Model Source sends symbols in M succ. time slots: y M 1 AM N ( G ) N 1 ( G ) N 1 1, e1 2 , e1 N 1 , e1 Pe 1 Fundamentals of Networking Lab(FunLab) 1, e N 2 ,e2 1 Pe 2 N K ,e K Pe K t Link Failure Model If a link is severely congested, packets are significantly delayed and assumed lost at the destination We model the network with link l in congestion state by its edge deleted subgraph denoted by Gl(V,El) γ1 S γ3 γ4 Fundamentals of Networking Lab(FunLab) D γ5 Link Failure Model Total network coding vecor of Gl(V;El), β(Gl) is different from β(G) i ,e k i if l Pe k ( d ) i ,ek (G ) (G l ) o .w . 0 if the congested link doesn’t belong to i-th path from source to destination, Pi, it will not affect packets going through those paths It is zero otherwise 1 ( G ) 1 2 2 (G ) 4 5 γ1 1 (G l ) 0 S 2 (G l ) 2 (G ) 1 1 Fundamentals of Networking Lab(FunLab) γ2 e1 e2 γ4 γ3 l1 γ5 D Link Failure Model Training sequence is A yl : vector of symbols observed at the destination in M time slots with link l congested l y M 1 A M N ( G l ) N 1 Potential for identifying: received symbols change uniquely in response to link l congestion y y Fundamentals of Networking Lab(FunLab) M 1 M 1 l1 y M 1 l2 y M 1 Example 1 (G ) 3 1 1 A 3 1 3 1, e 1 1 1 2 3 1 2 ,e 1 2 2 3 1 2 ,e 3 2 1 2 -- e1 e2 l1 l2 l3 1st time slot 0 2 2 3 1 1 2nd time slot 2 3 1 0 1 3 Fundamentals of Networking Lab(FunLab) Pe1 S Pe 2 1 1 e1 e2 3 D 2 2 Theorem 1: Sufficient Conditions If Rank(A)= deg(S), and for all Pek set of paths between source and destination starting at ek | Pe | k j 1 j j ,e 0 j 0 j i then A (G ) A (G l ) A ( G l1 ) A ( G l 2 ) l E l1 , l 2 E Fundamentals of Networking Lab(FunLab) (more next slide) Theorem 1 | Pe | k Condition j 1 j j ,ei 0 0 j j means For a set of paths having ek in common, Pek , NC coefficient of the paths are independent ! Independent 2 , e1 N 1 , e1 1 , e1 Pe 1 Independent 1, e N 2 ,e2 1 , e 1 2 1 2 ,e 1 1, e 2 independen 1 3 5 4 5 Fundamentals of Networking Lab(FunLab) 1 Pe 2 Pe1 t S Pe 2 N K ,e K Pe K γ1 γ2 e1 e2 γ4 t γ3 D γ5 Example 1 (G ) 3 1 1 A 3 1 3 Independent 1, e 1 1 1 2 1 R ank ( A ) 2 deg( S ) 2 , e 1 2 2 3 1 3 2 ,e 3 2 1 2 -- e1 e2 l1 l2 l3 1st time slot 0 2 2 3 1 1 2nd time slot 2 3 1 0 1 3 Fundamentals of Networking Lab(FunLab) Pe1 S Pe 2 1 1 e1 e2 3 D 2 2 Complexity/Speed First condition of Theorem 1: Rank( AM N ) deg( S ) implies M deg( S ) In previous example M=2=deg(S) Number of time slots: at least the number of outgoing links of source Is it possible to decrease number of time slots? faster monitoring Possible by increasing number of bits in LNC coeff. more complexity Fundamentals of Networking Lab(FunLab) Example q=3 A=[1 1 4] 1 -- e1 e2 l1 l2 l3 1st time slot 6 4 2 5 7 1 Fundamentals of Networking Lab(FunLab) S 1 e1 e2 3 D 2 2 Theorem 2: Complexity/Speed tradeoff Ni=|Pi| q bits per symbol are used in network coding M number of (desired) time slots Let Z={1,2,…,K} S K degree of source ZM: collection of all partitions of Z with size M M K links Z M {{ H 1 , H 2 ,..., H M } | H i Z , H i H j } i 1 K=3, 2 Z={1,2,3} ZM={ {{1,2},{3}} , {{1,3},{2}} , {{2,3},{3}} } Fundamentals of Networking Lab(FunLab) Theorem 2: Complexity/speed tradeoff Network is 1-identifiable if q min { H i , i 1 ,..., M } Z M Rank(A)=M Fundamentals of Networking Lab(FunLab) max i N j H i j Theorem 3: Random LNC Random linear network coding is a distributed approach achieving capacity asymptotically Intermediate node choose their NC coefficients uniformly from the elements of Fu (u=2q) 1 M P ( G is 1-identifiable) 1 | E | (| E | 1)( ) q 2 Exponential increase with q (number of bits) and M (number of time slots) Quadratic decrease with size of network Fundamentals of Networking Lab(FunLab) Multi-source Multi-destination So far, considered only Single source Single destination Easily extendable to Multi-source Multidestination Fundamentals of Networking Lab(FunLab) Simulation Simulation environment OPNET 14.5 MATLAB 7.1 (finite field operations) Evaluation University of Washington’s Electrical Engineering network Thirteen subnets 3 backbone routers Full Duplex Ethernet links Fundamentals of Networking Lab(FunLab) Simulation Set-Up Implementation of Network Coding (NC) within OPNET We employ network coding at transport layer (instead of IP layer) Routers model is modified to distinguish between nonNC/NC packets through the use of a flag bit within the UDP header Easier to implement NC packets are sent for separate processing non-NC packets are processed normally We assign a q-bit field called LNC field within the TCP/UDP header, for linear network coding. 1 LNC field Fundamentals of Networking Lab(FunLab) UDP packet RECEIVE/SEND interface Inherently network coding operates on unidirectional links Each interface within a router mode is designated as a SEND or RECEIVE interface only for the network coded packets operating regularly with non-network coded packets Finite field operation is done in MATLAB Using MATLAB API within OPNET Fundamentals of Networking Lab(FunLab) RECEIVE/SEND Fundamentals of Networking Lab(FunLab) Evaluation Fundamentals of Networking Lab(FunLab) UW EE Network Fundamentals of Networking Lab(FunLab) UW EE Network-lookup table Fundamentals of Networking Lab(FunLab) Fundamentals of Networking Lab(FunLab) Network Tomography: A Stochastic Model [1] Passage of probes can be modeled as two stochastic process: {Xl(i)} and {Zl(i)} for each node k Zl(i) time delay process of link k Xl(i) called bookkeeping process: cumulative probe from root to k [1] V. Arya, N. Duffield, D. Veitch “ Temporal Delay Tomography”, IEEE Infocom 2008 Fundamentals of Networking Lab(FunLab) Network Tomography: Stochastic Method l’ Discretize delay D={0,b,2b,…,mb,∞} mb is delay threshold Xl(i)=Xl’(i)+Zl(i) k-1 l Xl(i) 0 Pr[ X l ( i ) d | X l ( i ) v ] 1 Pr[ Z ( i ) d v ] k Fundamentals of Networking Lab(FunLab) Xl’(i) Zl(i) k if d v if d v o .w .