UNSW School of Electrical Engineering and Telecommunications Opportunistic Flow-Level Latency Estimation Using Consistent NetFlow Group 4:Garnsey, Dennis Kang, Kang Liu, Weiming Xu, Yang Lin, Shijie Chen, Zhouyuan Opportunistic Flow-Level Latency Estimation Using Consistent NetFlow 1 UNSW School of Electrical Engineering and Telecommunications Summary This paper presents a study in the use of time-stamps in NetFlow to estimate network latency and discusses ways to retrofit latency measurements to existing networks using NetFlow. Some of the techniques covered include • Hash-based sampling to provide consistent NetFlow • Opportunistic Latency Estimation – using smaller flows to estimate the average latency and standard deviation of longer flows • NetFlow is used for Fault, Performance and Security Management Opportunistic Flow-Level Latency Estimation Using Consistent NetFlow 2 UNSW What is latency and why is it important? School of Electrical Engineering and Telecommunications • Delay in data propagation introduced by links and network devices • Caused by – The speed of light – Switching and processing – Queuing and shaping • Typical latencies – Within Sydney 10 milliseconds – Within Australia - 30 - 100 milliseconds – Australia to the US - 200 From http://www.akamai.com/html/technology/dataviz2.html milliseconds 2009 Google announced that the next major update to the Page Rank Algorithm (search result indexing) will start taking into account the pages load (response time) 3 UNSW School of Electrical Engineering and Telecommunications What is a NetFlow record and why is it useful? • NetFlow originally a routing technology – Routers swap the destination mac address on packets and forward to egress port – NetFlow cached the destination mac and egress port to speed up routing – Superseded by other routing technologies (hardware rather than CPU based) – still needed when routing decision requires CPU (e.g. ACLs) • NetFlow records are still in router • Exporting them to a collector provides valuable information about network traffic patterns • Contains network and application info – otherwise need RMON Netflow V5 Source IP address Destination IP address IP address of next hop router SNMP index of input interface SNMP index of output interface Packets in the flow Bytes in the packets of the flow SysUptime at start of flow SysUptime at the last packet of the flow was received Source Port Unused (zero) Destination Port TCP flag Source AS Src. Mask IP protocol type ToS Destination AS Dst. Mask Unused (zero) Opportunistic Flow-Level Latency Estimation Using Consistent NetFlow 4 UNSW School of Electrical Engineering and Telecommunications What is consistent NetFlow? • Problem – How do we correlate a NetFlow record from one router with a NetFlow record from another router? • Issues – Time synchronisation of routers may not be accurate so time stamps don't match – Packet loss – Cache expiry due to load – Different sampling across routers - may be random • Solution – Hash-based sampling Opportunistic Flow-Level Latency Estimation Using Consistent NetFlow 5 UNSW School of Electrical Engineering and Telecommunications Consistent NetFlow • Sample packets at every link – – – – Pseudo random sampling (e.g., 1-out-of-100) Compute a hash over the invariant fields (same on each hop) of the packet Packet is selected for reporting if the hash falls within a given range All routers use the same hash, input fields and selection range » Result is consistent flow selection • Details of consistent sampling – x: subset of invariant bits in the packet – Hash function: h(x) = x mod A – Sample if h(x) < r, where r/A is a thinning factor Opportunistic Flow-Level Latency Estimation Using Consistent NetFlow 6 UNSW School of Electrical Engineering and Telecommunications People and Standards • Nick Duffield (AT&T Labs) – 2000-2002 – Trajectory Sampling (hash based sampling) – 2009 Co-author RFC 5474 PSAMP – 2012 Co-author Opportunistic Flow-Level Latency Estimation Using Consistent Netflow • PSAMP/IPFIX (some overlap, but complementary) – IPFIX is standardisation track for NetFlow Export • Describes how IP flow information is to be formatted and transferred from an exporter to a collector – PSAMP is standardisation track for Flow sampling • network elements to select subsets of packets by statistical and other methods, and to export a stream of reports on the selected packets to a Collector “PSAMP selection operations include random selection, deterministic selection (Filtering), and deterministic approximations to random selection (Hash-based Selection).” - RFC 5474 Latency Estimation Using Consistent NetFlow Opportunistic Flow-Level 7 UNSW School of Electrical Engineering and Telecommunications Opportunistic latency estimation • NetFlow records have timestamps – start of flow and end of flow • Can we estimate average latency and standard deviation from flow time-stamps? • Opportunistic – measure latency of shorter flows which occur during the same time frame as a longer flow, and interpolate the packet delay within the longer flow Opportunistic Flow-Level Latency Estimation Using Consistent NetFlow 8 UNSW School of Electrical Engineering and Telecommunications Basic knowledge Prerequisite: There are two basic assumptions this approach relies on: 1. Time Synchronization—the fundamental requirement for enabling accurate oneway delay measurements. 2. Packet Forwarding Order— the stream of packets follows a serial order (FIFO) Flow correlation: Two approaches to associate flow records: Mapping Packet Label to a Timestamp Timing Checks to Eliminate Inconsistencies Delay correlation: Central premise of Foundations of Delay Correlation: When two packets traverse a link closely separated in time, then the queuing delays that experience are positively correlated. UNSW School of Electrical Engineering and Telecommunications Latency Estimation UNSW School of Electrical Engineering and Telecommunications Variance and Its Estimation UNSW School of Electrical Engineering and Telecommunications Interpolation of Packet Delays Delay difference of two known packets The delay we estimate Closest delay in the past Time difference of two known packets UNSW School of Electrical Engineering and Telecommunications EVALUATION • Estimator Accuracy – Comparison to Active Probes – Accuracy With Respect to Flow Duration – Comparison to Interpolation and Trajectory Sampling UNSW School of Electrical Engineering and Telecommunications Sampling and Loss Rate Variation two variables that control the effective number of sampled packets: packet sampling rate and loss rate 1.Impact of Sampling Rate As shown above, relative errors decrease with the packet sampling rate increasing UNSW School of Electrical Engineering and Telecommunications Sampling and Loss Rate Variation 2.Impact of Packet Loss Rate relative error reduces as increasing the packet loss rate when using Multiflow and WISC traces. three traces WISC-R1, -R2 and -R3 have small (0.01%), medium (0.12%), and high (4.59%) packet loss rates, respectively, UNSW School of Electrical Engineering and Telecommunications Accuracy of Standard Deviation Estimates The increase of the packet loss rate reduces the relative error of standard deviation of flow-level latency. But the estimation of Endpoint cannot be trusted as Multiflow because of its poor accuracy. Using WISC traces also shows the same trend, but the improvement in accuracy of standard deviation estimates among traces is less than that in mean estimation accuracy. UNSW School of Electrical Engineering and Telecommunications Conclusion • Problem being solved – NetFlow time stamps should be able to be used to as data for measurement for network latency • Proposal – Use hash-based sampling for consistent NetFlow – Opportunistic Latency Estimation using time stamps from shorter flows to estimate average and standard deviation of latency with longer flow • Experimental evaluation – – – – Uses real and synthetic data and real and theoretical delay modeling Check accuracy of hash-based NetFlow Check accuracy of estimators – endpoint, multiflow and hybrid Compare with real data and alternative estimators (trajectory sampling) Opportunistic Flow-Level Latency Estimation Using Consistent NetFlow 17 UNSW School of Electrical Engineering and Telecommunications Results • Estimator accuracy – The multiflow estimator was more accurate than either the endpoint estimator or active probes for packet sampling. – For flow sampling the endpoint estimator is more accurate. – Over a range of flow sizes, endpoint performs better up to about size 3-4 and then accuracy decreases. – The flow sampling above contained a large number of small flows which was why the endpoint estimator was more accurate. Opportunistic Flow-Level Latency Estimation Using Consistent NetFlow 18 UNSW School of Electrical Engineering and Telecommunications Criticism • NF records are useful for network management, but problems that are not addressed here are – NF is resource intensive – Resources used by NF could be needed for data traffic • Some network management systems (Riverbed, Tenable) correlate NF records, however not in the deterministic manner as proposed here. • Given that this approach still relies on sampling, NF will still not replace Wireshark and network sniffing in the network management tool for packet capturing, and SNMP will still be used for lower level utilization reporting. NF sits midway between the two. • PSAMP is not commercially available yet (to my knowledge), so this approach is still evolving • The utility of per-flow average delays and standard deviations is not clear and may not be known until it becomes commercially available (if ever). Opportunistic Flow-Level Latency Estimation Using Consistent NetFlow 19 UNSW School of Electrical Engineering and Telecommunications Opportunistic Flow-Level Latency Estimation Using Consistent NetFlow Supplementary slides Opportunistic Flow-Level Latency Estimation Using Consistent NetFlow 20 UNSW School of Electrical Engineering and Telecommunications Introduction gathering information from the network SNMP Interface Counters Key Fields Packet 2 Interface E1 Packets In 36787 Packets Out 47856 Bytes In 786302 Bytes Out 789309 SNMP packet counters on interfaces NetFlow tables in CPU IP Layer Network Monitoring SNMP tcpConnEntry tcpConn State tcpConn LocAddr tcpConn LocPort tcpConn RmtAddr tcpConn RmtPort estab 167.8.15.92 227 176.15.53.216 228 estab 167.8.15.92 235 176.15.53.216 240 closing 167.8.15.92 236 178.67.124.15 196 estab 167.8.15.92 244 181.33.16.4 227 Netflow Records Key Fields Packet 2 Source IP 167.8.15.92 Destination IP 176.15.53.216 Source port 227 Destination port 228 Layer 3 Protocol TCP - 6 TOS Byte 0 Input Interface Ethernet 0 Source IP Dest IP Dest I/F Proto col TOS … Pkts 167.8.15.92 176.15.53.216 E1 6 0 … 11000 167.8.15.92 176.15.53.216 E1 6 0 … 11000 SNMP packet counters on interfaces Switch Layer SNMP tcp conn entries on end hosts Applications UNSW School of Electrical Engineering and Telecommunications NetFlow • Netflow n-tuple may include – – – – – – – Flow Usage counters Start time and end time Interfaces used QoS flags IP Addresses Applications ports Routing information UNSW School of Electrical Engineering and Telecommunications NetFlow V5 Header 0 0-3 8 16 NetFlow Version 24 Flow Record Count (1-30) 4-7 SysUptime of the export device booted 8-11 Current count of seconds since 0000 UTC 1970 12-15 Residual nanoseconds since 0000 UTC 1970 16-19 Sequence counter of total flows seen 20-23 engine_type engine_id Unused (zero) Format of NetFlow V.5 Header http://www.plixer.com/support/netflow_v5.html 31 UNSW School of Electrical Engineering and Telecommunications NetFlow V5 Flow Record 0 8 16 24 0-3 Source IP address 4-7 Destination IP address 8-11 IP address of next hop router 12-15 SNMP index of input interface SNMP index of output interface 16-19 Packets in the flow 20-23 Bytes in the packets of the flow 24-27 SysUptime at start of flow 28-31 SysUptime at the last packet of the flow was received Source Port 32-35 36-39 Unused (zero) 44-47 Destination Port TCP flag Source AS 40-43 Src. Mask 31 IP protocol type Destination AS Dst. Mask Unused (zero) Format of NetFlow V.5 Flow Record See http://www.plixer.com/support/netflow_v5.html ToS UNSW School of Electrical Engineering and Telecommunications NetFlow V9 Template bit 0-15 flowset_id = 0 • The distinguishing feature of the NetFlow Version 9 format is that it is template based. Templates provide an extensible design to the record format to allow future enhancements to NetFlow services without requiring changes to the basic flowrecord format. length Packet Header template_id field_count Template FlowSet field_1_type Data FlowSet field_2_length Data FlowSet ... Template FlowSet Data FlowSet ... field_1_length field_2_type field_3_type field_3_length ... field_N_type field_N_length template_id field_count field_1_type field_1_length ... field_N_type field_N_length Format of NetFlow V.9 Template UNSW School of Electrical Engineering and Telecommunications NetFlow V9 Header 0 0-3 8 16 NetFlow Version 24 Flow Record Count (1-30) 4-7 SysUptime of the export device booted 8-11 Current count of seconds since 0000 UTC 1970 12-15 Sequence counter of all export packets sent by the export device. Note: This is a change from the Version 5 and Version 8 headers, where this number represented “total flows.” 16-19 A 32-bit value that is used to guarantee uniqueness for all flows exported from a particular device. (The Source ID field is the equivalent of the engine type and engine ID fields found in the NetFlow Version 5 and Version 8 headers). Format of NetFlow V.9 Header From http://www.plixer.com/support/netflow_v9.html 31 UNSW School of Electrical Engineering and Telecommunications NetFlow V9 Flow Record • 87 fields possible - too many to fit on slide Field Type Value IN_BYTES 1 Length (bytes) Description N (default is 4) Incoming counter with length N x 8 bits for number of bytes associated with an IP Flow. IN_PKTS 2 N (default is 4) Incoming counter with length N x 8 bits for the number of packets associated with an IP Flow FLOWS 3 N Number of flows that were aggregated; default for N is 4 ... ... ... ... LAST_SWITCHED 21 4 System uptime at which the last packet of this flow was switched FIRST_SWITCHED 22 4 System uptime at which the first packet of this flow was switched ... ... ... ... Partial format of NetFlow V.9 Flow Record From http://www.plixer.com/support/netflow_v9.html UNSW Router 2 Record Router 1 Record source IP address School of Electrical Engineering and Telecommunications Correlate So rce IP address destination IP address destination IP address source TCP/UDP application port source TCP/UDP application port destination TCP/UDP application port destination TCP/UDP application port next hop router IP address next hop router IP address input physical interface index input physical interface index output physical interface index output physical interface index packet count for this flow packet count for this flow byte count for this flow byte count for this flow start of flow timestamp Synchronise start of flow timestamp end of flow timestamp end of flow timestamp IP Protocol (for example, TCP=6; UDP=17) IP Protocol (for example, TCP=6; UDP=17) Type of Service (ToS) byte Type of Service (ToS) byte TCP Flags (cumulative OR of TCP flags) TCP Flags (cumulative OR of TCP flags) source AS number source AS number destination AS number destination AS number source subnet mask source subnet mask destination subnet mask flags (indicates, among other things, which flows are invalid) shortcut router IP address Consistent NetFlow destination subnet mask flags (indicates, among other things, which flows are invalid) shortcut router IP address