Resource Optimization in Hybrid Core Networks with 100G Links Malathi Veeraraghavan and Zhenzhen Yan University of Virginia Date: Oct. 5-6, 2010 (Collaborator: Admela Jukan) • Outline – HNTES System Architecture – HNTES Demo on Tabletop testbed – Details of OFAT: Offline Flow Analysis Tool – PerfSONAR OWAMP data analysis – Simulation study – Special issue of IEEE Communications Magazine – Summary Sponsored by DOE ASCR grant DE-SC0002350 1 Two web sites • External project web site: – http://www.ece.virginia.edu/mv/researc h/DOE09/index.html – Software, documents, links to papers posted • Collaboration web site: – https://collab.itc.virginia.edu/portal/site /e121f110-7b37-4021-8ac14d61197c067a 2 Hybrid Network Traffic Engineering Software (HTNES) • MFDB: Monitored Flow Data Base • Some components can be centralized and rest distributed 3 Components • Offline Analysis Tool (OFAT): statistical R programs to identify heavy-hitter flows – Most challenging component – Leverage “human knowledge” about large file transfer servers and applications – Populate MFDB • Monitored flow data base (MFDB) Source IP address Destination IP address Protocol Source port Destination port Of monitored flow (not all fields are required for each flow) Status MonitoredRe directed Disabled Bypass circuit endpoints IP addresses and VLAN IDs Circuit duration Circuit rate 4 Components contd. • Flow Monitoring Module – receives packets for flows in MFDB mirrored to it from the routers (mirroring pre-configured for MFDB flows) – initiates the reservation and provisioning of a circuit – initiates circuit release when packet flow “ends” • IDC interface – interfaces with Inter-Domain Controller (IDC) • IDC (not part of HNTES software) – sets up circuit on request – set PBR route in IP router to redirect packets from default IP-routed path to newly established circuits – removes PBR entry when circuit is released 5 Components contd. • MFDB-interface module – supports human and programmatic interface to MFDB • Router control interface module – Configures mirroring for MFDB flows 6 Four video files recording experiments • Demonstration of the MFDB in MySQL and MFDB-interface modules • Demonstration of the flow monitoring module (FMM) – Flow gets redirected – Deactive interface to “prove” redirection • IDCIM 7 OFAT output file • After analysis of Netflow data, an ASCII file is output • Format srcIP dstIP srcport dstport prot 140.90.192.0 129.186.248.0 20 -1 6 165.112.0.0 128.193.216.0 20 -1 6 165.112.0.0 169.230.88.0 20 -1 6 • -1 is used to indicate “don’t match’’ 8 MFDB and MFDB interface module on tabletop testbed • Monitored Flow DataBase (MFDB) use MySQL • MySQL was installed by ESnet team on Diskpt-1 • We implemented and tested MFDB on Diskpt-1 installation • FIRST DEMO VIDEO 9 FMM demo on the ANI Tabletop Testbed Diskpt-1 BWdetail send MFDB FMM Diskpt-2 Step 2: Start BWdetail (extended version of iperf) Step 3 Step 1b: Enter BWdetail flow 5-tuple in MFDB BWdetail recv tcpdump A North-rt1 Step 1a: Configure router to mirror packets destined to Diskpt-2 to interface A South-rt1 Step 3: When FMM receives packets, it checks the 5-tuple against the MFDB FMM DEMO VIDEO 1 10 Tabletop experiment Diskpt-1 Diskpt-2 BWdetail recv BWdetail send MFDB tcpdump FMM eth6 192.168.100.50 xe-0/0/1 192.168.100.49 North-rt1 eth1 192.168.255.50 lo 172.16.0.17 eth2 192.168.100.54 GRE Tunnel Diskpt-1:gtun(192.168.100.78) North-rt1:gr-0/3/0.0(192.168.100.77) xe-0/0/2 192.168.100.17 xe-1/2/0 192.168.100.33 xe-0/0/0 192.168.100.18 xe-1/2/0 192.168.100.34 FMM DEMO VIDEO 2 ge-1/0/2 192.168.100.53 South-rt1 11 Four video files recording experiments • Demonstration of the MFDB in MySQL and MFDB-interface modules • Demonstration of the flow monitoring module (FMM) – Flow gets redirected Deactive interface to “prove” redirection • IDCIM 12 IDCIM demo • IDC Interface Module (IDCIM) run on host zelda2 at UVA • Started with OSCARS Java Client • Connect to OSCARS test IDC server and subscribe for notifications – ./run.sh subscribe -repo conf/axis-tomcat -url https://oscarsdev.es.net/axis2/services/OSCARSNotify producer https://oscarsdev.es.net/axis2/services/OSCARS -consumer http://128.143.10.221:8070 -topics idc:INFO • Automatic signaling 13 IDC-Interface Module (IDCIM) interaction with FMM and IDC 14 Threaded design (C++) (Java) 15 IDCIM demo • Run client (submodule to be integrated into FMM) • Send request for circuit from client to IDCIM • IDCIM (Java) packages per IDCP and sends to IDC with “now” request and automatic signaling • When PathSetup is confirmed, IDCIM notifies client (FMM) • IDCIM DEMO VIDEO 16 Outline check • HNTES System Architecture • HNTES Demo on Tabletop testbed Details of OFAT: Offline Flow Analysis Tool Discussion of what flows to redirect to circuits • • • • PerfSONAR OWAMP data analysis Elephant-vs-mice simulation study Special issue of IEEE Communications Magazine Summary 17 OFAT demonstration Not yet terminated http://www.ece.virginia.edu/mv/research/DOE09/index.html http://www.ece.virginia.edu/mv/research/DOE09/software/software.html 18 OFAT: Offline Flow Analysis Tool • Design: – Download Netflow data from router – Use flow-export tools to get ASCII file – Shows 5-tuple, bytes, timestamps of first and last packet in flow – Statistical package R programs: • Find flowlengths and isolate out flows of length 59sec from each 5-min file (FindLongFlow.R) • Concatenate flows from all 5-minute files in one day (one week): (Concatenate.R) – gaps (1-in-100 sampling): 5-minute gaps acceptable – “definition” of “long flow”: >= 10minutes – Output: all flows longer than 10 minutes 19 Methodology contd. • Sort long flows by protocol number and only save tcp, GRE, ESP, AH, flows (removed ICMP and UDP) and print statistics (Protocol.R) • Sort on ip protocol field and src and dst ports, and separate out flows for different applications into different files (Port.R) • Match long flow IDs and short flow IDs (LongShortMatch1.R and *2.R) • Identifies long flows that did not occur as short flows (AddFlowLength.R) • Output: MFDB_Long_Only_Flowlength (5-tuple and duration; Use -1 for don’t care fields) 20 Data files and R files • Data files from KANS I2 router, July 7, 2009: to demonstrate • AllLongFlow.txt • Statistics.txt • One example: rsync.txt • File used to populate MFDB: MFDB_Long_Only_Flowlength • Example R files: FindLongFlow.R, Concatenate.R 21 Three-track approach to understanding long flows I. Netflow data (analysis with R programs) Long flows separated by apps III. Understanding applications (with tcpdump, talking to developers: SCP, SFTP, GridFTP, BBCP) II. Network requirements workshop reports (“human knowledge” mining) IP addresses for scientific computing (data transfers) servers Goal: Identify suitable candidate flows for the MFDB 22 Track I: Netflow analysis • CHIC and LOSA routers of Internet2 • One-day data analysis • 5-day (Mon-Fri) analysis 23 Unidata one-day (HYNES) srcIP dstIP srcPort dstPort 128.117.136.0 35.8.8.0 388 38650 128.117.136.0 35.8.8.0 388 128.117.136.0 35.8.8.0 128.117.136.0 protocol firstunix lastunix flowlength 6 1246923494 1246923553 59.00099993 38650 6 1246923734 1246923794 59.53600001 388 38650 6 1246923794 1246923854 59.56599998 35.8.8.0 388 38650 6 1246923854 1246923914 59.69000006 128.117.136.0 35.8.8.0 388 38650 6 1246923914 1246923973 59.398 128.117.136.0 35.8.8.0 388 38650 6 1246923974 1246924034 59.648 128.117.136.0 35.8.8.0 388 38650 6 1246924154 1246924214 59.80200005 128.117.136.0 35.8.8.0 388 38650 6 1246924214 1246924274 59.68599987 128.117.136.0 35.8.8.0 388 38650 6 1246924274 1246924334 59.61099982 128.117.136.0 35.8.8.0 388 38650 6 1246924334 1246924394 59.45600009 128.117.136.0 35.8.8.0 388 38650 6 1246924394 1246924454 59.48799992 128.117.136.0 35.8.8.0 388 38650 6 1246924455 1246924514 59.08899999 128.117.136.0 35.8.8.0 388 38650 6 1246924514 1246924573 59.00699997 128.117.136.0 35.8.8.0 388 38650 6 1246924574 1246924633 59.36100006 14 minutes Between NCAR and Michigan State University 24 Top ten fat flows in one day bytes srcIP dstIP srcport dstport 542582720 198.108.24.0 210132404 204.228.64.0 0 0 50 1246879655 1246891060 11405.383 131.225.192.0 129.114.48.0 45677 22 6 1246879655 1246890520 10865.413 186519604 128.135.64.0 131.142.152.0 22 58942 6 1246891541 1246893460 1919.008 165567747 198.32.8.0 198.32.8.0 0 0 47 1246874013 1246874133 119.8869998 146416660 208.100.88.0 141.142.24.0 43094 22 6 1246861049 1246868372 7322.738 127799228 208.100.88.0 141.142.24.0 43094 22 6 1246882716 1246890100 7384.865 113470332 128.117.136.0 128.255.24.0 388 42707 6 1246861049 1246879356 18306.468 106577448 198.108.24.0 204.228.64.0 0 0 50 1246921790 1246923291 1500.479 101287624 131.225.192.0 129.114.48.0 45677 22 6 1246873833 1246878456 4622.681 152.46.0.0 128.255.56.0 873 1934 6 1246861049 1246869092 8042.826 91662040 protocol firstunix lastunix flow length encapsulated:3; ssh:5; Unidata:1; rsync:1 2 long ssh flows to University of Texas at Austin Texas Advanced Computing Center (129.114.48.0) from Fermilab (131.225.192.0). 141.142.24.0 corresponds to NCSA (National Center for Supercomputing Applications) for two other ssh flows. The Unidata LDM flow is from NCAR (National Center for Atmospheric Research) with address 128.117.136.0. 25 Data for a per-day basis five-weekday period (July 6-10, 2009) date July 06 July 07 July 08 July 09 July 10 Longest 18306 (113470332 bytes) 25326 (112768756 bytes) 21307 (80492044 bytes) 15364 (30196708 bytes) 23825 (1544080 bytes) fattest 542582720 (11405 seconds) 867174480 (10443 seconds) 185912032 (4562 seconds) 241363080 (3360 seconds) 310908448 (779 seconds) longest 26882 (58425675 bytes) 23220 (216722816 bytes) 18004 (26475856 bytes) 19986 (40811386 bytes) 15964 (142172096 bytes) fattest 187357504 (20402 seconds) 216722816 (23220 seconds) 349049492 (5223 seconds) 385387604 (7201 seconds) 164962944 (2041 seconds) CHIC LOSA Fattest data: remember it is 1-in-100 sampled data 26882 sec = 7.5 hours 26 Data for a five-weekday period (July 6-10, 2009) Router CHIC LOSA Number of flows 841062272 268933244 Number (%) of long flows 35632 (0.00424% ) 32660 (0.012%) Total number of bytes 3.11946E+12 1.17578E+12 Number and % of bytes in long flows 1.111436E+11 (3.563%) 101783460037 (8.66%) Number of long flows of different types 33241 (TCP), 0 (IP), 260(GRE), 211(ESP), 0(AH) 26618 (TCP), 67 (IP), 206(GRE), 320(ESP), 7(AH) Number of long flows of different applications (without counting long ACK flows) 447 (20:ftp), 3023 (22:ssh), 4 (25:smtp), 3078 (80: http), 63 (119: nntp), 1 (143:imap), 1690 (388: unidata), 142 (443:https) , 25 (554 :rtsp), 560 (873:rsync), 2545 (unassigned), 1134 (dynamic-and-private), SUM =12712 1202(20:ftp), 3109 (22:ssh), 8 (25:smtp), 3528 (80: http), 1 (119: nntp), 0 (143:imap), 426 (388: unidata), 307 (443:https) , 15 (554 :rtsp), 570 (873:rsync), 1068(unassigned), 574 (dynamicand-private), SUM =10808 Fattest flow (bytes) 867174480 (10443 seconds) 1779002612 (7381 seconds) Longest flow (seconds) 25326 (112768756 bytes) 26882 (58425675 bytes) 27 Repeat customers: ssh long flows Router Number of ssh long flows that occured on multiple days in that one-5-day period (candidates for MFDB) LOSA CHIC 2days 3 days 4 days 5 days 2days 3 days 4 days 5 days 59+60 +55+5 5=229 38+42+ 37=117 32+30= 62 28 62+67+ 54+53= 236 41+32+ 33=106 20+23= 43 17 28 Findings from track II • Teragrid servers: 15 sites (server names and IP addresses found) • ESG data grid servers found • So far: NP and BES reports studied • Number of servers found so far: 51 (BES: single) + 15 (BES: ranges) + 39 (NP) • Some IP address ranges used for participating institutions 29 “Match” rate between Track I and Track II • • • CHIC LOSA NP 10060 (29.84%) 1128 (4.14%) BES 3706 (9.12%) 1254 (4.6%) Percent of flows for which the src or dst IP address matches one of the server addresses found from the Track II study of science projects Number of long flows in CHIC is 33717 Number of long flows in LOSA is 27279 30 ESnet Netflow data analysis • Sent OFAT R programs to Chris Tracy • Chris ran these programs on one day data from an ESnet’s “busy” router • Here are preliminary results 31 Preliminary ESnet results • Took 1.5 hours to analyze one day’s data from busy ESnet router on Aug. 17, 2010 • Used gap threshold of 5 minutes and 10 minutes as “long flow” definition • LongFlow.txt: 69403 (59 sec: before concatenation) • All_Long_Flows: 157 (after concatenation) • MFDB_Long_Only_Flowlength: 24 • Duration in this “Long Flow Only” file: – 659 (min) to 9723 (max) • Can use All Long Flows and have HNTES wait to see multiple packets before triggering circuit setup (to check that it is not a short flow) 32 ESnet busy router one-day Statistics • • • • • Number of long flows: 193 (157+36 non-terminated) Bytes in long flows: 1101995002 (~1TB) Number of flows: 3339659 Bytes in all flows: 57901985032 (~50TB) Protocol Statistics: TCP (155) UDP (2), IP, GRE, AH, ESP (all 0) • Port Statistics: FTP, SMTP, HTTPs,NNTP, IMAP, RTSP, rsync (all 0), ssh (32), HTTP (2) Unidata (5), Unassigned (9), Dynamic_private (4) 33 Question: “What kinds of flows should we want to move to SDN” • Joe Metzger: “For the most part, it isn't worth it to us to touch anything less than 100Mbps, or possibly even less than 500Mbps. To some extent, it depends on duration. If somebody is going to nail up 100Mbps flows for months, yes, we would probably want to move that to a circuit. But if it is a 100Mbps flow that lasts a few hours, it isn't a big concern. However, if it is 100 100Mbps parallel flows between a group of hosts in one location, and a group of hosts at another location -- then yes, we would want to put that traffic onto an SDN circuit. Most of the very large flows that have already been moved to SDN are in the 1-10Gbps range and lasting for hours, for example: – • https://stats1.es.net/graphite/render/?width=586&height=303&_salt=128216106 4.328&target=fnal-mr2.interface.xe7_0_0%403503.out&from=10%3A30_20100817&until=20%3A30_20100817 Chris or Joe: Think of a curve where one axis is bandwidth and the other axis is time. We could define points, such that if a flow (or group of flows) falls below it -- we don't worry about moving it to SDN. But if it is above some threshold, we do want to move it... 34 Next steps • Rethinking definition of heavy-hitter – Our focus was on duration owing to long circuit setup delay • Lan and Heidemann 2006 paper: – – – – Elephants vs mice (Size: number of bytes) Cheetah vs snail (Rate) Tortoises vs dragonfly (Duration) Porcupine vs stingrays (Burstiness) • Two dimensions – Size, rate, burstiness dimension of “heavy hitter” – Group flows together instead of single flows 35 PerfSONAR OWAMP data analysis 36 PerfSONAR OWAMP data analysis • One-way Active Measurement Protocol(OWAMP) • Packet interval: 0.1 sec – 10 packets per sec – 600 packets per minute • Use perl programs provided by perfSONAR • Sample columns of the OWAMP data file: – endTime loss maxError max_delay min_delay sent startTime 37 PerfSONAR OWAMP data analysis • One day’s data from 23:00 Sep 14 to 23:00 Sep 15, 2010 for ELPA to BOIS • minDelay ≈ 15ms • Inter-Quartile Range (IQR) of maxDelay: 38 PerfSONAR OWAMP data analysis • Max delay plot: – ELPA-BOIS – ALBU-DENV • Overlapping paths • Appears to be due to traffic not host issues 39 Simulation study of heavy hitter flows • ns2 simulations ongoing • Create background load at varying levels of utilization • Run heavy-hitter flow (size, rate, burstiness) • Experiment 1: Delay impact – Run RTP/UDP flow – Measure and quantify delay impact on RTP/UDP flow at different levels of utilization • Experiment 2: Fairness impact – Run mice transfers (size) – Raj Jain fairness ratio: throughput, response time/service time 40 Special issue of IEEE Communications Magazine • Topic: – Hybrid Networking: Evolution Towards Combined IP Services and Dynamic Circuit Network Capabilities • Tentative schedule: – Manuscripts due: Nov 1, 2010 – Acceptance notification: Jan 15, 2010 – Tentative Issue of the Feature Topic: May 2011 • Guest editors – Admela Jukan, Technische Universität Carolo-Wilhelmina zu Braunschweig – Malathi Veeraghavan, University of Virginia – Masum Hasan, Cisco Systems • http://dl.comsoc.org/ci1/info/cfp/cfpcommag0511.htm 41 Summary • Project has several parallel tracks – HNTES software development – Testing software on ANI Tabletop testbed – Netflow data analysis and quantify “value” of redirection – Simulation study to identify best candidate flows for redirection – PerfSONAR OWAMP data analysis to characterize delay distribution across ESnet paths – IEEE Communication Magazine special issue 42