Faithful Reproduction of Network Experiments Dimosthenis Pediaditakis Charalampos Rotsos Andrew W. Moore firstname.lastname@cl.cam.ac.uk Computer Laboratory, Systems Research Group University of Cambridge, UK http://selena-project.github.io Research on networked systems: Yesterday 1 GbE 100 Mbps http://selena-project.github.io/ 1 GbE 100 Mbps ANCS 2014, Marina del Rey, Califoria, USA 1 GbE 100 Mbps 2 Research on networked systems: Modern era WAN link: 40++ Gbps 10 GbE 1 GbE http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 3 Simulation (ns3): Too much abstraction • Fat-Tree • 8x clients 12x switches • 1 GbE links 8 Gbps aggregeate • Ns3 – Flat model – 2.75x lower throughput http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 4 Emulation (MiniNet): Poor scalability • Identical experiment setup • MiniNet – Out of CPU cycles • 4.5x lower throughput • performance artifacts http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 5 Everything is a trade-off Fidelity Reproducibility Scalability • Natural for simulation • Emulation – MiniNet is the pioneer – How to maintain across different platforms ?? http://selena-project.github.io/ Emulation: Sacrifice scalability ANCS 2014, Marina del Rey, Califoria, USA Simulation: Sacrifice fidelity 6 SELENA: Standing on the shoulders of giants SIMULATION EMULATION SELENA HYBRID TESTBEDS Reproducibility Real Net Stacks Unmodified App Hardware Req. Scalability Fidelity Exec. speed • Fidelity: Emulation, Xen, real OS components • Reproducibility: MiniNet approach • Scalability: Time dilation (DieCast approach) Full user control: Trade execution speed for fidelity and scalability http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA API and experimental workflow Experiment description Python API Selena compiler http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 8 SELENA’s Emulation model over Xen Bridge http://selena-project.github.io/ Bridge OVS ANCS 2014, Marina del Rey, Califoria, USA 9 The concept of Time-Dilation Real time 1 tick = (1/C_Hz) seconds 10 Mbits data rateREAL = 10 / (6*C_Hz) Mbps Real Time 2x Dilated time (TDF = 2) OR (tick rate)/2 , C_Hz tick rate , 2*C_Hz Virtual time rateVIRT 10 Mbits data = 10 / (3*C_Hz) Mbps = 2 * rateREAL http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA I command you to slow down 10 Scaling resources via Time Dilation • STEP 1: Create a scenario • STEP 2: Choose a time dilation factor (TDF) – Linear and symmetric scaling of all resources • Network, CPU, ram BW, disk I/O • STEP 3: Control independently the “perceived” available resources – Configure via SELENA’s API independently • CPU (Xen Credit2) • Network (Xen VIF QoS, netem) • Disk I/O (in guests via cgroups) http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 11 Xen PV-guest Time-Keeping • Time – Wall clock time (epoch) – System time (boot) – Independent mode Hypervisor_set_timer_op set next event • rdtsc modes of operation XEN Clock Source TSC value XEN VIRQ • Scheduled timers rdtsc VIRQ_TIMER XEN Hypervisor http://selena-project.github.io/ – Native – Emulated ANCS 2014, Marina del Rey, Califoria, USA • Periodic timers • Loop delays 12 Implementing Time-Dilation Linux Guest Xen Hypervisor TSC value • Trap – Emulate - scale “rdtsc” • Native “rdtsc” (constant, invariant) - Start-of-day: dilated wallclock time - VPCU time: _u.tsc_timestamp = tsc_stamp; _u.system_time = system_time; _u.tsc_to_system_mul tsc_to_system_mul; Periodic VIRQ_TIMER is not= really used VCPUOP_set_singleshot_timer set_timer(&v->singleshot_timer, dilatedTimeout); Periodic VIRQ_TIMER implemented (but is not really used) http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 13 Summarizing the elements of Fidelity • Resource scaling via time dilation • Real Stacks and other OS components • Real Applications – Including SDN controllers • Realistic SDN switch models – Why is it important ? – How it affects performance ? http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 14 OpenFlow Switch X-Ray Control application complexity Control App Control App Network OS Control Channel OF Agent ASIC Available capacity, synchronicity - Scarce co-processor resources - Switch OS scheduling is non-trivial ASIC driver affects how fast the policy is configured in the ASIC PCI bus capacity is limited in comparison to data plane Control plane performance is critical for the data plane http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 15 Building an OpenFlow switch model • Pica8 P-3290 switch – Measure message processing performance (OFLOPS) – Extract latency characteristics of: • flow table management • the packet interception / injection mechanism • counters extraction • Configurable switch model – Replicate latency and loss characteristics – Implementation: Mirage-OS based switch • Flexible, functional, non-bloated code • Performance: uni-kernel • Small footprint: scalable emulations http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 16 Evaluation methodology 1. Run experiment on real hardware 2. Reproduce results in: 1. MiniNet 2. NS3 3. SELENA (for various TDF) 3. Compare against “real” http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 17 Throughput fidelity MiniNet and Ns3 - 2.7Gbps and 5.2Gbps SELENA - 10x dilation: 99.5% accuracy - executes 9x faster than Ns3 http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 18 Latency fidelity Setup - 18 nodes, 1Gbps links 10000 flows MiniNet &Ns3 accuracy: 32% and 44% Selena accuracy 71% with 5x dilation 98.7% with 20x dilation http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 19 1Mb TCP flows completion time exponential arrival λ = 0.02 SDN Control plane Fidelity http://selena-project.github.io/ Stepping behavior: - TCP SYN & SYNACK loss Mininet switch model: - does not capture this throttling effect The model is not capable to capture transient switch OS scheduling effects. ANCS 2014, Marina del Rey, Califoria, USA 20 Application fidelity (LAMP) • Fat-Tree CLOS – – – – 1 Gbps links 10x switches 4x Clients 4x WebServers: Apache2, PHP, MySQL, Redis, Wordpress http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 21 A layered SDN controller hierarchy 1st Layer Controller 2nd Layer Controller 4 pod, Fat-Tree topology, 1GbE links 32 Gbps aggregate traffic • More layers – Control decisions taken higher in the hierarchy The layered control-plane architecture – Flow setup latency increases • Network, Request pipelining, CPU load Question: does a layered controller hierarchy affect performance ? –How Resilience http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 22 Scalability analysis Bridge OVS Bridge • Fat-Tree topology, 1 GbE links, multi Gbit sink link • Domain-0 is allocated 4-cores – Why tops at 250% CPU utilisation ? • Near linear scalability http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 23 How to (not) use SELENA • SELENA is primarily a NETWORK emulation framework – Perfect match: network bound applications – Provides tuning knobs to experiment with: • CPU, disk I/O and Network relative performance • Real applications / SDN controllers / network stacks • Time dilation is not a panacea – – – – – Device-specific Disk IO performance Cache thrashing and data locality Multi-core effects (e.g. per-core lock contention) Hardware features (e.g. Intel DDIO) Scheduling effects of Xen at scale (100s of VMs) • Rule of thumb for choosing TDF – Low Dom-0 and Dom-U utilisation – Observation time-scales matter http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 24 Work in progress • API compatibility with MiniNet • Further improve scalability - Multi-machine emulation - Optimize guest-2-guest Xen communications • Features and use cases – SDN coupling with workload consolidation – Emulation of live VM migration – Incorporate energy models http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 25 SELENA is free and open. Give it a try: http://selena-project.github.io http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 26 http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 27 http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 28 http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 29 http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 30 http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA 31 Research on networked systems: past, present, future • Animation: 3 examples of networks. Examples will show the evolution of “network-characteristics” on which research is conducted: – Past: 2-3 Layers, Hierarchical, TOR, 100Mbps, bare metal OS – Present: Fat-tree, 1Gbps links, Virtualization, WAN links – Near future: Flexible architectures, 10Gbps, Elastic resource management, SDN controllers, OF switches, large scale (DC), • The point of this slide is that real-world systems progress at a fast pace (complexity, size) but common tools have not kept up with this pace • I will challenge the audience to think: – Which of the 3 examples of illustrated networks they believe they can model with existing tools – What level of fidelity (incl. Protocols, SDN, Apps, Net emulation) – What are the common sized and link speeds they can model http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 32 A simple example with NS-3 • Here I will assume a simple star-topology • 10x clients, 1x server, 1x switch (10Gbps aggregate) • I will provide the throughput plot and explain why performance sucks • Point out that NS3 is not appropriate for faster networks • Simplicity of models + non real applications • Using DCE: even slower, non full POSIXcompliant http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 33 A simple example with MiniNet • Same as before • Throughput plot • Better fidelity in terms of protocols, applications etc – Penalty in performance • Explain what is the bottleneck, especially in relation to MiniNet’s implementation http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 34 Everything is a trade-off • Nothing comes for free when it comes to modelling and the 3 keyexperimentation properties • MiniNet aims for fidelity – Sacrifices scalability • NS-3 aims for scalability (many abstractions) – Sacrifices fidelity, +scalability limitations • The importance of Reproducibility – MiniNet is a pioneer – difficult to maintain from machine to machine • MiniNet cannot guarantee that at the level of performance, only at the level of configuration Fidelity Reproducibility http://selena-project.github.io/ Scalability ANCS 2014, Marina del Rey, California, USA 35 SELENA: Standing on the shoulders of giants • Fidelity: use Emulation – Unmodified apps and protocols: fidelity + usability – XEN: Support for common OS, good scalability, great control on resources • Reproducible experiments – MiniNet approach, high-level experiment descriptions, automation • Maintain fidelity under scale – DieCast approach: time dilation (will talk more later on that) • The user is the MASTER: – Tuning knob: Experiment Execution speed http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 36 SELENA Architecture • Animation here: 3 steps show how an experiment is – Specified (python API) – compiled – deployed • Explain mappings of network entities-features to Xen emulation components • Give hints of optimization tweaks we use under the hood Experiment description Python API Selena compiler http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 37 Time Dilation and Reproducibility • Explain how time dilation also FACILITATES reproducibility across different platforms • Reproducibility – Replication of configuration • • • • Network architecture, links, protocols Applications Traffic / workloads How we do it in SELENA: Python API, XEN API – Reproduction of results and observed performance • Each platform should have enough resources to rund faithfully the experiment • How we do it in SELENA: time dilation – An older platform/hardware will require a different minimum TDF to reproduce the same results http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 38 Demystifying Time-Dilation 1/3 • Explain the concept in high-level terms – Give a solid example with a timeline • Similar to slide 8: http://sysnet.ucsd.edu/projects/timedilation/nsdi06-tdf-talk.pdf • Explain that everything happens at the H/V level – Guest time sandboxing (experiment VMs) – Common time for kernel + user space – No modifications for PV guests • Linux, FreeBSD, ClickOS, OSv, Mirage http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 39 Demystifying Time-Dilation 2/3 • Here we explain the low-level staff • Give credits to DieCast, but also explain the incremental work we did • Best to show/explain with an animation http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 40 Demystifying Time-Dilation 3/3 • Resources scaling – Linear and symmetric scaling for Network, CPU, ram BW, disk I/O – TDF only increases the perceived performance headroom of the above – SELENA allows for configuring independently the perceived speeds of • CPU • Network • Disk I/O (from within the guests at the moment -- cgroups) • Typical workflow 1. Create a scenario 2. Decide the minimum necessary TDF for supporting the desired (will see more later on that) 3. Independently scale resources, based on the requirements of the users and the focus of their studies http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 41 Summarizing the elements of Fidelity • Resource scaling via time dilation (already covered) • Real Stacks and other OS components • Real Applications – Including SDN controllers • Realistic SDN switch models – Why is it important – How much can it affect observed behaviours http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 42 Inside an OF switch • Present a model of an OF switch internals – Show components – Show paths / interactions which affect performance • Data plane (we do not model that currently) • Control plane Random image from the web. Just a placeholder http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 43 Building a realistic OF switch model • Methodology for constructing an empirical model – PICA-8 – OFLOPS measurements • Collect, analyze, extract trends • Stochastic model – Use a mirage-switch to implement the model • Flexible, functional, non-bloated code • Performant: uni-kernel, no context switches • Small footprint: scalable emulations http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 44 Evaluation methodology 1. Run experiment on real hardware 2. Reproduce results in: 1. MiniNet 2. NS3 3. SELENA (for various TDF) 3. Compare each one against “real” • We evaluate multiple aspects of fidelity: – – – – Data-Plane Flow-level SDN Control Application http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 45 Data-Plane fidelity • Figure from paper • Explain Star-topology • Show comparison of MiniNet + NS3 – Same figures from slides 2+3 but now compared against Selena + real • Point out how increasing TDF affects fidelity http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 46 Flow-Level fidelity • Figure from paper • Explain Fat-tree topology http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califorina, USA 47 Execution Speed • Compare against NS3, MiniNet • Point out that SELENA executes faster than NS3 – NS3 however replicates only half speed network • Therefore difference is even bigger http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 48 SDN Control plane Fidelity • Figure from paper • Explain experiment setup • Point out shortcomings of MiniNet – As good as OVS is • Point out terrible support for SDN by NS3 http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 49 Application level fidelity • • • • Figure from paper Explain the experiment setup Latency aspect Show how CPU utilisation matters for fidelity – Open the dialogue for the performance bottlenecks and limitations and make a smooth transition to the next slide http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 50 Near-linear Scalability • Figure from paper • Explain how is scalability determined for a given TDF http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 51 Limitations discussion • Explain the effects of running on Xen • Explain what happens if TDF is low and utilisation is high • Explain that insufficient CPU compromises – Emulated network speeds – Capability of guests to utilise the available bandwidth – Skews the performance of networked applications – Adds excessive latency • Scheduling also contributes http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 52 A more complicated example • Showcase the power of SELENA :P • Use the MRC2 experiment http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 53 Work in progress • API compatibility with MiniNet • Further improve scalability - Multi-machine emulation - Optimize guest-2-guest Xen communications • Features and use cases – SDN coupling with workload consolidation – Emulation of live VM migration – Incorporate energy models http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 54 SELENA is free and open. Give it a try: - http://selena-project.github.io http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA 55