Selena_ANCS_presentation

advertisement
Faithful Reproduction of
Network Experiments
Dimosthenis
Pediaditakis
Charalampos
Rotsos
Andrew W.
Moore
firstname.lastname@cl.cam.ac.uk
Computer Laboratory, Systems Research Group
University of Cambridge, UK
http://selena-project.github.io
Research on networked systems: Yesterday
1 GbE
100 Mbps
http://selena-project.github.io/
1 GbE
100 Mbps
ANCS 2014, Marina del Rey, Califoria, USA
1 GbE
100 Mbps
2
Research on networked systems: Modern era
WAN link: 40++ Gbps
10 GbE
1 GbE
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
3
Simulation (ns3): Too much abstraction
• Fat-Tree
• 8x clients
12x switches
• 1 GbE links
8 Gbps
aggregeate
• Ns3
– Flat model
– 2.75x lower
throughput
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
4
Emulation (MiniNet): Poor scalability
• Identical
experiment
setup
• MiniNet
– Out of CPU cycles
• 4.5x lower
throughput
• performance
artifacts
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
5
Everything is a trade-off
Fidelity
Reproducibility
Scalability
• Natural for simulation
• Emulation
– MiniNet is the pioneer
– How to maintain
across different
platforms ??
http://selena-project.github.io/
Emulation:
Sacrifice scalability
ANCS 2014, Marina del Rey, Califoria, USA
Simulation:
Sacrifice fidelity
6
SELENA: Standing on the shoulders of giants
SIMULATION
EMULATION
SELENA
HYBRID
TESTBEDS
Reproducibility
Real Net Stacks
Unmodified App
Hardware Req.
Scalability
Fidelity
Exec. speed
• Fidelity: Emulation, Xen, real OS components
• Reproducibility: MiniNet approach
• Scalability: Time dilation (DieCast approach)
Full user control:
Trade execution speed for fidelity and scalability
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
API and experimental workflow
Experiment description
Python API
Selena
compiler
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
8
SELENA’s Emulation model over Xen
Bridge
http://selena-project.github.io/
Bridge
OVS
ANCS 2014, Marina del Rey, Califoria, USA
9
The concept of Time-Dilation
Real time
1 tick = (1/C_Hz) seconds
10 Mbits data
rateREAL = 10 / (6*C_Hz) Mbps
Real
Time
2x Dilated time (TDF = 2)
OR
(tick rate)/2 , C_Hz
tick rate
, 2*C_Hz
Virtual
time
rateVIRT
10 Mbits data
= 10 / (3*C_Hz) Mbps = 2 * rateREAL
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
I command you
to slow down
10
Scaling resources via Time Dilation
• STEP 1: Create a scenario
• STEP 2: Choose a time dilation factor (TDF)
– Linear and symmetric scaling of all resources
• Network, CPU, ram BW, disk I/O
• STEP 3: Control independently the “perceived” available resources
– Configure via SELENA’s API independently
• CPU
(Xen Credit2)
• Network (Xen VIF QoS, netem)
• Disk I/O (in guests via cgroups)
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
11
Xen PV-guest Time-Keeping
• Time
– Wall clock time
(epoch)
– System time
(boot)
– Independent mode
Hypervisor_set_timer_op
set
next
event
• rdtsc modes of
operation
XEN
Clock Source
TSC
value
XEN VIRQ
• Scheduled timers
rdtsc
VIRQ_TIMER
XEN Hypervisor
http://selena-project.github.io/
– Native
– Emulated
ANCS 2014, Marina del Rey, Califoria, USA
• Periodic timers
• Loop delays
12
Implementing Time-Dilation
Linux Guest
Xen Hypervisor
TSC value
• Trap – Emulate - scale “rdtsc”
• Native “rdtsc” (constant, invariant)
- Start-of-day: dilated wallclock time
- VPCU time:
_u.tsc_timestamp
= tsc_stamp;
_u.system_time
= system_time;
_u.tsc_to_system_mul
tsc_to_system_mul;
Periodic
VIRQ_TIMER is not= really
used
VCPUOP_set_singleshot_timer
set_timer(&v->singleshot_timer, dilatedTimeout);
Periodic VIRQ_TIMER implemented
(but is not really used)
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
13
Summarizing the elements of Fidelity
• Resource scaling via time dilation
• Real Stacks and other OS components
• Real Applications
– Including SDN controllers
• Realistic SDN switch models
– Why is it important ?
– How it affects performance ?
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
14
OpenFlow Switch X-Ray
Control application complexity
Control
App
Control
App
Network OS
Control
Channel
OF Agent
ASIC
Available capacity, synchronicity
- Scarce co-processor resources
- Switch OS scheduling is non-trivial
ASIC driver affects how fast the policy is
configured in the ASIC
PCI bus capacity is limited in
comparison to data plane
Control plane performance is critical for the data plane
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
15
Building an OpenFlow switch model
• Pica8 P-3290 switch
– Measure message processing performance (OFLOPS)
– Extract latency characteristics of:
• flow table management
• the packet interception / injection mechanism
• counters extraction
• Configurable switch model
– Replicate latency and loss characteristics
– Implementation: Mirage-OS based switch
• Flexible, functional, non-bloated code
• Performance: uni-kernel
• Small footprint: scalable emulations
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
16
Evaluation methodology
1. Run experiment on real hardware
2. Reproduce results in:
1. MiniNet
2. NS3
3. SELENA (for various TDF)
3. Compare against “real”
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
17
Throughput fidelity
MiniNet and Ns3
- 2.7Gbps and 5.2Gbps
SELENA
- 10x dilation: 99.5% accuracy
- executes 9x faster than Ns3
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
18
Latency fidelity
Setup
- 18 nodes,
1Gbps links
10000 flows
MiniNet &Ns3 accuracy:
32% and 44%
Selena accuracy
71% with 5x dilation
98.7% with 20x dilation
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
19
1Mb TCP flows completion time
exponential arrival λ = 0.02
SDN Control plane Fidelity
http://selena-project.github.io/
Stepping behavior:
- TCP SYN & SYNACK loss
Mininet switch model:
- does not capture this
throttling effect
The model is not capable to capture
transient switch OS scheduling effects.
ANCS 2014, Marina del Rey, Califoria, USA
20
Application fidelity (LAMP)
• Fat-Tree CLOS
–
–
–
–
1 Gbps links
10x switches
4x Clients
4x WebServers:
Apache2, PHP,
MySQL, Redis,
Wordpress
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
21
A layered SDN controller hierarchy
1st Layer Controller
2nd Layer Controller
4 pod, Fat-Tree topology, 1GbE links
32 Gbps aggregate traffic
• More layers
– Control decisions taken higher in the hierarchy
The layered control-plane architecture
– Flow setup latency increases
• Network, Request pipelining,
CPU load
Question:
does a layered controller hierarchy affect performance ?
–How
Resilience
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
22
Scalability analysis
Bridge
OVS
Bridge
• Fat-Tree topology, 1 GbE links, multi Gbit sink link
• Domain-0 is allocated 4-cores
– Why tops at 250% CPU utilisation ?
• Near linear scalability
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
23
How to (not) use SELENA
• SELENA is primarily a NETWORK emulation framework
– Perfect match: network bound applications
– Provides tuning knobs to experiment with:
• CPU, disk I/O and Network relative performance
• Real applications / SDN controllers / network stacks
• Time dilation is not a panacea
–
–
–
–
–
Device-specific Disk IO performance
Cache thrashing and data locality
Multi-core effects (e.g. per-core lock contention)
Hardware features (e.g. Intel DDIO)
Scheduling effects of Xen at scale (100s of VMs)
• Rule of thumb for choosing TDF
– Low Dom-0 and Dom-U utilisation
– Observation time-scales matter
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
24
Work in progress
• API compatibility with MiniNet
• Further improve scalability
- Multi-machine emulation
- Optimize guest-2-guest Xen communications
• Features and use cases
– SDN coupling with workload consolidation
– Emulation of live VM migration
– Incorporate energy models
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
25
SELENA is free and open.
Give it a try: http://selena-project.github.io
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
26
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
27
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
28
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
29
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
30
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
31
Research on networked systems:
past, present, future
• Animation: 3 examples of networks.
Examples will show the evolution of “network-characteristics” on
which research is conducted:
– Past: 2-3 Layers, Hierarchical, TOR, 100Mbps, bare metal OS
– Present: Fat-tree, 1Gbps links, Virtualization, WAN links
– Near future: Flexible architectures, 10Gbps, Elastic resource management,
SDN controllers, OF switches, large scale (DC),
• The point of this slide is that real-world systems progress at a fast pace
(complexity, size) but common tools have not kept up with this pace
• I will challenge the audience to think:
– Which of the 3 examples of illustrated networks they believe they can
model with existing tools
– What level of fidelity (incl. Protocols, SDN, Apps, Net emulation)
– What are the common sized and link speeds they can model
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
32
A simple example with NS-3
• Here I will assume a simple star-topology
• 10x clients, 1x server, 1x switch (10Gbps
aggregate)
• I will provide the throughput plot and explain
why performance sucks
• Point out that NS3 is not appropriate for faster
networks
• Simplicity of models + non real applications
• Using DCE: even slower, non full POSIXcompliant
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
33
A simple example with MiniNet
• Same as before
• Throughput plot
• Better fidelity in terms of protocols, applications
etc
– Penalty in performance
• Explain what is the bottleneck, especially in
relation to MiniNet’s implementation
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
34
Everything is a trade-off
• Nothing comes for free when it comes to modelling and the 3 keyexperimentation properties
• MiniNet aims for fidelity
– Sacrifices scalability
• NS-3 aims for scalability (many abstractions)
– Sacrifices fidelity, +scalability limitations
• The importance of Reproducibility
– MiniNet is a pioneer
– difficult to maintain from machine to machine
• MiniNet cannot guarantee that at the level of performance, only at the level of configuration
Fidelity
Reproducibility
http://selena-project.github.io/
Scalability
ANCS 2014, Marina del Rey, California, USA
35
SELENA: Standing on the shoulders of giants
• Fidelity: use Emulation
– Unmodified apps and protocols: fidelity + usability
– XEN: Support for common OS, good scalability, great control on resources
• Reproducible experiments
– MiniNet approach, high-level experiment descriptions, automation
• Maintain fidelity under scale
– DieCast approach: time dilation (will talk more later on that)
• The user is the MASTER:
– Tuning knob: Experiment Execution speed
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
36
SELENA Architecture
• Animation here: 3 steps show how an experiment is
– Specified (python API)
– compiled
– deployed
• Explain mappings of network entities-features to Xen
emulation components
• Give hints of optimization tweaks we use under the hood
Experiment description
Python API
Selena
compiler
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
37
Time Dilation and Reproducibility
• Explain how time dilation also FACILITATES
reproducibility across different platforms
• Reproducibility
– Replication of configuration
•
•
•
•
Network architecture, links, protocols
Applications
Traffic / workloads
How we do it in SELENA: Python API, XEN API
– Reproduction of results and observed performance
• Each platform should have enough resources to rund faithfully
the experiment
• How we do it in SELENA: time dilation
– An older platform/hardware will require a different minimum TDF to
reproduce the same results
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
38
Demystifying Time-Dilation 1/3
• Explain the concept in high-level terms
– Give a solid example with a timeline
• Similar to slide 8: http://sysnet.ucsd.edu/projects/timedilation/nsdi06-tdf-talk.pdf
• Explain that everything happens at the H/V level
– Guest time sandboxing (experiment VMs)
– Common time for kernel + user space
– No modifications for PV guests
• Linux, FreeBSD, ClickOS, OSv, Mirage
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
39
Demystifying Time-Dilation 2/3
• Here we explain the low-level staff
• Give credits to DieCast, but also explain the
incremental work we did
• Best to show/explain with an animation
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
40
Demystifying Time-Dilation 3/3
• Resources scaling
– Linear and symmetric scaling for Network, CPU, ram BW, disk
I/O
– TDF only increases the perceived performance headroom of
the above
– SELENA allows for configuring independently the perceived
speeds of
• CPU
• Network
• Disk I/O (from within the guests at the moment -- cgroups)
• Typical workflow
1. Create a scenario
2. Decide the minimum necessary TDF for supporting the
desired (will see more later on that)
3. Independently scale resources, based on the requirements
of the users and the focus of their studies
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
41
Summarizing the elements of Fidelity
• Resource scaling via time dilation (already
covered)
• Real Stacks and other OS components
• Real Applications
– Including SDN controllers
• Realistic SDN switch models
– Why is it important
– How much can it affect observed behaviours
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
42
Inside an OF switch
• Present a model of an OF switch internals
– Show components
– Show paths / interactions which affect performance
• Data plane (we do not model that currently)
• Control plane
Random image
from the web.
Just a
placeholder
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
43
Building a realistic OF switch model
• Methodology for constructing an empirical
model
– PICA-8
– OFLOPS measurements
• Collect, analyze, extract trends
• Stochastic model
– Use a mirage-switch to implement the model
• Flexible, functional, non-bloated code
• Performant: uni-kernel, no context switches
• Small footprint: scalable emulations
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
44
Evaluation methodology
1. Run experiment on real hardware
2. Reproduce results in:
1. MiniNet
2. NS3
3. SELENA (for various TDF)
3. Compare each one against “real”
•
We evaluate multiple aspects of fidelity:
–
–
–
–
Data-Plane
Flow-level
SDN Control
Application
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
45
Data-Plane fidelity
• Figure from paper
• Explain Star-topology
• Show comparison of MiniNet + NS3
– Same figures from slides 2+3 but now compared
against Selena + real
• Point out how increasing TDF affects fidelity
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
46
Flow-Level fidelity
• Figure from paper
• Explain Fat-tree topology
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califorina, USA
47
Execution Speed
• Compare against NS3, MiniNet
• Point out that SELENA executes faster than NS3
– NS3 however replicates only half speed network
• Therefore difference is even bigger
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
48
SDN Control plane Fidelity
• Figure from paper
• Explain experiment setup
• Point out shortcomings of MiniNet
– As good as OVS is
• Point out terrible support for SDN by NS3
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
49
Application level fidelity
•
•
•
•
Figure from paper
Explain the experiment setup
Latency aspect
Show how CPU utilisation matters for fidelity
– Open the dialogue for the performance bottlenecks
and limitations and make a smooth transition to the
next slide
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
50
Near-linear Scalability
• Figure from paper
• Explain how is scalability determined for a given
TDF
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
51
Limitations discussion
• Explain the effects of running on Xen
• Explain what happens if TDF is low and
utilisation is high
• Explain that insufficient CPU compromises
– Emulated network speeds
– Capability of guests to utilise the available
bandwidth
– Skews the performance of networked applications
– Adds excessive latency
• Scheduling also contributes
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
52
A more complicated example
• Showcase the power of SELENA :P
• Use the MRC2 experiment
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
53
Work in progress
• API compatibility with MiniNet
• Further improve scalability
- Multi-machine emulation
- Optimize guest-2-guest Xen communications
• Features and use cases
– SDN coupling with workload consolidation
– Emulation of live VM migration
– Incorporate energy models
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
54
SELENA is free and open.
Give it a try:
- http://selena-project.github.io
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
55
Download