Slides

advertisement
VMTorrent: Scalable P2P Virtual
Machine Streaming
Joshua Reich, Oren Laadan, Eli Brosh,
Alex Sherman, Vishal Misra,
Jason Nieh, and Dan Rubenstein
1
VM Basics
• VM: software
implementation of
computer
VM
Image
VMM
VM
• Implementation
stored in VM image
• VM runs on VMM
– Virtualizes HW
– Accesses image
2
Where is Image Stored?
VM
Image
VMM
VM
3
Traditionally: Local Storage
Local
Storage
VMM
VM
4
IaaS Cloud: on Network Storage
Network
Storage
VMM
VM
VM
Image
5
Can Be Primary
Network
Storage
VM
Image
NFS/iSCSI
VMM
VM
e.g., OpenStack Glance
Amazon EC2/S3
vSphere network storage
6
Or Secondary
Network
Storage
VM
Image
Local
Storage
VMM
VM
e.g., Amazon EC2/EBS
vSphere local storage
7
Either Way, No Problem Here
Network
Storage
VMM
VM
VM
Image
8
Here?
Network
Storage
VM
Image
Bottleneck!
9
Lots of Unique VM Images
Network
Storage
on EC2 alone
54784 unique images*
*http://thecloudmarket.com/stats#/totals , 06 Dec 2012
10
Unpredictable Demand
Network
Storage
• Lots of customers
• Spot-pricing
• Cloud-bursting
11
Don’t Just Take My Word
• “The challenge for IT teams will be finding
way to deal with the bandwidth strain
during peak demand - for instance when
hundreds or thousands of users log on to a
virtual desktop at the start of the day - while
staying within an acceptable budget” 1
• “scale limits are due to simultaneous
loading rather than total number of nodes” 2
• Developer proposals to replace or
supplement VM launch architecture for
greater scalability 3
1. http://www.zdnet.com/why-so-many-businesses-arent-ready-for-virtual-desktops7000008229/?s_cid=e539
2. http://www.openstack.org/blog/2011/12/openstack-deployments-abound-at-austin-meetup129
12
3. https://blueprints.launchpad.net/nova/+spec/xenserver-bittorrent-images
Challenge: VM Launch in IaaS
• Minimize delay in VM execution
• Starting from time launch request arrives
• For lots of instances (scale!)
13
Naive Scaling Approaches
• Multicast
– Setup, configuration, maintenance, etc. 1
– ACK implosion
– “multicast traffic saturated the CPU on [Etsy]
core switches causing all of Etsy to be
unreachable“ 2
1. [El-Sayed et al., 2003; Hosseini et al., 2007]
2. http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication
14
Naive Scaling Approaches
• P2P bulk data download (e.g., Bit-Torrent)
– Files are big (waste bandwidth)
– Must wait until whole file available (waste time)
– Network primary? Must store GB image in RAM!
15
Both Miss Big Opportunity
VM image access
• Sparse

• Gradual
• Most of image doesn’t
need to be transferred
• Can start w/ just a
couple of blocks
16

VMTorrent Contributions
• Architecture
– Make (scalable) streaming possible:
Decouple data delivery from presentation
– Make scalable streaming effective:
Profile-based image streaming techniques
• Understanding / Validation
– Modeling for VM image streaming
– Prototype & evaluation
not highly
optimized
17
Talk
• Make (scalable) streaming possible:
Decouple data delivery from presentation
• Make scalable streaming effective:
Profile-based image streaming techniques
• VMTorrent Prototype & Evaluation
(Modeling along the way)
18
Decoupling Data Delivery
from Presentation
(Making Streaming Possible)
19
Generic Virtualization Architecture
• Virtual Machine Monitor virtualizes hardware
• Conducts I/O to image through file system
VM
VMM
Hardware
VM
Image
Host
FS
20
Cloud Virtualization Architecture
Network backend used
• Either to download image
• Or to access via remote FS
VM
VMM
Hardware
VM
Image
Network
Backend
FS
21
VMTorrent Virtualization Architecture
• Introduce custom file system
• Divide image into pieces
• But provide appearance
of complete image to VMM
VM
VMM
Hardware
VM
Image
Network
Backend
Custom
FS
FS
22
Decoupling Delivery from Presentation
VMM attempts to read piece 1
Piece 1 is present, read completes
VM
VMM
Hardware
0 1 2
3 4 5
6 7 8
Custom
FS
Network
Backend
23
Decoupling Delivery from Presentation
VMM attempts to read piece 0
Piece 0 isn’t local, read stalls
VMM waits for I/O to complete
VM stalls
VM
VMM
Hardware
0 1 2
3 4 5
6 7 8
Custom
FS
Network
Backend
24
Decoupling Delivery from Presentation
FS requests piece from backend
Backend requests from network
VM
VMM
Hardware
0 1 2
3 4 5
6 7 8
Custom
FS
Network
Backend
25
Decoupling Delivery from Presentation
Later, network delivers piece 0
Custom FS receives, updates piece
Read completes
VMM resumes VM’s execution
VM
VMM
Hardware
0 1 2
3 4 5
6 7 8
Custom
FS
Network
Backend
26
Decoupling Improves Performance
Primary Storage
No waiting for image
download to complete
VM
VMM
Hardware
0 1 2
3 4 5
6 7 8
Custom
FS
Network
Backend
27
Decoupling Improves Performance
Secondary Storage
No more writes or re-reads
over network w/ remote FS
X
VM
VMM
Hardware
0 1 2
3 4 5
6 7 8
Custom
FS
X
Network
Backend
28
But Doesn’t Scale
Assuming a single server,
the time to download a single piece is
t = W + S / (rnet / n)
•
•
•
•
W:
rnet :
S:
n:
wait time for first bit
network speed
piece size
# of clients
Transfer time,
each client gets
rnet / n
of server BW
29
Read Time Grows Linearly w/ n
Assuming a single server,
the time to download a single piece is
t = W + n * S / rnet
•
•
•
•
W:
rnet :
S:
n:
wait time for first bit
network speed
piece size
# of clients
Transfer time
linear w/ n
30
This Scenario
csd
VM
VMM
Hardware
0 1 2
3 4 5
6 7 8
Custom
FS
Network
Backend
31
Decoupling Enables P2P Backend
Alleviate network storage bottleneck
• Exchange pieces w/ swarm
Swarm
P2P copy must remain pristine
VM
VMM
Hardware
0 1 2
3 4 5
6 7 8
Custom
FS
0 1 2
3
4
5
Network
6 7 8
Backend
P2P
Manager
32
Space Efficient
FS uses pointers to P2P image
Swarm
FS does copy-on-write
VM
VMM
Hardware
0 1 2
3 4 5
6 7 8
Custom
FS
0 1 2
3 4 5
6 7 8
P2P
Manager
33
Minimizing Stall Time
Non-local piece accesses
Swarm
Trigger high priority requests
4!
VM
VMM
Hardware
4?
0 1 2
3 4 5
6 7 8
Custom
FS
4?
0 1 2
3 4 5
6 7 8
P2P
Manager
34
P2P Helps
Now, the time to download a single piece is
Transfer time
independent of n
t = W(d) + S / rnet
Wait is function
of diversity
•
•
•
•
•
W(d) : wait time for first bit as function of
d:
piece diversity
rnet :
network speed
S:
piece size
n:
# of peers
35
High Diversity Swarm Efficiency
36
Low Diversity Little Benefit
Nothing
to share
37
P2P Helps, But Not Enough
All peers request same pieces at same time
t = W(d) + S / rnet
 Low piece diversity
 Long wait (gets worse as n grows)
 Long download times
38
This Scenario
p2pd
VM
VMM
Hardware
0 1 2
3 4 5
6 7 8
Custom
FS
Swarm
0 1 2
3 4 5
6 7 8
P2P
Manager
39
Profile-based Image
Streaming Techniques
(Making Streaming Effective)
40
How to Increase Diversity?
Need to fetch pieces that are
• Rare: not yet demanded by many peers
• Useful: likely to be used by some peer
41
Profiling
• Need useful pieces
• But only small % of VM image accessed
• We need to know which pieces accessed
• Also, when (need later for piece selection)
42
Build Profile
• One profile for each VM/workload
• Ran one or more times (even online)
• Use FS to track
– Which pieces accessed
– When pieces accessed
• Entries w/ average appearance time,
piece index, and frequency
43
Piece Selection
• Want pieces not yet demanded by many
• Don’t know piece distribution in swarm
• Guess others like self
• Gives estimate when pieces likely needed
44
Piece Selection Heuristic
• Randomly (rarest first) pick one of first k
pieces in predicted playback window
• fetch w/ medium priority (demand wins)
45
Profile-based Prefetching
• Increases diversity
• Helps even w/ no peers
(when ideal access exceeds network rate)
46
Obtain Full P2P Benefit
Profile-based window-randomized prefetch
t = W(d) + S / rnet
 High piece diversity
 Short wait (shouldn’t grow much w/ n)
 Quick piece download
47
Full VMTorrent Architecture
p2pp
VM
VMM
Hardware
0 1 2
3 4 5
6 7 8
Custom
FS
Swarm
0 1 2
3 4 5
6 7 8
profile
P2P Manager
48
Prototype
49
VMTorrent Prototype
Custom C
Using FUSE
VM
Hardware
Custom C++
& Libtorrent
0 1 2
3 4 5
6 7 8
Custom
FS
BT Swarm
0 1 2
3 4 5
6 7 8
profile
P2P Manager
50
Evaluation Setup
51
Testbeds
• Emulab [White, et. al, 2002]
– Instances on 100 dedicated hardware nodes
– 100 Mbps LAN
• VICCI [Peterson, et. al, 2011]
– Instances on 64 vserver hardware node slices
– 1 Gbps LAN
52
VMs
53
Workloads
• Short VDI-like tasks
• Some cpu-intensive, some I/O intensive
54
Assessment
• Measured total runtime
– Launch through shutdown
– (Easy to measure)
• Normalized against
memory-cached execution
– Ideal runtime for that set of hardware
– Allows easy cross-comparison
• different VM/workload combinations
• Different hardware platforms
55
Evaluation
56
100 Mbps Scaling
Starting to
increase
57
Due to Decreased Diversity
# peers increases  more demand requests to
seed  less opportunity to build diversity  longer
to reach max swarming efficiency + lower max 58
Due to Decreased Diversity
# peers increases  more demand requests to
seed  less opportunity to build diversity  longer
to reach max swarming efficiency + lower max 59
Due to Decreased Diversity
We optimized too much for single instance!
(choosing demand requests take precedence)
# peers increases  more demand requests to
seed  less opportunity to build diversity  longer
to reach max swarming efficiency + lower max 60
(Some) Future Work
• Piece selection
for better diversity
Current work orders of
magnitude better than
state-of-art
• Improved profiling
• DC-specific
optimizations
61
Demo
(video omitted for space)
62
See Paper for More Details
• Modeling
– Playback process dynamics
– Buffering (for prefetch)
– Full characterization of r incorporating impact
of centralized and distributed models on W
– Other elided details
• Plus
– More architectural discussion!
– Lots more experimental results!
63
Summary
• Scalable VM launching needed
• VMTorrent addresses by
– Decoupling data presentation from streaming
– Profile-based VM image streaming
• Straightforward techniques, implementation,
no special optimizations for DC
• Performance much better than state-of-art
– Hardware evaluation on multiple testbeds
– As predicted by modeling
64
Download