Xiaowei Yang (Duke University)
News: Buffalo as Data Center
Mecca
• $1.9 billion, at least 200 employees
• Low-cost electric power, tax incentives, plenty of shovel-ready sites, cool climate
• Cloud Computing
– Elasticity
– Pay-as-you-go
• Challenges
– Security: co-residence, inference
– Performance
• Coarse-grained sharing
• Lack of virtualized interface for specialized hardware
• Cloud Applications
– Execution augmentation for mobile devices
– Energy saving for mobile
– Energy saving for desktops
– Disaster recovery
The Case for Energy-Oriented
Partial Desktop Migration
Nilton Bila†, Eyal de Lara†, Matti
Hiltunen, Kaustubh Joshi,
H. Andr´es Lagar-Cavillaand M.
Satyanarayanan
• Offices and homes have many PCs
• But, they areoften left running idle
– PCs idle on average 12 hours a day
• “Skilled in the art of being idle” by Nedevschi et al. in NSDI 2009
– 60% of desktops remain powered overnight
• “After-hours power status of office equipment in the USA” by Webber, in Energy 2006
• Dell Optiplex 745 Desktop
• Peak power: 280W
• Idle power: 102.1W
• Sleep power: 1.2W
• If we put one to sleep when it is idle, the saving is (102.1-1.2)W.
• Applications with always on semantics
– Skype, IM, email, personal media sharing
• Interspersed activities with idle periods
– Lunch break
– Chatting with colleagues
User0 User1 Dom0
Xen
• Full VM migration
– LiteGreen, USENIX 2010 best paper
– Encapsulate user session in VM
– When idle, migrate VM to consolidation server and power down PC
– When busy, migrate back to user’s PC
• Idle VM only access partial memory and disk state (working set)
• Migrate only the working set to a server
– Potentially a cloud server
– Cloud provider can further aggregate
• Small migration footprint
• Client
– Fast migration
– Low energy cost
• Network
– Reduce bandwidth demand
• Server
– More VMs per server
• Can its desktop save energy by sleeping when an VM runs on the cloud?
• Does the entire domain save energy by migrating idle sessions by sleeping?
• Prototyped simple on-demand migration approach with SnowFlock
– Prepared a VM image, and run the VM
– After five minutes, used SnowFlock to clone the VM
– Monitor memory and disk page migration to cloneVM
• Dell Optiplex 745 Desktop
– 4GB RAM, 2.66GHz Intel C2D
– Peak power: 280W
– Idle power: 102.1W
– Sleep power: 1.2W
• VM Image:
– Debian Linux 5
– 1GB RAM
– 12 GB disk
• Spatial locality
– Pre-fetching
• 98% of request arrive in close succession
Energy Savings: an hour-long trace
Hourly Energy Savings: an overnight session
• Saves 69% of energy
• A cloud node with 4GB of RAM can run
~30 VMs
• No partial migration
• V = 23
• Can it save cost?
– Network
– Cloud Rental
• Frequent power cycling reduces hw life expectancy and limits power savings
– Reduce number of sleep cycles and increase sleep duration
– Predict page access patterns and prefetch
– Leverage content addressable memory
• Fast reintegration
– Big Q: Can it be fast enough so that a user does not suffer a long delay?
• Policies
– When to migrate/re-integrate?
– When does the desktop go to sleep?
– On re-integration, should state be maintained in the cloud?
For how long?
Disaster Recovery as a Cloud
Service: Economic Benefits &
Deployment Challenges
Timothy Wood and Emmanuel Cecchet, University of
Massachusetts Amherst; K.K. Ramakrishnan, AT&T
Labs—Research; Prashant Shenoy, University of
Massachusetts Amherst; Jacobus van der Merwe,
AT&T Labs—Research; Arun Venkataramani,
University of Massachusetts Amherst
• Disasters cause expensive application downtime
• Truck crash shuts down Amazon EC2 site center (May 2010)
• Lightning strikes EC2 data (May 2009)
• Comcast Down: Hunter shoots cable
(2008)
• Squirrels bring down NASDAQ exchange (1987 and 1994)
• Customer: pay-as-you-go and elasticity
– Normal is cheap (fewer resources for backup than normal operations)
– Rapidly scale up resources after disaster is detected
• Provider: high degree of multiplexing
– Customers will not fail at once
– Can offer extra services like disaster detection
• Use DR services to prevent lengthy service disruptions
• Data backups + failover mechanism
– Periodically replicate state
– Switch to backup site after disaster
• Recovery Point Objective (RPO): the most recent backup time prior to any failure
• Recovery Time Objective (RTO): how long it can take for an application to come back online after a failure occurs
– Time to detect failure
– Provision servers
– Initialize applications
– Configure networks to connect
• Performance
– Have a minimal impact on the performance of each application being protected under failure-free operation
– How can DR impact performance?
• Consistency
– The application can be restored to a consistent state
• Geographic separation
– Challenge: increasing network latency
• Hot Backup Site
– Provides a set of mirrored stand-by servers that are always available
– Minimal RTO and RPO
– Use synchronous replication to prevent any data loss
• Cheaply synchronize state during normal operations
• Obtain resources on demand after failure
• Short delay to resource provision and applications
• Compare DR in Colocation center to
Cloud
• Colocation
– pays for servers and space at all times
• Cloud DR
– Pays for resources as they are used
• RUBiS: an ebay-like multi-tier web application
– Three front ends
– One database server
– Only database state is replicated
• 99% Uptime cost (3 days of disaster per year)
• Post-disaster expensive due to high powered VM instance
• Overall cheaper because 99% Uptime
• Flexible
• Colo has a fixed cost regardless of RPO requirements
• Cloud DR’s benefits depend on
– Type of resources to run application
– Variation between normal and post-disaster costs
– RPO and RTO requirements
– Uptime
• Cloud is better if post-disaster cost much higher than normal mode
• How to maximize revenue?
– Makes money from storage in normal case
– But must pay for servers and keep them available for DR
– Possible solutions
• Spot instances (EC2 uses them)
• Higher prices for higher priority resources
• Correlated failures
– Large disasters may affect many
– Possible solutions
• Decide provision using a risk model
• Spread out customers
Mechanisms Needed for Cloud DR
• Network reconfiguration
– Application must be brought up online after moved to a backup site
– May require setting up a private business network
• Security and Isolation
• VM migration and cloning
– Restore an application after a disaster is handled
– Cloud provider does not support VM migration in and out cloud yet
• Cloud based disaster recovery
– Can reduce cost
• Up to 85% from a case study
– Flexible tradeoff between cost and RPO
• Next lecture
– Another cloud application for group collaboration
• Monday is in fall break
• Next Wednesday
– Midterm
– http://www.cs.duke.edu/courses/fall10/cps
296.2/syllabus.html