DOLLY: Virtualization-Driven Database Provisioning for the Cloud Emmanuel Cecchet Joint work with Rahul Singh, Upendra Sharma and Prashant Shenoy THE CLOUD Virtualization Pay as you go Elasticity Internet Frontend/ Load balancer App. Servers VEE – Mar 10, 2011 – cecchet@cs.umass.edu Databases 2 PROVISIONING IN THE CLOUD Based on request volume and resource usage Reactions based on thresholds Works for stateless tiers Internet Frontend/ Load balancer App. Servers Provisioning logic VEE – Mar 10, 2011 – cecchet@cs.umass.edu Databases 3 5pm Replay updates MySQL backup MySQL restore New replica Replica ready VEE – Mar 10, 2011 – cecchet@cs.umass.edu WHY IS IT HARD TO ADD A DB REPLICA? snapshot 4 2pm VEE – Mar 10, 2011 – cecchet@cs.umass.edu WHY IS IT HARD TO ADD A DB REPLICA? 5pm Replay updates 2pm snapshot MySQL restore New replica Replica ready 5 RUBiS x users MySQL backup MySQL snapshot restore RUBiS New replica 14min Queries are slow… Let’s improve this! CREATE INDEX i1 on Table1 ; CREATE INDEX i2 on Table 2 RUBiS x users with indices MySQL backup MySQL snapshot restore RUBiS New replica 1H36min VEE – Mar 10, 2011 – cecchet@cs.umass.edu WHY IS IT HARD TO ADD A DB REPLICA? 6 Warehouse 1GB Warehouse 10GB PostgreSQL PostgresSQL backup snapshot restore PostgreSQL PostgresSQL backup snapshot restore Warehouse 1GB 24min Warehouse 10GB 1H30min apt-get update postgresql echo 1 > /proc/sys/magic_options CREATE USER x GRANT PRIVILEGES TO y VEE – Mar 10, 2011 – cecchet@cs.umass.edu WHY IS IT HARD TO ADD A DB REPLICA? 7 When to start replica spawning? How to predict replica spawning time? How to make replica spawning platform independent? When to generate new snapshots? How can we minimize resource usage? Power/cooling in private cloud $ cost in public cloud VEE – Mar 10, 2011 – cecchet@cs.umass.edu WHAT ARE THE MAIN PROBLEMS ? 8 IN CONSTANT TIME Filesystem snapshot/copy is OS & DB agnostic Only depends on VM size DB Dolly Dolly DB size Database Backup 4GB VM 16GB VM on disk Restore cloning cloning RUBiS –c–i 1022MB 843s 281s 899s RUBiS +c+bi 1.4GB 5761s 282s 900s RUBiS +c+fi 1.5GB 6017s 280s 900s TPC-W 684MB 288s 275s 905s TPC-H 1GB 1.8GB 1477s 271s 918s TPC-H 10GB 12GB 5573s n/a 911s VEE – Mar 10, 2011 – cecchet@cs.umass.edu VM CLONING: BACKUP/RESTORE 9 Dolly Database replication in the Cloud Provisioning with Dolly Prototype & Evaluation SPAWNING A REPLICA WITH CLONING Backup & Restore replace by VM cloning Client SQL requests Replication middleware 1 Load balancer Transactional log Management console add replica 4 resynchronize 2 DB1 DB2 OS VM1 OS VM2 clone DB2 OS VM3 3 clone DB23 OS VM3 VEE – Mar 10, 2011 – cecchet@cs.umass.edu 11 SPAWNING IN A PRIVATE CLOUD Clone entire virtual machine for backup/restore Backup server is optional DB1 OS VM1 stop DB1 clone OS VM1 1 1 DB2 B OS VM2 resume DB2 OS VM4 start 3 clone DB2 DB2 OS VM4 OS VM3 3 start 2 DB2 OS VM3 VEE – Mar 10, 2011 – cecchet@cs.umass.edu 2 12 SPAWNING IN A PUBLIC CLOUD Storage decoupled from computing resource Starting a new instance clones the volume DB1 stop DB1 snapshot OS Vol1 OS Vol1 DB2 register OS Vol2 start restart DB1 DB2 DB2 OS Vol1 OS Vol4 OS Vol3 DB2 OS Vol2 VEE – Mar 10, 2011 – cecchet@cs.umass.edu 13 Dolly Database replication in the Cloud Provisioning with Dolly Prototype & Evaluation Predictable backup and restore times are required Replay time can be estimated from write throughput wt : current workload write throughput wmax : replay speed of the spawning replica replica spawning time backup bi restore ri replay wt bi ri . wmax time wt . wmax wt . wmax VEE – Mar 10, 2011 – cecchet@cs.umass.edu MODELING SPAWNING TIME updates 15 Time to spawn from a live replica wmax s bi ri wmax wt Time to spawn from an existing snapshot wmax s ri replayi wmax wt Faster to take a new snapshot j to spawn a new replica than using old snapshot i if: backupj+restorej < restorei+replayi VEE – Mar 10, 2011 – cecchet@cs.umass.edu WHEN TO SNAPSHOT? 16 Input capacity prediction write prediction write predictions capacity predictions Capacity Provisioning schedule of snapshots schedule of replica spawning admission control if needed Snapshot scheduler HA adjuster Spawning options Write throttling Output Dolly Paused pool cleaner Scheduler Predictors write throttling/ read throttling Admission Control start/stop clone/ snapshot delete VM/ snapshot Management API VEE – Mar 10, 2011 – cecchet@cs.umass.edu DOLLY OVERVIEW Monitoring reclaim Free pool Manager 17 PROVISIONING REPLICAS Dolly does not provide predictors Dolly can work with any predictor (see [Eurosys09]) Workload prediction Capacity prediction VEE – Mar 10, 2011 – cecchet@cs.umass.edu Write prediction 18 CLOUD COST FUNCTIONS Adapt the provisioning decisions to the cloud platform specifics Cost can be $ on public cloud or time on private cloud Cost function name Definition pause_cost(VM, t) cost of pausing VM at time t spawn_cost(s, t, d) cost to spawn a replica from snapshot s at time t to meet deadline d spawn_cost(VM, t, d) cost to spawn a replica from a paused VM at time t to meet deadline d running_cost(VM,t1,t2) cost to run a VM from time t1 to time t2 pause_resume_cost(VM, t1, t2) cost to pause a VM at time t1 and resume it at time t2 backup_paused_cost(VM) cost to backup a paused VM backup_live_cost(VM, t) cost to backup an active VM at time t VEE – Mar 10, 2011 – cecchet@cs.umass.edu 19 Parse capacity provisioning predictions Decrease capacity by pausing VMs Increasing capacity Check if we can reuse a paused VM Check if we can spawn from an existing snapshot Choose cheapest options according to spawn_cost function Perform admission control if all replicas cannot be provisioned in time VEE – Mar 10, 2011 – cecchet@cs.umass.edu PROVISIONING REPLICAS 20 SNAPSHOT SCHEDULING to snapshot? Clone a paused VM Pause an active VM to clone it When to snapshot? At time j when backupj+restorej<restorei +replayi If new snapshot is scheduled, re-run capacity provisioning Prediction window must have minimum size VEE – Mar 10, 2011 – cecchet@cs.umass.edu How 21 Dolly Database replication in the Cloud Provisioning with Dolly Prototype & Evaluation IMPLEMENTATION admission control C-JDBC/Sequoia replication middleware OpenNebula Cloud management middleware Cost functions private cloud: minimize resource utilization time Amazon EC2: minimize cost predictions Sequoia driver Dolly Private EC2 write throttling SQL requests add/remove replica snapshot/pause/… Sequoia controller JMX Management API Scheduler Backupers Recovery Log Dolly OpenNebula Load balancer Dump table Log table New replica New replica OS VM4 OS VM5 start/stop/ clone/… DB1 DB2 DB3 OS VM1 OS VM2 OS VM3 VEE – Mar 10, 2011 – cecchet@cs.umass.edu TPC-W load injector OpenNebula clone clone DB3 Backup server or NAS snapshot OS VMclone 23 Private cloud: minimize resource utilization Amazon EC2: minimize cost Cost function name Private Cloud EC2 pause_cost(VM, t) return 1/VM->machine->temp return 60-((t-VM->start)%60) spawn_cost(s, t, d) return d-t comp$=(d-t)/60*hour$ io$=EBS_storage$*s->size + EBS_io$* (s->restore_io+s->replay_io) return comp$+io$ spawn_cost(VM, t, d) return d-t comp$=(d-t)/60*hour$ io$= EBS_io$* (s->resume_io+s->replay_io) return comp$+io$ running_cost(VM,t1,t2) return 1 (t2-t1)/60*hour$ pause_resume_cost(VM, t1, t2) if (t2-t1 > VM->pause + VM->resume) return 0 else return 2 io$= EBS_io$* (VM->pause_io+VM->resume_io) comp$=(60-(VM->stop-VM->start) %60)/60*hour$ return io$+ comp$ backup_paused_cost(VM) return backup_time return S3_storage$*s->size backup_live_cost(VM, t) return VM->pause + backup_time + VM->resume return pause_cost(VM, t)$+ S3_storage$*s->size + (VM->stop_io+VM->start_io)* EBS_io$ VEE – Mar 10, 2011 – cecchet@cs.umass.edu IMPLEMENTATION – COST FUNCTIONS 24 Multi-tier online bookstore benchmark 4GB Xen VM for the database Large EC2 instances from EBS volumes with CloudWatch Operation Private Cloud Public Cloud (EC2) start VM pause VM 42s 26s 220s 30s resume VM backup (stop/clone) restore (clone/start) 42s 150s 165s 30s 320s 220s 149 writes/sec 15 197 writes/sec 13 wmax Avg IOs per write VEE – Mar 10, 2011 – cecchet@cs.umass.edu TPC-W EVALUATION 25 WORKLOAD DESCRIPTION Snapshot s0 available at t0 VEE – Mar 10, 2011 – cecchet@cs.umass.edu 26 Cost MRM Private cloud 720m 0 Amazon EC2 $8.39 0 VEE – Mar 10, 2011 – cecchet@cs.umass.edu Overprovisioning with 6 replicas – 1h snapshot 27 Cost MRM Private cloud 410m 42.1 Amazon EC2 $4.61 41.5 replicas available replica spawning triggered here VEE – Mar 10, 2011 – cecchet@cs.umass.edu Reactive provisioning 28 Cost MRM Private cloud 381m42s 17.5 Amazon EC2 $18.29 27.2 VEE – Mar 10, 2011 – cecchet@cs.umass.edu Reactive provisioning – 15m snapshot 29 Cost MRM Private cloud 352m 0 Amazon EC2 $3.73 0 Private Cloud Amazon EC2 s1 s2 VEE – Mar 10, 2011 – cecchet@cs.umass.edu Dolly – 30m Prediction Window cheaper to leave instances online 30 VM cloning Solves administration issues by blackboxing the database Constant time backup/restore needed to predict replica spawning time New provisioning algorithm Decouples capacity provisioning from snapshot scheduling Cost functions to optimize for cloud platform specifics VEE – Mar 10, 2011 – cecchet@cs.umass.edu CONCLUSION 31 BONUS SLIDES Private Cloud s1 Cost MRM Private cloud 381m54s 0 Amazon EC2 $7.16 0 Amazon EC2 s2 s1 VEE – Mar 10, 2011 – cecchet@cs.umass.edu Dolly – 10m Prediction Window s2 33 Cost MRM Private cloud 360m30s 25.8 Amazon EC2 $5.00 33.7 VEE – Mar 10, 2011 – cecchet@cs.umass.edu Reactive provisioning – 1h snapshot 34 Database native tools Vendor specific or 3rd party ETL Understand database semantics Filesystem copy Low-level data copy Need to know what to copy VM cloning Copies database content + configuration + OS Unused space can be compressed VEE – Mar 10, 2011 – cecchet@cs.umass.edu BACKUP/RESTORE TECHNIQUES 35 Benchmark RUBiS TPC-W TPC-H scale 1(GB) TPC-H scale 10(GB) DB size MyISAM no constraint 836MB MyISAM w/ constraints 1.1GB MyISAM w/ constraint & index 1.2GB InnoDB no constraint 1022MB InnoDB w/ constraints 1.4GB InnoDB w/ constraint & index 1.5GB PostgreSQL binary dump PostgreSQL sql dump 684MB PostgreSQL binary dump PostgreSQL sql dump VM size 844MB 4.1GB 210MB 314MB 307MB 1.8GB PostgreSQL binary dump PostgreSQL sql dump Snapshot size 1.2GB 2.1GB 1.1GB (OS) + 2.1GB (data) VEE – Mar 10, 2011 – cecchet@cs.umass.edu DATABASE SIZES 2.0GB 12GB 7.3GB 16GB 36 Performance depends on database content 7000 VM shutdown VM copy VM cloning VM boot MySQL backup MySQL restore 6000 5000 Time in seconds 4000 6017 5761 3000 2000 826 1000 1141 899 843 292 290 293 VEE – Mar 10, 2011 – cecchet@cs.umass.edu BACKUP/RESTORE PERFORMANCE (1/3) 0 VM cloning MySQL MyISAM MySQL InnoDB RUBiS no constraint VM cloning MySQL MyISAM MySQL InnoDB RUBiS w/ constraints & basic index VM cloning MySQL MyISAM MySQL InnoDB RUBiS w/ constraints & full text index 37 File copy is the most effective for small databases 350 300 250 Time in seconds 200 VM shutdown VM/File copy VM cloning VM boot PostgreSQL stop/start PG bin backup PG bin restore PG sql backup PG sql restore 150 288 275 177 126 100 65 VEE – Mar 10, 2011 – cecchet@cs.umass.edu BACKUP/RESTORE PERFORMANCE (2/3) 50 0 File copy VM cloning direct VM cloning & copy PG bin PG sql TPC-W 38 VM cloning most effective on large databases 6000 5573 5000 Time in seconds 5256 VM shutdown VM/File copy VM cloning VM boot PostgreSQL stop/start PG bin backup PG bin restore PG sql backup PG sql restore 4000 3000 2000 1477 1468 1215 937 1000 699 147 189 273 File copy VM cloning direct VM cloning & copy VEE – Mar 10, 2011 – cecchet@cs.umass.edu BACKUP/RESTORE PERFORMANCE (3/3) 0 TPC-H 1GB PG bin PG sql File copy VM cloning direct VM cloning & copy PG bin TPC-H 10GB PG sql 39 DB Backup/ Restore Filesystem Copy VM Cloning Medium Very high None Performance Slow Fastest Fast Snapshot size Small DB size VM size Spawning time predictability Hard Moderate Easy Moderate Moderate None Database configuration Hard Hard None Missing data in transfer Possible Unlikely No Spawning atomicity No No Yes Resynchronization limitations Yes Yes Yes Feature Database specific knowledge Database installation VEE – Mar 10, 2011 – cecchet@cs.umass.edu BACKUP/RESTORE SUMMARY 40 Capacity provisioning depends on available snapshots Snapshots scheduled according to capacity demand Decouple capacity provisioning from snapshot scheduling if (predictor.capacity_changes || predictor.write_workload_changes) { do { schedule = capacity_provisioning(predictions) snapshot_schedule = snapshot_scheduling(predictions) } while (snapshot_schedule schedules new snapshots) scheduler.schedule(snapshot_schedule) scheduler.schedule(capacity_schedule) } if (time since last operation > threshold) { paused_pool_cleaner.release_old_paused_vms(); paused_pool_cleaner.delete_old_snapshots(); } VEE – Mar 10, 2011 – cecchet@cs.umass.edu DOLLY MAIN ALGORITHM 41 Paused VMs Snapshots VM never re-used if cost to resume > cost to spawn from last snapshot Old snapshots can be released based on cost to keep them around Free server pool Can reclaim servers with paused VMs when pool is empty VEE – Mar 10, 2011 – cecchet@cs.umass.edu RELEASING RESOURCES 42