DOLLY: Virtualization-Driven Database Provisioning for the Cloud Emmanuel Cecchet

advertisement
DOLLY: Virtualization-Driven
Database Provisioning for the Cloud
Emmanuel Cecchet
Joint work with Rahul Singh, Upendra
Sharma and Prashant Shenoy
THE CLOUD
Virtualization
 Pay as you go
 Elasticity
Internet
Frontend/
Load balancer
App.
Servers
VEE – Mar 10, 2011 – cecchet@cs.umass.edu

Databases
2
PROVISIONING IN THE CLOUD
Based on request volume and resource usage
 Reactions based on thresholds
 Works for stateless tiers
Internet
Frontend/
Load balancer
App.
Servers
Provisioning
logic
VEE – Mar 10, 2011 – cecchet@cs.umass.edu

Databases
3
5pm
Replay
updates
MySQL
backup
MySQL
restore
New
replica
Replica
ready
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
WHY IS IT HARD TO ADD A DB REPLICA?
snapshot
4
2pm
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
WHY IS IT HARD TO ADD A DB REPLICA?
5pm
Replay
updates
2pm
snapshot
MySQL
restore
New
replica
Replica
ready
5
RUBiS
x users
MySQL
backup
MySQL
snapshot restore
RUBiS
New replica
14min
Queries are slow… Let’s improve this!
CREATE INDEX i1 on Table1 ; CREATE INDEX i2 on Table 2
RUBiS
x users
with indices
MySQL
backup
MySQL
snapshot restore
RUBiS
New replica
1H36min
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
WHY IS IT HARD TO ADD A DB REPLICA?
6
Warehouse
1GB
Warehouse
10GB
PostgreSQL
PostgresSQL
backup snapshot restore
PostgreSQL
PostgresSQL
backup snapshot restore




Warehouse
1GB
24min
Warehouse
10GB
1H30min
apt-get update postgresql
echo 1 > /proc/sys/magic_options
CREATE USER x
GRANT PRIVILEGES TO y
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
WHY IS IT HARD TO ADD A DB REPLICA?
7
 When


to start replica spawning?
How to predict replica spawning time?
How to make replica spawning platform
independent?
 When
to generate new snapshots?
 How can we minimize resource usage?


Power/cooling in private cloud
$ cost in public cloud
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
WHAT ARE THE MAIN PROBLEMS ?
8
IN CONSTANT TIME
Filesystem snapshot/copy is OS & DB agnostic
 Only depends on VM size

DB
Dolly
Dolly
DB size
Database
Backup 4GB VM 16GB VM
on disk
Restore cloning cloning
RUBiS –c–i 1022MB
843s
281s
899s
RUBiS +c+bi 1.4GB
5761s
282s
900s
RUBiS +c+fi 1.5GB
6017s
280s
900s
TPC-W
684MB
288s
275s
905s
TPC-H 1GB
1.8GB
1477s
271s
918s
TPC-H 10GB 12GB
5573s
n/a
911s
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
VM CLONING:
BACKUP/RESTORE
9
Dolly
 Database
replication in the Cloud
 Provisioning with Dolly
 Prototype & Evaluation
SPAWNING A REPLICA WITH CLONING
Backup & Restore replace by VM cloning
Client SQL requests
Replication middleware
1
Load balancer
Transactional
log
Management
console
add
replica
4
resynchronize
2
DB1
DB2
OS
VM1
OS
VM2
clone
DB2
OS
VM3
3
clone
DB23
OS
VM3
VEE – Mar 10, 2011 – cecchet@cs.umass.edu

11
SPAWNING IN A PRIVATE CLOUD
Clone entire virtual machine for backup/restore
 Backup server is optional
DB1
OS
VM1
stop
DB1
clone
OS
VM1
1
1
DB2
B
OS
VM2
resume
DB2
OS
VM4
start
3
clone
DB2
DB2
OS
VM4
OS
VM3
3
start
2
DB2
OS
VM3
VEE – Mar 10, 2011 – cecchet@cs.umass.edu

2
12
SPAWNING IN A PUBLIC CLOUD
Storage decoupled from computing resource
 Starting a new instance clones the volume
DB1
stop
DB1
snapshot
OS
Vol1
OS
Vol1
DB2
register
OS
Vol2
start
restart
DB1
DB2
DB2
OS
Vol1
OS
Vol4
OS
Vol3
DB2
OS
Vol2
VEE – Mar 10, 2011 – cecchet@cs.umass.edu

13
Dolly
 Database
replication in the Cloud
 Provisioning with Dolly
 Prototype & Evaluation
Predictable backup and restore times are required
 Replay time can be estimated from write throughput

wt : current workload write throughput
 wmax : replay speed of the spawning replica

replica spawning time
backup
bi
restore
ri
replay

wt
   bi  ri  .
wmax

time
 wt
.
 wmax
 wt
 .
 wmax
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
MODELING SPAWNING TIME
updates
15


Time to spawn from a live replica
wmax
s  bi  ri 
wmax  wt
Time to spawn from an existing snapshot
wmax
s  ri  replayi 
wmax  wt
 Faster
to take a new snapshot j to spawn a
new replica than using old snapshot i if:
backupj+restorej < restorei+replayi
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
WHEN TO SNAPSHOT?
16

Input
capacity prediction
 write prediction


write predictions
capacity predictions
Capacity Provisioning


schedule of snapshots
schedule of replica
spawning
admission control if
needed
Snapshot scheduler
HA adjuster
Spawning options
Write throttling
Output

Dolly
Paused pool
cleaner
Scheduler
Predictors
write throttling/ read
throttling
Admission Control
start/stop
clone/ snapshot
delete VM/
snapshot
Management API
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
DOLLY OVERVIEW
Monitoring
reclaim
Free pool
Manager
17
PROVISIONING REPLICAS
Dolly does not provide predictors
 Dolly can work with any predictor (see [Eurosys09])
Workload
prediction
Capacity
prediction
VEE – Mar 10, 2011 – cecchet@cs.umass.edu

Write
prediction
18
CLOUD COST FUNCTIONS
Adapt the provisioning decisions to the cloud
platform specifics
 Cost can be $ on public cloud or time on private
cloud
Cost function name
Definition
pause_cost(VM, t)
cost of pausing VM at time t
spawn_cost(s, t, d)
cost to spawn a replica from snapshot s at time t to
meet deadline d
spawn_cost(VM, t, d)
cost to spawn a replica from a paused VM at time t to
meet deadline d
running_cost(VM,t1,t2)
cost to run a VM from time t1 to time t2
pause_resume_cost(VM, t1, t2)
cost to pause a VM at time t1 and resume it at time t2
backup_paused_cost(VM)
cost to backup a paused VM
backup_live_cost(VM, t)
cost to backup an active VM at time t
VEE – Mar 10, 2011 – cecchet@cs.umass.edu

19
 Parse
capacity provisioning predictions
 Decrease capacity by pausing VMs
 Increasing capacity




Check if we can reuse a paused VM
Check if we can spawn from an existing snapshot
Choose cheapest options according to
spawn_cost function
Perform admission control if all replicas cannot be
provisioned in time
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
PROVISIONING REPLICAS
20
SNAPSHOT SCHEDULING
to snapshot?
Clone a paused VM
 Pause an active VM to clone it

 When
to snapshot?

At time j when backupj+restorej<restorei +replayi

If new snapshot is scheduled, re-run capacity provisioning
 Prediction
window must have minimum size
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
 How
21
Dolly
 Database
replication in the Cloud
 Provisioning with Dolly
 Prototype & Evaluation
IMPLEMENTATION
admission control



C-JDBC/Sequoia replication
middleware
OpenNebula Cloud
management middleware
Cost functions
private cloud: minimize
resource utilization time
 Amazon EC2: minimize cost

predictions
Sequoia driver
Dolly
Private
EC2
write throttling
SQL requests
add/remove replica
snapshot/pause/…
Sequoia controller
JMX Management API
Scheduler
Backupers
Recovery Log
Dolly
OpenNebula
Load
balancer
Dump
table
Log
table
New
replica
New
replica
OS
VM4
OS
VM5
start/stop/
clone/…
DB1
DB2
DB3
OS
VM1
OS
VM2
OS
VM3
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
TPC-W
load injector
OpenNebula
clone
clone
DB3
Backup server
or NAS
snapshot
OS
VMclone
23


Private cloud: minimize resource utilization
Amazon EC2: minimize cost
Cost function name
Private Cloud
EC2
pause_cost(VM, t)
return 1/VM->machine->temp
return 60-((t-VM->start)%60)
spawn_cost(s, t, d)
return d-t
comp$=(d-t)/60*hour$
io$=EBS_storage$*s->size +
EBS_io$*
(s->restore_io+s->replay_io)
return comp$+io$
spawn_cost(VM, t, d)
return d-t
comp$=(d-t)/60*hour$
io$= EBS_io$*
(s->resume_io+s->replay_io)
return comp$+io$
running_cost(VM,t1,t2)
return 1
(t2-t1)/60*hour$
pause_resume_cost(VM,
t1, t2)
if (t2-t1 >
VM->pause + VM->resume)
return 0
else return 2
io$= EBS_io$*
(VM->pause_io+VM->resume_io)
comp$=(60-(VM->stop-VM->start)
%60)/60*hour$
return io$+ comp$
backup_paused_cost(VM)
return backup_time
return S3_storage$*s->size
backup_live_cost(VM, t)
return VM->pause +
backup_time + VM->resume
return pause_cost(VM, t)$+
S3_storage$*s->size +
(VM->stop_io+VM->start_io)*
EBS_io$
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
IMPLEMENTATION – COST FUNCTIONS
24



Multi-tier online bookstore benchmark
4GB Xen VM for the database
Large EC2 instances from EBS volumes with CloudWatch
Operation
Private Cloud Public Cloud (EC2)
start VM
pause VM
42s
26s
220s
30s
resume VM
backup (stop/clone)
restore (clone/start)
42s
150s
165s
30s
320s
220s
149 writes/sec
15
197 writes/sec
13
wmax
Avg IOs per write
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
TPC-W EVALUATION
25
WORKLOAD DESCRIPTION
Snapshot s0 available at t0
VEE – Mar 10, 2011 – cecchet@cs.umass.edu

26
Cost
MRM
Private cloud
720m
0
Amazon EC2
$8.39
0
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
Overprovisioning with 6 replicas – 1h snapshot
27
Cost
MRM
Private cloud
410m
42.1
Amazon EC2
$4.61
41.5
replicas
available
replica spawning
triggered here
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
Reactive provisioning
28
Cost
MRM
Private cloud
381m42s
17.5
Amazon EC2
$18.29
27.2
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
Reactive provisioning – 15m snapshot
29
Cost
MRM
Private cloud
352m
0
Amazon EC2
$3.73
0
Private Cloud
Amazon EC2
s1
s2
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
Dolly – 30m Prediction Window
cheaper to leave instances online
30
 VM
cloning
Solves administration issues by blackboxing
the database
 Constant time backup/restore needed to
predict replica spawning time

 New
provisioning algorithm
Decouples capacity provisioning from
snapshot scheduling
 Cost functions to optimize for cloud platform
specifics

VEE – Mar 10, 2011 – cecchet@cs.umass.edu
CONCLUSION
31
BONUS SLIDES
Private Cloud
s1
Cost
MRM
Private cloud
381m54s
0
Amazon EC2
$7.16
0
Amazon EC2
s2
s1
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
Dolly – 10m Prediction Window
s2
33
Cost
MRM
Private cloud
360m30s
25.8
Amazon EC2
$5.00
33.7
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
Reactive provisioning – 1h snapshot
34
 Database
native tools
Vendor specific or 3rd party ETL
 Understand database semantics

 Filesystem
copy
Low-level data copy
 Need to know what to copy

 VM
cloning
Copies database content + configuration +
OS
 Unused space can be compressed

VEE – Mar 10, 2011 – cecchet@cs.umass.edu
BACKUP/RESTORE TECHNIQUES
35
Benchmark
RUBiS
TPC-W
TPC-H
scale
1(GB)
TPC-H
scale
10(GB)
DB size
MyISAM no constraint
836MB
MyISAM w/ constraints
1.1GB
MyISAM w/ constraint &
index
1.2GB
InnoDB no constraint
1022MB
InnoDB w/ constraints
1.4GB
InnoDB w/ constraint &
index
1.5GB
PostgreSQL binary dump
PostgreSQL sql dump
684MB
PostgreSQL binary dump
PostgreSQL sql dump
VM size
844MB
4.1GB
210MB
314MB
307MB
1.8GB
PostgreSQL binary dump
PostgreSQL sql dump
Snapshot size
1.2GB
2.1GB
1.1GB
(OS) +
2.1GB
(data)
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
DATABASE SIZES
2.0GB
12GB
7.3GB
16GB
36
Performance depends on database content
7000
VM shutdown
VM copy
VM cloning
VM boot
MySQL backup
MySQL restore
6000
5000
Time in seconds

4000
6017
5761
3000
2000
826
1000
1141
899
843
292
290
293
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
BACKUP/RESTORE PERFORMANCE (1/3)
0
VM
cloning
MySQL
MyISAM
MySQL
InnoDB
RUBiS no constraint
VM
cloning
MySQL
MyISAM
MySQL
InnoDB
RUBiS w/ constraints & basic
index
VM
cloning
MySQL
MyISAM
MySQL
InnoDB
RUBiS w/ constraints & full text
index
37
File copy is the most effective for small databases
350
300
250
Time in seconds

200
VM shutdown
VM/File copy
VM cloning
VM boot
PostgreSQL stop/start
PG bin backup
PG bin restore
PG sql backup
PG sql restore
150
288
275
177
126
100
65
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
BACKUP/RESTORE PERFORMANCE (2/3)
50
0
File copy
VM cloning direct VM cloning & copy
PG bin
PG sql
TPC-W
38

VM cloning most effective on large databases
6000
5573
5000
Time in seconds
5256
VM shutdown
VM/File copy
VM cloning
VM boot
PostgreSQL stop/start
PG bin backup
PG bin restore
PG sql backup
PG sql restore
4000
3000
2000
1477
1468
1215
937
1000
699
147
189
273
File copy
VM
cloning
direct
VM
cloning
& copy
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
BACKUP/RESTORE PERFORMANCE (3/3)
0
TPC-H 1GB
PG bin
PG sql File copy
VM
cloning
direct
VM
cloning
& copy
PG bin
TPC-H 10GB
PG sql
39
DB Backup/
Restore
Filesystem
Copy
VM
Cloning
Medium
Very high
None
Performance
Slow
Fastest
Fast
Snapshot size
Small
DB size
VM size
Spawning time
predictability
Hard
Moderate
Easy
Moderate
Moderate
None
Database configuration
Hard
Hard
None
Missing data in transfer
Possible
Unlikely
No
Spawning atomicity
No
No
Yes
Resynchronization
limitations
Yes
Yes
Yes
Feature
Database specific
knowledge
Database installation
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
BACKUP/RESTORE SUMMARY
40



Capacity provisioning depends on available snapshots
Snapshots scheduled according to capacity demand
Decouple capacity provisioning from snapshot scheduling
if (predictor.capacity_changes ||
predictor.write_workload_changes) {
do {
schedule = capacity_provisioning(predictions)
snapshot_schedule = snapshot_scheduling(predictions)
} while (snapshot_schedule schedules new snapshots)
scheduler.schedule(snapshot_schedule)
scheduler.schedule(capacity_schedule)
}
if (time since last operation > threshold) {
paused_pool_cleaner.release_old_paused_vms();
paused_pool_cleaner.delete_old_snapshots();
}
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
DOLLY MAIN ALGORITHM
41

Paused VMs


Snapshots


VM never re-used if cost to resume > cost to spawn
from last snapshot
Old snapshots can be released based on cost to keep
them around
Free server pool

Can reclaim servers with paused VMs when pool is
empty
VEE – Mar 10, 2011 – cecchet@cs.umass.edu
RELEASING RESOURCES
42
Download