Aaron J. Elmore, Sudipto Das,
Divyakant Agrawal, Amr El Abbadi
Distributed Systems Lab
University of California Santa Barbara
Serve thousands of applications (tenants)
◦ AppEngine, Azure, Force.com
Tenants are (typically)
◦ Small
◦ SLA sensitive
◦ Erratic load patterns
◦ Subject to flash crowds
i.e. the fark, digg, slashdot, reddit effect (for now)
Support for Multitenancy is critical
Our focus: DBMSs serving these platforms
Sudipto Das {sudipto@cs.ucsb.edu}
What the tenant wants…
What the service provider wants…
Sudipto Das {sudipto@cs.ucsb.edu}
Static provisioning for peak is inelastic
Capacity
Demand
Time
Traditional Infrastructures
Capacity
Demand
Time
Deployment in the Cloud
Unused resources
Slide Credits: Berkeley RAD Lab
Sudipto Das {sudipto@cs.ucsb.edu}
Load Balancer
Sudipto Das {sudipto@cs.ucsb.edu}
Application/
Web/Caching tier
Database tier
Migrate a tenant’s database in a Live system
◦ A critical operation to support elasticity
Different from
◦ Migration between software versions
◦ Migration in case of schema evolution
Sudipto Das {sudipto@cs.ucsb.edu}
VM migration
[Clark et al., NSDI 2005]
One tenant-per-VM
◦ Pros: allows fine-grained load balancing
◦ Cons
Performance overhead
Poor consolidation ratio [Curino et al., CIDR 2011]
Multiple tenants in a VM
◦ Pros: good performance
◦ Cons: Migrate all tenants Coarse-grained load balancing
Sudipto Das {sudipto@cs.ucsb.edu}
Multiple tenants share the same database process
◦ Shared process multitenancy
◦ Example systems: SQL Azure, ElasTraS, RelationalCloud, and may more
Migrate individual tenants
VM migration cannot be used for fine-grained migration
Target architecture: Shared Nothing
◦ Shared storage architectures: see our VLDB 2011 Paper
Sudipto Das {sudipto@cs.ucsb.edu}
Sudipto Das {sudipto@cs.ucsb.edu}
How to ensure no downtime?
Need to migrate the persistent database image
(tens of MBs to GBs)
How to guarantee correctness during failures?
Nodes can fail during migration
How to ensure transaction atomicity and durability?
How to recover migration state after failure?
Nodes recover after a failure
How to guarantee serializability?
Transaction correctness equivalent to normal operation
How to minimize migration cost? …
Sudipto Das {sudipto@cs.ucsb.edu}
Downtime
◦ Time tenant is unavailable
Service Interruption
◦ Number of operations failing/transactions aborting
Migration Overhead/Performance impact
◦ During normal operation, migration, and after migration
Additional Data Transferred
◦ Data transferred in addition to DB’s persistent image
Sudipto Das {sudipto@cs.ucsb.edu}
Migration executed in phases
Starts with transfer of minimal information to destination
(“wireframe”)
Source and destination concurrently execute
transactions in one migration phase
Database pages used as granule of migration
Pages “pulled” by destination on-demand
Minimal transaction synchronization
A page is uniquely owned by either source or destination
Leverage page level locking
Logging and handshaking protocols to tolerate failures
Sudipto Das {sudipto@cs.ucsb.edu}
◦ Small tenants
i.e. not sharded across nodes.
◦ No replication
◦ No structural changes to indices
Extensions in the paper
◦ Relaxes these assumptions
Sudipto Das {sudipto@cs.ucsb.edu}
Owned Pages
P
1
P
2
P
3
Active transactions
P n
T
S1
T
,…,
Sk
Source Destination
Page owned by Node
Page not owned by Node
Sudipto Das {sudipto@cs.ucsb.edu}
Freeze index wireframe and migrate
Owned Pages
P
1
P
2
P
3
P
1
P
2
P
3
P n
Active transactions
P n
T
S1
T
,…,
Sk
Source
Un-owned Pages
Destination
Page owned by Node
Page not owned by Node
Sudipto Das {sudipto@cs.ucsb.edu}
Source Destination
Sudipto Das {sudipto@cs.ucsb.edu}
Requests for un-owned pages can block
Old, still active transactions
P
1
P
2
P
3
P
3 accessed by T
Di
Index wireframes remain frozen
P
1
P
2
P
3
P n
T
Sk+1
, T
,…
Sl
Source
P
3 pulled from source
P n
T
D1
,…,
T
Dm
Destination
New transactions
Page owned by Node
Page not owned by Node
Sudipto Das {sudipto@cs.ucsb.edu}
Pages can be pulled by the destination, if needed
Completed
P
1
P
2
P
3
P
1
P
2
P
3
P n
Source
P
1
, P
2,
… pushed from source P n
T
Dm+1
,
…, T
Dn
Destination
Page owned by Node
Page not owned by Node
Sudipto Das {sudipto@cs.ucsb.edu}
Index wireframe un-frozen
Source
P
1
P
2
P
3
P n
T
Dn+1
, T
,…
Dp
Destination
Sudipto Das {sudipto@cs.ucsb.edu}
Page owned by Node
Page not owned by Node
Once migrated, pages are never pulled back by source
◦ Transactions at source accessing migrated pages are aborted
No structural changes to indices during migration
◦ Transactions (at both nodes) that make structural changes to indices abort
Destination “pulls” pages on-demand
◦ Transactions at the destination experience higher latency compared to normal operation
Sudipto Das {sudipto@cs.ucsb.edu}
Only concern is “dual mode”
◦ Init and Finish: only one node is executing transactions
Local predicate locking of internal index and exclusive page level locking between nodes no phantoms
Strict 2PL Transactions are locally serializable
Pages transferred only once
◦ No T dest
T source conflict dependency
Guaranteed serializability
Sudipto Das {sudipto@cs.ucsb.edu}
Transaction recovery
◦ For every database page, transactions at source ordered before transactions at destination
◦ After failure, conflicting transactions replayed in the same order
Migration recovery
◦ Atomic transitions between migration modes
Logging and handshake protocols
◦ Every page has exactly one owner
Bookkeeping at the index level
Sudipto Das {sudipto@cs.ucsb.edu}
In the presence of arbitrary repeated failures, Zephyr ensures:
◦ Updates made to database pages are consistent
◦ A failure does not leave a page without an owner
◦ Both source and destination are in the same migration mode
Guaranteed termination and starvation freedom
Sudipto Das {sudipto@cs.ucsb.edu}
Replicated Tenants
Sharded Tenants
Allow structural changes to the indices
◦ Using shared lock managers in the dual mode
Sudipto Das {sudipto@cs.ucsb.edu}
Prototyped using an open source OLTP database H2
◦ Supports standard SQL/JDBC API
◦ Serializable isolation level
◦ Tree Indices
◦ Relational data model
Modified the database engine
◦ Added support for freezing indices
◦ Page migration status maintained using index
◦ Details in the paper…
Tungsten SQL Router migrates JDBC connections during migration
Sudipto Das {sudipto@cs.ucsb.edu}
Two database nodes, each with a DB instance running
Synthetic benchmark as load generator
◦ Modified YCSB to add transactions
Small read/write transactions
Compared against Stop and Copy
(S&C)
Sudipto Das {sudipto@cs.ucsb.edu}
System
Controller
Migrate
Metadata
Default transaction parameters:
10 operations per transaction 80% Read,
15% Update, 5% Inserts
Workload: 60 sessions
100 Transactions per session
Hardware: 2.4 Ghz Intel
Core 2 Quads, 8GB RAM,
7200 RPM SATA HDs with
32 MB Cache
Gigabit ethernet
Default DB Size: 100k rows
(~250 MB)
Sudipto Das {sudipto@cs.ucsb.edu}
Downtime (tenant unavailability)
◦ S&C: 3 – 8 seconds (needed to migrate, unavailable for updates)
◦ Zephyr: No downtime. Either source or destination is available
Service interruption (failed operations)
◦ S&C: ~100 s – 1,000s. All transactions with updates are aborted
◦ Zephyr: ~10s – 100s. Orders of magnitude less interruption
Sudipto Das {sudipto@cs.ucsb.edu}
Average increase in transaction latency
(compared to the 6,000 transaction workload without migration)
◦ S&C: 10 – 15%. Cold cache at destination
◦ Zephyr: 10 – 20%. Pages fetched on-demand
Data transfer
◦ S&C: Persistent database image
◦ Zephyr: 2 – 3% additional data transfer (messaging overhead)
Total time taken to migrate
◦ S&C: 3 – 8 seconds. Unavailable for any writes
◦ Zephyr: 10 – 18 seconds. No-unavailability
Sudipto Das {sudipto@cs.ucsb.edu}
Orders of magnitude fewer failed operations
Sudipto Das {sudipto@cs.ucsb.edu}
Proposed Zephyr, a live database migration technique with no downtime for shared nothing architectures
◦ The first end to end solution with safety, correctness and liveness guarantees
Prototype implementation on a relational OLTP database
Low cost on a variety of workloads
Sudipto Das {sudipto@cs.ucsb.edu}
Txns
Source Destination
Sudipto Das {sudipto@cs.ucsb.edu}
Txns
Source Destination
Sudipto Das {sudipto@cs.ucsb.edu}
Txns
Source Destination
Sudipto Das {sudipto@cs.ucsb.edu}
Txns
Source Destination
Sudipto Das {sudipto@cs.ucsb.edu}
Txns
Source Destination
Sudipto Das {sudipto@cs.ucsb.edu} 37
Txns
Source Destination
Sudipto Das {sudipto@cs.ucsb.edu}
Either source or destination is serving the tenant
◦ No downtime
Serializable transaction execution
◦ Unique page ownership
◦ Local multi-granularity locking
Safety in the presence of failures
◦ Transactions are atomic and durable
◦ Migration state is recovered from log
Ensure consistency of the database state
Sudipto Das {sudipto@cs.ucsb.edu}
Wireframe copy
Typically orders of magnitude smaller than data
Operational overhead during migration
Extra data (in addition to database pages) transferred
Transactions aborted during migration
Sudipto Das {sudipto@cs.ucsb.edu}
Failures due to attempted modification of
Index structure
Sudipto Das {sudipto@cs.ucsb.edu}
Only committed transaction reported
Loss of cache for both migration types
Zephyr results in a remote fetch
Sudipto Das {sudipto@cs.ucsb.edu}