Zephyr: Live Migration in Shared Nothing Databases for Elastic

advertisement

Aaron J. Elmore, Sudipto Das,

Divyakant Agrawal, Amr El Abbadi

Distributed Systems Lab

University of California Santa Barbara

Serve thousands of applications (tenants)

◦ AppEngine, Azure, Force.com

Tenants are (typically)

◦ Small

◦ SLA sensitive

◦ Erratic load patterns

◦ Subject to flash crowds

 i.e. the fark, digg, slashdot, reddit effect (for now)

Support for Multitenancy is critical

Our focus: DBMSs serving these platforms

Sudipto Das {sudipto@cs.ucsb.edu}

What the tenant wants…

What the service provider wants…

Sudipto Das {sudipto@cs.ucsb.edu}

Static provisioning for peak is inelastic

Capacity

Demand

Time

Traditional Infrastructures

Capacity

Demand

Time

Deployment in the Cloud

Unused resources

Slide Credits: Berkeley RAD Lab

Sudipto Das {sudipto@cs.ucsb.edu}

Load Balancer

Sudipto Das {sudipto@cs.ucsb.edu}

Application/

Web/Caching tier

Database tier

Migrate a tenant’s database in a Live system

◦ A critical operation to support elasticity

Different from

◦ Migration between software versions

◦ Migration in case of schema evolution

Sudipto Das {sudipto@cs.ucsb.edu}

VM migration

[Clark et al., NSDI 2005]

One tenant-per-VM

Pros: allows fine-grained load balancing

◦ Cons

 Performance overhead

 Poor consolidation ratio [Curino et al., CIDR 2011]

Multiple tenants in a VM

Pros: good performance

Cons: Migrate all tenants  Coarse-grained load balancing

Sudipto Das {sudipto@cs.ucsb.edu}

Multiple tenants share the same database process

◦ Shared process multitenancy

◦ Example systems: SQL Azure, ElasTraS, RelationalCloud, and may more

Migrate individual tenants

VM migration cannot be used for fine-grained migration

Target architecture: Shared Nothing

◦ Shared storage architectures: see our VLDB 2011 Paper

Sudipto Das {sudipto@cs.ucsb.edu}

Sudipto Das {sudipto@cs.ucsb.edu}

How to ensure no downtime?

 Need to migrate the persistent database image

(tens of MBs to GBs)

How to guarantee correctness during failures?

 Nodes can fail during migration

 How to ensure transaction atomicity and durability?

 How to recover migration state after failure?

 Nodes recover after a failure

How to guarantee serializability?

 Transaction correctness equivalent to normal operation

How to minimize migration cost?

Sudipto Das {sudipto@cs.ucsb.edu}

Downtime

◦ Time tenant is unavailable

Service Interruption

◦ Number of operations failing/transactions aborting

Migration Overhead/Performance impact

◦ During normal operation, migration, and after migration

Additional Data Transferred

◦ Data transferred in addition to DB’s persistent image

Sudipto Das {sudipto@cs.ucsb.edu}

Migration executed in phases

 Starts with transfer of minimal information to destination

(“wireframe”)

Source and destination concurrently execute

transactions in one migration phase

Database pages used as granule of migration

 Pages “pulled” by destination on-demand

Minimal transaction synchronization

 A page is uniquely owned by either source or destination

 Leverage page level locking

Logging and handshaking protocols to tolerate failures

Sudipto Das {sudipto@cs.ucsb.edu}

For this talk

◦ Small tenants

 i.e. not sharded across nodes.

◦ No replication

◦ No structural changes to indices

Extensions in the paper

◦ Relaxes these assumptions

Sudipto Das {sudipto@cs.ucsb.edu}

Owned Pages

P

1

P

2

P

3

Active transactions

P n

T

S1

T

,…,

Sk

Source Destination

Page owned by Node

Page not owned by Node

Sudipto Das {sudipto@cs.ucsb.edu}

Freeze index wireframe and migrate

Owned Pages

P

1

P

2

P

3

P

1

P

2

P

3

P n

Active transactions

P n

T

S1

T

,…,

Sk

Source

Un-owned Pages

Destination

Page owned by Node

Page not owned by Node

Sudipto Das {sudipto@cs.ucsb.edu}

Source Destination

Sudipto Das {sudipto@cs.ucsb.edu}

Requests for un-owned pages can block

Old, still active transactions

P

1

P

2

P

3

P

3 accessed by T

Di

Index wireframes remain frozen

P

1

P

2

P

3

P n

T

Sk+1

, T

,…

Sl

Source

P

3 pulled from source

P n

T

D1

,…,

T

Dm

Destination

New transactions

Page owned by Node

Page not owned by Node

Sudipto Das {sudipto@cs.ucsb.edu}

Pages can be pulled by the destination, if needed

Completed

P

1

P

2

P

3

P

1

P

2

P

3

P n

Source

P

1

, P

2,

… pushed from source P n

T

Dm+1

,

…, T

Dn

Destination

Page owned by Node

Page not owned by Node

Sudipto Das {sudipto@cs.ucsb.edu}

Index wireframe un-frozen

Source

P

1

P

2

P

3

P n

T

Dn+1

, T

,…

Dp

Destination

Sudipto Das {sudipto@cs.ucsb.edu}

Page owned by Node

Page not owned by Node

Once migrated, pages are never pulled back by source

◦ Transactions at source accessing migrated pages are aborted

No structural changes to indices during migration

◦ Transactions (at both nodes) that make structural changes to indices abort

Destination “pulls” pages on-demand

◦ Transactions at the destination experience higher latency compared to normal operation

Sudipto Das {sudipto@cs.ucsb.edu}

Only concern is “dual mode”

◦ Init and Finish: only one node is executing transactions

Local predicate locking of internal index and exclusive page level locking between nodes  no phantoms

Strict 2PL  Transactions are locally serializable

Pages transferred only once

◦ No T dest

 T source conflict dependency

Guaranteed serializability

Sudipto Das {sudipto@cs.ucsb.edu}

Transaction recovery

◦ For every database page, transactions at source ordered before transactions at destination

◦ After failure, conflicting transactions replayed in the same order

Migration recovery

◦ Atomic transitions between migration modes

 Logging and handshake protocols

◦ Every page has exactly one owner

 Bookkeeping at the index level

Sudipto Das {sudipto@cs.ucsb.edu}

In the presence of arbitrary repeated failures, Zephyr ensures:

◦ Updates made to database pages are consistent

◦ A failure does not leave a page without an owner

◦ Both source and destination are in the same migration mode

Guaranteed termination and starvation freedom

Sudipto Das {sudipto@cs.ucsb.edu}

Replicated Tenants

Sharded Tenants

Allow structural changes to the indices

◦ Using shared lock managers in the dual mode

Sudipto Das {sudipto@cs.ucsb.edu}

Prototyped using an open source OLTP database H2

◦ Supports standard SQL/JDBC API

◦ Serializable isolation level

◦ Tree Indices

◦ Relational data model

Modified the database engine

◦ Added support for freezing indices

◦ Page migration status maintained using index

◦ Details in the paper…

Tungsten SQL Router migrates JDBC connections during migration

Sudipto Das {sudipto@cs.ucsb.edu}

Two database nodes, each with a DB instance running

Synthetic benchmark as load generator

◦ Modified YCSB to add transactions

 Small read/write transactions

Compared against Stop and Copy

(S&C)

Sudipto Das {sudipto@cs.ucsb.edu}

System

Controller

Migrate

Metadata

Default transaction parameters:

10 operations per transaction 80% Read,

15% Update, 5% Inserts

Workload: 60 sessions

100 Transactions per session

Hardware: 2.4 Ghz Intel

Core 2 Quads, 8GB RAM,

7200 RPM SATA HDs with

32 MB Cache

Gigabit ethernet

Default DB Size: 100k rows

(~250 MB)

Sudipto Das {sudipto@cs.ucsb.edu}

Downtime (tenant unavailability)

S&C: 3 – 8 seconds (needed to migrate, unavailable for updates)

Zephyr: No downtime. Either source or destination is available

Service interruption (failed operations)

S&C: ~100 s – 1,000s. All transactions with updates are aborted

Zephyr: ~10s – 100s. Orders of magnitude less interruption

Sudipto Das {sudipto@cs.ucsb.edu}

Average increase in transaction latency

(compared to the 6,000 transaction workload without migration)

S&C: 10 – 15%. Cold cache at destination

Zephyr: 10 – 20%. Pages fetched on-demand

Data transfer

S&C: Persistent database image

Zephyr: 2 – 3% additional data transfer (messaging overhead)

Total time taken to migrate

S&C: 3 – 8 seconds. Unavailable for any writes

Zephyr: 10 – 18 seconds. No-unavailability

Sudipto Das {sudipto@cs.ucsb.edu}

Orders of magnitude fewer failed operations

Sudipto Das {sudipto@cs.ucsb.edu}

Proposed Zephyr, a live database migration technique with no downtime for shared nothing architectures

◦ The first end to end solution with safety, correctness and liveness guarantees

Prototype implementation on a relational OLTP database

Low cost on a variety of workloads

Sudipto Das {sudipto@cs.ucsb.edu}

Txns

Source Destination

Sudipto Das {sudipto@cs.ucsb.edu}

Txns

Source Destination

Sudipto Das {sudipto@cs.ucsb.edu}

Txns

Source Destination

Sudipto Das {sudipto@cs.ucsb.edu}

Txns

Source Destination

Sudipto Das {sudipto@cs.ucsb.edu}

Txns

Source Destination

Sudipto Das {sudipto@cs.ucsb.edu} 37

Txns

Source Destination

Sudipto Das {sudipto@cs.ucsb.edu}

Either source or destination is serving the tenant

◦ No downtime

Serializable transaction execution

◦ Unique page ownership

◦ Local multi-granularity locking

Safety in the presence of failures

◦ Transactions are atomic and durable

◦ Migration state is recovered from log

 Ensure consistency of the database state

Sudipto Das {sudipto@cs.ucsb.edu}

Wireframe copy

 Typically orders of magnitude smaller than data

Operational overhead during migration

 Extra data (in addition to database pages) transferred

Transactions aborted during migration

Sudipto Das {sudipto@cs.ucsb.edu}

Failures due to attempted modification of

Index structure

Sudipto Das {sudipto@cs.ucsb.edu}

Only committed transaction reported

Loss of cache for both migration types

Zephyr results in a remote fetch

Sudipto Das {sudipto@cs.ucsb.edu}

Download