Megastore Scalable Highly Available Storage for Interactive Systems Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson,Jean Michel L´eon,Yawei Li, Alexander Lloyd, Vadim Yushprakh Presented By: Hamid Seyedmoradi Ayoub Hamidi Ehsan Mohamad Nezamian Advanced Database Systems SRBIAU, Kurdistan Campus 10May2012 Megastore Motivation Introduction NoSQL & RDBMS Megastore Paxos 2 SRBIAU, Kurdistan Campus Advanced Database Systems Megastore wow! more than 3 billion write and 20 billion read daily key Contribution Data Model and Storage System Paxos Replication Report on Experience 3 SRBIAU, Kurdistan Campus Advanced Database Systems AVAILABILITY & SCALE Replication For Availability, we implemented a synchronous, fault tolerant log replicator optimized for long distance links Partitioning and Locality small databases 4 SRBIAU, Kurdistan Campus for scale, we partitioned data into a vast space of Advanced Database Systems AVAILABILITY & SCALE Replication Strategies Asynchronous Master/Slave Synchronous Master/Slave Optimistic Replication We decided to use Paxos 5 SRBIAU, Kurdistan Campus Advanced Database Systems Technology Options 6 SRBIAU, Kurdistan Campus Advanced Database Systems Technology Options 7 SRBIAU, Kurdistan Campus Advanced Database Systems AVAILABILITY & SCALE Partitioning and Locality Replication Datacenters ACID semantics within an entity group Entity Groups Partition the datastore Looser consistency across entity groups Each entity group is synchronously replicated across datacenters 8 SRBIAU, Kurdistan Campus Advanced Database Systems Entity group data and replication metadata stored in scalable NoSQL datastores AVAILABILITY & SCALE Partitioning and Locality Operations: Entities (Units of data) Most transactions are within a single entity group Cross Entity group transactions supported via Two – Phase Commit Asynch communication between entity groups supported by Queues 9 SRBIAU, Kurdistan Campus Entity Group 1 Local Index receive Send queue Local Index Advanced Database Systems Global Indexes span entity groups but have weaker consistency Entity Group 2 AVAILABILITY & SCALE Partitioning and Locality Entity Groups Selecting Entity Group Boundaries Example • Email • Blogs Physical Layout 10 SRBIAU, Kurdistan Campus Advanced Database Systems Megastore API Design Philosophy Data Model Pre-Joining with Keys SCATTER Indexes Storing Clause Repeated Indexes. Inline Indexes Mapping to Bigtable 11 SRBIAU, Kurdistan Campus Advanced Database Systems Megastore 12 SRBIAU, Kurdistan Campus Advanced Database Systems Megastore Transactions and Concurrency Control Read current snapshot inconsistent Transaction Lifecycle 1-Read 2-Application logic 13 SRBIAU, Kurdistan Campus 3-Commit 4-Apply 5-Clean up Advanced Database Systems Megastore Queues 14 Two Phase Commit SRBIAU, Kurdistan Campus Advanced Database Systems REPLICATION Brief Summary of Paxos Megastore’s Approach Fast Reads Fast Writes Replica Types Witness Replica Architecture 15 SRBIAU, Kurdistan Campus Advanced Database Systems Architecture 16 SRBIAU, Kurdistan Campus Advanced Database Systems Data Structures and Algorithms Replicated Logs 17 SRBIAU, Kurdistan Campus Advanced Database Systems Data Structures and Algorithms Reads Query Local Find Position Local read Majority read Catchup Validate Query Data 18 SRBIAU, Kurdistan Campus Advanced Database Systems Data Structures and Algorithms 19 SRBIAU, Kurdistan Campus Advanced Database Systems Data Structures and Algorithms Writes Accept Leader Prepare Accept Invalidate Apply 20 SRBIAU, Kurdistan Campus Advanced Database Systems Feedback 21 SRBIAU, Kurdistan Campus Advanced Database Systems END With Thanks Question ? 22 SRBIAU, Kurdistan Campus Advanced Database Systems