CS 440 Database Management Systems NoSQL & NewSQL, Cont’d.

advertisement
CS 440
Database Management Systems
NoSQL & NewSQL, Cont’d.
Some slides due to Magda Balazinska
1
Scaling by partitioning & replication
• Partition the data across machines
• Replicate the partitions
– Good:
• spread read queries across replica
– Bad:
• should keep the replica consistent after write queries
– Ugly:
• difficult to scale transactions
– two phase commit is expensive
• difficult to scale complex operations
2
NoSQL: Not Only SQL/ Not relational
• Goals
– highly scalable data management system
– flexible data model: various records from different schema
• They are willing to give up
– Complex queries
• e.g. no join
– ACID guarantees
• weaker versions, e.g. eventual consistency
– Multi-object transactions
• Not all NoSQL systems give up all these properties
3
NoSQL key features
• Scale horizontally simple operations
– key lookups,
– reads and writes of one record or a small number of
records
– simple selections
• Replicate/distribute data over many servers
• Simple call level interface (contrast w/ SQL)
• Weaker concurrency model than ACID
• Efficient use of distributed indexes and RAM
• Flexible schema
4
Different types of NoSQL
Taxonomy based on the data models:
• Key-value stores
– e.g., Dynamo, Project Voldemort, Memcached
• Document stores
– e.g., SimpleDB, CouchDB, MongoDB
• Extensible Record stores
– e.g., Bigtable, HBase, Cassandra
• NewSQL: new type of RDBMSs
– e.g., Megastore, VoltDB,
5
Key-Value stores features
• Data model: (key, value) pairs
– values are binary objects
– no further schema
• Operations
– insert, delete, and lookup operations on keys
– no operation across multiple data items
• Consistency
– replication with eventual consistency
• e.g., vector clocks in Dynamo
– goal to NEVER reject any writes (bad for business!)
– multiple versions with conflict resolution during reads
6
Key-Value stores features
• Use replication to provide fault-tolerance
• Quorum replication in Dynamo
– Each update creates a new version of an object
– Vector clocks track causality between versions
– Parameters:
• N = number of copies (replicas) of each object
• R = minimum number of nodes that must participate in
a successful read
• W = minimum number of nodes that must participate in
a successful write
• Quorum: R+W > N
7
Key-Value stores internals
• Only primary index: lookup by key
– No secondary indexes!
• Data remains in main memory
• Most systems also offer a persistence option
• Some offer ACID transactions others do not
– Multiversion concurrency control or locking
8
Multiversion Concurrency Control
• Idea: Let writers make a “new” copy while
readers use an appropriate “old” copy:
MAIN
SEGMENT
(Current
versions of
DB objects)
v
O
O’
O’’
VERSION
POOL
(Older versions that
may be useful for
some active readers.)
Readers are always allowed to proceed.
– But may be blocked until writer commits.
Multiversion CC (Contd.)
• Each version of an object has its writer’s TS as
its WTS, and the TS of the Xact that most
recently read this version as its RTS.
• Versions are chained backward; we can discard
versions that are “too old to be of interest”.
• Each Xact is classified as Reader or Writer.
– Writer may write some object; Reader never will.
– Xact declares whether it is a Reader when it begins.
WTS timeline
old
new
Reader Xact
• For each object to be read:
T
– Finds newest version with WTS < TS(T).
(Starts with current version in the main segment
and chains backward through earlier versions.)
• Assuming that some version of every object
exists from the beginning of time, Reader
Xacts are never restarted.
– However, might block until writer of the
appropriate version commits.
Writer Xact
• To read an object, follows reader protocol.
• To write an object:
– Finds newest version V s.t. WTS < TS(T).
– If RTS(V) < TS(T), T makes a copy CV of V,
with a pointer to V, with WTS(CV) = TS(T),
RTS(CV) = TS(T). (Write is buffered until T
commits; other Xacts can see TS values but can’t
read version CV.)
old
new
– Else, reject write.
WTS
CV
V
RTS(V)
T
Check out DynamoDB!
http://aws.amazon.com/dynamodb/
13
Different types of NoSQL
Taxonomy based on the data models:
• Key-value stores
– e.g., Dynamo, project voldemort, Memcached
• Document stores
– e.g., SimpleDB, CouchDB, MongoDB
• Extensible Record stores
– e.g., BigTable, HBase, Cassandra
• NewSQL: new type of RDBMSs
14
Document stores
• A "document” is a pointer-less object
– e.g., XML, JSON,
– nested or not
– schema-less
• relational vs. semi-structured (document) data model?
• They may have secondary indexes.
• Scalability
– Replication (e.g. SimpleDB, CounchDB – means
entire db is replicated)
– Sharding (MongoDB)
– Both
15
Amazon SimpleDB (1/3)
• Partitioning
– Data partitioned into domains: queries run within a domain
– Domains seem to be unit of replication. Limit 10GB
– Can use domains to manually create parallelism
• Data Model/ Schema
– No fixed schema
– Objects are defined with attribute-value pairs
16
Amazon SimpleDB (2/3)
• Indexing
– Automatically indexes all attributes
• Support for writing
– PUT and DELETE items in a domain
• Support for querying
– GET by key
– Selection + sort:
SELECT output_list FROM domain_name
[where expression] [sort_instructions]
[limit limit]
– A simple form of aggregation: count
17
Amazon SimpleDB (3/3)
• Availability and consistency
– Data is stored redundantly across multiple servers
– Takes time for the update to propagate to all locations
• Eventually consistent, but an immediate read might
not show the change
– Choose between consistent or eventually consistent
read
18
Different types of NoSQL
Taxonomy based on the data models:
• Key-value stores
– e.g., Dynamo, project voldemort, Memcached
• Document stores
– e.g., SimpleDB, CouchDB, MongoDB
• Extensible record stores
– e.g., BigTable, HBase, Cassandra
• NewSQL: new type of RDBMSs
19
Extensible record stores
• Data model is rows and columns
• Typical Access: Row ID, Column ID, Timestamp
• Scalability by splitting rows and columns over nodes
– Rows: sharding on primary key
– Columns: "column groups" = indication for which columns
to be stored together (e.g. customer name/address group,
financial info group, login info group)
20
Google Bigtable
•
•
•
•
Distributed storage system
Designed to store structured data
Scale to thousands of servers
Store up to several hundred terabytes (maybe even
petabytes)
• Perform backend bulk processing
• Perform real-time data serving
• To scale, Bigtable has a limited set of features
21
Bigtable data model
• Sparse, multidimensional sorted map
(row:string, column:string, time:int64)string
Columns are grouped in to families
22
Bigtable key features
• Read/writes of data under single row key is atomic
– Only single-row transactions!
• Data is stored in lexicographical order
– Improves data access locality
– Horizontally partitioned into tablets
– Tablets are unit of distribution and load balancing
• Column families are unit of access control
• Data is versioned (old versions garbage collected)
– Ex: most recent three crawls of each page, with times
23
Bigtable API
• Data definition
– Creating/deleting tables or column families
– Changing access control rights
• Data manipulation
– Writing or deleting values
– Looking up values from individual rows
– Iterating over subset of data in the table
• Can select on rows, columns, and timestamps
24
HBase
• Open source implementation of
BigTablehttp://hbase.apache.org/
25
Different types of NoSQL
Taxonomy based on the data models:
• Key-value stores
– e.g., Dynamo, project voldemort, Memcached
• Document stores
– e.g., SimpleDB, CouchDB, MongoDB
• Extensible record stores
– e.g., BigTable, HBase, Cassandra
• NewSQL: new type of RDBMSs
26
Scalable RDBMS: NewSQL
• Means RDBS that are offering sharding
• Key difference:
– NoSQL make it difficult or impossible to perform large
scope operations and transactions (to ensure performance),
while scalable RDBMS do not preclude these operations,
but users pay a price only when they need them.
• Megastore, VoltDB, MySQL Cluster, Clusterix,
ScaleDB
27
Megastore
• Implemented over Bigtable, used within Google
• Megastore is a layer on top of Bigtable
– Transactions that span nodes
– A database schema defined in a SQL-like language
– Hierarchical paths that allow some limited joins
• Megastore is made available through the Google App
Engine Datastore
28
VoltDB
• Main-memory RDBMS: no disk IO no buffer mngmt!
• Sharded across a shared-nothing cluster
– One transaction = one stored procedure
– So both the data and processing are partitioned
• Transaction processing
– SQL execution single-threaded for each shard
– Avoids all locking and latching overheads
• Synchronous multi-master replication for HA
– Multiple nodes may propagate updates
– Different from master/ slave
29
Application 1
• Web application that needs to display lots of customer
information; the users data is rarely updated, and
when it is, you know when it changes because
updates go through the same interface.
30
Application 2
• Department of Motor Vehicle: lookup objects by
multiple fields (driver's name, license number, birth
date, etc); "eventual consistency" is ok, since updates
are usually performed at a single location.
31
Application 3
• eBay-style application. Cluster customers by country;
separate the rarely changed "core” customer
information (address, email) from frequently-updated
info (current bids).
32
Application 4
• Everything else (e.g. a serious DMV application)
33
Criticism (from Stonebraker, CACM2011)
• No ACID = no interest in enterprises
– Screwing up mission-critical data is no-no-no
• Low-level query language is death
– Before SQL
• NoSQL means No Standards
– One (typical) large enterprise has 10,000 databases. These
need accepted standards
34
Download