NoSQL

advertisement
NoSQL
Hbase, Hive and Pig, etc.
Adopted from slides By Ruoming Jin,
Perry Hoekstra, Jiaheng Lu, Avinash
Lakshman, Prashant Malik, and Jimmy Lin
Outline
• Column Store
• Basic Concept
– Distributed Transaction Processing
– CAP
• NoSQL systems
• Summary
An invention in 2000s:
Column Stores for OLAP
Row Store and Column Store
• In row store data are stored in the disk tuple
by tuple.
• Where in column store data are stored in the
disk column by column
4
Row Store vs Column Store
Column Store:
IBM
60.25
10,000
1/15/2006
MSFT
60.53
12,500
1/15/2006
Used in: Sybase IQ, Vertica
Row Store:
IBM
60.25
10,000
1/15/2006
MSFT
60.53
12,500
1/15/2006
Used in: Oracle, SQL Server, DB2, Netezza,…
Row Store and Column Store
For example the query
SELECT account.account_number,
sum (usage.toll_airtime),
sum (usage.toll_price)
FROM usage, toll, source, account
WHERE usage.toll_id = toll.toll_id
AND usage.source_id = source.source_id
AND usage.account_id = account.account_id
AND toll.type_ind in (‘AE’. ‘AA’)
AND usage.toll_price > 0
AND source.type != ‘CIBER’
AND toll.rating_method = ‘IS’
AND usage.invoice_date = 20051013
GROUP BY account.account_number
Row-store: one row = 212 columns!
Column-store: 7 attributes
6
Row Store and Column Store
Row Store
Column Store
(+) Easy to add/modify a record
(+) Only need to read in relevant data
(-) Might read in unnecessary data
(-) Tuple writes require multiple accesses
• So column stores are suitable for read-mostly,
read-intensive, large data repositories
7
Column Stores: High Level
•
Read only what you need
• “Fat” fact tables are typical
• Analytics read only a few columns
•
Better compression
•
Execute on compressed data
•
Materialized views help row stores and
column stores about equally
Data model (Vertica/C-Store)
• Same as relational data model
– Tables, rows, columns
– Primary keys and foreign keys
– Projections
• From single table
• Multiple joined tables
• Example
Normal relational model
EMP(name, age, dept,
salary)
DEPT(dname, floor)
Possible C-store model
EMP1 (name, age)
EMP2 (dept, age,
DEPT.floor)
EMP3 (name, salary)
DEPT1(dname, floor)
C-Store/Vertica Architecture
(from vertica Technical Overview White Paper)
11
Summary: the performance gain
• Column representation – avoids reads of unused
attributes
• Storing overlapping projections – multiple orderings of
a column, more choices for query optimization
• Compression of data – more orderings of a column in
the same amount of space
• Query operators operate on compressed representation
List of Column Databases
•
•
•
•
•
•
•
Vertica/C-Store
SybaseIQ
MonetDB
LucidDB
HANA
Google’s Dremel
Parcell-> Redshit (Another Cloud-DB Service)
Outline
• Column Store
• Why NoSQL
• Basic Concept
– Distributed Transaction Processing
– CAP
• NoSQL systems
• Summary
Scaling Up
•
•
•
•
Issues with scaling up when the dataset is just too big
RDBMS were not designed to be distributed
Began to look at multi-node database solutions
Known as ‘scaling out’ or ‘horizontal scaling’
Scaling RDBMS – Master/Slave
• Master-Slave
– All writes are written to the master. All reads performed against
the replicated slave databases
– Critical reads may be incorrect as writes may not have been
propagated down
– Large data sets can pose problems as master needs to duplicate
data to slaves
What is NoSQL?
• Stands for Not Only SQL
• Class of non-relational data storage systems
• Usually do not require a fixed table schema nor do they use
the concept of joins
• All NoSQL offerings relax one or more of the ACID properties
(will talk about the CAP theorem)
Dynamo and BigTable
• Three major papers were the seeds of the NoSQL movement
– BigTable (Google)
– Dynamo (Amazon)
• Gossip protocol (discovery and error detection)
• Distributed key-value data store
• Eventual consistency
– CAP Theorem (discuss in a sec ..)
CAP Theorem
• Suppose three properties of a system
– Consistency (all copies have same value)
– Availability (system can run even if parts have failed)
– Partitions (network can break into two or more parts, each with active
systems that can not influence other parts)
• Brewer’s CAP “Theorem”: for any system sharing data it is
impossible to guarantee simultaneously all of these three
properties
• Very large systems will partition at some point
– it is necessary to decide between C and A
– traditional DBMS prefer C over A and P
– most Web applications choose A (except in specific applications such
as order processing)
ACID Transactions
• A DBMS is expected to support “ACID
transactions,” processes that are:
– Atomic : Either the whole process is done or none
is.
– Consistent : Database constraints are preserved.
– Isolated : It appears to the user as if only one
process executes at a time.
– Durable : Effects of a process do not get lost if the
system crashes.
20
Consistency
• Two kinds of consistency:
– strong consistency – ACID(Atomicity Consistency Isolation
Durability)
– weak consistency – BASE(Basically Available Soft-state
Eventual consistency )
Consistency Model
• A consistency model determines rules for visibility and apparent
order of updates.
• For example:
–
–
–
–
–
–
–
–
Row X is replicated on nodes M and N
Client A writes row X to node N
Some period of time t elapses.
Client B reads row X from node M
Does client B see the write from client A?
Consistency is a continuum with tradeoffs
For NoSQL, the answer would be: maybe
CAP Theorem states: Strict Consistency can't be achieved at the
same time as availability and partition-tolerance.
Two-Phase Commit (2PC)
If unanimous to commit
decide to commit
else decide to abort
“commit or abort?” “here’s my vote” “commit/abort!”
TM/C
RM/P
precommit
or prepare
vote
RMs validate Tx and
prepare by logging their
local updates and
decisions
decide
notify
TM logs commit/abort
(commit point)
2PC: Phase 1
✓ 1. Tx requests commit, by notifying coordinator (C)
• C must know the list of participating sites/RMs.
✓ 2. Coordinator C requests each participant (P) to prepare.
✓ 3. Participants (RMs) validate, prepare, and vote.
• Each P validates the request, logs validates updates
locally, and responds to C with its vote to commit or
abort.
• If P votes to commit, Tx is said to be “prepared” at P.
2PC: Phase 2
✓ 4. Coordinator (TM) commits.
• Iff all P votes are unanimous to commit
– C writes a commit record to its log
– Tx is committed.
• Else abort.
✓ 5. Coordinator notifies participants.
• C asynchronously notifies each P of the outcome for Tx.
• Each P logs the outcome locally
• Each P releases any resources held for Tx.
25
25
Eventual Consistency
• When no updates occur for a long period of time,
eventually all updates will propagate through the
system and all the nodes will be consistent
• For a given accepted update and a given node,
eventually either the update reaches the node or the
node is removed from service
• Known as BASE (Basically Available, Soft state,
Eventual consistency), as opposed to ACID
CAP Theorem
Consistency: Every node in the system contains the
same data (e.g. replicas are never out of data)
Availability: Every request to a non-failing node in
the system returns a response
Partition Tolerance: System properties
(consistency and/or availability) hold even when the
system is partitioned (communicate lost) and data is
lost (node lost)
The CAP Theorem
Availability
Consistency
Partition
tolerance
Theorem: You can have
at most two of these
properties for any
shared-data system
A network partition
CAP Theorem
• Three properties of a system: consistency, availability and
partitions
• You can have at most two of these three properties for any
shared-data system
• To scale out, you have to partition. That leaves either
consistency or availability to choose from
– In almost all cases, you would choose availability over
consistency
Outline
• Column Store
• Why NoSQL
• Basic Concept
– Distributed Transaction Processing
– CAP
• NoSQL systems
• Summary
What kinds of NoSQL
• NoSQL solutions fall into two major areas:
– Key/Value or ‘the big hash table’.
•
•
•
•
•
Amazon S3 (Dynamo)
Voldemort
Scalaris
Memcached (in-memory key/value store)
Redis
– Schema-less which comes in multiple flavors, column-based,
document-based or graph-based.
•
•
•
•
•
Cassandra (column-based)
CouchDB (document-based)
MongoDB(document-based)
Neo4J (graph-based)
HBase (column-based)
4 Category
Key/Value
Pros:
–
–
–
–
very fast
very scalable
simple model
able to distribute horizontally
Cons:
- many data structures (objects) can't be easily modeled as key
value pairs
Schema-Less
Pros:
- Schema-less data model is richer than key/value pairs
- eventual consistency
- many are distributed
- still provide excellent performance and scalability
Cons:
- typically no ACID transactions or joins
Common Advantages
• Cheap, easy to implement (open source)
• Data are replicated to multiple nodes (therefore
identical and fault-tolerant) and can be
partitioned
– Down nodes easily replaced
– No single point of failure
•
•
•
•
Easy to distribute
Don't require a schema
Can scale up and down
Relax the data consistency requirement (CAP)
What is given up?
•
•
•
•
•
joins
group by
order by
ACID transactions
SQL as a sometimes frustrating but still powerful query
language
• easy integration with other applications that support SQL
Key-Value Stores
Extremely simple interface
 Data model: (key, value) pairs
 Operations: Insert(key,value), Fetch(key),
Update(key), Delete(key)
Implementation: efficiency, scalability, faulttolerance
 Records distributed to nodes based on key
 Replication
 Single-record transactions, “eventual consistency
User Case: Shopping Cart Data
• As we want the shopping carts to be available
all the time, across browsers, machines, and
sessions, all the shopping information can be
put into value where the key is the userID
Not to Use
•
•
•
•
Relationships among data
Multi-operation Transactions
Query by Data
Operations by Sets: since operations are
limited to one key at a time, there is no
way to operate upon multiple keys at the
same time. If you need to operate upon
multiple keys, you have to handle this
from the client side
Document Stores
Like Key-Value Stores except value is document
 Data model: (key, document) pairs
 Document: JSON, XML, other semistructured
formats
 Basic operations: Insert(key,document), Fetch(key),
Update(key), Delete(key)
 Also Fetch based on document contents
Example systems
 CouchDB, MongoDB, SimpleDB etc
Document-Based
• based on JSON format: a data model which supports
lists, maps, dates, Boolean with nesting
• Really: indexed semistructured documents
• Example: Mongo
– {Name:"Jaroslav",
Address:"Malostranske nám. 25, 118 00 Praha 1“
Grandchildren: [Claire: "7", Barbara: "6", "Magda: "3",
"Kirsten: "1", "Otis: "3", Richard: "1"]
}
MongoDB CRUD operations
• CRUD stands for create, read, update, and
delete
• MongoDB stores data in the form of
documents, which are JSON-like field and
value pairs.
A collection of MongoDB documents
Suitable Use Cases
•
•
•
•
Event Logging
Content Management Systems
Web Analytics or Real time Analysis
E-commerce Applications
Not to use
• Complex Transaction Spanning Different
Operations
• Queries against Varying Aggregate Structure
HBase is an open-source,
distributed, column-oriented
database built on top of HDFS
based on BigTable!
BigTable Data Model
• A table in Bigtable is a sparse, distributed,
persistent multidimensional sorted map
• Map indexed by a row key, column key, and a
timestamp
– (row:string, column:string, time:int64) 
uninterpreted byte array
• Supports lookups, inserts, deletes
– Single row transactions only
Image Source: Chang et al., OSDI 2006
BigTable Applications
•
•
•
•
Data source and data sink for MapReduce
Google’s web crawl
Google Earth
Google Analytics
HBase is ..
• A distributed data store that can scale horizontally to
1,000s of commodity servers and petabytes of
indexed storage.
• Designed to operate on top of the Hadoop
distributed file system (HDFS) or Kosmos File System
(KFS, aka Cloudstore) for scalability, fault tolerance,
and high availability.
Benefits
• Distributed storage
• Table-like in data structure
– multi-dimensional map
• High scalability
• High availability
• High performance
HBase benefits than RDBMS
• No real indexes
• Automatic partitioning
• Scale linearly and automatically with new
nodes
• Commodity hardware
• Fault tolerance
• Batch processing
Cassandra
Structured Storage System over a P2P Network
Why Cassandra?
• Lots of data
– Copies of messages, reverse indices of messages,
per user data.
• Many incoming requests resulting in a lot of
random reads and random writes.
• No existing production ready solutions in the
market meet these requirements.
Design Goals
• High availability
• Eventual consistency
– trade-off strong consistency in favor of high availability
• Incremental scalability
• Optimistic Replication
• “Knobs” to tune tradeoffs between consistency,
durability and latency
• Low total cost of ownership
• Minimal administration
innovation at scale
• google bigtable (2006)
– consistency model: strong
– data model: sparse map
– clones: hbase, hypertable
• amazon dynamo (2007)
– O(1) dht
– consistency model: client tune-able
– clones: riak, voldemort
cassandra ~= bigtable + dynamo
Cassandra
Suitable Use Cases
• Event Logging
• Content management Systems, blogging
platforms
Not to use
• There are problems for which column-family
databases are not best solutions, such as
systems that require ACID transactions for
writes and reads.
• If you need the database to aggregate the
data using queries (such as SUM or AVG), you
have to do this on the client side using data
retrieved by the client from all the rows.
Hive and Pig
Need for High-Level Languages
• Hadoop is great for large-data processing!
– But writing Java programs for everything is
verbose and slow
– Not everyone wants to (or can) write Java code
• Solution: develop higher-level data processing
languages
– Hive: HQL is like SQL
– Pig: Pig Latin is a bit like Perl
Hive and Pig
• Hive: data warehousing application in Hadoop
– Query language is HQL, variant of SQL
– Tables stored on HDFS as flat files
– Developed by Facebook, now open source
• Pig: large-scale data processing system
– Scripts are written in Pig Latin, a dataflow language
– Developed by Yahoo!, now open source
– Roughly 1/3 of all Yahoo! internal jobs
• Common idea:
– Provide higher-level language to facilitate large-data
processing
– Higher-level language “compiles down” to Hadoop jobs
Hive Components
•
•
•
•
Shell: allows interactive queries
Driver: session handles, fetch, execute
Compiler: parse, plan, optimize
Execution engine: DAG of stages (MR, HDFS,
metadata)
• Metastore: schema, location in HDFS, SerDe
Source: cc-licensed slide by Cloudera
Data Model
• Tables
– Typed columns (int, float, string, boolean)
– Also, list: map (for JSON-like data)
• Partitions
– For example, range-partition tables by date
• Buckets
– Hash partitions within ranges (useful for sampling,
join optimization)
Source: cc-licensed slide by Cloudera
Physical Layout
• Warehouse directory in HDFS
– E.g., /user/hive/warehouse
• Tables stored in subdirectories of warehouse
– Partitions form subdirectories of tables
• Actual data stored in flat files
– Control char-delimited text, or SequenceFiles
– With custom SerDe, can use arbitrary format
Source: cc-licensed slide by Cloudera
Hive: Example

Hive looks similar to an SQL database

Relational join on two tables:


Table of word counts from Shakespeare collection
Table of word counts from the bible
SELECT s.word, s.freq, k.freq FROM shakespeare s
JOIN bible k ON (s.word = k.word) WHERE s.freq >= 1 AND k.freq >= 1
ORDER BY s.freq DESC LIMIT 10;
the
I
and
to
of
a
you
my
in
is
25848
23031
19671
18038
16700
14170
12702
11297
10797
8882
Source: Material drawn from Cloudera training VM
62394
8854
38985
13526
34654
8057
2720
4135
12445
6884
Graph Databases
67
NEO4J (Graphbase)
• A graph is a collection nodes (things) and edges (relationships) that connect
pairs of nodes.
• Attach properties (key-value pairs) on nodes and relationships
•Relationships connect two nodes and both nodes and relationships can hold an
arbitrary amount of key-value pairs.
• A graph database can be thought of as a key-value store, with full support for
relationships.
• http://neo4j.org/
68
NEO4J
69
NEO4J
70
NEO4J
71
NEO4J
72
NEO4J
73
NEO4J
Properties
74
NEO4J Features
• Dual license: open source and commercial
•Well suited for many web use cases such as tagging, metadata annotations,
social networks, wikis and other network-shaped or hierarchical data sets
• Intuitive graph-oriented model for data representation. Instead of static and
rigid tables, rows and columns, you work with a flexible graph network
consisting of nodes, relationships and properties.
• Neo4j offers performance improvements on the order of 1000x
or more compared to relational DBs.
• A disk-based, native storage manager completely optimized for storing
graph structures for maximum performance and scalability
• Massive scalability. Neo4j can handle graphs of several billion
nodes/relationships/properties on a single machine and can be sharded to
scale out across multiple machines
•Fully transactional like a real database
•Neo4j traverses depths of 1000 levels and beyond at millisecond speed.
(many orders of magnitude faster than relational systems)
75
5 Summary
7.76
© 2006 by Prentice Hall
Download