From RDBMS to NoSQL

advertisement
From SQL to NoSQL
Xiao Yu
Mar 2012
1
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
RDBMS
• The predominant choice in storing data
› Not so true for data mining PhDs since we
put everything in txt files.
• First formulated in 1969 by Codd
› We are using RDBMS everywhere
2
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Slide from neo technology, “A NoSQL Overview and the Benefits of Graph Databases"
3
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
When RDBMS met Web 2.0
Slide from Lorenzo Alberton, "NoSQL Databases: Why, what and when"
4
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
What’s Wrong with Relational DB?
• Nothing is wrong. You just need to use
the right tool.
• Relational is hard to scale.
› Easy to scale reads
› Hard to scale writes
5
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
The Death of RDBMS?
6
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
What’s NoSQL?
• The misleading term “NoSQL” is short for
“Not Only SQL”.
• non-relational, schema-free, non-(quite)acid
• horizontally scalable, distributed, easy
replication support
• simple API
7
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Four (emerging) NoSQL Categories
• Key-value stores
› Based on DHTs/ Amazon’s Dynamo paper *
› Data model: (global) collection of K-V pairs
› Example: Voldemort
• Column Families
› BigTable clones **
› Data model: big table, column families
› Example: HBase, Cassandra, Hypertable
*G DeCandia et al, Dynamo: Amazon's Highly Available Key-value Store, SOSP 07
** F Chang et al, Bigtable: A Distributed Storage System for Structured Data, OSDI 06
8
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Four (emerging) NoSQL Categories
• Document databases
› Inspired by Lotus Notes
› Data model: collections of K-V Collections
› Example: CouchDB, MongoDB
• Graph databases
› Inspired by Euler & graph theory
› Data model: nodes, rels, K-V on both
› Example: AllegroGraph, VertexDB, Neo4j
9
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Focus of Different Data Models
Slide from neo technology, “A NoSQL Overview and the Benefits of Graph Databases"
10
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
CAP theorem
RDBMS
NoSQL (most)
Partition
Tolerance
Consistency
Availability
11
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
When to use NoSQL?
• Bigness
• Massive write performance
› Twitter generates 7TB / per day (2010)
•
•
•
•
Fast key-value access
Flexible schema or data types
Schema migration
Write availability
› Writes need to succeed no matter what (CAP, partitioning)
•
•
•
•
•
•
•
•
12
Easier maintainability, administration and operations
No single point of failure
Generally available parallel computing
Programmer ease of use
Use the right data model for the right problem
Avoid hitting the wall
Distributed systems support
from http://highscalability.com/
Tunable CAP tradeoffs
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Key-Value Stores
id
hair_color
age
height
1923
Red
18
6’0”
3371
Blue
34
NA
…
…
…
…
Table in relational db
Store/Domain in Key-Value db
Find users whose age is above 18?
Find all attributes of user 1923?
Find users whose hair color is Red and age is 19?
(Join operation) Calculate average age of all grad students?
13
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Example of Voldemort
14
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Voldemort in LinkedIn
Sid Anand, LinkedIn Data Infrastructure (QCon London 2012)
15
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
RO Store Usage Pattern
Sid Anand, LinkedIn Data Infrastructure (QCon London 2012)
16
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Voldemort vs MySQL
Sid Anand, LinkedIn Data Infrastructure (QCon London 2012)
17
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Column Families – BigTable Alike
F Chang, et al, Bigtable: A Distributed Storage System for Structured Data, osdi 06
18
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
BigTable Data Model
The row name is a reversed URL. The contents column family contains the page
contents, and the anchor column family contains the text of any anchors that
reference the page.
19
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
More on Row and Column
•
•
•
•
•
•
Rows stored in lexicographic order by row key
Table dynamically split into “Tablets”
Each tablet contains key [startKey, endKey)
Tablets are distributed on different nodes
All date in the same CF are usually same type
Data in same CF are compressed and stored
together
• CF in a specific row is sorted
20
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
BigTable API Examples
adds one anchor
to www.cnn.com and deletes a
different anchor
uses a Scanner abstraction to
iterate over all anchors in a
particular row
21
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
BigTable Performance
22
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Document Database - mongoDB
Table in relational db
Documents in a collection
Open source, document db
Json-like document with dynamic schema
Initial release 2009
23
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
mongoDB Product Deployment
And much more…
24
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
mongoDB Features
•
•
•
•
•
•
•
•
•
25
Document-oriented storage
Full Index Support
Replication & High Availability
Auto-Sharding
Querying
Fast In-Place Updates ?
Map/Reduce
GridFS
Commercial Support
From http://www.mongodb.org/
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
sum(checkout)
From Gabriele Lana, CouchDB Vs MongoDB
26
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
And mongoDB is fast
100000 Indexed Queries
avg
med
dev
total
mongoDB
0.00025
0.00021
0.00019
24.99955
mySQL
0.00199
0.00032
0.00518
199.32546
100 Non-Indexed Queries
avg
med
dev
total
mongoDB
0.05662
0.03704
0.19267
5.66164
mySQL
1.99975
1.68468
1.99266
199.97469
http://www.idiotsabound.com/did-i-mention-mongodb-is-fast-way-to-go-mongo
27
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Graph Database
Data Model Abstraction:
•Nodes
•Relations
•Properties
28
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Neo4j - Build a Graph
Slide from neo technology, “A NoSQL Overview and the Benefits of Graph Databases"
29
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Neo4j – Traverse a Graph
Slide from neo technology, “A NoSQL Overview and the Benefits of Graph Databases"
30
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
A Debatable Performance Evaluation
Comparing Apple to Orange
31
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Conclusion
• Use the right data model for the right
problem
32
Data Mining Meeting
Mar, 2012
Data and Information Systems Laboratory
University of Illinois Urbana-Champaign
Download