L4-1 NoSQL

advertisement
Big Data and NoSQL
What and Why?
Motivation: Size
• WWW has spawned a new era of applications
that need to store and query very large data sets
– Facebook / 1.1 billion (sep 2012) [Yahoo finance]
– Twitter / 500 mio (jul 2012) [techcrunch.com]
– Netflix / 30 mio (okt 2012) [Yahoo finance]
– Google / index ~45 billion pages (nov 2012)
• [http://www.worldwidewebsize.com/]
• So – how do we store and query this type of data
set?
2
Motivation: Scalability
• Another characteristica is the potential rapidly
expanding market
– Facebook (end of year data):
•
•
•
•
•
2004: 1 mio
2006: 12 mio
2008: 150 mio
2010: 600 mio
2012: 1.100 mio
• How to scale fast enough?
3
Motivation: Availability
• Bad user experience
– Down time
– Long waits
• ... Equals lost users
• How to ensure good user experience?
4
From Alvin Richard’s Goto presentation
5
Requirements:
• So the QA requirements are
– Performance:
• Time-dimension:
• Space-dimension:
– Availability:
fast storage and retrival of data
large data sets
of data service
6
The players
• The old workhorse
– RDBM: Relational DataBase Management
• The hyped and forgotten (?)
– OODB
– XMLDB
• BaseX
• The hyped and uncertain
– NoSQL
7
RDBM
• Performance:
– Time
• Queries do not scale linearly in the
size of the data set 
– 10.000 rows ~
– 1 billion rows ~
1 second
24 hours
[Jacobs (2009)]
8
RDBM
• Performance:
– Space
• Generally, RDBM are rather monolith
One RDBM = One machine
• Limitations on disk size, RAM, etc.
– (There are ‘tricks’ to distribute data over many
machines, but it is generally an ‘after-the-fact’ design,
not something considered a premise...)
9
RDBM
• Availability:
– OK I think...
• Redundancy, redundant disk arrays, the usual bag of tricks
10
RDBM Performance
Starting with Time behaviour
11
RDBM Space tricks
• Sharding
[Wikipedia]
• Exercise: Give examples from WoW …
12
Why do RDBMs become slow?
• Jacobs’ paper explains the details.
• It boils down to:
– The memory hierarchy
• Cache, RAM, Disk, (Tape)
• Each step has a big performance penalty
– The reading pattern
• Sequential read faster than Random read
• Tape: obvious
but even for RAM this is true!
13
Issue 1
14
Thus
Issue 1:
Large data sets cannot reside in RAM
Thus read/write time is slow
15
Issue 2
• Issue
– Often one domain type is fixed at ”large”
• Number of customers/users ( < 7 billion )
• Number of sensors
– While other domains increase indefintely
• Observations over time
– each page visit, transaction, sensor reading
• Data production
– tweets, facebook update, log entry
16
But...
The relational database model explicitly ignores
the ordering of rows in tables.
17
Example
• Transactions recorded by a web server
– Transaction log is inherently time ordered
– Queries are usually about time
as well
• How may last day?
• Most visiting user last week?
• However results in random
visits to the disk!
18
Normalization
• It becomes even worse due to the classic RDBM
virtue of normalization:
• A user and a transaction table
19
Jacobs’ statement
20
Denormalization technique
• Denormalization:
– Make aggregate
objects
– Allows much faster
access
– Liability: Much more
space intensive
• And!
– Store data in the
sequences they are
to be queried…
21
NoSQL Databases
A new take on storing and quering big
data
22
Key features
• NoSQL: ”Not Only SQL”
/ Not Relational
– Horizontal scaling of simple operations over many
servers
– Repliation and partitioning data over many servers
– Simple call level interface
– Weaker concurrency model than ACID
– Efficient use of RAM and dist. Indexes for storage
– Ability to dynamically add new attributes to records
• Architectural Drivers:
– Performance
– Scalabilty
[Cattel, 2010]
23
Clarifications
• NoSQL focus on
– Simple operations
• Key lookup, read/write of one or a few records
• Opposite:
Complex joins over many tables (SQL)
• (NoSQL generally denormalize and thus do not require joins)
– Horizontal scaling
• Many servers with no RAM nor disk sharing
• Commodity hardware
– Cheap but more prone to failures
24
NoSQL DB Types
25
Basic types
• Four types
– Key-value stores
• Memcached, Riak, Regis
– Document stores
• MongoDB
– Extensible record (Fowler: Column-family)
• Cassandra, Google BigTable
– Graph stores
• Neo4J
26
Key-value
• Basically the Java Map<KeyType, ValueType>
datastructure:
– Map.put(”pid01-2012-12-03-11-45”, measurement);
– m = Map.get(”pid01-2012-12-03-11-45”);
• Schema: Any value under any key…
• Supports:
– Automatic replication
– Automatic sharding
– Both using hashing on the key
• Only lookup using the (primary) key
27
Document
• Stores ”documents”
– MongoDB:
JSON objects.
– Stronger queries, also in
document contents
– Schema: Any JSON object may be stored!
– Atomic updates, otherwise no concurrency control
• Supports
– Master-slave replication, automatic failover and
recovery
– Automatic sharding
• Range-based, on shard key (like zip-code, CPR, etc.)
28
Extensible Record/Column
• First kid on the block: Google BigTable
• Datamodel
– Table of rows and columns
• Scalability model: splitting both on rows and columns
• Row: (Range) Sharding on primary key
• Column: Column groups – domain defined clustering
– No Schema on the columns, change as you go
– Generally memory-based with periodic flushes to disk
29
(Graph)
• Neo4J
30
CAP Theorem
31
Reviewing ACID
• Basic RDBM teaching talks on ACID
• Atomicity
– Transaction: All or none succeed
• Concistency
– DB is in valid state before and after transaction
• Isolation
– N transactions executed concurrently = N executed in
sequence
• Durability
– Once a transaction has been committed, it will remain
so (even in the event of power loss, crash)
32
CAP
• Eric Brewer: only get two of the three:
• Consistency
– Set of operations has occurred at once (Client view)
• Availability
– Operations will terminate in intended reponse
• Partition tolerence
– Operation will complete, even if components are
unavailable
33
Horizontal Scaling: P taken
• We have already taken P, so we have to relax
either
– Consistency
– Availability
• RDBM prefer consistency over availability
• NoSQL prefer availability over consistency
– replacing it with eventual consistency
34
Achieving ACID
• Two-phase Commit
1. All partitions pre-commit, report result to master
2. If success, master tells each to commit; else roll-back
• Guaranty consistency, but availability suffer
• Example
– Two partitions, 99.9% availability
•
=> 99.92
= 99.8% (+43 min down every month)
– Five partitions:
•
99,5% (36 hours down time in all)
35
BASE
• Replace ACID with BASE
• BA:
• S:
• E:
Basically Available
Soft state
Eventual consistent
• Availability achieved by partial failures not
leading to system failures
– In two-phase commit, what would master do if one
partition does not repond?
36
Eventual Consistency
• So what does this mean?
– Upon write, an immediate read may retrieve the old
value – or not see the newest added item!
• Why? Gets data from a replica that is not yet updated…
– Eventual consistency:
• Given sufficiently long period of time which no updates, all
replicas are consistent, and thus all reads consistently return
the same data…
• System always available, but state is ‘soft’ /
cached
37
Discussion
• Web applications
– Is it a problem that a facebook update takes some
minutes to appear at my friends?
38
Summary
• RDBMs have issues with ‘big data’
– Ignores the physical laws of hardware (random read)
– RDB model requires joins (random read)
• NoSQL
– Class of alternative DB models and impl.
– Two main classes
• Key-value stores
• Document stores
• CAP
– You can only get two of three
– NoSQL choose A, and sacrifice C for Eventual
Consistency
CS@AU
Henrik Bærbak Christensen
39
Download