Big Data and NoSQL What and Why? Motivation: Size • WWW has spawned a new era of applications that need to store and query very large data sets – Facebook / 1.1 billion (sep 2012) [Yahoo finance] – Twitter / 500 mio (jul 2012) [techcrunch.com] – Netflix / 30 mio (okt 2012) [Yahoo finance] – Google / index ~45 billion pages (nov 2012) • [http://www.worldwidewebsize.com/] • So – how do we store and query this type of data set? 2 Motivation: Scalability • Another characteristica is the potential rapidly expanding market – Facebook (end of year data): • • • • • 2004: 1 mio 2006: 12 mio 2008: 150 mio 2010: 600 mio 2012: 1.100 mio • How to scale fast enough? 3 Motivation: Availability • Bad user experience – Down time – Long waits • ... Equals lost users • How to ensure good user experience? 4 From Alvin Richard’s Goto presentation 5 Requirements: • So the QA requirements are – Performance: • Time-dimension: • Space-dimension: – Availability: fast storage and retrival of data large data sets of data service 6 The players • The old workhorse – RDBM: Relational DataBase Management • The hyped and forgotten (?) – OODB – XMLDB • BaseX • The hyped and uncertain – NoSQL 7 RDBM • Performance: – Time • Queries do not scale linearly in the size of the data set – 10.000 rows ~ – 1 billion rows ~ 1 second 24 hours [Jacobs (2009)] 8 RDBM • Performance: – Space • Generally, RDBM are rather monolith One RDBM = One machine • Limitations on disk size, RAM, etc. – (There are ‘tricks’ to distribute data over many machines, but it is generally an ‘after-the-fact’ design, not something considered a premise...) 9 RDBM • Availability: – OK I think... • Redundancy, redundant disk arrays, the usual bag of tricks 10 RDBM Performance Starting with Time behaviour 11 RDBM Space tricks • Sharding [Wikipedia] • Exercise: Give examples from WoW … 12 Why do RDBMs become slow? • Jacobs’ paper explains the details. • It boils down to: – The memory hierarchy • Cache, RAM, Disk, (Tape) • Each step has a big performance penalty – The reading pattern • Sequential read faster than Random read • Tape: obvious but even for RAM this is true! 13 Issue 1 14 Thus Issue 1: Large data sets cannot reside in RAM Thus read/write time is slow 15 Issue 2 • Issue – Often one domain type is fixed at ”large” • Number of customers/users ( < 7 billion ) • Number of sensors – While other domains increase indefintely • Observations over time – each page visit, transaction, sensor reading • Data production – tweets, facebook update, log entry 16 But... The relational database model explicitly ignores the ordering of rows in tables. 17 Example • Transactions recorded by a web server – Transaction log is inherently time ordered – Queries are usually about time as well • How may last day? • Most visiting user last week? • However results in random visits to the disk! 18 Normalization • It becomes even worse due to the classic RDBM virtue of normalization: • A user and a transaction table 19 Jacobs’ statement 20 Denormalization technique • Denormalization: – Make aggregate objects – Allows much faster access – Liability: Much more space intensive • And! – Store data in the sequences they are to be queried… 21 NoSQL Databases A new take on storing and quering big data 22 Key features • NoSQL: ”Not Only SQL” / Not Relational – Horizontal scaling of simple operations over many servers – Repliation and partitioning data over many servers – Simple call level interface – Weaker concurrency model than ACID – Efficient use of RAM and dist. Indexes for storage – Ability to dynamically add new attributes to records • Architectural Drivers: – Performance – Scalabilty [Cattel, 2010] 23 Clarifications • NoSQL focus on – Simple operations • Key lookup, read/write of one or a few records • Opposite: Complex joins over many tables (SQL) • (NoSQL generally denormalize and thus do not require joins) – Horizontal scaling • Many servers with no RAM nor disk sharing • Commodity hardware – Cheap but more prone to failures 24 NoSQL DB Types 25 Basic types • Four types – Key-value stores • Memcached, Riak, Regis – Document stores • MongoDB – Extensible record (Fowler: Column-family) • Cassandra, Google BigTable – Graph stores • Neo4J 26 Key-value • Basically the Java Map<KeyType, ValueType> datastructure: – Map.put(”pid01-2012-12-03-11-45”, measurement); – m = Map.get(”pid01-2012-12-03-11-45”); • Schema: Any value under any key… • Supports: – Automatic replication – Automatic sharding – Both using hashing on the key • Only lookup using the (primary) key 27 Document • Stores ”documents” – MongoDB: JSON objects. – Stronger queries, also in document contents – Schema: Any JSON object may be stored! – Atomic updates, otherwise no concurrency control • Supports – Master-slave replication, automatic failover and recovery – Automatic sharding • Range-based, on shard key (like zip-code, CPR, etc.) 28 Extensible Record/Column • First kid on the block: Google BigTable • Datamodel – Table of rows and columns • Scalability model: splitting both on rows and columns • Row: (Range) Sharding on primary key • Column: Column groups – domain defined clustering – No Schema on the columns, change as you go – Generally memory-based with periodic flushes to disk 29 (Graph) • Neo4J 30 CAP Theorem 31 Reviewing ACID • Basic RDBM teaching talks on ACID • Atomicity – Transaction: All or none succeed • Concistency – DB is in valid state before and after transaction • Isolation – N transactions executed concurrently = N executed in sequence • Durability – Once a transaction has been committed, it will remain so (even in the event of power loss, crash) 32 CAP • Eric Brewer: only get two of the three: • Consistency – Set of operations has occurred at once (Client view) • Availability – Operations will terminate in intended reponse • Partition tolerence – Operation will complete, even if components are unavailable 33 Horizontal Scaling: P taken • We have already taken P, so we have to relax either – Consistency – Availability • RDBM prefer consistency over availability • NoSQL prefer availability over consistency – replacing it with eventual consistency 34 Achieving ACID • Two-phase Commit 1. All partitions pre-commit, report result to master 2. If success, master tells each to commit; else roll-back • Guaranty consistency, but availability suffer • Example – Two partitions, 99.9% availability • => 99.92 = 99.8% (+43 min down every month) – Five partitions: • 99,5% (36 hours down time in all) 35 BASE • Replace ACID with BASE • BA: • S: • E: Basically Available Soft state Eventual consistent • Availability achieved by partial failures not leading to system failures – In two-phase commit, what would master do if one partition does not repond? 36 Eventual Consistency • So what does this mean? – Upon write, an immediate read may retrieve the old value – or not see the newest added item! • Why? Gets data from a replica that is not yet updated… – Eventual consistency: • Given sufficiently long period of time which no updates, all replicas are consistent, and thus all reads consistently return the same data… • System always available, but state is ‘soft’ / cached 37 Discussion • Web applications – Is it a problem that a facebook update takes some minutes to appear at my friends? 38 Summary • RDBMs have issues with ‘big data’ – Ignores the physical laws of hardware (random read) – RDB model requires joins (random read) • NoSQL – Class of alternative DB models and impl. – Two main classes • Key-value stores • Document stores • CAP – You can only get two of three – NoSQL choose A, and sacrifice C for Eventual Consistency CS@AU Henrik Bærbak Christensen 39