70s - Database access is hard and depends on the app
80s – Relational databases come on the scene
90s – Object oriented programming and DBs
00s – Interpreted languages, Agile
Means an app that supports millions of users
Represents relationships
Variable usage (viral apps)
Data that is important aggregated, not by itself
Time-to-market vs. proper design
Uptime (availability) vs. correctness
Ease of management vs. customization
Rejection of RDBMS as one-size-fits-all
Minimal functions and minimal admin
BASE
Basically available, Soft state, Eventually consistent
ACID
Atomic, Consistent, Isolation, Durable
Written in C
Open-source and free (no royalties)
New BSD license
By Salvatore Sanfilippo
Created for a Web analytics project
Sponsored by VMWare
Used by various projects: github, craigslist, stackoverflow, digg
Key-value dictionary with sets, lists
Single-threaded
Delayed writes
Data needs to be kept in-memory
Simple protocol
Lack of table, schema, or database
Very basic security
Session store
One (or more) sessions per user
Many reads, few writes
Throw-away data
Timeouts
Logging
Rapid, low latency writes
Data you don’t care that much about
Not that much data (must be in-memory)
Low-latency write
Many reads throughout transaction
Short (less than a day)
Think a shopping cart or a file upload
Data that you don’t mind losing
Records that can be accessed by a single primary key
Schema that that is either a single value or is a serialized object
Java
Jedis (github)
.NET
ServiceStack (Google Code)
Ruby
redis-rb (github)
SET k v
GET k
MSET k v [k2 v2]
MGET k [k2 …]
GETSET k v
Returns value before set, sets new value
SETNX k v (only sets if does not exist)
SETEX k n v (expires a key after n seconds)
Set is unordered grouping of values
SADD k v
SCARD k – counts set
SISMEMBER k v – checks to see if v is in set
SUNION k [k2 …] – adds sets
SINTER k [k2 …] – intersects sets
SDIFF k [k2 …] – subtracts sets
Ordered group
LPUSH k v – prepends
LPOP k v – removes 1 st element
LINSERT k BEFORE || AFTER n v – inserts v before or after the nth element
RPUSH kv – appends
RPOP k v – removes last element
LLEN k – number of elements
LRANGE k n m – gets range n to m inclusive
SLAVEOF host port
Asynchronous
Can chain together pub -> slave -> slave
Cannot chain together pub <-> pub
Sorted sets (indexed but with set operations, higher big-O complexity)
Hashes (many values for one key)
HSET k field v – sets v for field for k
HGET k field
MULTI / EXEC / DISCARD / WATCH – xactions
Message queues (Pub/Sub)
Startup info
Client logins
Databases and number of keys
Background saves and time
Replication
Bottleneck on memory
Low CPU
Disk only on flush
Amortized
SET, GET, etc – O(1)
KEYS – O(N)
ZADD, ZREM, etc – O(log(n))
General Types of Databases
Relational Databases – Oracle, Postgres, MySQL
Object Stores – Objectivity, Cache, db4o
Key Value Stores – Berkelely DB, Riak, Cassandra
Document Stores – Mongo, Lotus, Couch
Graph Databases – Neo4j, InfoGrid
Redis is a key value store
Intentionally made for clustering
Replicas are not consistent
Written in Java
Much more robust organization
Often called a column-store, though this is a misnomer
Imitates Dynamo
Written in Erlang
Robust organization
REST Api
Made for clustering, similar to Cassandra
Imitates Dynamo
Document store – can represent objects natively – understands its data
Can access by values
Much more advanced architecture
Auto-sharding