Cassandra(DHT) Amit Mhatre ABSTRACT

advertisement
Cassandra(DHT)
Amit Mhatre
Indiana University Bloomington
aumhatre@indiana.edu
ABSTRACT
CREATE TABLE tweets (
Web applications these days have different needs than the
applications that RDBMS were designed for. Some of those needs
are high availability, flexible schemas, geographic distribution,
scalability, elasticity at low cost and low and predictable response
time. Usually these web applications can do without transactions
and complex queries. It has been observed that traditional
RDBMS do not have flexible schemas and do not scale well. This
is where Cassandra, a NoSQL solution comes into picture. In this
paper I present the data model and a precise description of
Cassandra, an open source distributed database management
system.
1. INTRODUCTION
Cassandra is a very high performing, eventually consistent,
scalable database, which doesn’t follow the normal SQL database
principles like schemas, tables, columns, datatypes and a query
language like SQL. Instead it is a non-relational database similar
to Google’s BigTable. It is described as a BigTable data model
running on an Amazon Dynamo like infrastructure.
Cassandra was developed at Facebook and released as an open
source project in 2008. It is currently being developed by Apache
committers and contributors from many companies. It is currently
in production at web giants like Digg, Twitter, Facebook, Google,
Rackspace, Amazon, which proves that it is one of the most
popular tool in the NoSQL ecosystem.
2. MOTIVATION
We will explain the need for a non-relational database by
considering an example.
In the case of Twitter, we need to store users’ data alongwith their
tweets, followers and users’ following. If we were modeling this
in a relational database, our tables would look like these:
CREATE TABLE user (
id INTEGER PRIMARY KEY,
username VARCHAR(64),
id INTEGER,
user INTEGER REFERENCES user(id),
body VARCHAR(140),
timestamp TIMESTAMP
);
If we were to store data like this, we would need to perform joins
to combine data from multiple tables, and be able to create indices
on the appropriate attributes to make that efficient.
Firstly, performing joins is a time-consuming and expensive
operation. Secondly, getting distributed systems right is a big
challenge, and it comes with trade-offs. The above example lacks
support for secondary indices which makes it difficult to
efficiently perform joins.
NoSQL overcomes these problems, by changing the way how data
is stored.
3. DATA MODELING
The Cassandra data model is designed for distributed data on a
very large scale. It does a trade-off on the ACID properties for
important advantages in performance, availability and operational
manageability.
Cassandra follows a column oriented model. To simplify matters,
we will start with a key-value model. In a key-value model we
have a key that uniquely identifies a value and this value can be
structured, based on a JSON format, or completely unstructured,
like a BLOB. In a column oriented model, a value can be a
collection of other key-value elements.
The data model consists of the following main elements:
Column - The basic element which is a tuple composed of a
timestamp, a column name and a column value. The timestamp is
set by the client.
SuperColumns – They are columns which consist of dynamic list
of sub-columns.
password VARCHAR(64)
ColumnFamily – It is a set of columns. It is comparable to a table
in the relational database except that the number and names of
columns can vary from a row to another.
CREATE TABLE followers (
KeySpace – It is a set of ColumnFamily. The notion of a ‘row’
does not exist by itself; in fact it is a list of Columns or
SuperColumns identified by a row key.
);
user INTEGER REFERENCES user(id),
follower INTEGER REFERENCES user(id)
Cluster – the nodes in a Cassandra instance.
);
CREATE TABLE following (
user INTEGER REFERENCES user(id),
followed INTEGER REFERENCES user(id)
);
Consider an example given below:
Tunable consistency – Cassandra defines different levels of
consistency. Some of them are:
One: Data is written to at least one node’s commit log and
memory table before responding to the client. During read, the
data will be returned from the first node where it is found.
Quorum: Data will be written on N/2 + 1 nodes before responding
to the client. N is the replication factor(number of nodes data will
be replicated to). For a read, data will be read from N/2 + 1 nodes.
All: Cassandra will write and read the data from all nodes.
5. DATA PARTITIONING AND
REPLICATION
In this case, the ColumnFamily Tweets, represents a tweet by a
user on Twitter. Each SuperColumn is identified by a key. The
column ‘Text’ has value ‘Hello, World’, while column ‘User_ID’
has value ‘39823’ for the first instance.
Another example which will help understand better is that of
Twissandra, which is a Twitter clone to represent data.
<Keyspaces>
<Keyspace Name="Twissandra">
<ColumnFamily Name="User"/>
The ability of Cassandra to scale incrementally is achieved by
dynamically partitioning data over the set of nodes in cluster.
Cassandra partitions data across the cluster using consistent
hashing but uses an order preserving hash function to do so.
High availability and durability is achieved using replication.
Each data item is replicated at N nodes, where N is the replication
factor. Each key is assigned to a coordinator node which is
responsible for replication of data items that fall within its range.
The client is given an option of how data is to be replicated. DHT
like ring Gossip protocol is used to link nodes together for data
replication.
<ColumnFamily Name="Username"/>
<ColumnFamily Name="Friends"/>
6. MySQL COMPARISON
<ColumnFamily Name="Followers"/>
Following are the results of read and write operations over more
than 50GB of data:
<ColumnFamily Name="Tweet"/>
<ColumnFamily Name="Userline"/>
MySQL writes average: approx. 300ms
MySQL read average: approx. 350ms
<ColumnFamily Name="Timeline"/>
</Keyspace>
Cassandra writes average: 0.12ms
Cassandra reads average: 15ms
</Keyspaces>
The column family User will contain columns for username and
password.
The column family Username is used to map usernames to UUID
based key.
Friends and Followers will answer questions like who is following
user X and who is user X following.
Tweet will store records with unique keys(UUID), the body of the
tweet and the timestamp.
Userline and Timeline are used to maintain a timeline of each
user’s tweets.
7. ADVANTAGES
Key advantages of this architecture are as follows:
Massive scalability, high availability, schema flexibility, sparse
and semi-structured data
8. DISADVANTAGES
Some of the drawbacks encountered till now are:
Limited query capabilities so far, eventual consistency makes
client application a bit complicated, portability is an issue due to
lack of standardization.
4. FEATURES
Total Decentralization – Every node in the cluster is identical.
There are no network bottlenecks, no single point of failure. Data
is distributed across the cluster. As there is no master node, every
node can service any request.
Elasticity – Read and write throughput increases linearly as new
machines are added. The applications do not suffer from any
downtime or interruption.
Fault tolerant – Automatic data replication is supported for fault
tolerance. Failed nodes can be replaced with no downtime.
9. CONCLUSION
Cassandra is a storage system which provides scalability, high
performance, fault tolerance and high availability. But it still lacks
support for secondary indices, atomicity across keys and
compression. It should also support vector clocks, range deletion,
memory efficient compactions and live Keyspace, ColumnFamily
changes.
[4] http://www.rackspace.com/cloud/blog/2010/05/12/cassandraby-example/
10. ACKNOWLEDGMENT
Many thanks to Prof. Judy Qiu and the AIs’ for guiding me
throughout the course. And for the guidelines given for the survey
paper.
11. REFERENCES
[1] http://www.cs.cornell.edu/projects/ladis2009/papers/lakshma
n-ladis2009.pdf
[2] http://maxgrinev.com/2010/07/09/a-quick-introduction-tothe-cassandra-data-model/.
[3] http://www.slideshare.net/cb1kenobi/cassandra-say-goodbyeto-the-relational-database-562010
[5] http://blog.octo.com/en/nosql-lets-play-with-cassandra-part13/
[6] http://en.wikipedia.org/wiki/Apache_Cassandra
[7] http://wiki.apache.org/cassandra/ArticlesAndPresentations
Interface Software and Technology (Vancouver, Canada,
November 02 - 05, 2003). UIST '03. ACM, New York, NY,
1-10. DOI= http://doi.acm.org/10.1145/964696.964697.
Download