Cassandra(DHT) Amit Mhatre Indiana University Bloomington aumhatre@indiana.edu ABSTRACT CREATE TABLE tweets ( Web applications these days have different needs than the applications that RDBMS were designed for. Some of those needs are high availability, flexible schemas, geographic distribution, scalability, elasticity at low cost and low and predictable response time. Usually these web applications can do without transactions and complex queries. It has been observed that traditional RDBMS do not have flexible schemas and do not scale well. This is where Cassandra, a NoSQL solution comes into picture. In this paper I present the data model and a precise description of Cassandra, an open source distributed database management system. 1. INTRODUCTION Cassandra is a very high performing, eventually consistent, scalable database, which doesn’t follow the normal SQL database principles like schemas, tables, columns, datatypes and a query language like SQL. Instead it is a non-relational database similar to Google’s BigTable. It is described as a BigTable data model running on an Amazon Dynamo like infrastructure. Cassandra was developed at Facebook and released as an open source project in 2008. It is currently being developed by Apache committers and contributors from many companies. It is currently in production at web giants like Digg, Twitter, Facebook, Google, Rackspace, Amazon, which proves that it is one of the most popular tool in the NoSQL ecosystem. 2. MOTIVATION We will explain the need for a non-relational database by considering an example. In the case of Twitter, we need to store users’ data alongwith their tweets, followers and users’ following. If we were modeling this in a relational database, our tables would look like these: CREATE TABLE user ( id INTEGER PRIMARY KEY, username VARCHAR(64), id INTEGER, user INTEGER REFERENCES user(id), body VARCHAR(140), timestamp TIMESTAMP ); If we were to store data like this, we would need to perform joins to combine data from multiple tables, and be able to create indices on the appropriate attributes to make that efficient. Firstly, performing joins is a time-consuming and expensive operation. Secondly, getting distributed systems right is a big challenge, and it comes with trade-offs. The above example lacks support for secondary indices which makes it difficult to efficiently perform joins. NoSQL overcomes these problems, by changing the way how data is stored. 3. DATA MODELING The Cassandra data model is designed for distributed data on a very large scale. It does a trade-off on the ACID properties for important advantages in performance, availability and operational manageability. Cassandra follows a column oriented model. To simplify matters, we will start with a key-value model. In a key-value model we have a key that uniquely identifies a value and this value can be structured, based on a JSON format, or completely unstructured, like a BLOB. In a column oriented model, a value can be a collection of other key-value elements. The data model consists of the following main elements: Column - The basic element which is a tuple composed of a timestamp, a column name and a column value. The timestamp is set by the client. SuperColumns – They are columns which consist of dynamic list of sub-columns. password VARCHAR(64) ColumnFamily – It is a set of columns. It is comparable to a table in the relational database except that the number and names of columns can vary from a row to another. CREATE TABLE followers ( KeySpace – It is a set of ColumnFamily. The notion of a ‘row’ does not exist by itself; in fact it is a list of Columns or SuperColumns identified by a row key. ); user INTEGER REFERENCES user(id), follower INTEGER REFERENCES user(id) Cluster – the nodes in a Cassandra instance. ); CREATE TABLE following ( user INTEGER REFERENCES user(id), followed INTEGER REFERENCES user(id) ); Consider an example given below: Tunable consistency – Cassandra defines different levels of consistency. Some of them are: One: Data is written to at least one node’s commit log and memory table before responding to the client. During read, the data will be returned from the first node where it is found. Quorum: Data will be written on N/2 + 1 nodes before responding to the client. N is the replication factor(number of nodes data will be replicated to). For a read, data will be read from N/2 + 1 nodes. All: Cassandra will write and read the data from all nodes. 5. DATA PARTITIONING AND REPLICATION In this case, the ColumnFamily Tweets, represents a tweet by a user on Twitter. Each SuperColumn is identified by a key. The column ‘Text’ has value ‘Hello, World’, while column ‘User_ID’ has value ‘39823’ for the first instance. Another example which will help understand better is that of Twissandra, which is a Twitter clone to represent data. <Keyspaces> <Keyspace Name="Twissandra"> <ColumnFamily Name="User"/> The ability of Cassandra to scale incrementally is achieved by dynamically partitioning data over the set of nodes in cluster. Cassandra partitions data across the cluster using consistent hashing but uses an order preserving hash function to do so. High availability and durability is achieved using replication. Each data item is replicated at N nodes, where N is the replication factor. Each key is assigned to a coordinator node which is responsible for replication of data items that fall within its range. The client is given an option of how data is to be replicated. DHT like ring Gossip protocol is used to link nodes together for data replication. <ColumnFamily Name="Username"/> <ColumnFamily Name="Friends"/> 6. MySQL COMPARISON <ColumnFamily Name="Followers"/> Following are the results of read and write operations over more than 50GB of data: <ColumnFamily Name="Tweet"/> <ColumnFamily Name="Userline"/> MySQL writes average: approx. 300ms MySQL read average: approx. 350ms <ColumnFamily Name="Timeline"/> </Keyspace> Cassandra writes average: 0.12ms Cassandra reads average: 15ms </Keyspaces> The column family User will contain columns for username and password. The column family Username is used to map usernames to UUID based key. Friends and Followers will answer questions like who is following user X and who is user X following. Tweet will store records with unique keys(UUID), the body of the tweet and the timestamp. Userline and Timeline are used to maintain a timeline of each user’s tweets. 7. ADVANTAGES Key advantages of this architecture are as follows: Massive scalability, high availability, schema flexibility, sparse and semi-structured data 8. DISADVANTAGES Some of the drawbacks encountered till now are: Limited query capabilities so far, eventual consistency makes client application a bit complicated, portability is an issue due to lack of standardization. 4. FEATURES Total Decentralization – Every node in the cluster is identical. There are no network bottlenecks, no single point of failure. Data is distributed across the cluster. As there is no master node, every node can service any request. Elasticity – Read and write throughput increases linearly as new machines are added. The applications do not suffer from any downtime or interruption. Fault tolerant – Automatic data replication is supported for fault tolerance. Failed nodes can be replaced with no downtime. 9. CONCLUSION Cassandra is a storage system which provides scalability, high performance, fault tolerance and high availability. But it still lacks support for secondary indices, atomicity across keys and compression. It should also support vector clocks, range deletion, memory efficient compactions and live Keyspace, ColumnFamily changes. [4] http://www.rackspace.com/cloud/blog/2010/05/12/cassandraby-example/ 10. ACKNOWLEDGMENT Many thanks to Prof. Judy Qiu and the AIs’ for guiding me throughout the course. And for the guidelines given for the survey paper. 11. REFERENCES [1] http://www.cs.cornell.edu/projects/ladis2009/papers/lakshma n-ladis2009.pdf [2] http://maxgrinev.com/2010/07/09/a-quick-introduction-tothe-cassandra-data-model/. [3] http://www.slideshare.net/cb1kenobi/cassandra-say-goodbyeto-the-relational-database-562010 [5] http://blog.octo.com/en/nosql-lets-play-with-cassandra-part13/ [6] http://en.wikipedia.org/wiki/Apache_Cassandra [7] http://wiki.apache.org/cassandra/ArticlesAndPresentations Interface Software and Technology (Vancouver, Canada, November 02 - 05, 2003). UIST '03. ACM, New York, NY, 1-10. DOI= http://doi.acm.org/10.1145/964696.964697.