Apache Cassandra Arvind Dwarakanath adwaraka@indiana.edu Department of Computer Science Indiana University Bloomington 1. What are NoSQL Distributed Hash Tables? and The concept of NoSQL differs from the standard Relational Database. The problems of the relational databases included the inability to work on dataintensive applications and indexing of large number of files/documents. Many NoSQL systems have been developed in order to cater to the above requirements. Many of the more databases have of late in nature. This type of redundant storage of servers. The storing distributed hash table. popular NoSQL been distributed structure means data on many occurs using a In a distributed hash table, the data is stored and a keyspace is evaluated using a hash function. The hashing is done using a SHA-1 hash. The data is traversed and then stored in a node that is responsible for that keyspace. A keyspace partitioning scheme splits ownership of this keyspace among the participating nodes. An overlay network then connects the nodes, allowing them to find the owner of any given key in the keyspace. A very popular version of a NoSQL database using the concept of keyspace is Apache Cassandra - the topic of the survey. 2. What is Cassandra? Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL system that was initially developed by Facebook and it powers their Inbox Search feature. A standalone test version of Twitter called „Twissandra‟ has also been created as demonstration. The basic fundamental of Cassandra is that it is a columnar database or rather a column-oriented distributed database. The data is stored in the form of columns and it is uniquely marked using 'keyspace'. It can be classified as a 'Cloud Db'. 3. Features of the Cassandra Model The data model A table in Cassandra is a distributed multidimensional map indexed by a Keyspace. The value is an object maybe an element or it may be highly structured. The row key in a table is a string with no size restrictions, although typically 16 to 36 bytes long. Every operation under a single row key is atomic per replica no matter how many columns are being read or written into. Columns are grouped together into sets called column families very much similar to what happens in the BigTable system. Cassandra exposes two kinds of column families: Simple and Super. Super column families can be visualized as a column family within a column family. The top dimension in Cassandra is called Keyspace. For instance; usrs['adwaraka'] will indicate a column family of users. In it, there will be an identifier „adwaraka‟. In usrs, we can further add usrs[adwaraka][fname], usrs[adwaraka][lname] and usrs[adwaraka][gender]. Column and Column Family As mentioned before, the data model is columnar in nature. The column is the base of Cassandra data model. The column is the lowest and smallest increment of data. It‟s a tupple (triplet) that contains a name, a value and a timestamp. Here‟s a column represented in JSON notation: Each row has multiple columns, each of which has a name, value, and a timestamp. Unlike a table in an RDBMS, different rows in the same column family do not have to share the same set of columns, and a column may be added to one or multiple rows at any time. It can be useful to distinguish between “static” column families that contain values such as user data or other object data, and “dynamic” column families that contain data such as precalculated query results. Keyspaces Keyspaces group column families together. Typically, there will be one Keyspace for each application that uses a Cassandra cluster. The most important settings that are defined at the keyspace level are the replication factor and the replica placement strategy. Thus, if you have sets of data that have different requirements for these settings (such as different levels of faulttolerance), these sets of data should reside in different keyspaces. A keyspace is to be set before any client API like thrift has to be fired. fname: "Arvind", On the Cassandra CLI, use the 'use <keyspace name>' to select the required keyspace. The command goes like this lname: "Dwarakanath", use keyspace Keyspace1; For the usr[adwaraka] { gender: “Male” } A column family resembles a table in an RDBMS. Column families contain rows and columns. Each row is uniquely identified by a row key. Super Columns Super Columns are a type of super structure of columns. Super columns are way to group multiple columns. Every super column must have a different name, just like with regular columns. Different super columns may hold subcolumns with the same name. Super columns are a way to add an extra map layer to the data model. Super columns are frequently used to hold a single record where each field in the record is represented by a subcolumn. For example, the name of a super column might be the ID of a transaction and each sub-column could hold some attribute of the transaction. For example, if a transactions row like the one describe had two entries, it might look like: So a user can continuously add new nodes to it without any worry about stoppage of applications. Durability Durability is the property that writes, once completed, will survive permanently, even if the server is killed or crashes or loses power. This requires calling fsync to tell the OS to flush its write-behind cache to disk. Fault Tolerant Data is automatically replicated to the multiple nodes for implementing faulttolerance. Replication across multiple data centres is supported. Failed nodes can be replaced with no downtime. { „trans-A‟: { Changeable Consistency „date‟: „01/02/2010‟, „amount‟: 5000 „timespace‟: <value1> }, „trans-B‟: { „date‟: „01/03/2010‟, „amount‟: 4500 „timespace‟: <value2> } } Decentralized Every node in the cluster is identical. There are no hierarchies between the nodes. There are no network bottlenecks. There are no single points of failure. The ability to tune consistency levels per query is a powerful feature of Cassandra because it gives the developer complete control of managing the trade-off of availability versus consistency. This means that Cassandra queries can be configured to exhibit strongly consistent behaviour (but there is no row-level locking) if the developer is willing to sacrifice latency. The consistency levels offered by Cassandra, which have different meanings for reads and writes, are detailed in the tables below. A „quorum‟ of replicas is essentially a majority of replicas, or [(Replica Number/2) + 1] with any resulting fractions rounded down. WRITE CONSISTENCY LEVELS Elasticity New nodes can be added without any down time or problems to applications. - ALL- All replicas must have received the write; otherwise the operation will fail. - - - - - ANY - Ensure that the write has been written to at least one node (can include hinted handoff recipients). Note that if all replica nodes are down at write time, an ANY write may not be readable until nodes have recovered. ONE - Ensure that the write has been written to at least one replica‟s commit log and memory table before responding to the client. QUORUM- Ensure that the write has been written to a quorum of replicas before responding to the client. LOCAL_QUORUM- Ensure that the write has been written to a quorum of replicas in the datacenter local to the coordinator before responding to the client. This setting avoids the latency of inter-data center communication. EACH_QUORUM- Ensure that the write has been written to a quorum of replicas in each datacenter in the cluster before responding to the client. READ CONSISTENCY LEVELS - - - ALL- Return the record with the most recent timestamp once all replicas have replied, failing the operation if any replicas are unresponsive. ONE- Returns the response from the closest replica, as determined by the snitch configured for the cluster. When read_repair is enabled, Cassandra may perform a consistency check in a background thread. QUORUM- Returns the record with the most recent timestamp once a quorum of replicas has reported. - - LOCAL_QUORUM- Returns the record with the most recent timestamp once a quorum of replicas in the datacenter local to the coordinator has reported. This setting avoids the latency of inter-data center communication. EACH_QUORUMReturns the record with the most recent timestamp once a quorum of replicas in each datacenter in the cluster has reported. In choosing the consistency level for particular operations, developers should consider the relative importance of consistency, latency, and availability. Note that the read operations are slower than rights in Cassandra. Cassandra can therefore be used for more write intensive operations on a massive scale like a blog. For instance, in a case where availability is top priority it may make sense to choose a level of ONE over QUORUM. If the replication factor for the cluster is 3, a QUORUM operation tolerates the loss of only one node (or, one copy of the data) while ONE allows the operation to complete even if two nodes are unavailable. Clusters spanning multiple data centres may present further questions regarding local latency and durability. 4. Major Client Libraries for Cassandra Thrift Thrift has been mentioned many times before in the paper. What is Thrift? Thrift is a software framework that allows for scalable cross-programming development. In this context, Thrift is the name of the RPC client used to communicate with the Cassandra server. It statically generates an interface for serialization in a variety of languages, including C++, Java, Python, PHP, Perl, C# to name a few. It is this mechanism that allows you to interact with Cassandra from any of these client languages. We can see the following on the screenConnected to: localhost/9160 "Test Cluster" on Welcome to cassandra CLI. Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] Some other clients that are used include Hector (using Java), Pycassa (using Python), phpcasssa (PHP), Ruby (Cassandra) etc. The libraries are available at github website. On the following prompt, the user can type in the necessary commands. If in doubt, the user can type „help;‟ to see what the various APIs of Cassandra are. Running Cassandra and basic CLI Commands 5. References and Notes The latest version available for download is Apache-Cassandra is 0.7.2. The installation is simple enough. The minimal installation for this study was done only on one local machine using a virtual OS Ubuntu. To run the Cassandra in the foreground, we need to run the following command:darkprince@ubuntu:~/Desktop/apache -cassandra-0.7.2$ bin/cassandra –f After a flurry of messages, we see that the message that pops up looks somewhat like. INFO 10:09:01,633 thrift clients.... Listening for This indicates that Cassandra is all ears for all the thrift clients and that the clients can start their operations. In the absence of any higher-level client, Cassandra has its own indigenous client that can be run using the Cassandra Command line prompt. To run this, open another terminal and run the following commanddarkprince@ubuntu:~/Desktop/apache -cassandra-0.7.2$ bin/cassandracli --host localhost 1. ‘Cassandra – A Decentralized Structured Storage System’, by Avinash Lakshman and Prashant Malik. Published April 2010. 2. Apache Cassandra Main Project Site. This site contains the code base and information of the data model. This even provides the client options a user can have to access the contents of Cassandra; http://cassandra.apache.org/ 3. Apache Site that contains the Cassandra wiki; http://wiki.apache.org/cassandra / 4. Datastax Cassandra Documentation, it contains a succinct summary of the tuneable consistencies; http://www.datastax.com/docs/0. 7/index 5. Software Solutions and Development. Talks about the installation of the Thrift clients and the data model in detail; http://www.sodeso.nl/