Document 13996775

advertisement
 Survey Paper B534 Distributed System Apache Cassandra Sumit Goyal {goyals} Definition of NoSQL The concept of NoSQL generally indicates a major difference from the standard Relational Database. The problem of the relational databases includes a rigid schema structure, an inability to work on applications that uses a lot of data and indexing of large number of files/documents. NoSQL systems have been developed in order to cater to such kinds of requirements. Definition of Distributed Hash Tables In a distributed hash table, the data is stored and a keyspace is evaluated using a hash function. The hashing is done using a SHA-­‐1 hash. The data is traversed and then stored in a node that is responsible for that keyspace. A keyspace partitioning scheme splits ownership of this keyspace among the participating nodes. An overlay network then connects the nodes, allowing them to find the owner of any given key in the keyspace. A very popular version of a NoSQL database using the concept of keyspace is Apache Cassandra -­‐ the topic of the survey. What is Cassandra? Apache Cassandra is a second-­‐generation distributed database originally open-­‐sourced by Facebook. It is a NoSQL system that was initially developed by Facebook and it powers their Inbox Search feature. The basic fundamental of Cassandra is that it is a columnar database. The data is stored in the form of columns and it is uniquely marked using 'keyspace'. It can be classified as a 'Cloud Db'. Apache Cassandra has a write-­‐optimized shared-­‐nothing architecture and this characteristic has resulted in excellent performance and scalability. Properties of Cassandra Model Description of the Data Model The Cassandra data model is designed for distributed data on a very large scale. It trades ACID-­‐compliant data practices for important advantages in performance, availability, and operational manageability. A table in Cassandra is a distributed multidimensional map indexed by a keyspace. The value is an object maybe an element or it may be highly structured. The row key in a table is a string with no size restrictions, although typically 16 to 36 bytes long. Every operation under a single row key is atomic per replica no matter how many columns are being read or written into. Columns are grouped together into sets called column families very much similar to what happens in the BigTable system. Cassandra exposes two kinds of column families: Simple and Super. Super column families can be visualized as a column family within a column family. The top dimension in Cassandra is called Keyspace. For instance usrs['goyals'] will indicate a column family of users. In it, there will be an identifier ‘goyals’. In usrs, we can further add usrs[goyals][fname], usrs[goyals][lname], and usrs[goyals][gender]. Keyspaces -­‐ Cassandra is based on a key-­‐
value model A database consists of column families. A column family is a set of key-­‐value pairs. Drawing an analogy with relational databases, you can think about column family as table and a key-­‐value pair as a record in a table. At the first level the value of a record is in turn a sequence of key-­‐value pairs. These nested key-­‐value pairs are called columns where key is the name of the column. In other words you can say that a record in a column family has a key and consists of columns. This level of nesting is mandatory – a record must contain at least one column (so in the first point above value of a record was an intermediate notion as value is actually a sequence of columns). At the second level, which is arbitrary, the value of a nested key-­‐value pair can be a sequence of key-­‐value pairs as well. When the second level of nesting is presented, outer key-­‐value pairs are called super columns with key being the name of the super column and inner key-­‐value pairs are called columns. The description is cleared up below. Column and Column Family As mentioned before, the data model is columnar in nature. The column is the base of Cassandra data model. The column is the lowest and smallest increment of data. It’s a tupple (triplet) that contains a name, a value and a timestamp. Here’s a column represented in JSON notation: { // this is a column name: "firstname", value: "Sumit", timestamp: 123456789 } A column family resembles a table in an RDBMS. Column families contain rows and columns. Each row is uniquely identified by a row key. Each row has multiple columns, each of which has a name, value, and a timestamp. Unlike a table in an RDBMS, different rows in the same column family do not have to share the same set of columns, and a column may be added to one or multiple rows at any time. It can be useful to distinguish between “static” column families that contain values such as user data or other object data, and “dynamic” column families that contain data such as precalculated query results. Super Columns-­‐ How is it Different from a Column? Super Columns are a type of super structure of columns. Super columns are way to group multiple columns. Every super column must have a different name, just like with regular columns. Different super columns may hold sub-­‐columns with the same name. Super columns are a way to add an extra map layer to the data model. Super columns are frequently used to hold a single record where each field in the record is represented by a sub-­‐column. For example, the name of a super column might be the ID of a transaction and each sub-­‐column could hold some attribute of the transaction. For example, if a transactions row like the one describe had two entries, it might look like: •
{ ‘trans-­‐009123812’: { ‘date’: ‘01/01/2011’, ‘amount’: 25000 }, ‘trans-­‐009123813’: { ‘date’: ‘11/02/2011’, ‘amount’: 6500 } } Distribution, Replication and Fault Tolerance •
•
•
Data is distributed across the nodes in the cluster using Consistent Hashing based and on an Order Preserving Hash function. We use an Order Preserving Hash so that we could perform range scans over the data for analysis at some later point. Cluster membership is maintained via Gossip style membership algorithm. Failures of nodes within the cluster are monitored using an Accrual Style Failure Detector. High availability is achieved using replication and we actively replicate data across data centres. Since eventual consistency is the mantra of the system reads execute on the closest replica and data is repaired in the background for increased read throughput. System exhibits incremental scalability properties which can be achieved as easily as dropping nodes and having them automatically bootstrapped with data. Consistency -­‐ Tuneable The ability to tune consistency levels per query is a powerful feature of Cassandra because it gives the developer complete control of managing the trade-­‐off of availability versus consistency. This means that Cassandra queries can be configured to exhibit strongly consistent behaviour (but there is no row-­‐level locking) if the developer is willing to sacrifice latency. The consistency levels offered by Cassandra, which have different meanings for reads and writes, are detailed in the tables below. A “quorum” of replicas is essentially a majority of replicas, or RF / 2 + 1 with any resulting fractions rounded down. WRITE CONSISTENCY LEVELS -
-
-
ANY -­‐ Ensure that the write has been written to at least one node (can include hinted handoff recipients). Note that if all replica nodes are down at write time, an ANY write may not be readable until nodes have recovered. ONE -­‐ Ensure that the write has been written to at least one replica’s commit log and memory table before responding to the client. QUORUM-­‐ Ensure that the write has been written to a quorum of replicas before responding to the client. -
-
-
LOCAL_QUORUM-­‐ Ensure that the write has been written to a quorum of replicas in the datacenter local to the coordinator before responding to the client. This setting avoids the latency of inter-­‐data center communication. EACH_QUORUM-­‐ Ensure that the write has been written to a quorum of replicas in each datacenter in the cluster before responding to the client. ALL-­‐ All replicas must have received the write; otherwise the operation will fail. READ CONSISTENCY LEVELS -
-
-
-
-
ONE-­‐ Returns the response from the closest replica, as determined by the snitch configured for the cluster. When read_repair is enabled, Cassandra may perform a consistency check in a background thread. QUORUM-­‐ Returns the record with the most recent timestamp once a quorum of replicas has reported. LOCAL_QUORUM-­‐ Returns the record with the most recent timestamp once a quorum of replicas in the datacenter local to the coordinator has reported. This setting avoids the latency of inter-­‐data center communication. EACH_QUORUM-­‐ Returns the record with the most recent timestamp once a quorum of replicas in each datacenter in the cluster has reported. ALL-­‐ Return the record with the most recent timestamp once all replicas have replied, failing the operation if any replicas are unresponsive. In choosing the consistency level for particular operations, developers should consider the relative importance of consistency, latency, and availability. However, the read operations are slower than rights in Cassandra. Cassandra can therefore be used for more write intensive operations. Clusters spanning multiple data centres may present further questions regarding local latency and durability. Client Libraries for Cassandra Thrift is a software framework for scalable cross-­‐language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml. This was too developed by Facebook and now it is used as an interface to Cassandra. It can be installed very easily using easy-­‐install thrift command. Using one of the following higher-­‐level clients is strongly preferred to raw Thrift when developing applications (the Thrift API is primarily intended for client developers). What follows are clients that support Cassandra 0.7. If no high-­‐level client exists for your environment, you may be able to update an older client; failing that, you'll have to use the raw Thrift API. Some of the higher level clients are Pycassa (Python), Hector, Pelops, Kundera (all Java Clients), phpcassa (PHP) to name a few. References 1) Apache Cassandra Main Project Site. This site contains the code base and information of the data model. This even provides the client options a user can have to access Cassandra; http://cassandra.apache.org/ 2) Apache Main wiki Site; http://wiki.apache.org/cassandra/ 3) Datastax Cassandra Doc.; http://www.datastax.com/docs/0.7/i
ndex 4) Software Solutions and Development. Talks about the installation of the Thrift clients and the data model in detail. 
Download