Workshop in Information Security – Distributed Databases Project By: Ilia Oshmiansky, Ainat Chervin and Yosi Barad Project Plan Background: The past few years introduced us to large scale databases that are distributed throughout multiple machines. Our project discusses the security issues that arise with this new database mechanism, specifically how additional security comes at a price in performance. The two databases we will be implementing our tests on are: Cassandra: Apache Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. Cassandra is distributed, which means that it is capable of running on multiple machines while appearing to users as a unified whole. Moreover since Cassandra is decentralized, every node is identical. In Cassandra no node performs certain organizing operations distinct from any other node. Instead, Cassandra features a peer-to-peer protocol and uses gossip algorithm to maintain and keep in sync a list of nodes that are alive or dead. Cassandra is being used in production by some of the biggest properties on the Web, including Facebook, Twitter, Cisco, Rackspace, Digg, Cloudkick, Reddit, and more. Cassandra has become popular because of its outstanding technical features; it is durable, seamlessly scalable, and consistent. It performs fast writes, can store hundreds of terabytes of data, and is decentralized Accumulo: Apache Accumulo is a sorted, distributed key/value store based on Google's BigTable design., Accumulo has cell-level access labels and a server-side programming mechanisms. Goals: The main goals of this project are to add support for cell-level ACLs (Access Control Lists) to Cassandra and compare the resulting system to Accumulo on their performance. We will try to evaluate and measure the security holes, then attempt to improve the security of both systems by increasing the consistency, while measuring the performance penalty as well. Success Criteria: Our success criteria are divided according to our two different main goals in this project. For the first goal, we will consider our project successful if after adding the cell-level ACLs to Cassandra, we will get performance measurements that are as good as the ones measured in Accumulo. For the second goal, the success criteria will be managing to improve the security with a reasonable decrease in performance. Incremental Phases: 1. System set-up and initial performance measurements: First we will install the system on a single node. The system consists of the database (Apache Cassandra, Accumulo) and the testing framework (YCSB++). Once the installation is complete we will run a few tests to verify the installation was successful. We will extend the system and install it on five more nodes in order to prepare it for the performance measurements. Next we will measure the performance of Cassandra prior to the additional cell-level security and produce the first performance report. The scenarios for testing the performance are detailed below. Exact science Faculty - Tel Aviv University 1 Workshop in Information Security – Distributed Databases Project By: Ilia Oshmiansky, Ainat Chervin and Yosi Barad 2. Implementation of cell-level ACLs: We will implement ACLs support in Cassandra by storing them as additional attributes. 3. Performance comparison: At this stage we will compare the performance of Cassandra with the added implementation of ACLs to the performance we measured in phase one, without the added security. Moreover, we will check Cassandra's performance in comparison to Accumulo's performance on the same tests. (Further detail on the tests below). 4. Analysis of the security holes: Here we will measure the security holes that may exist due to the inconsistency of the ACLs configuration. This may occur, for example, when the user changes the permissions to deny access to a certain file, but this restriction is not propagated to all the nodes and other users can access it during the inconsistency window. YCSB++ allows us to measure this inconsistency as a read-afterwrite latency. 5. Improving the security through stronger consistency: We will attempt to improve the security of ACLs in Cassandra by providing a solution with higher consistency guarantees and measure the performance penalty (e.g. as a decrease in throughput). Testing scenarios: To test the performance of the databases we will be using the YCSB++ framework. In this framework, in order to perform a test you must define a workload. A workload is a combination of a Workload java class and a Parameter file. The Parameter file defines the data that will be loaded into the database during the loading phase, and the operations that will be executed against the data set during the transaction phase. Additionally, we must choose the appropriate runtime parameters (number of client threads, target throughput, etc.) On the YCSB++ article we saw several experiments which are strongly related to the topic of our project. In these tests they did the following: They ran the YCSB++ insert throughput benchmark on Accumulo with a varying number of ACL entries (0 – 11 entries) on two different client configurations – single client with 100 threads and 6 clients with 16 threads each. The following graph demonstrates their results: Exact science Faculty - Tel Aviv University 2 Workshop in Information Security – Distributed Databases Project By: Ilia Oshmiansky, Ainat Chervin and Yosi Barad On these results they commented: "Figure 14 shows the insert throughput, measured as the number of rows inserted per second, for different numbers of entries in each ACL (while the total size of the ACLs is constant). A value of zero entries means that no security was used. When the workload uses a single client with 100 threads, we observe that the throughput decreases with increasing number of entries in each ACL: in comparison to not using any access control, throughput drops by 24% with 4 entries in the ACL and by as much as 47% with an 11-entry ACL. This happens because the single YCSB++ client is running at almost 100% CPU utilization (as shown in Figure 15) and increasing the number of entries in each ACL leads to increased computation overhead. However, using six YCSB++ clients with 16 threads each reduces the insert throughput only by about 10%, even when there are 11 entries in the ACL." What this basically means is that the limiting factor in these tests is not always the database server but also the client. The drop in performance was most significant on the single client setup which showed a decrease of close to 50%. The second benchmark was of the SCAN operation with the exact same setup and these were the results: In this experiment we see no significant difference between the two setups (the 1 client compared to 6) but we do see an instant decrease of about 45% once the fine-grained ACL is invoked. Our performance tests will mimic the tests described in the article that is, they will consist of the following: Workloads: 1) An insert workload that writes 48 million single-cell rows in an empty table. 2) A scan workload that scans 320 million rows Configurations: 1) A single client with 100 threads. 2) Six clients with 16 threads each. Cycles: We will run each test three times in order to get the most accurate results. We will monitor the performance of these tests using YCSB's custom monitoring tool, called Otus which allows us to process and analyze the collected data using a tailored web-based visualization system. Exact science Faculty - Tel Aviv University 3 Workshop in Information Security – Distributed Databases Project By: Ilia Oshmiansky, Ainat Chervin and Yosi Barad Milestones: Milestone 1: Completing installations and running initial performance tests. This step includes the following: - Installing and running Cassandra. - Installing and running YCSB++. - Running some initial manual testing of Cassandra (creating accounts, basic inserts, scans etc.) - Connecting YCSB++ to Cassandra and running benchmark tests with the following configuration: YCSB++ with a single client with multiple threads, Cassandra running on 1 cluster (single PC or 2 PCs) - Install Accumulo. Milestone 1.5: Start the implementation of the Cell-level ACL for Cassandra. This will include the following: - Examination of the Cassandra source code and research different implementation options. - Writing some initial code. - Testing our implementation using YCSB++ with basic configuration (YCSB++ with a single client with multiple threads, Cassandra running on 1 cluster) and reach conclusions regarding the feasibility of a better implementation. - Comparing the test results to the results of our initial tests and to tests done by others. - Perhaps implement several different solutions to be tested against each other in the next step. Milestone 2: Finishing the implementation of the Cell-level ACL and evaluate the performance on a more advanced configuration of Cassandra. In this step we will do the following: - Setting up a more advanced configuration of Cassandra, perhaps with several clusters (up to 3) with several PCs in each cluster. - Testing the performance of the advanced configuration of Cassandra with and without the added Cell-level Security. - Evaluating the feasibility of further improving the implementation. Testing other implementations. - Running more advanced set-ups of YCSB++ such as: a. Connecting more client nodes to it and running a test with more than one client with multiple threads in order to eliminate the CPU factor b. Configuring better ACLs, playing with different sized ACL header compared to cell content and setting up unique ACLs. c. Connecting to Accumulo and try to reproduce the results shown in the article (see Testing scenarios) d. Figure out the pitfalls in our testing procedure (find limiting factors for example) and configure custom tests that would more accurately evaluate the system performance - Creating a final version of the implementation based on the test results and consultation from our project advisor (Alexandra Shulman from IBM). Milestone 2.5: Begin analyzing the security holes and implementing a security improvement. This includes the following steps: - Brainstorming and coming up with scenarios that may expose security risks related to the control lists. - Performing the tests we came up with and testing for the security holes. - Examine different approaches to overcome these risks while considering the performance penalty they may inflict on the database. - Start implementing the approach we settled on. Exact science Faculty - Tel Aviv University 4 Workshop in Information Security – Distributed Databases Project By: Ilia Oshmiansky, Ainat Chervin and Yosi Barad Milestone 3: Finish implementing the security improvement and measure performance penalty. This includes the following steps: - Finish implementation on both Cassandra and Accumulo. - Run the tests we found to expose the security holes with the added implementation. - Measure the performance penalty by comparing the results to those we measured without the added improvement. Literature, technology and related projects: Details on a related project that was done can be found here: YCSB++: Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores. http://www.pdl.cmu.edu/PDL-FTP/Storage/socc2011.pdf Other 1. 2. 3. 4. literature on technologies we will need to research can be found here: Cassandra: http://cassandra.apache.org/ Accumulo: http://incubator.apache.org/accumulo/ YCSB++: http://www.pdl.cmu.edu/ycsb++/index.shtml Eventual Consistency: http://www.allthingsdistributed.com/2008/12/eventually_consistent.html 6. Dynamo: Amazon’s Highly Available Key-value Store http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html 7. Bigtable: A Distributed Storage System for Structured Data labs.google.com/papers/bigtable-osdi06.pdf Requisite tools, resources and knowledge: Installing the necessary software infrastructure for working with Cassandra. Researching and reading material on the following subjects: 1. Security holes and threats. 2. Access control lists. 3. Consistency models. 4. Distributed databases structure. Gain deeper understanding of Cassandra implementation, specifically its security attributes. Getting familiar with the YCSB++ testing framework: https://github.com/brianfrankcooper/YCSB/wiki/ Getting familiar with the Otus monitoring tool: https://github.com/otus/otus Scope and implementation choices: The project scope will depend on the complexity of source code, initial installation phase and the testing phase. Also, to complete the initial installation phase (installing everything on a single machine) and the later installation phase (expanding to several clusters) we will have to rely on the assistance of the system manager (in Schrieber). Since Cassandra and the YCSB++ benchmark tool are written in Java, our choice for the implementation language is naturally Java as well. Exact science Faculty - Tel Aviv University 5 Workshop in Information Security – Distributed Databases Project By: Ilia Oshmiansky, Ainat Chervin and Yosi Barad Risk factors and contingency plans: 1) Problems with the initial installation: - The installation and operation of these systems (Cassandra,accumulo and YCSB++) is highly complicated and we do not really know what to expect in terms of possible technical difficulties. - We depend on the assistance of the system team which might limit our work. If for example we need something installed to continue working and they won't be able to help us for a week – then we won't be able to do much during this week. Contingency plans: If we get stuck on the installation phase we can seek alternatives. For example: - If we cannot install it in the lab we can try installing it at home on a VM. - Try to get assistance from IBM - Perhaps we can install it later and continue in other directions (focus on developing the testing environment etc.) 2) Problems with extending the installations: We may encounter difficulties expanding the installation to other stations and establish communication between them. Contingency plan: In the case where we cannot expand to the desired number of stations, we will consider reducing the project scope and we will implement only the first test configuration (with the one client running 100 threads) 3) The performance after adding the cell-level ACL implementation won't reach our expected goals of milestone 2 Contingency Plan: Decrease the scope of the project. Focus on improving the performance rather than moving on to Milestone 2.5. 4) We are unable to find any security holes due to a lacking set up Contingency Plan: If we are unable to research the security holes due to a lacking setup (for example not able to set up several clusters) we will attempt to extend the system or alternatively try to test it on IBM labs. 5) We are unable to find any security holes which are possible for us to fix Contingency Plan: If we cannot find security holes which are possible for us to fix we might move back to milestone 2 and focus on expanding our testing phase to include more in-depth analysis of the performance and focus on improving that part. Exact science Faculty - Tel Aviv University 6