Milestone 2 Workshop in Information Security – Distributed Databases Project By: Yosi Barad, Ainat Chervin and Ilia Oshmiansky Project web site: http://infosecdd.yolasite.com Access Control Security vs. Performance 1 Milestone 2: Our Plan: Examine the Cassandra source code and research different implementation options Write some initial code Test our implementation using YCSB++ with basic configuration Compare the test results to the results of our initial tests 2 Milestone 2: Our Plan: Try to implement several different solutions to be tested against each other Set up a more advanced configuration of Cassandra Test the performance of Cassandra with and without the added Cell-level Security 3 Milestone 2: Our Plan: Evaluate how to further improve the implementation Run more advanced set-ups of YCSB++ (custom workloads) Create a final version of the implementation based on the test results 4 Plan Step 1: Examine the Cassandra source code and research different implementation options Steps: • Fully understand the Cassandra data structure • Cassandra terminology- A "Cell" is actually called a "Column" it is the lowest/smallest increment of data and is represented as a tuple (triplet) that contains a name, a value and a timestamp. 5 Plan Step 1: Examine the Cassandra source code and research different implementation options -Next we have a "Column Family" which is a container for rows, and can be thought of as a table in a relational system. Each row in a column family can be referenced by its key. 6 Plan Step 1: Examine the Cassandra source code and research different implementation options -Finally, a "Keyspace" is a container for column families. It's roughly the same as a schema or database that is, a logical collection of tables. 7 Plan Step 1: Examine the Cassandra source code and research different implementation options • Understand the code flow of get and set operations. • The security logic is separate from the rest of the code and has two main interfaces: 1) Iauthenticator – responsible for authenticating the user that logs in. 2) Iauthority – responsible for authorizing the access of an authenticated user to a specific resource. • It is possible to write our own code which implements these Interfaces. 8 Plan Step 1: Examine the Cassandra source code and research different implementation options • The Initial code had the following classes: "AllowAllAuthentication.java“ - implements Iauthenticator . "AllowAllAuthorization.java“ - implements Iauthority. • With some research we found: "SimplelAuthentication.java" "SimpleAuthorization.java” That allow some simple user authentication and a basic ACL. 9 Plan Step 1: Examine the Cassandra source code and research different implementation options Some information about these classes: • use two additional configuration files: password.properties – a list of users and their passwords. access.properties – contains a set of permissions. 10 Plan Step 1: Examine the Cassandra source code and research different implementation options There are several problems with this implementation: • Very inefficient • Need a lot of maintenance. Despite these issues we still believed we should use this code as a starting point having full intentions to improve it. 11 Plan Step 2: Write some initial code The initial code we wrote: • Implemented the RowKey access control. At this point we could limit the access of Read/Write to a specific Row within a ColumnFamily. Keyspace1.Users.key1.<rw>=yosi 12 Plan Step 2: Write some initial code • Implemented the Column access control. Same as The RowKey access control, this time we went a level lower. Keyspace1.Users.key1.column1<rw>=yosi 13 Plan Step 2: Write some initial code • We set a new syntax to the Cassandra client: <column value>: [<user1,user2,...> <permission>] [...]. For example: Set Users['key1']['City'] = 'Haifa:scott,yosi rw:Andrew ro'; This command does the following: 1. creates a new column named "City" with value "Haifa" in column family “Users“. 2. Writes the permissions to the access.control file in the correct format. In the following example we will add the following two lines to the access.control file: Keyspace1.Users.key1.City<rw>=scott,yosi Keyspace1.Users.key1. City<ro>=Andrew 14 Plan Step 3: Test our implementation using YCSB++ with basic configuration • We ran one basic test on it to get an idea of where we stand in terms of performance. • The results we got were terrifying (as expected): An average of under 40 op/sec and it got lower the more tests we ran every new entry meant another line in the file that we need to scan. • Our next step was to improve the implementation so that it won't rely on configuration files. 15 Plan Step 4: Try to implement several different solutions to be tested against each other • Our two implementations were: 1) Writing the permissions to a file. 2) Storing the permissions within the values of the columns in Cassandra. 16 Plan Step 5: Storing the permissions within the values of the columns in Cassandra. This stage included the following: • Parse the value returned from Cassandra. • Add a new "get" function to grab and separate the ACL from the actual value of a Column. For example: • Yosi wants to run the following command - This command will now work as following: 1) Get the value and check the ACL in the value 2) Perform the validation 3) Perform the actual insert. Milestone 2 Try to implement several different solutions to be tested Story to be told version 1 against each -other עינת המגניבה Access and modify VerySecretValue 18 איליה הסקרן Won’t be able to access VerySecretValue יוסי הקפדן VerySecretValue:Yosi,Ainat rw Milestone 2 Try to implement several different solutions to be tested Story to be told version 2 against each–other עינת המגניבה Access and modify VerySecretValue 19 איליה הסקרן Access but not modify VerySecretValue יוסי הקפדן VerySecretValue:Yosi,Ainat rw:Ilia ro Milestone 2 Demonstration 20 Plan Step 5: Storing the permissions within the values of the columns in Cassandra. In order to further enhance the performance: • We removed the "SimpleAuthority" related functions. • We changed the implementation of the "authorize" function in the SimpleAuthority class to read the ACL from the value within the Cassandra DB rather than from the access.Properties file. Plan Step 5: Run more advanced set-ups of YCSB++ (custom workloads) • To test our implementation we had to: - Add ACL entries to the values YCSB sent to Cassandra - Get YCSB to login to Cassandra with a user and password • This included the following steps: -Compile YCSB (this was a challenge since this code has no documentation anywhere) -Edit the YCSB code to connect to Cassandra with our user. -Change the way YCSB generated values to fit our custom format (<val>:<users> <rw>:<users> <ro>). -Recompile it with different number of ACL entries for our "increasing ACL" test. • Now we got much better results: 22 Plan Step 5: Workload A: Update heavy workload - mix of 50/50 reads and writes. 23 Plan Step 5: Workload B: Read mostly workload – This workload has a 95/5 reads/write mix 24 Plan Step 5: Workload C: Read only - This workload is 100% read. 25 Plan Step 5: Workload D: Read latest workload - In this workload, new records are inserted, and the most recently inserted records are the most popular. 26 Plan Step 5: Workload F: Read-modify-write - In this workload, the client will read a record, modify it, and write back the changes. 27 Plan Step 5: Compare the test results to the results of our initial tests • The current results aren't satisfying as they do not sit well with what was expected (we expected the throughput to decrease with each additional entry). 28 Plan Step 6: Set up a more advanced configuration of Cassandra (consisting of several nodes) • We realized that we rather wait for a local HD allocation for running several Cassandra nodes because: - A shared hard drive would be the bottleneck and won't increase the performance - It is very hard to benchmark this remote storage. - It is time consuming to set-up the clusters and if we'll get the local HD allocation we might spend more time on building this configuration again. • We sent a request to the system admin for local HD allocations so we could install Cassandra and test performance running on a local HD. 29 Plan Step 7: Test the performance of the advanced configuration of Cassandra with and without the added Cell-level Security Once we will finish the more advanced set-up, we will be able to run both Cassandra implementations (with and without the added security) and thus get the desired results. 30 Plan Step 10: Create a final version of the implementation based on the test results • We would like to further analyze the code and find ways to improve it (see plans for ahead). • Furthermore, we cannot rely on the tests we ran so far as they do not accurately assess the performance. 31 Milestone 2 Progress Compared to Plan: Plan Step Examine the Cassandra source code and research different implementation options Write some initial code Test our implementation using YCSB++ with basic configuration Compare the test results to the results of our initial tests Try to implement several different solutions to be tested against each other Set up a more advanced configuration of Cassandra (consisting of several nodes) Test the performance of the advanced configuration of Cassandra with and without the added Cell-level Security Evaluate how to further improve the implementation Run more advanced set-ups of YCSB++ (custom workloads) Create a final version of the implementation based 32 test results on the Status Milestone 2 Overall We completed the goal of implementing cell-level ACL security, but there is still some work to be done in the performance testing and perhaps the code can be further improved. 33 Milestone 2 Plans for ahead Expand the Cassandra setup – • Upon receiving the local HD allocation we requested we can continue with the more advanced testing and create a setup consisting of several nodes/clusters. Expand the tests, search for limiting factors• We plan on expending our tests in several directions. 34 Milestone 2 Plans for ahead Evaluate how to further improve the implementation – At this point we do not see any major issues in our implementation. Also, we will have to run better tests to understand the actual performance penalty of our implementation and perhaps need some guidance to see where we can improve. Start analyzing security holes due to inconsistencies – We need to figure out how to measure the inconsistencies using YCSB++ and assess the security threats that might arise from these inconsistencies. 35 Milestone 2 Questions? 36