YCSB++ Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky YCSB++ Tutorial Introduction: YCSB++ benchmark tool extend YCSB to support Accumulo database, and read after write measurement to test inconsistency between the different nodes of the database. We will cover the following subjects regarding YCSB++ benchmark tool: Installation and configuration of YCSB++. Example of usage: measure inconsistent state in Cassandra (Acl) using YCSB++. Example of usage: benchmark Accumulo using YCSB++. Installation and configuration of YCSB++: 1. Tool chain requirements for YCSB++ are: Java (1.6 and higher) HBase, Hadoop, Zookeeper and ant. In this tutorial we use Java version 1.6.0, Hadoop version 0.20.1, HBase version 0.90.2, and Zookeeper version 3.3.3 2. Download ant from here: http://ant.apache.org/ivy/download.cgi. 3. Install ant on your machine, you may find more information here: http://ant.apache.org/manual/index.html. 4. Download YCSB++ files from here: https://github.com/MiloPolte/YCSB/zipball/master 5. Extract YCSB++ files. e.g. to /specific/disk1/temp/YCSB++/. 6. Download Zookeeper from http://zookeeper.apache.org/. 7. Go to the conf folder in the zookeeper folder and create a file called zoo.cfg Insert the following lines inside: tickTime=2000 dataDir=/var/zookeeper clientPort=2181 # change the var instance to the place you would like zookeeper data file to be placed # e.g dataDir=/specific/disk1/temp/zookeeper/conf/zookeeper Save the file and close it. 8. Copy the zookeeper .jar file from zookeeper directory to /specific/disk1/temp/YCSB++/lib. 9. Enter "ant" command form the YCSB++ directory to build the package. YCSB++ Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky 10. Download Hadoop from http://hadoop.apache.org/ and HBase from http://hbase.apache.org/. 11. Build hbase database layer: Copy the hbase-0.90.2.jar file from hbase directory to /specific/disk1/temp/YCSB++/db/hbase/lib/. Copy all the jar files from hbase lib directory to: /specific/disk1/temp/YCSB++/db/hbase/lib/. Go to YCSB++ directory and enter "ant dbcompile-hbase" command. 12. Build hbase bulkloader: Copy the hbase-0.90.2.jar file from hbase directory to /specific/disk1/temp/YCSB++/bulkloader/hbase/lib/. Copy all the jar files from hbase lib directory to: /specific/disk1/temp/YCSB++/bulkloader/hbase/lib/. Copy the hadoop-0.20.1-core.jar file from Hadoop directory to: /specific/disk1/temp/YCSB++/bulkloader/hbase/lib/. Copy all the jar files from Hadoop lib directory to: /specific/disk1/temp/YCSB++/bulkloader/hbase/lib/. Go to YCSB++ directory and enter "ant bulkcompile-hbase". 13. In this example we would later show how to benchmark Accumulo database therefore we have installed Accumulo and its perquisites Hadoop and Zookeeper on the system. If you would like to download and install Accumulo you may find it here: http://accumulo.apache.org/downloads/. Once you obtained Accumulo on your machine follow the readme file located in the Accumulo directory to bring it up. 14. Copy all the jar files from Accumulo lib directory to: /specific/disk1/temp/YCSB++/db/accumulo/lib/. 15. Use "ant dbcompile-accumulo" command form the YCSB++ directory to build the Accumulo database layer. YCSB++ Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky Example of usage: measure inconsistent state in Cassandra (Acl) nodes YCSB++ uses processes named consumer and producer and syncs them via Zookeeper in order to perform a consistency test among the nodes. The producer process produces values and inserts them to one node in the database (in our example to Cassandra Acl). Once values are inserted it notifies zookeeper which signals the consumer to start querying another node for the information. The time that passed from the moment the value was inserted by the producer to one node until the time the value was reachable by the consumer from another node is the inconsistency windows. 1. First bring Zookeeper up. Go to zookeeper directory and run the server with the command: "bin/zkServer.sh start". (You may stop it anytime you want using bin/zkServer.sh stop) 2. Next bring the cluster you would like to examine up. Make sure all of the nodes are running correctly (notice that since we are trying to measure inconsistency and the producer and consumer working with different node – 2 nodes at least are needed in the cluster). In our example we run 3 nodes of Cassandra Acl. 3. Create a keyspace usertable (this specific keyspace is needed in YCSB) with replication factor:3 in cassandra so there will be a copy of each value on each node: "create keyspace usertable with replication_factor = 3 and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';" 4. Next we create column family data. "Create column family data;" 5. Once zookeeper and cassandra nodes are running place the consumer on hold: "java -cp /specific/scratches/parallel/yosibar1-2012-10-31/YCSB++/lib/zookeeper3.4.3.jar:/specific/scratches/parallel/yosibar1-2012-1031/YCSB++/build/ycsb.jar:/specific/scratches/parallel/yosibar1-2012-1031/YCSB++/db/cassandra/cassandra-binding-0.1.4.jar com.yahoo.ycsb.Client -db com.yahoo.ycsb.db.CassandraClient10 -p hosts=172.17.136.200 -p coordserver=132.67.104.224:2181 -s -P ~/scratch/YCSB++/workloads/consumerWorkload -p cassandra.username=ilia -p coord-server-zkRoot=/ycsb110" Notice that: -p coord-server=132.67.104.224:2181- is the ip of the Zookeeper server to listen on. -p coord-server-zkRoot=/ycsb110 - is the entry for the Zookeeper to store information. -p cassandra.username=ilia – is the username for Cassandra Acl. -p hosts=172.17.136.200 – is the ip of Cassandra node which the consumer queries. This should make the consumer prompt a going to wait message. YCSB++ Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky 6. Finally we can run the producer to start the benchmark: java -cp /specific/scratches/parallel/yosibar1-2012-10-31/zookeeper/zookeeper3.4.3.jar:/specific/scratches/parallel/yosibar1-2012-1031/YCSB++/build/ycsb.jar:/specific/scratches/parallel/yosibar1-2012-1031/YCSB/++db/cassandra/cassandra-binding-0.1.4.jar com.yahoo.ycsb.Client -db com.yahoo.ycsb.db.CassandraClient10 -p hosts=fermat-11 -p coord-server=132.67.104.224:2181 p operationcount=100000 -s -P ~/scratch/YCSB++/workloads/producerWorkload -p cassandra.acl=yosi,dan,ilia -p coord-server-zkRoot=/ycsb110 Notice that: -p coord-server=132.67.104.224:2181- is the ip of the Zookeeper server to listen on. -p coord-server-zkRoot=/ycsb110 - is the entry for the Zookeeper to store information. -p hosts=fermat-11 – the ip of Cassandra node which the producer inserts values to. -p operationcount=100000 – is the number of operations to be executed. -p cassandra.acl=yosi,dan,ilia – is the Acl to be stored on the values inserted. You may see the results of the time lags measured in the consumer process as they represent inconsistent state of the keys and values between the nodes. YCSB++ Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky Examples of usage: benchmark Accumulo using YCSB++ 1. First we'll bring Accumulo server up and prompt the client shell. Once zookeeper and Hadoop are running correctly on the machine you may start Accumulo server: "bin/start-all.sh" (enter the command from the Accumulo directory) Run Zookeeper: Run Hadoop: Run Accumulo: YCSB++ Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky 2. You may check that Accumulo runs correctly through the monitor page: http://localhost:50095 This should look like: 3. Use the following command to prompt the client shell: bin/accumulo shell -u root Then enter the password for your Accumulo instance in our example we set the instance name and password to accum/accum. 4. Next we'll create a new table called usertable: "createtable usertable" YCSB++ Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky 5. Now we are ready to benchmark Accumulo using YCSB++. In this example we'll use workloada form the YCSB++ core workloads which is a 50/50 workload of reads and inserts from the database. Before we start make sure the workload file contains the needed property values: Notice that: accumulo.zookeper – is the ip which zookeeper runs on. accumulo.instanceName – is the instance name you choose on the Accumulo init. accumulo.password – is the password you choose on the Accumulo init. accumulo.columnFamily – is the name of the table we created. First let's use the –load command to prepare the workload as values to be read are inserted to Accumulo database. Afterwards we'll use the –run command to perform the benchmark test of workloada. Enter the following command in the command prompt (or terminal) from YCSB++ folder location: `java -cp build/ycsb.jar:db/accumulo/lib/accumulo-core-1.4.0.jar:db/accumulo/lib/accumulo-core1.4.0-javadoc.jar:db/accumulo/lib/accumulo-core-1.4.0sources.jar:db/accumulo/lib/accumulo-server1.4.0.jar:db/accumulo/lib/accumulo-server-1.4.0-javadoc.jar:db/accumulo/lib/accumulo-server1.4.0sources.jar:db/accumulo/lib/accumulo-start-1.4.0.jar:db/accumulo/lib/accumulo-start-1.4.0javadoc.jar:db/accumulo/lib/accumulo-start-1.4.0sources.jar:db/accumulo/lib/zookeeper3.4.3.jar:db/accumulo/lib/hadoop-0.20.2-core.jar:db/accumulo/lib/cloudtrace1.4.0.jar:db/accumulo/libcloudtrace-1.4.0-javadoc.jar:db/accumulo/lib/cloudtrace-1.4.0sources.jar:db/accumulo/lib/commons-collections-3.2.jar:db/accumulo/lib/commonsconfiguration1.5.jar:db/accumulo/lib/commons-io-1.4.jar:db/accumulo/lib/commons-jci-core1.0.jar:db/accumulo/lib/commons-jci-fam-1.0.jar:dbaccumulo/lib/commons-lang2.4.jar:db/accumulo/lib/commons-logging-1.0.4.jar:db/accumulo/lib/commons-logging-api1.0.4.jar:db/accumulo/libexamples-simple-1.4.0.jar:db/accumulo/lib/examples-simple-1.4.0javadoc.jar:db/accumulo/lib/examples-simple-1.4.0-sources.jar:db/accumulo/libjline0.9.94.jar:db/accumulo/lib/libthrift-0.6.1.jar:db/accumulo/lib/log4j1.2.16.jar:db/accumulo/lib/wikisearch-ingest-1.4.0-javadoc.jar:db/accumulo/lib/wikisearch-query1.4.0-javadoc.jar:/specific/scratches/parallel/yosibar1-2012-10-31/zookeeper/lib/slf4j-log4j121.6.1.jar:/specific/scratches/parallel/yosibar1-2012-10-31/zookeeper/lib/slf4j-api-1.6.1.jar com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.db.AccumuloClientSecurity -p security.cell.entries=4 -p host132.67.105.169 -p threadcount=1 -s -P /specific/scratches/parallel/yosibar1-2012-1031/workloads/workloada -load >> workloada_res.txt` Notice that we attached all the lib jar files of Accumulo,ycsb and zookeeper to the command. hosts="132.67.105.169" refers to the ip Accumulo listen on. threadcount=1 refers to the number of threads initiated in the test. -P workloads/workloada refer to the workload being used. YCSB++ Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky Next we need to run the workload using YCSB++ run command: `java -cp build/ycsb.jar:db/accumulo/lib/accumulo-core-1.4.0.jar:db/accumulo/lib/accumulo-core1.4.0-javadoc.jar:db/accumulo/lib/accumulo-core-1.4.0sources.jar:db/accumulo/lib/accumulo-server1.4.0.jar:db/accumulo/lib/accumulo-server-1.4.0-javadoc.jar:db/accumulo/lib/accumulo-server1.4.0sources.jar:db/accumulo/lib/accumulo-start-1.4.0.jar:db/accumulo/lib/accumulo-start-1.4.0javadoc.jar:db/accumulo/lib/accumulo-start-1.4.0sources.jar:db/accumulo/lib/zookeeper3.4.3.jar:db/accumulo/lib/hadoop-0.20.2-core.jar:db/accumulo/lib/cloudtrace1.4.0.jar:db/accumulo/libcloudtrace-1.4.0-javadoc.jar:db/accumulo/lib/cloudtrace-1.4.0sources.jar:db/accumulo/lib/commons-collections-3.2.jar:db/accumulo/lib/commonsconfiguration1.5.jar:db/accumulo/lib/commons-io-1.4.jar:db/accumulo/lib/commons-jci-core1.0.jar:db/accumulo/lib/commons-jci-fam-1.0.jar:dbaccumulo/lib/commons-lang2.4.jar:db/accumulo/lib/commons-logging-1.0.4.jar:db/accumulo/lib/commons-logging-api1.0.4.jar:db/accumulo/libexamples-simple-1.4.0.jar:db/accumulo/lib/examples-simple-1.4.0javadoc.jar:db/accumulo/lib/examples-simple-1.4.0-sources.jar:db/accumulo/libjline0.9.94.jar:db/accumulo/lib/libthrift-0.6.1.jar:db/accumulo/lib/log4j1.2.16.jar:db/accumulo/lib/wikisearch-ingest-1.4.0-javadoc.jar:db/accumulo/lib/wikisearch-query1.4.0-javadoc.jar:/specific/scratches/parallel/yosibar1-2012-10-31/zookeeper/lib/slf4j-log4j121.6.1.jar:/specific/scratches/parallel/yosibar1-2012-10-31/zookeeper/lib/slf4j-api-1.6.1.jar com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.db.AccumuloClientSecurity -p security.cell.entries=4 -p host132.67.105.169 -p threadcount=1 -s -P /specific/scratches/parallel/yosibar1-2012-1031/workloads/workloada | grep "Throughout" >> workloada_res.txt` This should create workloada_res.txt contains the information from the benchmark test. In our example we used these tests to check Accumulo throughput as we increased the number of entries in the access control list. Therefore the file contains the information we gathered regarding the throughput and the ACLs: YCSB++ Tutorial Workshop in information security by Yosi Barad, Ainat Chervin and Ilia oshmiansky However if you exclude the "|grep "throuput" " from the command line, the benchmark results will appear in terms of throughput, latency and run time. You may change the operations count to 50000 by editing the workloada file or by adding: -p operationcount=50000 to the command line. Likewise you may change the number of threads for YCSB++ to initiate in the benchmark by adding: -p threadcount=100. Finally you may add any other property parameters to your workload by changing the YCSB++ source code using the getproperty mechanism (you may check the java files and Javadoc for more information) after you insert your changes to the code, build the source code again using the "ant" command from the YCSB++ directory and add the relevant parameter using –p key=value to the YCSB++ command.