Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter Duncan Benchmarking Cloud Serving Systems with YCSB • Benchmarking vs Testing • Any difference? • My opinion – Benchmarking: Performance – Testing: usability test, security test, performance etc… Motivation • A lot of new systems in Cloud for data storage and management – MongoDB, MySQL, Asterix, etc.. • Tradeoff – E.g. Append update to a sequential disk-log • Good for write, bad for read – Synchronous replication • copies up to date, but high write latency • How to choose? – Use benchmark to model your scenario! Evaluate Performance =? • Latency – Users don’t want to wait! • Throughput – Want to serve more requests! • Inherent tradeoff between latency and throughput – More requests => more resource contention=> higher latency Which system is better? • “Typically application designers must decide on an acceptable latency, and provision enough servers to achieve the desired throughput” • achieve the desired latency and throughput with fewer servers. – Desired latency:0.1 sec, 100 request/sec – MongoDB, 10 server – Asterix DB, 15 server What else to evaluate? • Cloud platform • Scalability – Good scalability=>performance proportional to # of servers • Elasticity – Good elasticity=>performance improvement with small disruption A Short Summary • Evaluate performance = evaluate latency, throughput, scalability, elasticity • A better system= less machine to achieve the performance goal YCSB • Data generator • Workload generator • YCSB client – Interface to communicate with DB YCSB Data Generator • A table with F fields and N records • Each field => a random string • E.g. 1,000 byte records, F=10, 100 bytes per field Workload Generator • Basic operations – Insert, update, read, scan – No join, aggregate etc. • Able to control the distributions of: • Which operation to perform – E.g. 0.95 read, 0.05 update, 0 scan => read-heavy workload • Which record to read or write – Uniform – Zipfian: some records are extremely popular – Latest: recent records are more popular YCSB Client • A script – Use the script to run the benchmark • Workload parameter files – You can change the parameter • Java program • DB interface layer – You can implement the interface for your DB system Experiments • Experiment Setup: – 6 servers – YCSB client on another server – Cassandra, HBase, MySQL, PNUTS • Update heavy, read heavy, read only, read latest, short range scan workload. Future Work • Availability – Impact of failure on the system performance • Replication – Impact to performance when increase replication 4 criteria • Author’s 4 criteria for a good benchmark: – Relevance to application – Portability • Not just for 1 system! – Scalability • Not just for small system, small data! – simplicity Reference • • • • Benchmarking Cloud Serving Systems with YCSB, Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears, SOCC 10 BG: A Benchmark to Evaluate Interactive Social Networking Actions, Sumita Barahmand, Shahram Ghandeharizadeh, CIDR 13 http://en.wikipedia.org/wiki/Software_testing http://en.wikipedia.org/wiki/Benchmark_(computing) • Thank You! • Questions? Why a new benchmark? • Most cloud systems do not have a SQL interface => hard to implement complex queries • Benchmark only for specific applications – TPC-W for E-commerce – TPC-C for apps that mange, sell, distribute product/service