Benchmarking Datacenter and Big Data Systems Wanling Gao, Zhen Jia, Lei Wang, Yuqing Zhu, Chunjie Luo, Yingjie Shi, Yongqiang He, Shiming Gong, Xiaona Li, Shujie Zhang, Bizhu Qiu, Lixin Zhang, Jianfeng Zhan INSTITUTE OF COMPUTING TECHNOLOGY http://prof.ict.ac.cn/jfzhan 1 Acknowledgements 2/ This work is supported by the Chinese 973 project (Grant No.2011CB302502), the HiTech Research and Development (863) Program of China (Grant No.2011AA01A203, No.2013AA01A213), the NSFC project (Grant No.60933003, No.61202075) , the BNSFproject (Grant No.4133081), and Huawei funding. Big Data Benchmarking Workshop Executive summary An open-source project on datacenter and big data benchmarking ICTBench http://prof.ict.ac.cn/ICTBench Several case studies using ICTBench 3/ Big Data Benchmarking Workshop Question One Gap between Industry and Academia Longer and longer distance • Code • Data sets 4/ Big Data Benchmarking Workshop Question Two Different benchmark requirements Architecture communities • Simulation is very slow • Small data and code sets System communities • Large-scale deployment is valuable. Users need real-world applications • There are three kinds of lies: lies, damn lies, and benchmarks 5/ Big Data Benchmarking Workshop State-of-Practice Benchmark Suites SPEC CPU TPCC 6/ SPEC Web HPCC Gridmix Big Data Benchmarking Workshop PARSEC YCSB Why a New Benchmark Suite for Datacenter Computing No benchmark suite covers diversity of data center workloads State-of-art: CloudSuite 7/ Only includes six applications according to their popularity Big Data Benchmarking Workshop Why a New Benchmark Suite (Cont’) Memory Level Parallelism(MLP): Simultaneously outstanding cache misses CloudSuite our benchmark suite DCBench MLP 8/ Big Data Benchmarking Workshop Why a New Benchmark Suite (Cont’) Scale-out performance DCBench Cloudsuite Data analysis benchmark 6 Speed up 5 sort grep wordcount 4 svm kmeans fkmeans 3 all-pairs Bayes HMM 2 1 1 9/ 4 Working nodes Big Data Benchmarking Workshop 8 Outline 10/ Background and Motivation Our ICTBench Case studies Big Data Benchmarking Workshop ICTBench Project ICTBench: three benchmark suites Project homepage 11/ DCBench: architecture (application, OS, and VM execution) BigDataBench: system (large-scale big data applications) CloudRank: Cloud benchmarks (distributed managements) not covered in this talk http://prof.ict.ac.cn/ICTBench The source code is available Big Data Benchmarking Workshop DCBench 12/ DCBench: typical data center workloads Different from scientific computing: FLOPS Cover applications in important domains • Search engine, electronic commence etc. Each benchmark = a single application Purposes Architecture system (small-to-medium) researches Big Data Benchmarking Workshop BigDataBench 13/ Characterizing big data applications Not including data-intensive super computing Synthetic data sets varying from 10G~ PB Each benchmark = a single big application. Purposes large-scale system and architecture researches Big Data Benchmarking Workshop CloudRank 14/ Cloud computing Elastic resource management Consolidating different workloads Cloud benchmarks Each benchmark = a group of consolidated data center workloads. services/ data processing/ desktop Purposes Capacity planning, system evaluation and researches User can customize their benchmarks. Big Data Benchmarking Workshop Benchmarking Methodology To decide and rank main application domains according to a publicly available metric 15/ e.g. page view and daily visitors To single out the main applications from main applications domains Big Data Benchmarking Workshop Top Sites on the Web Search Engine Social Network Electronic Commerce Media Streaming Others 15% 5% 40% 15% 25% Top Sites on the Web More details in http://www.alexa.com/topsites/global;0 16/ Big Data Benchmarking Workshop Benchmarking Methodology To decide and rank main application domains according to a publicly available metric 17/ e.g. page view and daily visitors To single out the main applications from main applications domains Big Data Benchmarking Workshop Main algorithms in Search Engine Search Engine Social Network Electronic Commerce Media Streaming Others 15% 5% 40% 15% 25% Top Sites on The Web 18/ Big Data Benchmarking Workshop Algorithms used in Search: Pagerank Graph mining Segmentation Feature Reduction Grep Statistical counting Vector calculation sort Recommendation …… Main Algorithms in Search Engines (Nutch) Sort Classification DecisionTree 19/ BFS Word Grep Word Count Segmentation Big Data Benchmarking Workshop Merge Sort Vector calculate PageRank Segmentation Scoring & Sort Main Algorithms in Social Networks Search Engine Social Network Electronic Commerce Media Streaming Others 15% 5% 40% 15% 25% Top Sites on The Web 20/ Big Data Benchmarking Workshop Algorithms used in Social Network: Recommendation Clustering Classification Graph mining Grep Feature Reduction Statistical counting Vector calculation Sort …… Main Algorithms in Electronic Commerce Search Engine Social Network Electronic Commerce Media Streaming Others 15% 5% 40% 15% 25% Top Sites on The Web 21/ Big Data Benchmarking Workshop Algorithms used in electronic commerce: Recommendation Associate rule mining Warehouse operation Clustering Classification Statistical counting Vector calculation …… Overview of DCBench Programmin g model MapReduce MapReduce MapReduce MapReduce Vector MapReduce Category Workloads Basic operation Cluster Sort Wordcount Grep Naïve Bayes Support Machine K-means MapReduce MPI Fuzzy k-means MapReduce MPI Recommendatio Item based MapReduce n Collaborative Filtering Association rule mining Segmentation Frequent pattern MapReduce growth Hidden Markov model MapReduce Classification 22/ Big Data Benchmarking Workshop language source Java Java Java Java Java Java C++ Java C++ Java Hadoop Hadoop Hadoop Mahout Implemented by ourself Mahout IBM PML Mahout IBM PML Mahout Java Mahout Java Implemented by ourself Overview of DCBench (Cont’) Category Workloads Programming model language source Warehouse operation Feature reduction Database operations MapReduce Java Hive-bench Principal Component Analysis MPI C++ IBM PML Kernel Principal Component Analysis MPI C++ IBM PML Vector calculate Paper similarity analysis All-Pairs C&C++ Implemented by ourself Graph mining Breadth-first search MPI C++ Graph500 Service Pagerank Search engine MapReduce C/S Java Java Mahout nutch Auction C/S Java Rubis Media streaming C/S Java Cloudsuite Service 23/ Big Data Benchmarking Workshop Methodology of Generating Big Data To preserve the characteristics of real-world Characteristic data Expand Analysis Small-scale Data Semantic Big Data Locality Temporally e.g. word frequency 24/ Spatially Big Data Benchmarking Workshop Word reuse distance Word distribution in documents Workloads in BigDataBench 1.0 Beta Analysis Workloads Search Engine Service Workloads 25/ Simple but representative operations • Sort, Grep, Wordcount Highly recognized algorithms • Naïve Bayes, SVM Widely deployed services • Nutch Server Big Data Benchmarking Workshop Variety of Workloads are Included Workloads Off-line On-line Base Operations 26/ Machine Learning I/O bound CPU bound Hybrid Sort Wordcount Grep Big Data Benchmarking Workshop Naïve Bayes Nutch Server SVM Features of Workloads Workloads Sort Wordcount Grep Resource Characteristic Computing Complexity Instructions I/O bound O(n*lgn) Integer comparison domination CPU bound / O(m*n) [m: the length of dictionary] Floating-point computation domination / O(M*n) [M: the number of support vectors * dimension] Floating-point computation domination SVM 27/ O(n) Integer comparison domination Hybrid Naïve Bayes Nutch Server O(n) Integer comparison and calculation domination I/O & CPU bound Big Data Benchmarking Workshop Integer comparison domination Content 28/ Background and Motivation Our ICTBench Case studies Big Data Benchmarking Workshop Use Case 1: Microarchitecture Characterization Using DCBench Five nodes cluster 29/ one mater and four slaves(working nodes) Each node: Big Data Benchmarking Workshop Instructions Execution level kernel application DCBench: 30/ Software Testing Media Streaming Data Serving Web Search Web Serving SPECFP SPECINT SPECWeb HPCC-COMM HPCC-DGEMM HPCC-FFT HPCC-HPL HPCC-PTRANS HPCC-RandomAccess HPCC-STREAM Data analysis Naive Bayes SVM Grep WordCount K-means Fuzzy K-means PageRank Sort Hive-bench IBCF HMM avg 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% service Data analysis workloads have more app-level instructions Service workloads have higher percentages of kernel-level instructions Big Data Benchmarking Workshop Pipeline Stall DC workloads have severe front end stall (i.e. instruction fetch stall) Services: more RAT(Register Allocation Table) stall Data analysis: more RS(Reservation Station) and ROB(ReOrder Buffer) full stall Instruction fetch_stall Rat_stall load_stall 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 31/ Big Data Benchmarking Workshop RS_full stall store_stall ROB_full stall Architecture Block Diagram 32/ Big Data Benchmarking Workshop Front End Stall Reasons For DC, High Instruction cache miss and Instruction TLB miss make the front end inefficiency 100 80 60 40 20 0 ITLB Page Walks per K-instruction L1 I Cache Miss per K-Instruction 33/ 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Big Data Benchmarking Workshop MLC Behaviors DC workloads have more MLC misses than HPC L2 Cache misses per k-Instruction 34/ Data analysis workloads own better locality (less L2 Service cache misses) 100 80 60 40 Data analysis 20 0 Big Data Benchmarking Workshop HPCC 35/ Software Testing Media Streaming Data Serving Web Search Web Serving SPECFP SPECINT SPECWeb HPCC-COMM HPCC-DGEMM HPCC-FFT HPCC-HPL HPCC-PTRANS HPCC-RandomAccess HPCC-STREAM Naive Bayes SVM Grep WordCount K-means Fuzzy K-means PageRank Sort Hive-bench IBCF HMM avg The ratio of L3 Cache satisfed L2 Cache Miss LLC Behaviors LLC is good enough for DC workloads Most L2 cache misses can be satisfied by LLC 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Big Data Benchmarking Workshop 36/ 2.5 Software Testing Media Streaming Data Serving Web Search Web Serving SPECFP SPECINT SPECWeb HPCC-COMM HPCC-DGEMM HPCC-FFT HPCC-HPL HPCC-PTRANS HPCC-RandomAccess HPCC-STREAM Naive Bayes SVM Grep WordCount K-means Fuzzy K-means PageRank Sort Hive-bench IBCF HMM avg Page Walks per K-Instruction DTLB Behaviors DC workloads own more DTLB miss than HPC Most data analysis workloads have less DTLB miss Data analysis Service Big Data Benchmarking Workshop HPCC 2 1.5 1 0.5 0 8.00% 37/ 6.00% 5.00% 3.00% 2.00% Big Data Benchmarking Workshop Software Testing Media Streaming Data Serving Web Search Web Serving SPECFP SPECINT SPECWeb HPCC-COMM HPCC-DGEMM HPCC-FFT HPCC-HPL HPCC-PTRANS HPCC-RandomAccess HPCC-STREAM Naive Bayes SVM Grep WordCount K-means Fuzzy K-means PageRank Sort Hive-bench IBCF HMM avg Branch misprediction ratio Branch Prediction DC: Data analysis workloads have pretty good branch behaviors Service’s branch is hard to predict Service 7.00% Data analysis 4.00% HPCC 1.00% 0.00% DC Workloads Characteristics Data analysis applications share many inherent characteristics, which place them in a different class from desktop, HPC, traditional server and scale-out service workloads. More details can be found at our IISWC 2013 paper. Characterizing Data Analysis Workloads in Data Centers. Zhen Jia, et al. 2013 IEEE International Symposium on Workload Characterization (IISWC2013) 38/ Big Data Benchmarking Workshop Use Case 2: Architecture Research Using BigDataBench 1.0 Beta Data Scale Hadoop Configuration 39/ 10 GB – 2 TB 1 master 14 slave node Big Data Benchmarking Workshop Use Case 2: Architecture Research Some micro-architectural events are tending towards stability when the data volume increases to a certain extent Cache and TLB behaviors have different trends with increasing data volumes for different workloads 40/ L1I_miss/1000ins: increase for Sort, decrease for Grep Big Data Benchmarking Workshop Search Engine Service Experiments Same phenomena is observed Index size:2GB ~ 8GB Segment size:4.4GB ~ 17.6GB 41/ Big Data Benchmarking Workshop Micro-architectural events are tending towards stability when the index size increases to a certain extent Big data impose challenges to architecture researches since large-scale simulation is time-consuming Use Case 3: System Evaluation Using BigDataBench 1.0 Beta Data Scale Hadoop Configuration 42/ 10 GB – 2 TB 1 master 14 slave node Big Data Benchmarking Workshop System Evaluation a threshold for each workload 100MB ~ 1TB System is fully loaded when the data volume exceeds the threshold Sort is an exception An inflexion point(10GB ~ 1TB) Data processing rate decreases after this point Global data access requirements • I/O and network bottleneck 43/ System performance is dependent on applications and data volumes. Big Data Benchmarking Workshop Conclusion ICTBench An open-source project on datacenter and big data benchmarking 44/ DCBench BigDataBench CloudRank http://prof.ict.ac.cn/ICTBench Big Data Benchmarking Workshop Publications Characterizing OS behavior of Scale-out Data Center Workloads. Chen Zheng et al. Seventh Annual Workshop on the Interaction amongst Virtualization, Operating Systems and Computer Architecture (WIVOSCA 2013). In Conjunction with ISCA 2013.[ Characterization of Real Workloads of Web Search Engines. Huafeng Xi et al. 2011 IEEE International Symposium on Workload Characterization (IISWC-2011). The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems. Zhen Jia et al. Second workshop of big data benchmarking (WBDB 2012 India) & Lecture Note in Computer Science (LNCS) CloudRank-D: Benchmarking and Ranking Cloud Computing Systems for Data Processing Applications. Chunjie Luo et al. Front. Comput. Sci. (FCS) 2012, 6(4): 347–362 45/ BigDataBench: a Big Data Benchmark Suite from Web Search Engines. Wanling Gao, et al. The Third Workshop on Architectures and Systems for Big Data (ASBD 2013) in conjunction with ISCA 2013. Characterizing Data Analysis Workloads in Data Centers. Zhen Jia, et al. 2013 IEEE International Symposium on Workload Characterization (IISWC-2013) Big Data Benchmarking Workshop Thank you! Any questions? 46/ Big Data Benchmarking Workshop