Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm Agenda Why big data memory characterization? • Workloads, Methodology and Metrics • Measurements and results • Conclusion and outlook 2 INTEL CONFIDENTIAL Why big data memory characterization? • Studies show exponential data growth to come. • Big Data: information from unstructured data • Primary technologies are Hadoop and NoSQL 3 INTEL CONFIDENTIAL Why big data memory characterization? Power Memory consumes upto 40% of total server power Performance Memory latency, capacity, bandwidth are important Large data volumes can put pressure on the memory subsystem Optimizations tradeoff CPU cycles to reduce load on memory, ex: compression Important to understand memory usages of big data 4 INTEL CONFIDENTIAL Why big data memory characterization? DRAM scaling is hitting limits Emerging memories have higher latency Focus on latency hiding optimizations How do latency-hiding optimizations apply to big data workloads? 5 INTEL CONFIDENTIAL Executive Summary • Provide insight into memory access characteristics of big data applications • Examine implications on prefetchability, compressibility, cacheability • Understand impact on memory architectures for big data usage models 6 INTEL CONFIDENTIAL Agenda • Why big data memory characterization? Workloads, Methodology and Metrics • Measurements and results • Conclusion and outlook 7 INTEL CONFIDENTIAL Big Data workloads • Sort • WordCount • Hive Join • Hive Aggregation • NoSQL indexing We analyze these workloads using hardware DIMM traces, performance counter monitoring, and performance measurements 8 INTEL CONFIDENTIAL General Characterization Memory footprint from DIMM trace • Memory in GB touched atleast once by the application • Amount of memory to keep the workload „in memory“ EMON: • CPI • Cache behavior: L1, L2, LLC MPI • Instruction and Data TLB MPI Understand how the workloads use memory 9 INTEL CONFIDENTIAL Cache Line Working Set Characterization 1. For each cache line, compute number of times it is referenced 2. Sort cache lines by their number of references 3. Select a footprint size, say X MB 4. What fraction of total references is contained in X MB of the hottest cache lines? Identifies the hot working set of application 10 INTEL CONFIDENTIAL Cache Simulation Run workload through a LRU cache simulator and vary the cache size Considers temporal nature, not only spatial • Streaming through regions larger than cache size • Eviction and replacement policies impact cacheability • Focus on smaller sub-regions Hit rates indicate potential for cacheability in tiered memory architecture 11 INTEL CONFIDENTIAL Entropy • Compressibility and Predictability important • Signal with high information content – harder to compress and difficult to predict • Entropy helps understand this behavior. For a set of cache lines K: Lower entropy 12 INTEL CONFIDENTIAL more compressibility, predictability Entropy - example (A) (B) (C) Footprint: 640B Footprint: 640B Footprint: 640B References: 100 References: 100 References: 100 References/line: 10 References/line: 10 References/line: 10 64 byte cache: 10% 192 byte cache: 30% Entropy: 1 Lower entropy 13 INTEL CONFIDENTIAL < 64 byte cache: 19% 192 byte cache: 57% Entropy: 0.785 < 64 byte cache: 91% 192 byte cache: 93% Entropy: 0.217 more compressibility, predictability Correlation and Trend Analysis Examine trace for trends Eg: increasing trend in upper physical address ranges Aggressively prefetch to an upper cache • With s = 64, l=1000, test function f mimics ascending stride through memory of 1000 cache lines • Negative correlation with f indicates decreasing trend High correlation 14 INTEL CONFIDENTIAL strong trend predict, prefetch Agenda • Why big data memory characterization? • Big Data Workloads • Methodology and Metrics Measurements and results • Conclusion and outlook 15 INTEL CONFIDENTIAL General Characterization • NoSQL and sort have highest footprints • Hadoop Compression reduces footprints and improves execution time 16 INTEL CONFIDENTIAL General Characterization • Sort has highest cache miss rates (transform large volume from one representation to another) • Compression helps reduce LLC misses 17 INTEL CONFIDENTIAL General Characterization • Workloads have high peak bandwidths • Sort has ~10x larger footprint than wordcount, but lower DTLB MPKI: memory references not well contained within page granularities, and are widespread 18 INTEL CONFIDENTIAL Cache Line Working Set Characterization Hottest 100MB contains 20% of all references 19 INTEL CONFIDENTIAL NoSQL has most spread among its cache lines Sort has 60% references in 120GB footprint within 1GB Cache Simulation Percentage cache hits higher than percentage references from footprint analysis Big Data workloads operate on smaller memory regions at a time 20 INTEL CONFIDENTIAL Entropy from [Shao et al 2013] Big Data workloads have higher entropy (>13) than SPEC workloads (>7) they are less compressible, predictable 21 INTEL CONFIDENTIAL Normalized Correlation • Hive aggregation has high correlation magnitudes (+,-) • Enabling prefetchers has higher correlation in general Potential for effective prediction and prefetching schemes for workloads like Hive aggregation 22 INTEL CONFIDENTIAL Take Aways & Next Steps • Big Data workloads are memory intensive • Potential for latency hiding techniques like cacheability and predictability to be successful • Large 4th level cache can benefit big data workloads • Future work • Including more workloads in the study • Scaling dataset sizes, etc 23 INTEL CONFIDENTIAL