ppt

advertisement
Memory System Characterization
of Big Data Workloads
Martin Dimitrov, Karthik Kumar, Patrick Lu,
Vish Viswanathan, Thomas Willhalm
Agenda
 Why big data memory characterization?
• Workloads, Methodology and Metrics
• Measurements and results
• Conclusion and outlook
2
INTEL CONFIDENTIAL
Why big data memory characterization?
• Studies show exponential data growth to come.
• Big Data: information from unstructured data
• Primary technologies are Hadoop and NoSQL
3
INTEL CONFIDENTIAL
Why big data memory characterization?
Power
Memory consumes
upto 40% of total
server power
Performance
Memory latency,
capacity, bandwidth
are important
Large data volumes can put pressure on the
memory subsystem
Optimizations tradeoff CPU cycles to reduce load
on memory, ex: compression
Important to understand memory usages of big data
4
INTEL CONFIDENTIAL
Why big data memory characterization?
DRAM scaling
is hitting limits
Emerging
memories have
higher latency
Focus on
latency hiding
optimizations
How do latency-hiding optimizations apply to big data workloads?
5
INTEL CONFIDENTIAL
Executive Summary
• Provide insight into memory access
characteristics of big data applications
• Examine implications on prefetchability,
compressibility, cacheability
• Understand impact on memory
architectures for big data usage models
6
INTEL CONFIDENTIAL
Agenda
• Why big data memory characterization?
 Workloads, Methodology and Metrics
• Measurements and results
• Conclusion and outlook
7
INTEL CONFIDENTIAL
Big Data workloads
• Sort
• WordCount
• Hive Join
• Hive Aggregation
• NoSQL indexing
We analyze these workloads using hardware DIMM
traces, performance counter monitoring, and
performance measurements
8
INTEL CONFIDENTIAL
General Characterization
Memory footprint from DIMM trace
• Memory in GB touched atleast once by the application
• Amount of memory to keep the workload „in
memory“
EMON:
• CPI
• Cache behavior: L1, L2, LLC MPI
• Instruction and Data TLB MPI
Understand how the workloads use memory
9
INTEL CONFIDENTIAL
Cache Line Working Set
Characterization
1. For each cache line, compute number of times it is
referenced
2. Sort cache lines by their number of references
3. Select a footprint size, say X MB
4. What fraction of total references is contained in X MB
of the hottest cache lines?
Identifies the hot working set of application
10
INTEL CONFIDENTIAL
Cache Simulation
Run workload through a LRU cache simulator and
vary the cache size
Considers temporal nature, not only spatial
• Streaming through regions larger than cache size
• Eviction and replacement policies impact cacheability
• Focus on smaller sub-regions
Hit rates indicate potential for cacheability in tiered
memory architecture
11
INTEL CONFIDENTIAL
Entropy
• Compressibility and Predictability important
• Signal with high information content – harder to
compress and difficult to predict
• Entropy helps understand this behavior. For a set
of cache lines K:
Lower entropy
12
INTEL CONFIDENTIAL
more compressibility, predictability
Entropy - example
(A)
(B)
(C)
Footprint: 640B
Footprint: 640B
Footprint: 640B
References: 100
References: 100
References: 100
References/line: 10
References/line: 10
References/line: 10
64 byte cache: 10%
192 byte cache: 30%
Entropy: 1
Lower entropy
13
INTEL CONFIDENTIAL
<
64 byte cache: 19%
192 byte cache: 57%
Entropy: 0.785
< 64 byte cache: 91%
192 byte cache: 93%
Entropy: 0.217
more compressibility, predictability
Correlation and Trend Analysis
Examine trace for trends
Eg: increasing trend in upper physical address ranges
Aggressively prefetch to an upper cache
• With s = 64, l=1000, test function f mimics ascending
stride through memory of 1000 cache lines
• Negative correlation with f indicates decreasing trend
High correlation
14
INTEL CONFIDENTIAL
strong trend
predict, prefetch
Agenda
• Why big data memory characterization?
• Big Data Workloads
• Methodology and Metrics
 Measurements and results
• Conclusion and outlook
15
INTEL CONFIDENTIAL
General Characterization
• NoSQL and sort have highest footprints
• Hadoop Compression reduces footprints and improves
execution time
16
INTEL CONFIDENTIAL
General Characterization
• Sort has highest cache miss rates (transform large
volume from one representation to another)
• Compression helps reduce LLC misses
17
INTEL CONFIDENTIAL
General Characterization
• Workloads have high peak bandwidths
• Sort has ~10x larger footprint than wordcount, but
lower DTLB MPKI: memory references not well
contained within page granularities, and are
widespread
18
INTEL CONFIDENTIAL
Cache Line Working Set
Characterization
Hottest 100MB
contains 20% of all
references
19
INTEL CONFIDENTIAL
NoSQL has most
spread among its
cache lines
Sort has 60%
references in 120GB
footprint within 1GB
Cache Simulation
Percentage cache hits higher than percentage references
from footprint analysis
Big Data workloads operate on smaller memory regions at a time
20
INTEL CONFIDENTIAL
Entropy
from [Shao et al 2013]
Big Data workloads have higher entropy (>13) than SPEC
workloads (>7)
they are less compressible, predictable
21
INTEL CONFIDENTIAL
Normalized Correlation
• Hive aggregation has high correlation magnitudes (+,-)
• Enabling prefetchers has higher correlation in general
Potential for effective prediction and prefetching schemes for
workloads like Hive aggregation
22
INTEL CONFIDENTIAL
Take Aways & Next Steps
• Big Data workloads are memory intensive
• Potential for latency hiding techniques like
cacheability and predictability to be successful
• Large 4th level cache can benefit big data workloads
• Future work
• Including more workloads in the study
• Scaling dataset sizes, etc
23
INTEL CONFIDENTIAL
Download