Database Architecture Optimized for the New Bottleneck: Memory Access Peter Boncz Stefan Manegold Martin Kersten Data Distilleries B.V. Amsterdam The Netherlands CWI Amsterdam The Netherlands P.Boncz@ddi.nl {S.Manegold,M.Kersten}@cwi.nl Contents • How Memory Access works • Simple Scan Experiment • Consequences for DBMS – Data Structures: vertical decomposition – Algorithms: tune random memory access • Partitioned Join Algorithms – Monet Experiments – Accurate Cost Models • Conclusion 2 CPU Speed vs. Memory Speed Moore’s Law: CPU speed doubles every 3 years 3 Memory Access in Hierarchical Systems 4 Simple Scan Experiment 5 Consequences for DBMS • Memory access is a bottleneck • Prevent cache & TLB misses • Cache lines must be used fully • DBMS must optimize – Data structures – Algorithms (focus: join) 6 Vertical Decomposition in Monet 7 Partitioned Joins • Cluster both input relations • Create clusters that fit in memory cache • Join matching clusters • Two algorithms: – Partitioned hash-join – Radix-Join (partitioned nested-loop) 8 Partitioned Joins: Straightforward Clustering • Problem: Number of clusters exceeds number of – TLB entries ==> TLB trashing – Cache lines ==> cache trashing • Solution: Multi-pass radix-cluster 9 Partitioned Joins: Multi-Pass Radix-Cluster • Multiple clustering passes • Limit number of clusters per pass • Avoid cache/TLB trashing • Trade memory cost for CPU cost • Any data type (hashing) 10 Monet Experiments: Setup • Platform: – SGI Origin2000 (MIPS R10000, 250 MHz) • System: – Monet DBMS • Data sets: – Integer join columns – Join hit-rate of 1 – Cardinalities: 15,625 - 64,000,000 • Hardware event counters – to analyze cache & TLB misses 11 Monet Experiments: Radix-Cluster (64,000,000 tuples) 12 Accurate Cost Modeling: Radix-Cluster 13 Monet Experiments: Partitioned Hash-Join 14 Monet Experiments: Radix-Join 15 Monet Experiments: Overall Performance (64,000,000 tuples) 16 Conclusion • Problem: – Memory access is increasingly the most important bottleneck for database performance • Solutions: – Vertical decomposition improves column-wise data access – Radix-algorithms optimize join performance • General: – Algorithms can be tuned to achieve optimal memory access – Detailed and accurate estimation of memory cost is possible Monet homepage: www.cwi.nl/~monet 17