MonetDB: A column-oriented DBMS Ryan Johnson CSC2531 The memory wall has arrived • CPU performance +70%/year • Memory performance latency: -50%/decade bandwidth: +20%/year (est.) • Why? – DRAM focus on capacity (+70%/year) – Physical limitations (pin counts, etc.) – Assumption that caches "solve” latency problem DBMS spends 95% of time waiting for memory The problem: data layouts • Logical layout: 2-D relation => Unrealizable in linear address space! • N-ary storage layout, aka “slotted pages” ... – Easy row updates, strided access to columns => Low cache locality for read-intensive workloads “NSM layouts considered harmful” Coping with The Wall • Innovation: decompose all data vertically – Columns stored separately, rejoined at runtime • Binary Association Table (BAT) replaces Relation – List of (recordID, columnValue) pairs – Compression and other tricks ==> 1 byte/entry BAT + clever algos => cache locality => Winner! Exploring deeper • • • • • Performance study (motivation) Physical data layouts Cache-optimized algorithms Evaluating MonetDB performance Implications and lingering questions NSM: access latency over time Read one column (record size varies with x) Latency increases ~10x as accesses/cache line 1 (slope changes at L1/L2 line size) Efficient physical BAT layout • Idea #1: “virtual OID” – Optimizes common case – Dense, monotonic OIDs – All BATs sorted by OID Joining two BAT on OID has O(n) cost! How to handle gaps? • Idea #2: compression – Exploits small domains – Boosts cache locality, effective mem BW Out-of-band values? Can’t we compress NSM also? Cache-friendly hash join • Hash partitioning: one pass but trashes L1/L2 – #clusters > #cache lines Recall: CPU is cheap compared to memory access • Radix-partitioning: limit active #partitions by making more passes Great, but how well does it work? • Three metrics of interest – L1/L2 misses (= suffer latency of memory access) – TLB misses (even more expensive than cache miss) – Query throughput (higher is better) • Should be able to explain throughput using other metrics – Given model makes very good predictions => Memory really is (and remains!) the bottleneck A few graphs Radix clustering behavior as cardinality varies Radix-clustered HJ vs. other algorithms Big win: stability as cardinalities vary Implications and discussion points • Cache-friendly really matters (even w/ I/O) – Traditional DBMS memory-bound • Vertically decomposed data: superior density – Data brought to cache only if actually needed – Compression gives further density boost • Questions to consider... – Queries accessing many columns? – What about inserts/updates (touch many BAT)? – What about deletes/inserts (bad for compression)? Implications and discussion points • Cache-friendly really matters (even w/ I/O) – Traditional DBMS memory-bound • Vertically decomposed data: superior density – Data brought to cache only if actually needed – Compression gives further density boost • Questions to consider... – Queries accessing many columns? – How to make a good query optimizer? – Performance of transactional workloads? • Update-intensive, concurrency control, ... – What about inserts (bad for compression)?