HYRISE – A Main Memory Hybrid Storage Engine By: Martin Grund, Jens Krüger, Hasso Plattner, Alexander Zeier, Philippe Cudre-Mauroux, Samuel Madden, VLDB 2011 Presented by: Natasha Prokoshyna 1 Separation between transaction processing and analytics systems OLAP bulk updates and large sequential scans spanning many rows prefer narrow, vertical partitions prefer pure columnar systems (eg. MonetDB/X100) OLTP reads/writes to a few rows at a time (select, update, delete) prefer wider, horizontal partitions prefer all-row design 2 Need for ‘real-time analytics’ Need up-to-the minute reporting on business processes, ‘real-time analytics’ Available-to-promise (ATP) applications Existing databases are not optimized for mixed query workloads 3 HYRISE – A hybrid system Automatically partitions tables into vertical partitions of varying widths depending on how the columns of the table are accessed Finds partitioning that minimizes the number of cache misses for a given workload Contributions of partial projections, data alignment and query plans Scales to tables with a large number of columns 4 Architecture Receives user queries, creates a physical query plan, executes the query plan by calling Storage Manager Analyzes query workload and suggests best possible partitioning to the Storage Manager Creates and maintains the hybrid containers storing the data 5 Storage Manager Each partition is a container For relation r with eight attributes a1...a8, stored in three containers: A relation is a collection of disjoint vertical partitions of different widths Each attribute is mapped to only one container Container provides methods to access values it holds 6 Query Processor Implements selection, projection (where clause), sorting and group by operators Support for joins includes hash and nested loops join Supports both early and late materialization for most operators Cache misses are highly correlated with and are a good predictor of total CPU cycles in database access methods 7 Layout Manager Candidate Generation Candidate Merging For each relation R, recursively split the set of attributes into two subsets: accessed attributes and ignored attributes Determine all primary partitions[1] for all participating tables A set of primary partitions each contains a set of attributes that are always accessed together Inspect permutations to generate additional candidates, assuming: Merging two primary partitions advantageous for wide, random access to attributes Candidate merging has additional access overhead unless both primary partitions are perfectly aligned to cache lines Discard candidate if cost of merging primary partitions is equal to or greater than the sum of individual costs, otherwise add merged candidate to candidate set Layout Generation Generate the set of all valid layouts by exploring all possible combinations of the partitions Evaluate the cost of each layout to find the physical layout with the lowest cost [1] primary partition – largest partition that does not incur any container overhead cost 8 Hybrid Model Cost of the above algorithm is high for complex workloads Approximate algorithm: Cluster the primary partitions that are often co-accessed together Generate optimal sub-layouts for each cluster of primary partitions Combine the optimal sub-layouts Improves on the all-column layout: 1.6x faster on OLTP queries, identical performance on analytical queries Improves on all-row layouts: 4x less CPU cycles Hill-Climb Data Morphing algorithm (Hankins and Patel, 2003) performed worse on a simplified version of the benchmark: 60% worse on cache misses, and 16% worse on CPU cycles 9 Experimental Results For a table with 500 4-byte attributes, starting with a single OLTP query and iteratively given an increasing number of OLAP queries, the number of partitions converges towards “all-column” layout. 10 Benchmarks All-row (R) vs. all-column (C) vs. hybrid system (H) : Intel E5450 quad-core CPU with 32KB per core L1 data and instruction cache (8-way associative, 64 byte cache lines), a shared 6MB L2 cache (24-way associative, 64 byte cache lines), and 64 GB of PC2 5300 CL2 RAM 11 Outstanding Issues Lacks support for transactions and recovery Query execution is single-threaded and handles one operator at a time only orthogonal to the question of which physical design will perform best for a given workload Support parallel execution for multi-core processors is a work in progress Container overhead cost: loading attributes into cache that are not used by the projection (cointainer_width – num_bytes for full scan) Approximate algorithm finds sub-optimal layouts when a complex combination of partitions How does HYRISE compare to other existing hybrid systems (eg. Oracle Exadata 2) 12 Thank you! 13