C-Store: A Column-oriented DBMS By New England Database Group 1 Current DBMS Gold Standard Store fields in one record contiguously on disk Use B-tree indexing Use small (e.g. 4K) disk blocks Align fields on byte or word boundaries Conventional (row-oriented) query optimizer and executor (technology from 1979) Aries-style transactions M.I.T 2 Terminology -- “Row Store” Record 1 Record 2 Record 3 Record 4 E.g. DB2, Oracle, Sybase, SQLServer, … M.I.T 3 Row Stores are Write Optimized Can insert and delete a record in one physical write Good But for on-line transaction processing (OLTP) not for read mostly applications Data warehouses CRM M.I.T 4 Elephants Have Extended Row Stores With Bitmap indices Better sequential read Integration of “datacube” products Materialized views But there may be a better idea……. M.I.T 5 Column Stores M.I.T 6 At 100K Feet…. Ad-hoc queries read 2 columns out of 20 In a very large warehouse, Fact table is rarely clustered correctly Column store reads 10% of what a row store reads M.I.T 7 C-Store (Column Store) Project Brandeis/Brown/MIT/UMass-Boston project Usual suspects participating Enough coded to get performance numbers for some queries Complete status later M.I.T 8 We Build on Previous Pioneering Work…. Sybase IQ (early ’90s) Monet (see CIDR ’05 for the most recent description) M.I.T 9 C-Store Technical Ideas Code No the columns to save space Big alignment disk blocks Only materialized views (perhaps many) Focus on Sorting not indexing Automatic physical DBMS design M.I.T 10 C-store (Column Store) Technical Ideas Optimize for grid computing Innovative Xacts Data redundancy – but no need for Mohan ordered on anything, Not just time Column optimizer and executor M.I.T 11 How to Evaluate This Paper…. None of the ideas in isolation merit publication Judge the complete system by its (hopefully intelligent) choice of Small collection of inter-related powerful ideas That together put performance in a new sandbox M.I.T 12 Code the Columns Work hard to shrink space Use extra space for multiple orders Fundamentally E.g. easier than in a row store RLE works well M.I.T 13 No Alignment Densepack E.g. columns a 5 bit field takes 5 bits Current CPU speed going up faster than disk bandwidth Faster to shift data in CPU than to waste disk bandwidth M.I.T 14 Big Disk Blocks Tunable Big (minimum size is 64K) M.I.T 15 Only Materialized Views Projection (materialized view) is some number of columns from a fact table columns in a dimension table – with a 1-n join between Fact and Dimension table Plus Stored in order of a storage key(s) Several may be stored!!!!! With a permutation, if necessary, to map between them M.I.T 16 Only Materialized Views Table (as the user specified it and sees it) is not stored! No secondary indexes (they are a one column sorted MV plus a permutation, if you really want one) M.I.T 17 Example User view: EMP (name, age, salary, dept) Dept (dname, floor) Possible set of MVs: MV-1 (name, dept, floor) in floor order MV-2 (salary, age) in age order MV-3 (dname, salary, name) in salary order M.I.T 18 Different Indexing Sequential Few values Many values RLE encoded Conventional B-tree at the value level Delta encoded Conventional B-tree at the block level Non sequential Bitmap per value Conventional Gzip Conventional B-tree at the block level M.I.T 19 Automatic Physical DBMS Design Not enough 4-star wizards to go around Accept a “training set” of queries and a space budget Choose the MVs auto-magically Re-optimize periodically based on a log of the interactions M.I.T 20 Optimize for Grid Computing I.e. shared-nothing Dewitt (Gamma) was right Horizontal partitioning and intra-query parallelism as in Gamma M.I.T 21 Innovative Redundancy Hardly any warehouse is recovered by a redo from the log Takes too long! Store enough MVs at enough places to ensure K-safety Rebuild dead objects from elsewhere in the network K-safety is a DBMS-design problem! M.I.T 22 XACTS – No Mohan Undo from a log (that does not need to be persistent) Redo by rebuild from elsewhere in the network M.I.T 23 XACTS – No Mohan Snapshot isolation (run queries as of a tunable time in the recent past) To solve read-write conflicts Distributed Xacts Without a prepare message (no 2 phase commit) M.I.T 24 Storage (sort) Key(s) is not Necessarily Time That would be too limiting So how to do fast updates to densepack column storage that is not in entry sequence? M.I.T 25 Solution – a Hybrid Store Write-optimized Column store Tuple mover Read-optimized Column store (Much like Monet) (Batch rebuilder) (What we have been talking about so far) M.I.T 26 Column Executor Column operations – not row operations Columns Late remain coded – if possible materialization of columns M.I.T 27 Column Optimizer Chooses Most Build MVs on which to run the query important task in snowflake schemas Which are simple to optimize without exhaustive search Looking at extensions M.I.T 28 Current Performance 100X popular row store in 40% of the space 10X popular column store in 70% of the space 7X popular row store in 1/6th of the space Code available with BSD license M.I.T 29 Structure Going Forward Vertica Very well financed start-up to commercialize C-store Doing the heavy lifting University Research Funded by Vertica M.I.T 30 Vertica Complete alpha system in December ‘05 Everything, With including DBMS designer current performance! Looking for early customers to work with (see me if you are interested) M.I.T 31 University Research Extension of algorithms to non-snowflake schemas Study of L2 cache performance Study of coding strategies Study of executor options Study of recovery tactics Non-cursor Study interface of optimizer primitives M.I.T 32