C-Store - VLDB 2005

advertisement
C-Store: A Column-oriented DBMS
By
New England Database Group
1
Current DBMS Gold Standard
Store
fields in one record contiguously on disk
Use
B-tree indexing
Use
small (e.g. 4K) disk blocks
Align
fields on byte or word boundaries
Conventional
(row-oriented) query optimizer
and executor (technology from 1979)
Aries-style
transactions
M.I.T
2
Terminology -- “Row Store”
Record 1
Record 2
Record 3
Record 4
E.g. DB2, Oracle, Sybase, SQLServer, …
M.I.T
3
Row Stores are Write Optimized
Can
insert and delete a record in one physical
write
Good
But
for on-line transaction processing (OLTP)
not for read mostly applications
 Data
warehouses
 CRM
M.I.T
4
Elephants Have Extended Row Stores
With
Bitmap indices
Better
sequential read
Integration
of “datacube” products
Materialized
views
But there may be a better idea…….
M.I.T
5
Column Stores
M.I.T
6
At 100K Feet….
Ad-hoc
queries read 2 columns out of 20
In
a very large warehouse, Fact table is
rarely clustered correctly
Column
store reads 10% of what a row
store reads
M.I.T
7
C-Store (Column Store) Project
Brandeis/Brown/MIT/UMass-Boston
project
 Usual
suspects participating
 Enough
coded to get performance
numbers for some queries
 Complete
status later
M.I.T
8
We Build on Previous Pioneering
Work….
Sybase
IQ (early ’90s)
Monet
(see CIDR ’05 for the most recent
description)
M.I.T
9
C-Store Technical Ideas
Code
No
the columns to save space
Big
alignment
disk blocks
Only
materialized views (perhaps many)
Focus
on Sorting not indexing
Automatic
physical DBMS design
M.I.T
10
C-store (Column Store) Technical
Ideas
Optimize
for grid computing
Innovative
Xacts
Data
redundancy
– but no need for Mohan
ordered on anything, Not just time
Column
optimizer and executor
M.I.T
11
How to Evaluate This Paper….
None
of the ideas in isolation merit
publication
Judge
the complete system by its
(hopefully intelligent) choice of
 Small
collection of inter-related
powerful ideas
 That
together put performance in a
new sandbox
M.I.T
12
Code the Columns
Work
hard to shrink space
 Use
extra space for multiple orders
Fundamentally
 E.g.
easier than in a row store
RLE works well
M.I.T
13
No Alignment
Densepack
 E.g.
columns
a 5 bit field takes 5 bits
Current
CPU speed going up faster than
disk bandwidth
 Faster
to shift data in CPU than to
waste disk bandwidth
M.I.T
14
Big Disk Blocks
Tunable
Big
(minimum size is 64K)
M.I.T
15
Only Materialized Views
Projection
(materialized view) is some
number of columns from a fact table
columns in a dimension table – with
a 1-n join between Fact and Dimension
table
Plus
Stored
in order of a storage key(s)
Several
may be stored!!!!!
With
a permutation, if necessary, to map
between them
M.I.T
16
Only Materialized Views
Table
(as the user specified it and sees
it) is not stored!
No
secondary indexes (they are a one
column sorted MV plus a permutation, if
you really want one)
M.I.T
17
Example
User view:
EMP (name, age, salary, dept)
Dept (dname, floor)
Possible set of MVs:
MV-1 (name, dept, floor) in floor order
MV-2 (salary, age) in age order
MV-3 (dname, salary, name) in salary order
M.I.T
18
Different Indexing
Sequential
Few values
Many values
RLE encoded
Conventional B-tree at
the value level
Delta encoded
Conventional B-tree at
the block level
Non sequential Bitmap per value
Conventional Gzip
Conventional B-tree at
the block level
M.I.T
19
Automatic Physical DBMS Design
Not
enough 4-star wizards to go around
Accept
a “training set” of queries and a
space budget
Choose
the MVs auto-magically
Re-optimize
periodically based on a log
of the interactions
M.I.T
20
Optimize for Grid Computing
I.e.
shared-nothing
 Dewitt
(Gamma) was right
Horizontal
partitioning and intra-query
parallelism as in Gamma
M.I.T
21
Innovative Redundancy
Hardly
any warehouse is recovered by a
redo from the log
 Takes
too long!
Store
enough MVs at enough places to
ensure K-safety
Rebuild
dead objects from elsewhere in
the network
K-safety
is a DBMS-design problem!
M.I.T
22
XACTS – No Mohan
Undo
from a log (that does not need to
be persistent)
Redo
by rebuild from elsewhere in the
network
M.I.T
23
XACTS – No Mohan
Snapshot
isolation (run queries as of a
tunable time in the recent past)
 To
solve read-write conflicts
Distributed
Xacts
 Without
a prepare message (no 2
phase commit)
M.I.T
24
Storage (sort) Key(s) is not
Necessarily Time
That
would be too limiting
So
how to do fast updates to densepack
column storage that is not in entry
sequence?
M.I.T
25
Solution – a Hybrid Store
Write-optimized
Column store
Tuple mover
Read-optimized
Column store
(Much like Monet)
(Batch rebuilder)
(What we have been
talking about so far)
M.I.T
26
Column Executor
Column
operations – not row operations
Columns
Late
remain coded – if possible
materialization of columns
M.I.T
27
Column Optimizer
Chooses
 Most
Build
MVs on which to run the query
important task
in snowflake schemas
 Which
are simple to optimize without
exhaustive search
Looking
at extensions
M.I.T
28
Current Performance
100X
popular row store in 40% of the
space
10X
popular column store in 70% of the
space
7X
popular row store in 1/6th of the space
Code
available with BSD license
M.I.T
29
Structure Going Forward
Vertica
 Very
well financed start-up to
commercialize C-store
 Doing
the heavy lifting
University
Research
 Funded
by Vertica
M.I.T
30
Vertica
Complete
alpha system in December ‘05
 Everything,
 With
including DBMS designer
current performance!
 Looking
for early customers to work with
(see me if you are interested)
M.I.T
31
University Research
Extension
of algorithms to non-snowflake
schemas
Study
of L2 cache performance
Study
of coding strategies
Study
of executor options
Study
of recovery tactics
Non-cursor
Study
interface
of optimizer primitives
M.I.T
32
Download