C-Store: Data Model and Data Organization - Sun Yat

advertisement
C-Store: Data Model and
Data Organization
Jianlin Feng
School of Software
SUN YAT-SEN UNIVERSITY
May 17, 2010
Data Model:
standard relational model

A database:


A relation


A set of named attributes (columns)
Primary key


A set of named relations (tables)
A set of attributes whose values uniquely identify
a row (tuple) in a relation.
Foreign key

References a primary key in another relation.
Primary Keys and Foreign Keys
How to Implement a logical relation?


Given a logical relation R
Row Store


R has a direct physical correspondence.
Column Store (C-Store)


R may not have a direct physical correspondence.
Covered by a set of C-Store Projections.
C-Store Projection

Is anchored on a given logical table T.

Contains one or more attributes from T.

Can also contain any number of attributes
from other tables.

Attributes from other tables must be referenced by
a sequence of n:1 (i.e., foreign keys) relationships
from T.
How to Construct a Projection?


Given a logical table T
Extract the attributes of interest from T.



Retain any duplicate rows
In standard Projection, duplicates are deleted.
Obtain the attributes of interest from other
tables

Perform foreign-key joins.
Attribute from DEPT
table
Sort Key

Tuples in a projection are stored column-wise.



K attributes  K columns
Each column is sorted on the same Sort Key.
Sort Key

Can be any column or columns in a projection
Examples of Sort Key
Each
Projection
is
horizontall
y
partitioned
into
Segments
Horizontal Partition of a Projection

Each projection is horizontally partitioned into 1 or
more segments.

SID > 0, Segment identifier

Value-based partitioning on the Sort Key of a
projection


Each segment has a key range of the sort key.
The set of all key ranges partitions the whole space of sort
key.
How to Re-construct a Complete Tuple?

To answer any SQL query

There must be a covering set of projections for
every logical table T.


Every column in T is stored in at least one projection.
To re-construct a tuple, Join segments from
different projections

Using Storage Keys and Join Indexes
Storage Keys

A segment, horizontal fragment of a
projection




May involve several coulmns
Associate every data value of each column with a
Storage Key, SK.
In the same segment, data values from different
columns with matching SK belong to the same
logical tuple.
A SK is simply the ID of a logical tuple.
Architecture of C-Store (Vertica)
On a Single Node
Storage Keys in Read Store (RS)

SKs are numbered 1,2 ,3,...

SKs are not physically stored


But are inferred from a tuple’s physical position in
a column.
Why SKs can be inferred?

All the columns in the same segment are sorted
on the same Sort Key.
Storage Keys in Write Store (WS)

SKs are physically stored

SKs are represented as integers.

Each SK in WS is larger than the largest SK
for any segment in RS.
join indexes: A Mapping Table

Suppose T1 and T2 are two projections that
cover a logical table T.

For each segment of T1 , build a join index to
T2
is a Table with rows :
(s: SID in T2, k: Storage Key in Segment s)

Ordering of Data in Each Column

Self-order


the column is ordered by values in itself.
Foreign-order

the column is ordered by values in other column in
the same projection.


A projection may involve several columns
Example



Projection EMP1(name, age| age)
Column age is Self-order
Column name is foreign order
Column Type in Read Store(RS)

Type 1: Self-order, few distinct values

Type 2: Foreign-order, few distinct values

Type 3: Self-order, many distinct values

Type 4: Foreign-order, many distinct values
Compression of Columns in Type 1:
Self-order, few distinct values




Represented by a sequence of triples (v, f, n)
v: a data value stored in the column
f : the position in the column where v first appears.
n: the number of times v appears in the column

Example: 4’s appear in positions 12-18, is recorded
as (4, 12, 7)

One triple is required for each distinct value in the
column.
Fast Search over Columns in Type 1

Using Clustered B-Tree (i.e., Primary B-Tree)

Densepack the B-Tree


No on-line updates in RS
Using large disk blocks


64-128K
The height of this index can be kept small (2 or
less)
Compression of Columns in Type 2:
Foreign-order, few distinct values




Represented by a
sequence of tuples (v, b)
v: a data value stored
in the column
b: a bitmap indicating
the positions in which v
is stored.
Use Run-Length
Encoding to compress
bitmaps.
Column V=0
C1
V=1
V=2
0
0
1
1
2
1
0
2
1
0
0
1
1
0
1
0
0
1
0
0
0
0
1
0
0
1
0
1
1
0
0
0
0
1
0
0
Fast Search over Columns in Type 2


To efficiently find the i-th value in a column,
Use Offset indexes

B-Trees that map positions in a column to the
values contained in that column.
Compression of Columns in Type 3:
Self-order, many distinct values

Basic Idea



Represent each value in
the column as a delta
from the previous value.
Use block (64-128K) as
the unit of compression.
Like compressing an
Inverted Index
Column C1
Delta
representation
1
1 (starting value)
4
3
7
3 (delta)
7
0 (delta)
8
1 (delta)
12
4 (delta)
Fast Search over Columns in Type 3

Use densepacked B-Tree at the block-level

A block is viewed as a tuple.
Compression of Columns in Type 4:
Foreign-order, many distinct values

Two choices at the moment



Do not compress
Use densepacked B-Tree
Open for further research
Write Store (WS)

WS is also a Column Store

To avoid writing two query optimizers

Implements the identical physical DBMS
design as RS.

The Major Difference due to Updates in WS

Storage representation
How to Do Updates?

Update




Insert a new tuple
Modify an existing tuple
Delete an (existing) tuple
Update depends on first doing a query



Insert: need to check Primary Key Constraint etc.
Modify: need to first find the specified tuple
Delete: need to first find the specified tuple
Queries in C-Store

Use some known values of some columns as
conditions to search values on other columns

Example table:


Example query:


EMP1(name, age| age)
use age values to query name values.
Tuples in C-Store are stored column-wise


We need to keep track of all column values of the same
logical tuple
The Idea: Storage Key as the tuple ID.
How to Speed-up Queries in C-Store?

The Basic Idea:




Pre-sort every projection in some SORT KEY order
Keep several copies of the same projection, each copy is
pre-sorted in different sort key order.
For a given query, use a copy of the projection whose order
matches with the query conditions.
Therefore queries in C-Store usually equal to
searches via sort key.

Note: Each Projection in WS has only one copy
Storage Key in WS

The Storage Key, SK, for each tuple is explicitly
stored in each WS segment.



Columns in WS only keep a logical SORT KEY order via
SKs.
Delay physical sorts on SORT KEY order when tuples are
moved from WS to RS.
A unique SK is given to each insert of a logical tuple
r in a table T.


SK is a serial number generated by system.
The SK of r must be recorded in each projection that stores
data for r.
Storage Representation of Columns
in WS

Every column in a Projection

Represented as a collection of (v, sk) pairs



v : a data value in the column
sk : the storage key (explicitly stored)
Build a B-Tree over each column

Use the second field of each pair, sk, as the KEY
Sort Keys of Each Projection in WS

Represented as a collection of (s, sk) pairs



s : a sort key value
sk : the storage key describing where s first
appears.
Build a B-Tree over the (s, sk) pairs

Use the first field of each pair, s, as the KEY
Searches via Sort Key: Two Steps

1st Step,use the B-Tree over the (s, sk) pairs



Search condition:
Search result:
known sort key values
corresponding storage keys
2nd Step, use the B-Tree over the each
column (i.e., the (v, sk) pairs)


Search condition:
Step.
Search result:
storage keys found in 1st
data values in the column.
Partition of a Projection in WS

Partitioning a Projection in the same way as
in RS



Equals to partitioning the space of sort key
1:1 mapping between RS segments and WS ones.
A tuple is identified by a (sid, storage_key)
pair in either RS or WS


sid: segment ID
Storage_key: the SK of the tuple
General Picture of WS

Usually is very small, can be fully stored in
Memory

Data processing is fast in memory

Do not compress data, represent data directly

Each projection uses B-Tree indexing to
maintain a logical sort-key order.

The B-Trees are secondary B-Tree.
References
1.
2.
Mike Stonebraker, Daniel Abadi, Adam Batkin,
Xuedong Chen, Mitch Cherniack, Miguel Ferreira,
Edmond Lau, Amerson Lin, Sam Madden,
Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran
and Stan Zdonik. C-Store: A Column Oriented
DBMS , VLDB, 2005.
(http://db.csail.mit.edu/projects/cstore/vldb.pdf)
VERTICA DATABASE TECHNICAL OVERVIEW
WHITE PAPER.
http://www.vertica.com/php/pdfgateway?file=Vertic
aArchitectureWhitePaper.pdf
Download