Rows vs. Columns: Why Not Both?

advertisement
APPLIED SOLUTIONS
Rows vs. Columns:
Why Not Both?
Teradata Columnar allows row,
column or hybrid reads for
ultra-efficient queries.
by Paul Sinclair
T
eradata Columnar, introduced in
Teradata Database 14.0, offers a
new approach for organizing the
data of user-defined tables and join indexes
on disk. Teradata’s unique implementation
of columnar has the ability to partition a
table or join index by column. This column
partitioning feature can be used alone in a
single-level partitioning definition or with
row partitioning, enabling queries to access
data by column, row or both.
The hybrid row and column feature provides a new paradigm for storing data that
changes the cost-benefit trade-offs of physical
database design choices and their combinations. Teradata Columnar also benefits
businesses by reducing I/O for certain queries
while simultaneously decreasing space usage.
Divide and Conquer
Row partitioning in the Teradata Database
allows users to group rows horizontally.
Each row partition contains a subset of
the total rows in a table, such as one day of
transaction data. Users can define one or
more levels and define a partitioning expression for each level. If they have a query that
specifies a value or a range of values for the
partitioning column, fewer rows need to be
accessed, compared with a full table scan.
With column partitioning, each column
or groups of columns in a table will become
a partition containing the values of that
specific grouping. This is a simple approach
because there is no need to define partitioning expressions, and determining partition
elimination is very easy. Here is an example
that creates a column-partitioned table that
vertically partitions the data:
CREATE TABLE Sales_CP (
TxnNo
INTEGER,
TxnDate DATE,
ItemNo INTEGER,
Quantity INTEGER )
PARTITION BY COLUMN,
UNIQUE INDEX (TxnNo);
A primary index is not specified, so
this is a no primary index (NoPI) table.
Each column is its own partition, so
every partition value is just a value of
that individual column.
PAGE 1 l Teradata Magazine l Q2/2012 l ©2012 Teradata Corporation l AR-6580
Efficiencies
of Column Partitioning
The key benefit in defining row partitioning
for a table is when queries access a subset of
rows based on constraints on the partitioning columns. Using column partitioning
improves the performance of queries
accessing a subset of the columns, either for
predicates or projections. Because sets of
columns can be stored in separate partitions,
only those containing the columns referenced
by the query need to be scanned.
The advantages of column and row
partitioning can be combined in Teradata
Columnar, further reducing I/O. Fewer data
figure 1
Column and Row Partitioning
TxnNo
TxnDate
ItemNo
100
05-29-2011
756
Quantity
1
100
05-29-2011
124
3
290
05-30-2011
437
1
450
05-30-2011
110
1
530
05-30-2011
815
2
One row
partition
One row
partition
One column One column One column One column
partition
partition
partition
partition
Partitioning by rows and by columns results in less data being
accessed, which means faster query times and reduced I/O.
blocks need to be read, since more information is packed into the blocks. For example,
the following shows how to get a list of
items sold on May 29, 2011:
CREATE TABLE Sales_CPRP (
TxnNo INTEGER,
TxnDate DATE,
ItemNo INTEGER,
Quantity INTEGER )
PARTITION BY (
COLUMN,
RANGE_N(TxnDate
BETWEEN DATE ‘2011-01-01’
AND DATE ‘2011-12-31’
EACH INTERVAL ‘1’ DAY) ),
UNIQUE INDEX (TxnNo);
With both column and row partitioning
defined on a table, the query needs to access
only the column partitions containing
essential items. (See figure 1.)
Another way to look at the advantages of
partitioning is to contrast the data accessed
when different types of partitioning are
defined. For example, this table definition
has various physical database designs:
CREATE TABLE mytable (
A INT, B INT, C CHAR(100),D INT, E INT,
F INT, G INT, H INT, I INT, J INT, K INT,
L INT );
PAGE 2 l Teradata Magazine l Q2/2012 l ©2012 Teradata Corporation l AR-6580
The query below is based on the previous
table called mytable:
SELECT SUM(F) FROM mytable WHERE
B BETWEEN 4 AND 7;
In figure 2 on page 3, only columns F
and B are referenced by the query, even
though the table has 12 columns. The
figure illustrates the data that has to be
accessed when there is:
1} no partitioning
2} row partitioning on column B
3} column partitioning
4} both column and row partitioning
APPLIED SOLUTIONS
Comparison of Physical Database Design Choices
figure 2
No Partitioning
Row Partitioning
A B C D E F G H
I
F K L
1
5
a
3
9 9 4 6
2
7
4
2
9 q
5
4 6
3
8
5
1
1
2
3
1
1
1
3
4
7
8
2
9
4 8 m 7
5
d
3
3
9 4
1
4
2
8
6
f
2
2
4
7
3
1
5
7
2
6 6
r
1
8
2
8
3
4
2
5
1
7
e 0 5
1
6 4
3
9 9
7
8 4
u
9 0
1
2
7
6 6 0 3
9
d
3
5
1
2
6
2
7
3
3
A B C D E F G H
I
J
K L
8 4
u
9 0
6 6 0 3
1
5
a
3
9 9 4 6
2
7
4
5
6 6
r
1
8
4
2
5
1
Column names
5
3
2
Column names
Column
values
1
2
2
8
7
3
Column
values
g Partitioning column
8
g Primary index column
NoPl With Column Partitioning
A B C D E F G H
5
I
F K L
NoPl With Column and Row Partitioning
Column names
A B C D E F G H
I
F K L
Column names
9
9
1
8
3
6
2
Column
values
2
4
4
1
5
9
6
2
Column
values
1
2
When a table has no partitioning, as in No. 1, all data is accessed. By combining column and row partitioning,
as shown in No. 4, only the necessary data is accessed.
Column-Partitioned Characteristics
A table or join index that is partitioned by column has several key features:
> No primary index
> Column partitions composed of single or multiple columns
> Column partitions, each of which usually contains multiple physical rows
> Columnar storage and autocompression implemented with a new physical
row format column called a container
> Row-storage implementation that may have physical rows with row format called a subrow
If the table has 4 million rows, the query
reads approximately this many data blocks:
> 9,987 with no partitioning
> 4,529 with row partitioning
> 281 with column partitioning
> 171 with column and row partitioning
Decreased I/O usually comes with
higher CPU usage. Since I/O is often
relatively expensive compared with CPU,
and CPU is getting faster at a much
PAGE 3 l Teradata Magazine l Q2/2012 l ©2012 Teradata Corporation l AR-6580
higher rate than I/O, this is usually a
reasonable trade-off.
Fewer Touch Points
Teradata Columnar offers the ability to
partition a table or join index vertically by
column, horizontally by row or a hybrid of
both. To eliminate the need to access all data
in all rows and columns, Teradata Columnar
breaks the table into smaller sections so only
the individual column data for the proper
rows is scanned to complete a query. T
Paul Sinclair is an architect at Teradata
Labs. He designed partitioning features
that are the basis for Teradata Columnar.
Download