APPLIED SOLUTIONS Rows vs. Columns: Why Not Both? Teradata Columnar allows row, column or hybrid reads for ultra-efficient queries. by Paul Sinclair T eradata Columnar, introduced in Teradata Database 14.0, offers a new approach for organizing the data of user-defined tables and join indexes on disk. Teradata’s unique implementation of columnar has the ability to partition a table or join index by column. This column partitioning feature can be used alone in a single-level partitioning definition or with row partitioning, enabling queries to access data by column, row or both. The hybrid row and column feature provides a new paradigm for storing data that changes the cost-benefit trade-offs of physical database design choices and their combinations. Teradata Columnar also benefits businesses by reducing I/O for certain queries while simultaneously decreasing space usage. Divide and Conquer Row partitioning in the Teradata Database allows users to group rows horizontally. Each row partition contains a subset of the total rows in a table, such as one day of transaction data. Users can define one or more levels and define a partitioning expression for each level. If they have a query that specifies a value or a range of values for the partitioning column, fewer rows need to be accessed, compared with a full table scan. With column partitioning, each column or groups of columns in a table will become a partition containing the values of that specific grouping. This is a simple approach because there is no need to define partitioning expressions, and determining partition elimination is very easy. Here is an example that creates a column-partitioned table that vertically partitions the data: CREATE TABLE Sales_CP ( TxnNo INTEGER, TxnDate DATE, ItemNo INTEGER, Quantity INTEGER ) PARTITION BY COLUMN, UNIQUE INDEX (TxnNo); A primary index is not specified, so this is a no primary index (NoPI) table. Each column is its own partition, so every partition value is just a value of that individual column. PAGE 1 l Teradata Magazine l Q2/2012 l ©2012 Teradata Corporation l AR-6580 Efficiencies of Column Partitioning The key benefit in defining row partitioning for a table is when queries access a subset of rows based on constraints on the partitioning columns. Using column partitioning improves the performance of queries accessing a subset of the columns, either for predicates or projections. Because sets of columns can be stored in separate partitions, only those containing the columns referenced by the query need to be scanned. The advantages of column and row partitioning can be combined in Teradata Columnar, further reducing I/O. Fewer data figure 1 Column and Row Partitioning TxnNo TxnDate ItemNo 100 05-29-2011 756 Quantity 1 100 05-29-2011 124 3 290 05-30-2011 437 1 450 05-30-2011 110 1 530 05-30-2011 815 2 One row partition One row partition One column One column One column One column partition partition partition partition Partitioning by rows and by columns results in less data being accessed, which means faster query times and reduced I/O. blocks need to be read, since more information is packed into the blocks. For example, the following shows how to get a list of items sold on May 29, 2011: CREATE TABLE Sales_CPRP ( TxnNo INTEGER, TxnDate DATE, ItemNo INTEGER, Quantity INTEGER ) PARTITION BY ( COLUMN, RANGE_N(TxnDate BETWEEN DATE ‘2011-01-01’ AND DATE ‘2011-12-31’ EACH INTERVAL ‘1’ DAY) ), UNIQUE INDEX (TxnNo); With both column and row partitioning defined on a table, the query needs to access only the column partitions containing essential items. (See figure 1.) Another way to look at the advantages of partitioning is to contrast the data accessed when different types of partitioning are defined. For example, this table definition has various physical database designs: CREATE TABLE mytable ( A INT, B INT, C CHAR(100),D INT, E INT, F INT, G INT, H INT, I INT, J INT, K INT, L INT ); PAGE 2 l Teradata Magazine l Q2/2012 l ©2012 Teradata Corporation l AR-6580 The query below is based on the previous table called mytable: SELECT SUM(F) FROM mytable WHERE B BETWEEN 4 AND 7; In figure 2 on page 3, only columns F and B are referenced by the query, even though the table has 12 columns. The figure illustrates the data that has to be accessed when there is: 1} no partitioning 2} row partitioning on column B 3} column partitioning 4} both column and row partitioning APPLIED SOLUTIONS Comparison of Physical Database Design Choices figure 2 No Partitioning Row Partitioning A B C D E F G H I F K L 1 5 a 3 9 9 4 6 2 7 4 2 9 q 5 4 6 3 8 5 1 1 2 3 1 1 1 3 4 7 8 2 9 4 8 m 7 5 d 3 3 9 4 1 4 2 8 6 f 2 2 4 7 3 1 5 7 2 6 6 r 1 8 2 8 3 4 2 5 1 7 e 0 5 1 6 4 3 9 9 7 8 4 u 9 0 1 2 7 6 6 0 3 9 d 3 5 1 2 6 2 7 3 3 A B C D E F G H I J K L 8 4 u 9 0 6 6 0 3 1 5 a 3 9 9 4 6 2 7 4 5 6 6 r 1 8 4 2 5 1 Column names 5 3 2 Column names Column values 1 2 2 8 7 3 Column values g Partitioning column 8 g Primary index column NoPl With Column Partitioning A B C D E F G H 5 I F K L NoPl With Column and Row Partitioning Column names A B C D E F G H I F K L Column names 9 9 1 8 3 6 2 Column values 2 4 4 1 5 9 6 2 Column values 1 2 When a table has no partitioning, as in No. 1, all data is accessed. By combining column and row partitioning, as shown in No. 4, only the necessary data is accessed. Column-Partitioned Characteristics A table or join index that is partitioned by column has several key features: > No primary index > Column partitions composed of single or multiple columns > Column partitions, each of which usually contains multiple physical rows > Columnar storage and autocompression implemented with a new physical row format column called a container > Row-storage implementation that may have physical rows with row format called a subrow If the table has 4 million rows, the query reads approximately this many data blocks: > 9,987 with no partitioning > 4,529 with row partitioning > 281 with column partitioning > 171 with column and row partitioning Decreased I/O usually comes with higher CPU usage. Since I/O is often relatively expensive compared with CPU, and CPU is getting faster at a much PAGE 3 l Teradata Magazine l Q2/2012 l ©2012 Teradata Corporation l AR-6580 higher rate than I/O, this is usually a reasonable trade-off. Fewer Touch Points Teradata Columnar offers the ability to partition a table or join index vertically by column, horizontally by row or a hybrid of both. To eliminate the need to access all data in all rows and columns, Teradata Columnar breaks the table into smaller sections so only the individual column data for the proper rows is scanned to complete a query. T Paul Sinclair is an architect at Teradata Labs. He designed partitioning features that are the basis for Teradata Columnar.