Data Warehouse and the Star Schema CSCI 242 ©Copyright 2015, David C. Roberts, all rights reserved Red Brick 2 Invented data warehouse; they sold a hardware product with a star schema database You loaded the Red Brick Warehouse and then queried it for OLTP It featured new optimizations for star schemas, was very fast Enter Sybase 3 Sybase learned the optimization and developed their own product. The Sybase product was a stand-alone software data warehouse product It couldn’t do general-purpose database work, was just a data warehouse They appear to have copied the Red Brick idea, without selling hardware Enter Oracle 4 Oracle, later, also copied the same optimization They added a bitmap index to their database product, and added the star schema optimization Now their product could do data warehouse as well as database Status Today 5 Oracle dominates the field today IBM eventually bought Red Brick so still offers some sort of Red Brick product Sybase offers their OLTP product, now as an offering of SAP So what is this algorithm that is so copied? THE ALGORITHM 6 Optimizing Star Queries 7 Build a bitmap index on each foreign key column of the fact table Index is a 2-dimensional array, one column for each row being indexed, one row per value of that column Bitmap indexes are typically much smaller than b-tree indexes, that can be larger than the data itself Bitmap Index Example 8 Query Processing The typical query is a join of foreign keys of dimension tables to the fact table This is processed in two phases: 1. 2. 9 From the fact table, retrieve all rows that are part of the result, using bitmap indexes Join the result of the step above to the dimension tables Example Query Find sales and profits from the grocery departments of stores in the West and Southwest districts over the last three quarters 10 Example Query SELECT store.sales_district, time.fiscal_period, SUM(sales.dollar_sales) revenue, SUM(dollar_sales) - SUM(dollar_cost) income FROM sales, store, time, product WHERE sales.store_key = store.store_key AND sales.time_key = time.time_key AND sales.product_key = product.product_key AND time.fiscal_period IN ('3Q95', '4Q95', '1Q96') and product.department = 'Grocery' AND store.sales_district IN ('San Francisco', 'Los Angeles') GROUP BY store.sales_district, time.fiscal_period; 11 Phase 1 Finding the rows in the SALES table (using bitmap indexes): SELECT ... FROM sales WHERE store_key IN (SELECT store_key FROM store WHERE sales_district IN ('WEST', 'SOUTHWEST')) AND time_key IN (SELECT time_key FROM time WHERE quarter IN ('3Q96', '4Q96', '1Q97')) AND product_key IN (SELECT product_key FROM product WHERE department = 'GROCERY'); 12 Phase 2 Now the fact table is joined to dimension tables. For dimension tables of small cardinality, a full-table scan may be used. For large cardinality, a hash join could be used. 13 The Star Transformation Use bitmap indexes to retrieve all relevant rows from the fact table, based on foreign key values – Join this result set to the dimension tables – – 14 This happens very fast If there are many values, a hash join may be used If there are fewer values, a b-tree driven join may be used