Lecture 10: More OLAP - Dimensional modeling www.cl.cam.ac.uk/Teaching/current/Databases/ 1 Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions & measures – Star schema: A fact table in the middle connected to a set of dimension tables – Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake – Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation 2 Star product prodId p1 p2 name price bolt 10 nut 5 sale oderId date o100 1/7/97 o102 2/7/97 105 3/8/97 customer custId 53 81 111 store storeId c1 c2 c3 custId 53 53 111 name joe fred sally prodId p1 p2 p1 storeId c1 c1 c3 address 10 main 12 main 80 willow qty 1 2 5 city nyc sfo la amt 12 11 50 city sfo sfo la 3 Star Schema product prodId name price sale orderId date custId prodId storeId qty amt customer custId name address city store storeId city 4 Terms • Fact table • Dimension tables • Measures product prodId name price sale orderId date custId prodId storeId qty amt customer custId name address city store storeId city 5 Another Star Schema time item time_key day day_of_the_week month quarter year Sales Fact Table time_key item_key branch_key branch location_key branch_key branch_name branch_type units_sold dollars_sold avg_sales item_key item_name brand type supplier_type location location_key street city province_or_street country Measures 6 Dimension Hierarchies sType store store storeId s5 s7 s9 city cityId sfo sfo la tId t1 t2 t1 mgr joe fred nancy snowflake schema constellations region sType tId t1 t2 size small large city cityId pop sfo 1M la 5M location downtown suburbs regId north south region regId name north cold region south warm region 7 Cube Fact table view: sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 Multi-dimensional cube: amt 12 11 50 8 p1 p2 c1 12 11 c2 c3 50 8 dimensions = 2 8 3-D Cube Fact table view: sale prodId p1 p2 p1 p2 p1 p1 storeId c1 c1 c3 c2 c1 c2 Multi-dimensional cube: date 1 1 1 1 2 2 amt 12 11 50 8 44 4 day 2 day 1 p1 p2 c1 p1 12 p2 11 c1 44 c2 4 c2 c3 c3 50 8 dimensions = 3 9 Aggregates • Add up amounts for day 1 • In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 81 10 Aggregates • Add up amounts by day • In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 ans date 1 2 sum 81 48 11 Another Example • Add up amounts by day, product • In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4 sale prodId p1 p2 p1 date 1 1 2 amt 62 19 48 rollup drill-down 12 Aggregates • Operators: sum, count, max, min, median, ave • “Having” clause • Using dimension hierarchy – average by region (within store) – maximum by month (within date) 13 Cube Aggregation day 2 day 1 p1 p2 c1 p1 12 p2 11 p1 p2 c1 56 11 c1 44 c2 4 c2 c3 Example: computing sums ... c3 50 8 c2 4 8 rollup drill-down c3 50 sum c1 67 c2 12 c3 50 129 p1 p2 sum 110 19 14 Cube Operators day 2 day 1 p1 p2 c1 p1 12 p2 11 p1 p2 c1 56 11 c1 44 c2 4 c2 c3 ... c3 50 sale(c1,*,*) 8 c2 4 8 c3 50 sale(c2,p2,*) sum c1 67 c2 12 c3 50 129 p1 p2 sum 110 19 sale(*,*,*) 15 Extended Cube c2 4 8 c312 p1 p2 c1 * 12 p1 p2 c1* 44 c1 56 11 c267 4 c2 44 c3 4 50 11 23 8 8 50 * 62 19 81 * day 2 day 1 p1 p2 * c3 50 * 50 48 48 * 110 19 129 sale(*,p2,*) 16 Aggregation Using Hierarchies day 2 day 1 p1 p2 c1 p1 12 p2 11 c1 44 c2 4 c2 c3 c3 50 customer region 8 country p1 p2 region A region B 56 54 11 8 (customer c1 in Region A; customers c2, c3 in Region B) 17 Pivoting Fact table view: sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 Multi-dimensional cube: date 1 1 1 1 2 2 amt 12 11 50 8 44 4 day 2 day 1 p1 p2 c1 p1 12 p2 11 p1 p2 c1 56 11 c1 44 c2 4 c2 c3 c3 50 8 c2 4 8 c3 50 18