ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Dimensional Modeling II Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business 1 The Business Dimensional Lifecycle Technical Architecture Design Product Selection & Installation Business Project Planning Requirement Dimensional Modeling Physical Design Data Staging Design & Development Deployment Maintenance and Growth Definition End-User Application Specification End-User Application Development Project Management 2 Outline • Table structure, types, characteristics and terminology • Design steps • Dimensional models with varying types of fact and dimension tables 3 Types of Facts • Transactional facts (transactions or line items in transactions) • Snapshots • Factless facts 4 Types of Dimensions • • • • • Role playing dimensions Heterogeneous dimensions Slowly changing dimensions Large dimensions Many-to-many dimensions 5 Keys and Attributes • Primary key - a column whose value uniquely identifies each row (record) in the table. • Attributes – columns in a table that are not designated as the primary key. • Foreign key – a non-primary-key attribute for a table that corresponds to a primary key of another table. 6 Attributes in DW tables • Dimension Table – One Primary Key – Dimension Attributes • Fact table – Primary key --- A collection of primary keys from all its associated dimension tables • All warehouse keys in fact table are foreign keys referring to its associated dimension tables • All/part of warehouse keys in fact table form the primary key of fact table – Fact Attributes 7 Attributes in DW tables CUSTOMER TIME # * * * * * * * * * * TIME_KEY ORD ERD ATE D AY_ OF_ WEEK D AY_ NUMBER_IN _MONTH D AY_ NUMBER_IN _YEAR WEEK_N UMBER MON TH QUARTER H OL IDAY_FLAG FISC AL_YEAR FISC AL_QUARTER referenced by referenced by SALES reference # # # * * * # * * * * CUSTOMER_ KEY C ID C NAME STATE C ITY reference TIME_KEY PRODUC T_KEY CUSTOMER_ KEY PRIC E QUANTITY SALES Data warehouse keys generated by the system reference referenced by PRODUCT # * * * PRODUC T_KEY PID PNAME PCN AME 8 Keys and Grain • Keys – Primary or natural keys (from source systems) – Warehouse or synthetic keys (generated by a data warehouse tool) • Grain – The level of detail of fact measures described in the DW, e.g., sales transactions from order line items by order date, product and customer 9 Single-Fact-Table Data Warehouse Design Decisions 1. The business questions in focus and source information systems* 2. The grain of the fact table 3. The dimensions tables and keys 4. The fact attributes and dimension attributes *All DW attributes must be mapped to or derived from source attributes 10 Single-Fact-Table Data Warehouse Design Decisions 1. The business questions in focus and source information systems 2. The grain of the fact table 3. The dimensions tables and keys 4. The fact attributes and dimension attributes 11 Sample Business Questions • Report Sales in terms of – (total) amt, (total) qty and (avg.) price • Report Sales by PRODUCT name and/or category name • Report Sales by CUSTOMER name, city and/or or state • Report Sales by ORDER date, month, year, holiday, special event or other time constraints • Report using a combination of the measures and constraints 12 Relational Schema of B.com B2B System Orders ( Order_No, SID, BID, CID, Order_date) OrderLine (Order_No, Line_ID, PID, Actual_Del_Date, Target_Del_Date, Arrival_Date, Shipping_Fee, Tax, Quantity, Unit_Price,Defect_on_arrival) Delivery ( SID, CID, Unit_shipping_fee, UNIT_DEL_TIME) Contract ( CID, Contract_Name, Payment_term, Payment_num) Payment ( PaymentID, OrderNO, Pay_Amount, Date) 13 Relational Schema of B.com B2B System Category ( CAT_ID, CAT_Name) Product ( PID, CAT_ID, P_Weight, P_Life, P_Name) Supplier ( SID, S_Name, S_City, S_State, S_Country) Product_Supply ( PID, SID, Unit_Price, Quantity_in_Stock, Production_in_Week) Buyer ( BID, B_Name, CityID, B_Type) Buyer_City ( CityID, C_Name, C_State, C_Country, C_Tax) 14 Single-Fact-Table Data Warehouse Design Decisions 1. The business questions in focus and source information systems 2. The grain of the fact table 3. The dimensions tables and keys 4. The fact attributes and dimension attributes 15 Grain of the Fact Table Type of fact table: transactional facts Potential grains: order or orderline Constraints: order date, product, customer Grain: sales from orderline (by order date, product, and customer) 16 Single-Fact-Table Data Warehouse Design Decisions 1. The business questions in focus and source information systems 2. The grain of the fact table 3. The dimensions tables and keys 4. The fact attributes and dimension attributes 17 Dimension Tables and Keys Key dimension tables jointly make up the primary key for a fact table CUSTOMER # * * * * CUSTOMER_ KEY C ID C NAME C ITY STATE REFERENCED BY SALES REFERENCE # # # * * * TIME_KEY CUSTOMER_ KEY PRODUC T_KEY PRIC E QUANTITY SALES_AMOU NT REFERENCE REFERENCE REFERENCED BY TIME # TIME_KEY * ORD ER_ DATE * D AY_ OF_ WEEK * D AY_ NUMBER_IN _MONTH * D AY_ NUMBER_IN _YEAR * WEEK_N UMBER * MON TH * QUARTER * H OL IDAY_FLAG ... REFERENCED BY PRODUCT # * * * PRODUC T_KEY PID PNAME PCN AME 18 Single-Fact-Table Data Warehouse Design Decisions 1. The business questions in focus and source information systems 2. The grain of the fact table 3. The dimensions tables and keys 4. The fact attributes and dimension attributes 19 Determine Fact Attributes SALES # # # * * * TIME_KEY CUSTOMER _KEY PRODU CT_ KEY PRIC E QUANTITY SALES_ AMOUNT 20 Types of Fact Attributes • Additive fact attributes can be added along any dimension. CUSTOMER TIME # * * * * * * * * * * TIME_KEY ORD ERD ATE D AY_ OF_ WEEK D AY_ NUMBER_IN _MONTH D AY_ NUMBER_IN _YEAR WEEK_N UMBER MON TH QUARTER H OL IDAY_FLAG FISC AL_YEAR FISC AL_QUARTER referenced by referenced by SALES reference # # # * * * # * * * * CUSTOMER_ KEY C ID C NAME STATE C ITY reference TIME_KEY PRODUC T_KEY CUSTOMER_ KEY PRIC E QUANTITY SALES reference referenced by PRODUCT # * * * PRODUC T_KEY PID PNAME PCN AME 21 Types of Fact Attributes • Non-additive fact attributes cannot be added along any dimension. CUSTOMER TIME # * * * * * * * * * * TIME_KEY ORD ERD ATE D AY_ OF_ WEEK D AY_ NUMBER_IN _MONTH D AY_ NUMBER_IN _YEAR WEEK_N UMBER MON TH QUARTER H OL IDAY_FLAG FISC AL_YEAR FISC AL_QUARTER referenced by referenced by SALES reference # # # * * * # * * * * CUSTOMER_ KEY C ID C NAME STATE C ITY reference TIME_KEY PRODUC T_KEY CUSTOMER_ KEY PRIC E QUANTITY SALES reference referenced by PRODUCT # * * * PRODUC T_KEY PID PNAME PCN AME 22 Types of Fact Attributes • Semi-additive fact attributes can be added along some dimensions. INVENTORY_PRODUCT # PRODUC T_KEY INVENTORY_TIME REFERENCED BY # TIME_KEY INVENTORYFACT REFERENCE # # # * REFERENCED BY REFERENCE TIME_KEY PRODUC T_KEY WAR EHOUSE_KEY QUANTITY_ON_H AND REFERENCE REFERENCED BY WAREHOUSE # WAR EHOUSE_KEY 23 Time Dimension • Data warehouse needs an explicit time dimension table instead of just a time attribute (e.g, ORDERDATE). • Save computation effort and improve query performance • Complex queries regarding calendar calculation are hidden from end users of data warehouse. 24 Time Dimension Besides the time attribute, time dimension table includes the following additional attributes: – Day_of_week (1-7); Day_number_in_month (131); – Day_number_in_year (1-365) – Week_number (1-52); month (1-12), Quarter (14) – Holiday_flag (y/n) – Fiscal_quarter, Fiscal_year 25 Determine Dimension Attributes CUSTOMER TIME # * * * * * * * * * * TIME_KEY ORD ERD ATE D AY_ OF_ WEEK D AY_ NUMBER_IN _MONTH D AY_ NUMBER_IN _YEAR WEEK_N UMBER MON TH QUARTER H OL IDAY_FLAG FISC AL_YEAR FISC AL_QUARTER referenced by referenced by SALES reference # # # * * * # * * * * CUSTOMER_ KEY C ID C NAME STATE C ITY reference TIME_KEY PRODUC T_KEY CUSTOMER_ KEY PRIC E QUANTITY SALES reference referenced by PRODUCT # * * * PRODUC T_KEY PID PNAME PCN AME 26 Avoid Snowflake Designs CUSTOMER TIME # * * * * * * * * * * TIME_KEY ORD ERD ATE D AY_ OF_ WEEK D AY_ NUMBER_IN _MONTH D AY_ NUMBER_IN _YEAR WEEK_N UMBER MON TH QUARTER H OL IDAY_FLAG FISC AL_YEAR FISC AL_QUARTER referenced by referenced by SALES reference # # # * * * # * * * * CUSTOMER_ KEY C ID C NAME STATE C ITY reference TIME_KEY PRODUC T_KEY CUSTOMER_ KEY PRIC E QUANTITY SALES reference referenced by PRODUCT # * * * PRODUC T_KEY PID PNAME PCN AME 27 Avoid Snowflake Design TIME REFERENCED BY REFERENCED BY SALES CUSTOMER REFERENCE REFERENCE REFERENCE REFERENCED BY PRODUCT_CATEGORY # PRODUC T_C ATEGOR Y_KEY * PCID * PCN AME PRODUCT REFERENCED BY REFERECE Snowflake structure # * * * PRODUC T_KEY PID PNAME PRODUC T_C ATEGOR Y_KEY 28 Avoid Snowflake Schemas • Tradeoff of avoiding snowflake – Advantage: improve query performance and easy of understanding – Disadvantage: require more storage space 29