3/5/2015 Class 06.1 Chapter 1 Adamson & Venerable Spring 2015 Dimensional Modeling • • • • • • • Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures Multiple Processes Aggregation 1 3/5/2015 Dimensional Modeling • Dimensional Model Basics – OLTP Model for Transactions – Master Files vs. Transaction files? – Relational Model vs. Dimensional Model • ERDs vs. Star Schema • Normalization issues • Reports & Queries in Relational Models can be complex: – Especially were seeking historical comparisons – Business Process Measurements (Not in RDBMS) • • • • Gross Margins Average Aged A/R balances Inventory levels compared to Sales by Warehouse Location ROI, etc. Fact & Dimension Tables Order Entry Process – Table 1.1 Facts or Measures Dimensions Gross sales dollars Customer Name Total Cost Customer state, zip, etc. Margin dollars Date of order Quantity Sold Quarter of order Month of order Year of order Salesperson name Sales Territory Corporate region Product name Product brand Product category 2 3/5/2015 Dimensional Modeling • Fact & Dimension Tables – Use Normalization approach to determining appropriate dimension tables – Functional dependencies • Customer_key Name, address, city, state, zip • Salesperson_key Name, code, territory, region • Date_key date, day_week, day_of_month, etc. • Order_key product_key, order_date_key, salesperson_key, customer_key, order_dollars, extended_cost, margin_dollars, quantity_ordered, order_number, order_line Dimensions (continued) • Dimensions represent key characteristics of each fact or transaction • For example, a sale “fact” would consist of an Item sold, quantity, and price; • A required Dimension will thus be: PART, which will contain other data about the Item, such as description, weight, size, etc. • However, we may want to know about other attributes/properties or Dimensions of the Sale, such as: Customer, Rep, Location, Time represented as Day of Week, etc. 3 3/5/2015 Star Diagram for Orders NEW NEW Star Schema: Fact Tables • Primary Keys on Fact Table – Foreign Keys • Orders [Order_Date_key, Product_key, salesperson_key, customer_key, ….] – Surrogate Key • Orders [Order_ID, Order_Date_key, Product_key, Salesperson_key, Customer_key, etc….] – Avoid Keys from other Systems • Granularity – What level of detail? • Individual Transactions (lowest “grain”) • Daily summaries, weekly summaries, monthly summaries, etc. • Must be uniform and consistent throughout data warehouse for computational purposes 4 3/5/2015 Dimensional Modeling: The Star Schema • RECALL: Data Warehouses are relatively nonvolatile; – However, some changes may be necessary • Most common changes are updates to the FACT tables. – However, these are generally done in “batches” • Even the Dimension tables require some changes: Type 1; Type 2; and Type 3 … Dimensional Modeling • Changing dimension tables: – Type 1 Change • Correct an error or a value that has no significance in future analysis – E.g., phone number change in Customer dimension – Type 2 Change • Add new row in dimension table – Significant change, such as a new location for customer – Type 3 Change • Add new dimension or modify schema (add columns) – New organization structure for firm; want to be able to compare new and old. E.g., International Currency Conversion 5 3/5/2015 Type 1 Example Susan's Tax Bracket attribute value changes from Medium to High Jukić, Vrbsky, Nestorov – Database Systems Chapter 8 Type 2 Example Susan's Tax Bracket attribute value changes from Medium to High (SIMPLE CASE) Jukić, Vrbsky, Nestorov – Database Systems Chapter 8 – Slide 15 6 3/5/2015 Type 3 Example A Susan's Tax Bracket attribute value changes from Medium to High Jukić, Vrbsky, Nestorov – Database Systems Chapter 8 – Slide 16 Type 3 Example B (with timestamps) Susan's Tax Bracket attribute value changes from Medium to High Jukić, Vrbsky, Nestorov – Database Systems Chapter 8 – Slide 17 7 3/5/2015 Facts and Measures • 3 Kinds of Measures – Fully additive • Facts may be added equally well across any dimension • E.g., gross sales dollars, extended cost, margin dollars, quantity sold – Non-additive • Facts that may not be added. Generally ratios. – E.g., Margin rate • Need to break components into individual facts: – Margin dollars, sales dollars – Semi-additive • Fact that can be summarized across some dimensions – Units sold of same vs. different products • E.g., Banking account balances can be totaled for point in time, but not across time periods. More Examples of Kinds of Measures • Fully Additive – Most dollar amounts are fully additive. • Dollars of all products sold in a time period. • Non-Additive – Percentages and ratios are not additive without appropriate weightings • 10% gain in Investment #1 can not be added to a 20% gain in investment #2 • Semi-Additive – A bank can add all customer balances together at the end of each month; but not across time. 8 3/5/2015 Dimensional Modeling • Multiple Processes – E.g., Orders vs. Shipments – Different processes • Payroll vs. Purchasing – Different grains or granularity • E.g., Time dimension … Very Important!! – Hour (time of day?) » Scalar function performed on computer clock – Day – Week – Month – Year Multiple Processes: Orders & Shipments Data Marts are subsets of the Data Warehouse: E.g., Orders and Shipments are separate data marts 9 3/5/2015 Dimensional Modeling • Aggregation (Σ) – As size of Data Warehouse grows we may need to aggregate data in order to improve query performance – Reduces granularity! • E.g., Change from Daily to Monthly Summaries – Keep same Dimension Tables – Time table may be modified – Fact table keeps same types of attributes as before » BUT quantities now represent monthly summarizations of data. Aggregate Table Month_date_key De-normalize from Day to Month Adjust primary key accordingly 10 3/5/2015 Becker’s Rule #2: Materializing Object Views (MOV) • 1. Objects • 2. Views • 3. Materialize Views Reports, Forms Gartner Magic Quadrant BI & Analytics Platforms – Layouts, formats, etc Tables 11 Gartner Magic Quadrant BI & Analytics Platforms 3/5/2015 Summary • • • • • • Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Data Changes: Type 1, Type 2, Type 3 Facts and Measures: Additive, Nonadditive, Semi-additive • Multiple Processes • Aggregation 12