Tips and Tricks for Dimensional Modeling By Shawn Jackson Overview • Set of techniques and concepts used in data warehouse design • Intended to support end-user queries and is oriented around understandability and performance • Uses the concepts of facts (measures) and dimensions (context) • Facts are typically (but not always) numerical values that can be aggregated • Dimensions are groups of hierarchies and descriptors that define the facts Star Schema Snowflake Schema Kimball University: 10 Essential Rules of Dimensional Modeling (#1-5) 1. 2. 3. 4. 5. Load detailed atomic data into dimensional structures • Store data at the lowest grain • Use summary tables/views to improve performance as necessary Structure dimensional models around business processes • Fact tables should be based on a business event • Complement single process fact tables with consolidated fact tables that combine metrics from multiple processes at the same level of detail Ensure that every fact table has an associated date dimension table Ensure that all facts in a single fact table are at the same grain or level of detail Resolve many-to-many relationships in fact tables Kimball University: 10 Essential Rules of Dimensional Modeling (#6-10) 6. 7. Resolve many-to-one relationships in dimension tables Store report labels and filter domain values in dimension tables • • 8. 9. Don’t store codes and descriptions in the fact table Make sure the full description of the code is in the dimension table Make certain that dimension tables use a surrogate key Create conformed dimensions to integrate data across the enterprise • • Date dimension is a common example Single version of the truth 10. Continuously balance requirements and realities to deliver a DW/BI solution that's accepted by business users and that supports their decision-making Slowly Changing Dimensions • • • • • • Type 0 Type 1 Type 2 Type 3 Type 4 Type 6 SCD Type 0 • • • • Rows are added but never changed Missing true business / natural key Typically are only used in derived dimensions Type 0 attributes are more common Supplier Key Supplier Name 123 Acme Supply Co 124 Acme Supply Company SCD Type 1 • Rows can be updated or added based upon business key • Historical information is not tracked Supplier_Key Supplier_Code Supplier_Name Supplier_State 123 ABC Acme Suply Co CA Supplier_Key Supplier_Code Supplier_Name Supplier_State 123 ABC Acme Supply Co CA Supplier_Key Supplier_Code Supplier_Name Supplier_State 123 ABC Acme Supply Co IL SCD Type 2 • Rows are only added • A version number or effective dates are used to keep track of history Supplier Key Supplier Code Supplier Name Supplier State Start Date End Date 123 ABC Acme Supply Co CA 01-Jan-2000 21-Dec-2004 124 ABC Acme Supply Co IL 22-Dec-2004 SCD Type 3 • Rows are updated but not added • Historical information is preserved through extra columns Supplier Key Supplier Code Supplier Name Original / Prior Supplier State Effective Date Current Supplier State 123 ABC Acme Supply Co CA 22-Dec-2004 IL SCD Type 4 • Combination of type 1 and type 2 dimensions • Rows are updated in the type 1 table and added in the type 2 table Supplier Supplier_key Supplier_Code Supplier_Name Supplier_State 123 ABC Acme Supply Co IL Supplier History Supplier HistKey Supplier Key Supplier Code Supplier Name Supplier State Start Date End Date 1001 123 ABC Acme Supply Co CA 01-Jan-2000 21-Dec-2004 1002 123 ABC Acme Supply Co IL 22-Dec2004 SCD Type 6 / hybrid • Combines type 1, 2 and 3 in one table Supplier Supplier Key Code Supplier Name Current Prior State State Start Date End Date Current Flag 123 ABC Acme Supply Co NY CA 01-Jan2000 21-Dec2004 N 124 ABC Acme Supply Co NY IL 22-Dec2004 03-Feb2008 N 125 ABC Acme Supply Co NY NY 04-Feb2008 Y Roleplaying Dimensions • Recycled for multiple applications within the same database • Date dimension is commonly used (sale date, delivery date) • Can be used to get different views of data Roleplaying Example Factless Fact Tables • Tracking events • Many to many joins