Dimensional Modeling Overview Agenda ■ Dimensional Modeling Overview ■ Dimensional Modeling Steps ■ Dimensional Modeling Framework ■ Retail Sales - Case Study ■ Dimensional Modeling Tips Dimensional Modeling Overview From Requirement to Data Design ■ The requirements definition completely drives the Data design for the DW. ■ Data design consists of putting together the data structures. ■ A group of data elements form a data structure. ■ Logical data design includes determination of the various data elements that are needed and combination of the data elements into structures of data and also establishing relationships among the data structures. Information Package Rqts. Definition Document Requirements Gathering Dimensional Model Data Design Design Decisions ■ Choosing the process. Selecting the subjects from the information packages for the first set of logical structures to be designed. ■ Choosing the grain. Determining the level of detail for the data structures. ■ Identifying and conforming the dimensions. Choosing the business dimensions (such as product, market, time, etc.) to be included in the first set of structures . ■ Choosing the facts. Selecting the metrics or units of measurements (such as product sale units, dollar sales, dollar revenue, etc.) included in set of structures. ■ Choosing the duration of the database. Determining how far back in time you should go for historical data. Dimensional Modeling Overview ■ What is Dimensional Modeling ♦ It is a logical design technique to structure the business dimensions and the metrics that are analyzed along these dimensions. ♦ A logical design technique that seeks to present the data in a standard framework that is intuitive and allows for high performance access ♦ It is inherently dimensional and adheres to a discipline that uses relational model with some important restrictions ♦ The fundamental idea of dimensional modeling is that nearly every type of business data can be represented as a kind of cube of data ♦ The model has also proved to provide high performance for queries and analysis Dimensional Modeling Overview ■ Components of Dimensional Model ♦ Fact Table ► The fact table contains facts or measurements of the business ♦ Dimension Table ► The dimension tables contain textual attributes that describe the facts Dimensional Modeling Overview Sample Report Translation Sales Rep Performance Report Central Region Jul-00 Aug-00 (Dollars) (Dollars) Chicago District 879 878 Adams 345 456 Brown 564 565 Frederickson 657 768 Minneapolis District 890 789 Andersen Smith 909 978 4244 4434 Central Region Total “Dimension” Report,row and column heading “Facts” Numeric report values Dimensional Modeling Overview ■ Dimension Tables ♦ Dimension Tables are the entry points into the data warehouse ♦ Dimension tables are designed especially for selection and grouping under a common head ♦ Determine contextual background for facts ♦ Parameters for OLAP ♦ Common Dimensions ► Date ► Product ► Location/Region ► Customers The dimensional model to represent the information contained in the information package, the data structure must be represent with Metrics, business dimensions and attributes for each business dimension. Dimensional Modeling Overview ■ Dimension Tables ♦ Dimension Table Characteristics ► Serve as report labels and query constraints • “By” words • “Where” clauses ► Provide Descriptive Information • Minimal codes • Embedded meaning as attributes ► Represent hierarchical relationships Let see Product business dimension example, When we want to analyze the fact by products. ITEM KEY Item Description Item Size Package Type Category Dimensional Modeling Overview Product Dimension Table with Sample Rows Product Key Product Desc. Brand Desc. Class Desc. Size Product Key Product Desc Brand Desc Class Size 0001 CHEERIOS 10 OZ CHEERIOS Family 10 OZ 0002 CHEERIOS 24 OZ CHEERIOS Family 24 OZ 0003 LUCKY CHARMS 10 OZ LUCKY CHARMS Kids 10 OZ Big Dimensions Retail_Fact time_Key Customer Dimension Store_Key Customer_key Customer_Key Customer_ID (natural key) Product_key Customer_name Sales_dollars Customer_address Units_sold Date_of_birth Age Gender Annual_Income Number_of_children Marital_status Other_attributes... Big Dimensions Customer Dimension Customer_key Customer_ID Customer Dimension Customer_key Customer_ID (natural key) Customer_name Customer_address Date_of_birth Customer_name Customer_address Date_of_birth Age Customer Demographics Dimension Customer_demographics_key Age_Band Gender Annual_Income Gender Income_Band Number_of_children Number_of_children Marital_status Marital_status Other_attributes... Dirty dimensions ■ A dirty dimension is the one in which data quality cannot be guaranteed ♦ Data about the same customer can appear multiple times Customer Dimension Customer_key Customer_ID (natural key) Customer_name Customer_address Date_of_birth Age Gender Annual_Income Number_of_children Marital_status Other_attributes... Hierarchies in Dimensions ■ Multiple Hierarchies ♦ Dimension tables can represent multiple hierarchies roll-ups ♦ For example ,Store Dimension could have ♦ the following hierarchies ► Physical Geography • Zip, City, County, State, Country ► Sales Organization • District, Region, Zone ► Distribution Roll-up • Distribution Center, Distribution Center Region Store Dimension Store_key Store_description Store_type Zip City State Sales_region Sales_zone Distribution Center Distribution Center Region Hierarchies in Dimensions Year Region Quarter State Month District Region Category Sales Zone Brand Product Week Date City Dimension Tables can represent multiple hierarchical roll-ups Dimensional Modeling Overview ■ Fact Table ♦ The fact table is where the numerical measurements of the business are stored ♦ Facts ► The detail information in a Fact tables • For Examples: Sales Quantity, Unit Sales Price, Sales Amount etc. ► ► ► Key performance indicators of the business Numeric in Nature Analyzed across the dimensions ♦ Multi-part key Sales Fact DATE KEY ITEM KEY STORE KEY PROMOTION KEY ► Foreign keys to dimension tables POS TRXN# ► Date is always a key Sales Quantity Unit Sales Price Sales $ Amount Dimensional Modeling Overview ■ So far we have formed fact table and dimension tables. ■ How should these tables be arranged in the dimensional model? ■ What are the relationships and how should we mark the relationships in the model? ■ The dimensional model should primarily facilitate queries and analyses. What would be the types of queries and analyses? ■ Before combining these tables in dimensional model. What are the requirements? The model should provide the best data access. The whole model must be query-centric. It must be optimized for queries and analysis. The model must show that the dimension tables interact with the fact table. It should structured in such a way that every dimension can interact equally with the fact table. The model should allow drilling down or rolling up along dimension hierarchies. With this rqts., each of the dimension tables are directly relates to fact table in the middle. ■ Such an arrangement in the dimensional model looks like a star formation, with the fact table at the core of a star and the dimension tables along the spikes of the star. ■ The dimensional model is therefore called a STAR schema. ■ See figure: Star like Model Dimension2 Dimension3 Fact Dimension1 Dimensionn STAR schema for AUTO Sales PRODUCT TIME DEALER AUTO SALES PAYMENT METHOD CUSTOMER DEMO GRAPHICS Typical Star model Product Dimension Date Dimension Sales Fact Product_key Date_key Description Product_key Brand Store_key Category Date_key Day_of_week Month Quarter Dollars_sold Store Dimension Year Units_sold Store_key Holiday_flag Dollars_cost Store_name Address Floor_plan_type Star model….. ■ It consists of sales fact table in the middle of schema diagram. It have 3 dimension tables of Date, Product and Store. ■ The user will analyze the sales using dollar sold, unit sold and dollar cost. ■ From the STAR schema structure intuitively answers the questions for a given amount of dollars, what was the product sold? Who was the customer? Which store sold the product? When was the order placed? ■ Constraints and filters of queries are easily understood by looking at the star. ■ A common type of analysis is the drilling down the summary numbers to get at the details at the lower levels by filtering queries. Snowflake Design Dim Table Dim Table Fact Table Dim Table Dim Table Low cardinality redundant attributes moved to “sub dimension” tables Snowflake Design ■ Issues ♦ Only few tools optimized for snowflake schema ♦ When the tool is not optimized for snowflake design ► Presentation more complex ► Browsing is slower ► Problems with multiple joins Dimensional Modeling Steps Snowflake Design Dimensional Modeling Steps ■ Identify the Business Process ♦ A major operational process that is supported by some kind of legacy system(s) from which data can be collected for the purpose of the data warehouse ♦ Example: orders, invoices, shipments, inventory, sales ■ Identify the Grain ♦ The fundamental lowest level of data represented in a fact table for the business process ♦ Example: individual transactions, individual daily snapshots ■ Identify the Dimensions ♦ Choose the dimensions that will apply to each fact table record ■ Identify the Facts ♦ Choose the measured facts that will populate each fact table record Dimensional Modeling Steps Key Inputs Business Requirements Data Reality Dimensional Modeling Steps 1. 2. 3. 4. Identify the business Process Identify the Grain Identify the Dimensions Identify the Facts Output Dimensional Model Resist the temptation to model data by looking copy books alone Conformed Dimension Shared Dimensions Must Conform ■ Option 1: Identical dimensions with the same keys, labels, definitions and values Sales Schema Inventory Schema Item Key DATE KEY Item Desc. ITEM KEY Brand Desc. STORE KEY Category PROMO KEY .. Sales Fact Item Key DATE KEY Item Desc. ITEM KEY Brand Desc. Category .. STORE KEY Inventory Fact Conformed Dimension ■ Option 2: “Subset” of base dimension Sales Schema Item Key DATE KEY DATE KEY Item Desc. ITEM KEY Day-of-week Brand Desc. STORE KEY Week Desc Category PROMO KEY Month Desc Desc. Sales $ .. Item key Item Desc 0001 Cheerios 10oz Forecast Schema Brand Desc Cheerios Category Desc Cereal Brand Key Month Key Month KEY Brand Desc. Brand Key Month Desc Category Estimate Desc. Sales $ .. Brand key 1001 Brand Desc Cheerios Category Desc Cereal Slowly Changing Dimensions ■ Dimension attributes evolve over time ♦ For example, customers change their names, move, have children, adjust their Incomes ■ For every dimension attribute, need to identify “Changes” strategy ♦ May use combination of strategies within same dimension table Slowly Changing Dimensions Type 1 : Overwrite the changed attributes Original record Item Key Item Desc Dept 12345 Sim City 3000 Educational S/W Updated record Item Key Item Desc Dept 12345 Sim City 3000 Strategy S/W Slowly Changing Dimensions Type 2 : Add a New Dimension Record Original record Item Key Item Desc Dept 12345 Sim City 3000 Educational S/W Additional record Item Key Item Desc Dept 12345 Sim City 3000 Strategy S/W Slowly Changing Dimensions Type 3 : Add a “Prior” Attribute Original record Item Key Item Desc Dept 12345 Sim City 3000 Educational S/W Updated record Item Key Item Desc Dept 12345 Sim City 3000 Strategy S/W Prior Dept Educational S/W Slowly Changing Dimensions ■ Data Warehouse Keys ie., STAR schema keys ♦ All tables (facts and dimensions) should use Data Warehouse generated surrogate keys ♦ It is possible that the customer no., of discontinued customers are reassigned to new customers. We will have a problem because the same customer no,. Could relate to the data for the newer customer and also to the data of the retried customer. Therefore do not use such keys as a primary keys for dimensional tables. ♦ The surrogate keys are simply system generated sequence numbers. ♦ Each row in a dimension table is identified by a unique value of an attribute designated as the primary key of the dimension. ♦ Each dimension table is in 1:M relationship with central fact table. So the primary key of each dimension table must be a foreign key in the fact table. Additive Measures ■ The ability of measures to be added across all dimensions of the fact table. ■ Measures could be fully additive, semi additive or non additive ♦ Fully Additive - The values of the attributes summed up by simple addition, Aggregation is a fully additive measures is done by simple addition. Sales Quantity, Revenue ♦ Semi Additive - Account Balance, Inventory, number of customers (Measure of Intensity, head counts) ♦ Non-Additive - Profit margin (Ratios and Percentages) i.e., Ratio or Percentages should not be added 1:3, 30%, etc. Factless Fact tables to track attendance of students Time Dimension time_key time attributes….. Student Dimension Student Attendance Fact Student_key time_Key Student attributes….. Course Dimension Student_Key Course_key Course_Key Course attributes….. Faculty_Key Facility_key Faculty Dimension Facility Dimension Faculty_key Facility_key Faculty attributes….. Facility attributes….. Factless Fact tables – Coverage tables Sales Fact (revisited) Date Dimension Product Dimension Date_key Product_key Day_of_week Month Quarter Sales Fact Description Date_key Brand Product_key Category Year Holiday_flag Promotion Dimension Promotion_key Promotion_key Store_key Store Dimension Dollars_sold Store_key Discount Units_sold Store_name Event Dollars_cost Promotion_name Address Floor_plan_type Factless Fact tables – Coverage tables Date Dimension Product Dimension Date_key Product_key Day_of_week Month Quarter Sales Fact Date_key Year Holiday_flag Promotion Dimension Promotion_key Product_key Description Brand Category Promotion_key Store_key Promotion_name Store Dimension Discount Store_key Event Store_name Address Floor_plan_type Dimensional Modeling Framework Dimensional Modeling Framework Identify Subject Area, Grain Conceptual Level Identify Major Dimension & Facts, Conform Dimensions across Facts Level of detail Detail Facts with Measures •Detail Dimensions with Hierarchies &Attributes •Slowly changing Dimensions Policies Logical Level Source-Data Model Mapping Pre-calculations, Aggregates, Indexes, Data Structures, Source-Physical Model Mapping Physical Level Dimensional Modeling Framework ■ STEP1: Choose the process ♦ Chose a process or subject area from the list of subject areas identified ♦ Examples of this could be Sales Analysis, Strategic Sourcing, Human Resources ■ STEP2 : Choose the Grain ♦ Choose the level of detail ♦ Every data mart / warehouse should be based on the most granular (atomic) data that can possibly be collected and stored. Dimensional Modeling Framework ■ STEP3 : Identifying Dimensions & Dimension Hierarchy Year Region Category Quarter State Brand Month District Product Week Date City Dimensional Modeling Framework ■ STEP4 : Choose the Measures ♦ Identify all the measures for the fact table ■ STEP5 : Conforming the dimensions ♦ Common dimensions across the Facts/ data marts have to be exactly same or subset of the main dimension table Dimensional Modeling Framework ■ STEP6 : Adding Attributes to Dimension Tables ♦ Enhance the depth of analysis ♦ Examples Customer Age, Address, Profession, Product color, flavor, product size, packaging type etc. Regional Manager Year Region Category Quarter State Brand Month Week District Product Date City Sequence CurrentFlag Phone Address Size Color Dimensional Modeling Framework ■ STEP7 : Storing Pre-calculations in the Fact table ♦ Calculated based on one or more base measures ■ STEP8 : Choosing the Duration of the Database ♦ Need for analyzing the data over a period of time Dimensional Modeling Framework ■ STEP9 : Track Slowly Changing Dimensions ♦ Certain kinds of dimension attribute changes need to be handled differently in Data Warehouse ► Type I – Overwrite ► Type II - History ► Type III – Add new column example :- Organizational changes Retail Sales – Case Study Retail Sales - Case Study ■ Background ♦ Chain consists of over 100 grocery stores in five states ♦ Stores average 60,000 SKUs in departments such as frozen foods, dairy etc. ♦ Bar codes are scanned directly into the cash registers PoS system ♦ Items are promoted via coupons, temporary price reductions, ads and in-store promotions ■ Analytical Requirements ♦ Need to know what is selling in store each day in order to evaluate product movement, as well as to see how sales are impacted by promotions ♦ Need to understand the mix of items in a consumer market basket Retail Sales - Case Study ■ Identify the Business ■ Sales Process ■ Identify the Grain ■ Identify the Dimensions ■ Transaction Item (Daily Sales) ■ Date, Location, Item, Promotion ■ Identify the Measures ■ Quantity, Price, Amount Retail Sales - Case Study DATE KEY Date Description Week Month Quarter Year STORE KEY ITEM KEY DATE KEY ITEM KEY STORE KEY PROMOTION KEY POS TRXN# Sales Quantity Unit Sales Price Sales $ Amount Store Name City District Zone Item Description Item Size Package Type Category PROMOTION KEY Promotion Description Discount Resultant Sales Schema Retail Sales - Case Study Time Dimension Date Key Date Day of Week Day Number in Month Month Quarter Year Holiday Indicator 1 1/1/1999 Friday 1 January Q1 1999 Holiday 2 1/2/1999 Saturday 2 January Q1 1999 Non-Holiday 3 1/3/1999 Sunday 3 January Q1 1999 Non-Holiday 4 1/4/1999 Monday 4 January Q1 1999 Non-Holiday SKU Number Dept Size Package Type Brand Category Item Dimension Itemkey Item Description 1 Lasagna 6 OZ 90706287103 Grocery 6 OZ Box Cold Gourmet Frozen Foods 2 Beef Stew 6 OZ 16005393282 Grocery 6 OZ Box Cold Gourmet Frozen Foods 3 Extra Nougat 2 OZ 46817560065 Grocery 6 OZ Can Chewy Candy Retail Sales - Case Study Promotion Dimension Promo Key Promo Name Price Reduction Ad Type Media Type Promo $ Begin Date End Date 1 Blue Ribbon Discounts Temporary Daily Paper Paper 2000 1/1/1999 1/15/1999 2 Red Carpet Closeout Markdown Sunday Paper Paper 1000 1/3/1999 1/10/1999 3 Ad Blitz None Paper and Radio Paper and Radio 7000 1/15/1999 1/30/1999 Sales Fact Date Key Item Key Store Key Promo Key POS Trxn # Sales Qty Unit Sales Price Sales $Amt 1 1 1 15 763457893 1 4.59 4.59 1 2 1 1 763457893 2 0.89 1.78 1 5 11 19 763457894 1 2.56 2.56 2 13 5 8 763457923 1 0.33 0.33 2 5 11 12 763457998 1 1.29 1.29 Dimensional Modeling Tips Dimensional Modeling Tips ■ Carefully choose the labels to identify data marts, dimension, attributes and facts ■ An attribute can live in one and only one dimension, whereas a fact can be repeated in multiple fact tables ■ If a single dimension appears to reside in more than one places, several roles are probably being played. Name the roles uniquely and treat them as separate dimensions ■ A single field in the underlying source data can have one or more logical columns associated with it ♦ E.g., A product attribute field may translate to product code, product short description, and product long description ■ Every fact should have a default aggregation rule (sum, min, max, latest, semi-additive, special algorithm, and not aggregatable) ♦ This will serve as a requirements list for query and report writers tools evaluations Thank You ■ References ♦ Ralph Kimball ► The Data Warehouse Toolkit ► The Data Warehouse Lifecycle Toolkit