Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions): The hidden slides of this slideshow may be important. However, I will focus on leaning by exercises and therefore, rattling off new concepts are often done in hidden slides. Introduction to Slowly Changing Dimensions (SCD) If the attributes of a dimension is dynamic (e.i. they may be updated) we say that they are slowly changing. May the Branch-size of a Branch-office change after e.g. a renovation? May the Branch-name of a Branch-office change? Fact table Bank accounts - Account# - Interest-last-year - Cost-last-year - Branch# Dimension Branch-offices - Branch# - Branch-name - Branch-size Exercise in SCD: Soppose the attribute Branch-size is dynamic and aggregations is made to the level (Branch-size, Year) or (Branch-size, Month) . Does this aggregation make sense and how would you solve possible problems? Fact table Bank accounts - Account# - Interest-last-year - Cost-last-year - Branch# Dimension Branch-offices - Branch# - Branch-name - Branch-size Exercise in SCD: Soppose the attribute Branch-name is dynamic and aggregations is made to the level (Branch-name, Year). Does this aggregation make sense and how would you solve possible problems? Fact table Bank accounts - Account# - Interest-last-year - Cost-last-year - Branch# Dimension Branch-offices - Branch# - Branch-name - Branch-size Problems with slowly changing dimensions: •If you do not update a dynamic attribute the datawarehouse is stale. •If you update a dynamic attribute the old measures may be aggregated to a wrong attribute level value as e.g. the Branch office size! TimeID Dayname TimeID Week Branch Office Month Quarter ProductID … Branch Office Address City District Size group Value group ProductID Year Amount Product name Day no Price Product group Working day Price category Which dimension attributes and relationships may be slowly changing and which of these give aggregation problems? Response type Evaluation criteria Is historical information preserved Aggregation performance Storage consumption Response 1 where dimension records are overwritten No In the evaluation, we define this solution to have average performance Only the current dimension record version is stored. No redundant data is stored Response 2 where new versions are created Yes Version records makes performance slower proportional to the number of changes All old versions of dimension records are stored often with redundant attributes Response 3 where only one historical version is saved The current version and a single history destroying version are saved No performance degradation occurs if either the current or the historical version are used in a query Normally, only a single extra attribute version is stored Response 4 that use the top of a dynamic dimension hierarchy as a new static dimension Yes Better or worse depen-ding on whether both dimension tables are used in a query The relatively large fact table must have an extra foreign key attribute Response 5 with dimension data as fact data Yes Better or worse depen-ding on whether the new fact data are used in a query The relatively large fact table must have an extra attribute for each dynamic dimension attribute Response 6 that use fine granularity in combination with response 1 or 3 The finer the granularity, the more historical state information is preserved The finer the granularity, the slower the performance The finer the granularity, the more storage consumption Response 7 that stores dynamic dimension data as static facts in another data mart Yes Better or worse depen-ding on whether both fact tables are used in a drill across query This is the most storage consuming solution as at least a new fact and foreign key are stored in the new fact table Kimball’s type 1 response: Owerwrite the old value: Bank account Fact - Account-ID - Time-ID - Branch-ID - Interest-last-month - Cost-last-month Figure 3.2 Branch-office Dimension - Branch-ID - Branchname Time Dimension - Time-ID - Monthname Response 1 used with dimension attribute change: Sales fact table Bran-ID … 001 Quantity Branch office dimension … 2000 Bran-ID … Quantity 001 2000 Bran-ID … Quantity 001 2000 001 3500 Bran-ID … 001 … … Br-Name … Centre ButikID … Br-Name 001 West ButikID … Br-Name 001 West … … In response 2 you create a new version of the changed record: Sales fact table Bran-ID … Quantity 001 2000 Bran-ID … Quantity 001 2000 Branch office dimension … … … Bran-ID … Bran-Size 001 250 Bran-ID … Bran-Size 001 250 002 450 Bran-ID … Bran-Size Bran-ID … Quantity 001 2000 001 250 002 3500 002 450 … … … How is it possible to aggregate to the fhysical Branch office level? Exercise in SCD: Soppose the attribute Branch-name and Branch-size use response type 1 and 2, respectively and are changed at the same time. How is it in this situation possible not to preserve the historic Branch-name information as the this gives wrong name level aggregations? Fact table Bank accounts - Account# - Interest-last-year - Cost-last-year - Branch# Dimension Branch-offices - Branch# - Branch-name - Branch-size Exercise: Customers What SCD responces will you recommend for the datawarehouses designed in the car rentel case of slideshow 1. Branch offices Orders Contracts Pick up Reservations Car return Cars Car types Garage services Garages Kimball’s 3 responces to slowly changing dimensions : 1. Owerwrite the old value. 2. Create a new dimension record with the new value. 3. Create an extra attribute for the changed dimension value. Kimball’s type 3 response: Create an extra attribute for the changed dimension relationship. Suppose the product group of a product may be changed. Does this solution make meaningful aggregations to the two group levels? In response 3, you create a new version attribute: Order-line fact table Bran-ID … Quantity 001 2000 Bran-ID … Quantity 001 2000 Bran-ID … Quantity 001 2000 001 3500 Branch office dimension … … … Bran-ID … Old-Size New-Size 001 250 250 Bran-ID … Old-Size New-Size 001 450 250 Bran-ID … Old-Size New-Size 001 450 250 Does this solution make meaningful aggregations to the two Size levels? … … … Response 3 should only be used for a new grouping criteria: Order-line fact table Prod-ID … Quantity 001 2000 Prod-ID … Quantity 001 2000 Prod-ID … Quantity 001 2000 001 3500 Product dimension … Prod-ID 001 … Prod-ID 001 … Prod-ID 001 … Old-group New-group … A … Old-group New-group … A B … Old-group New-group … A B What is the difference between the Grouping update and the previous Branch size update as the Grouping aggregations functions well while the Branch-size aggregations does not give any meening? Suppose the product group of a product may be changed. How would you implement SCD response 2 in this example? Orderdetail fact - Order-ID - Product-ID - Qty - Price Product dimension - Product-ID - Group-ID - Product-name Productgroup dimension - Group-ID - Group-name Will SCD response 2 make meaningful aggregations if you want to compare product group sale over time? Will SCD response 3 make meaningful aggregations? Exercise in when to preserve historic information. Exchange the Product dimension with a Branch office dimension and the Productgroup dimension with a Branch-Size dimension in the following example! Orderdetail fact - Order-ID - Product-ID - Qty - Price Product dimension - Product-ID - Group-ID - Product-name Productgroup dimension - Group-ID - Group-name Notice! It may be both attribute and business dependent whether you want to preserve historic information or not. Will SCD response 2 make meaningful aggregations if you want to compare the sale of the Branch-Size over time? Will SCD response 3 make meaningful aggregations? Suppose the product group of a product may be updated. Will the response type 1 give correct aggregations to the group level if you want to compare product group sale over time? Orderdetail fact - Order-ID - Product-ID - Qty - Price Product dimension - Product-ID - Product-name - Group-ID - Group-name Suppose the product group of a product may be changed. Will the solution below give correct aggregations to the group level if you want to compare product group sale over time? Orderdetail fact - Order-ID - Product-ID - Group-ID - Qty - Price Product dimension - Product-ID - Product-name Productgroup dimension - Group-ID - Group-name SCD Type 4 may be used in dynamic dimension hierarchies: Suppose both salary group and product group are dynamic. Does this make SCD problems? Product Dimension - Product-ID - Product-name -Product-group-name Figure 2.1 Order Dimension - Order-ID -Ordertype . .. Dimension Hierachy OrderdetailsFact - Product-ID - Order-ID - Date-ID - Salesman-ID - Qty - Price Time Dimension - Date-ID - Date - Month - Year - Holiday indication Salesman - Salesman-ID - Salesman-name -Salary-group-ID Salary-Group - Salary-group-ID - Salary-name - Salary . .. The Type 4 Responce: Dynamic relationships in a dimension hierarchy may be related directly to the fact table Order Dimension - Order-ID -Ordertype . .. Product Dimension - Product-ID - Product-name OrderdetailsFact - Product-ID - Order-ID - Date-ID - Salesman-ID -Salary-group-ID -Product-group-ID - Qty - Price Product-group Dimension - Product-group-ID -Product-group-name Figure 3.1 Time Dimension - Date-ID - Date - Month - Year - Holiday indication Salesman Dimension - Salesman-ID - Salesman-name . .. Salary-Group Dimension - Salary-group-ID - Salary-name - Salary SCD Type 5 store dynamic attributes in the fact table: Dimension Orders - Order# - Ordertype Dimension Products - Product# - Product-name - Price Fact table Dimension Orderdetails - Product# - Order# - Qty - Date# - Salesman# Salesmen - Salesman# - Salesman-name Dimension Time - Date# - Date-Name SCD Type 6 Responce: Use fine granularity: Bank account Fact - Account-ID - Time-ID - Branch-ID - Interest-last-month - Cost-last-month Figure 3.2 Branch-office Dimension - Branch-ID - Branchname Time Dimension - Time-ID - Monthname The Type 7 Response: Store the Dynamic Dimension Data as Static Facts in another Mart. Fact table Example Orderdetails Time sheets per day per salesman per department Let us suppose a fact table - Product# stores the sale of products in a - Order# - Qty department store. In this example the department Products Salesmen records may have an attribute with the number of salesmen - Product# as well as well as an attribute - Salesman# - Product-name - Salesman-name - Price with the monthly costs of the - Group# departments. Product These attributes are dynamic! groups - Group# - Group-name - Department# Departments Department# Department name No. of employes Department costs Which response type would you recommend? Exercise: Select responses to SCD for the Airline DW. Airline companies Flight routes Airports Subroutes Departures Tickets Customers Travel arrangement Exercise: Select responses to SCD for the Hotel DW. Hotel chains Hotels Rooms Room reservations Services/ tours/ car rentals Customer groups Check-in periods Customers Exercise. Select responses to SCD for the travel agency. Customers Buyer Orders Traveler Bookings Reservations Flight routes/ Room types/ Car types/ service types Departures/ Hotel rooms/ Car rentals/ etc. Product owners Exercise. Design a datawarehouse for a promotion company. Customers Orders Logical promotions Order lines Promotion media Physical promotions Presentation blocks/types How is it possible to measure the results of promotions and where should these measures be stored in the data warehouse? Exercise: Design a DW for a commercial TV channel HRM exercise: Make some requerements for a HRM system and try to group them in OLTP and OLAP requerements. Make an ER diagram for an OLTP database and one or more OLAP datamarts that can fulfill the requerements. Design a datawarehouse for a bank: It should be possible to analyze both costs and revenye for customers, households, branch offices, regions, account managers etc. Exercise: Design a datawarehouse for a housing association that let out flats, shops and office areas. It is possible to sign up on vaiting lists for these. Exercise: Design et datawarehouse for DSB in order to deminish train delays. Exercise: Design a datawarehouse for stock exchange dealers in a bank. Kimball’s type 2 response: Suppose an account shifts Branch relationship in the middle of the month. Will the aggregations be correct and how will you solve possible problems? Bank account Fact - Account-ID - Date-ID - Branch-ID - Interest-last-month - Cost-last-month Branch-office Dimension - Branch-ID - Branchname Time Dimension - Date-ID - Monthname Figure 3.2 Can you find more solutions? Kimball’s type 2 response: Suppose both the Branch relationship and the Branch-size are dynamic. How can aggregations be correct? Bank account Fact - Account-ID - Date-ID - Branch-ID - Interest-last-month - Cost-last-month Branch-office Dimension - Branch-ID - Branchname Time Dimension - Date-ID - Monthname End of session Thank you !!!