Data Warehousing (Kimball, Ch.2-4) Dr. Vairam Arunachalam School of Accountancy, MU Grocery Store case terminology SKUs – Stock-keeping units UPCs – Universal Product codes POS system – Point of Sale system Promotions – TPRs, ads in newspapers, newspaper inserts, displays (shelf displays and end-aisle displays), coupons Promotion dimension – lift, baseline sales, time shifting, cannibalization, growing the market Sep. 9, 1999 Dr. Vairam Arunachalam 2 The Grocery Store Steps in the Design Process: – Choose a business process to model (e.g., daily item movement) – Choose the grain of the business process (e.g., SKU by store by promotion by day) – Choose the dimensions applicable to the fact table (e.g., time, product, store, promotion) – Choose the measured facts (e.g., dollar sales, unit sales, dollar cost, customer count) Sep. 9, 1999 Dr. Vairam Arunachalam 3 Salient Principles – The data warehouse almost always demands data expressed at the lowest possible grain of each dimension… – A careful grain statement determines the dimensionality of the fact table. – The number of sales transaction line items in a business can be estimated by dividing the gross revenue of the business by the average price of a sales item. Sep. 9, 1999 Dr. Vairam Arunachalam 4 Salient Principles (contd.) – The fact table in a dimensional schema is naturally highly normalized – Efforts to normalize any of the tables in a dimensional database solely in order to save disk space are a waste of time – The dimension tables must not be normalized but should remain as flat tables. (Because?…) Sep. 9, 1999 Dr. Vairam Arunachalam 5 Salient Principles (contd.) – Most data warehouses need an explicit time dimension table even though the primary time key may be an SQL date-valued object. (Because?…) – Drilling down in a data warehouse is adding row headers from the dimension tables. Drilling up is subtracting row headers. – The product dimension is one of the primary dimensions in nearly every data warehouse. Sep. 9, 1999 Dr. Vairam Arunachalam 6 Normalization review 1NF: no repeating groups; primary key defined 2NF: non-key domains functionally dependent on entire primary key 3NF: no dependencies between non-key domains Sep. 9, 1999 Dr. Vairam Arunachalam 7 Other Issues Database sizing Domain transfer – design variations Additive vs. semi- (or non-additive) dimensions Sep. 9, 1999 Dr. Vairam Arunachalam 8 The Warehouse Inventory Models: – The Inventory Snapshot model – Delivery Status model – Transaction model Sep. 9, 1999 Dr. Vairam Arunachalam 9 Inventory Snapshot Model – Fig. 3.2 – Gross Margin Return on Inventory (GMROI) = [(Qty Ship)*(Value at LSP – Value at Cost)] / [(Daily Avg Qty)*(Value at LSP)] Sep. 9, 1999 Dr. Vairam Arunachalam 10 Delivery Status Model – Steps: Sep. 9, 1999 Received Inspected Placed into inventory Authorized to sell Picked from inventory Boxed Shipped Dr. Vairam Arunachalam 11 Delivery Status Model (contd.) – Exception Conditions: Failed inspection Returned to vendor Damaged in handling Lost Returned from customer Returned to inventory Written off Refunded – Fig 3.3 Sep. 9, 1999 Dr. Vairam Arunachalam 12 Transaction Model – Includes: Sep. 9, 1999 Receive shipment line item Place SKU into inspection hold Release SKU from inspection hold Place SKU into inspection failed with reason Mark SKU for return to vendor with reason Place SKU in bin Authorize SKU for sale Pick SKU from bin Dr. Vairam Arunachalam 13 Transaction Model (contd.) – Includes: Package SKU for shipment Ship SKU to customer Bill customer Receive SKU from customer with reason Return SKU to inventory from customer return Remove SKU from inventory with reason – Fig. 3.4 Sep. 9, 1999 Dr. Vairam Arunachalam 14 Transaction Model (contd.) – Sample queries: Sep. 9, 1999 How many times have we placed a product into an inventory bin on the same day we have picked the product from the same bin at a different time? What is the clustering in time of customer returns of a particular SKU? How many separate shipments did we receive from vendor X and when did we get them? On which SKUs have we had more than one round of QA inspection failures that caused the return of the product to the vendor? Dr. Vairam Arunachalam 15 Transaction Model (contd.) – Transplant context (e.g. ,FedEx) – Compare models Sep. 9, 1999 Dr. Vairam Arunachalam 16 Salient principles – All measures that record a static level (such as…) are inherently nonadditive across time. However, in these cases the measure may be usefully aggregated across time by averaging over the number of time periods. – Document control numbers (such as…) usually are presented as degenerate dimensions (i.e., dimension keys with no corresponding dimension table) in fact tables where the grain of the table is the document itself or a line item in the document. Sep. 9, 1999 Dr. Vairam Arunachalam 17 Salient principles (contd.) Exceptions to absolute additivity in the fact table can be made where the additive measures are more conveniently delivered in a view. Examples include computed time spans from a large number of date fields, as well as extended monetary amounts derived from units costs and prices. In such a case, it is important to have all users access the view instead of the underlying table. (Tie to 3NF) Sep. 9, 1999 Dr. Vairam Arunachalam 18 Shipments The ideal shipments fact table (Fig. 4.1) Typical customer ship-to dimension (Fig. 4.2) Typical deal dimension (Fig. 4.3) Typical ship mode dimension (Fig. 4.4) Sep. 9, 1999 Dr. Vairam Arunachalam 19