Chapter 3 Conventional Data Warehouses Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 1 Level name Child level name Parent level name Key attributes Other attributes Key attributes Other attributes Key attributes Other attributes (a) Level (0,1) (1,1) (0,n) (1,n) (c) Cardinalities (b) Hierarchy Level name1 Key attributes Other attributes role name1 Fact relationship name role name2 Level name2 Key attributes Other attributes Measures (d) Fact relationship with measures and associated levels Additive Semiadditive +! Nonadditive /Derived (e) Types of measures x Criterion (f) Analysis criterion (g) Distributing factor (h) Exclusive relationships Fig. 3.1. Notation of the MultiDim model Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 2 Product number Product name Description Size ... Product groups Product Category name Description ... Distribution Category Distributor Department Department name Description ... Distributor id Name ... Store Sales Quantity Price + /Amount Payment date Date Event Order date Weekday flag Weekend flag Season ... Sector Customer Customer id Customer name Customer address ... Customer type Store number Store name Store address Manager name Sales group district Sales group region City name City population City area State name State population State area State major activity ... Time x Sector name Description ... Profession Profession name Description ... Fig. 3.2. A conceptual multidimensional schema of a sales data warehouse Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 3 Independent Parallel Dependent Specialization Aggregation 0..* Association 1..* Criterion 1 1 Individual Strict Alternative Balanced 0..* 1..* Simple Nonstrict Unbalanced Generalized Recursive Noncovering Fig. 3.3. Hierarchy classification Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 4 Product number Product name Description Size Distributor ... Product groups Product Category Department Category name Description ... Department name Description ... (a) Schema department A category 1 product A product B category 2 product C product D (b) Examples of instances Fig. 3.4. A balanced hierarchy Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 5 ATM Name Address Model Money capacity ... Agency Bank structure ATM Branch Agency name Address Area No. employees ... Bank Branch name Address Min. capital Max. capital ... Bank name Address Manager Headquarters ... (a) Schema bank X branch 1 agency 11 ATM 111 branch 2 agency 12 branch 3 agency 31 agency 32 ATM 112 (b) Examples of instances Fig. 3.5. An unbalanced hierarchy Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 6 Employee subordinate supervisor (a) Entity name Entity type Address Model Money capacity Area No. employees Min. capital Max. capital Manager Headquarters ... Bank structure Supervision Employee id Name Address City State Title Position Salary Gender Marital status No. children ... Financial entity subsidiary parent (b) Fig. 3.6. Examples of recursive hierarchies Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 7 Customer Customer id Customer name Address Branch name Area name ... Person Profession name Class name ... Company Type name Sector name ... Fig. 3.7. Entity-relationship representation of customer types Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 8 Customer Customer id Customer name Address ... Customer type Type Sector Type name Description ... Sector name Description ... Branch x x Profession Class Prof name Description ... Branch name Description ... Area Area name Description ... Class name Description ... (a) Schema area A branch 1 branch 2 ... class 1 profession A sector 1 profession B type A type B ... customer X customer Y ... customer Z customer K (b) Examples of instances Fig. 3.8. A generalized hierarchy Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 9 Journal Publication Publication id Title Abstract No. pages Publication type Journal name Volume Number Year Book x Book name Publisher Year Proceedings Conference Proceedings name Acceptance rate Year Conference id Conference name Description Fig. 3.9. A generalized hierarchy without a joining level Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 10 County Warehouse number Name Address ... Warehouse location Warehouse City City name City population City area ... x County name County population County area ... State x State name State population State area State major activity ... (a) Schema state 1 county 1 city A city B city C city D ... warehouse X warehouse Y ... warehouse Z warehouse K (b) Examples of instances Fig. 3.10. A noncovering hierarchy Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 11 Sales Amount Model Dimensions Weight Display Memory ... Product groups Product Category Category name Description ... Department Depart. name Description ... (a) Schema electronics phone PDA MP3 player mobile phone (b) Examples of instances Fig. 3.11. A balanced nonstrict hierarchy Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 12 Product sales Aggregation Aggregation amount by category by department 50 100 40 60 70 50 Phone 170 20 PDA 100 Product sales Aggregation Aggregation amount by category by department Electronics 370 MP3 100 (a) Strict hierarchy Phone 170 20 40 60 70 PDA 200 Electronics 570 MP3 200 (b) Nonstrict hierarchy Fig. 3.12. Example of aggregation for a sales amount measure Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 13 Salary Employee id Employee name Position ... Org. structure Employee Payroll .. Section Section name Description Activity ... Division Division name Type Responsible ... Fig. 3.13. A nonstrict hierarchy with a distributing factor Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 14 Section name Description Activity ... Org. structure Section Division Division name Type Responsible ... Employee Payroll Salary Employee id Employee name Position ... Fig. 3.14. Transforming a nonstrict hierarchy into a strict hierarchy with an additional dimension Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 15 Employee Section E1 E2 E3 No. employees by section Section S1 3 S1 S2 2 S2 S3 2 S3 Division No. employees by division 7 D1 E4 E5 (a) (b) Fig. 3.15. Double-counting problem for a nonstrict hierarchy Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 16 Time Date ... Calendar Quarter Month Quarter number ... Month name ... Year Year ... Bimester Bimester number ... (a) Schema Year 2001 Year Quarter Month Q1-2001 Q2-2001 Jan 2001 Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001 Month Bimester Year B1-2001 B2-2001 B3-2001 ... ... ... Year 2001 (b) Examples of instances Fig. 3.16. Alternative hierarchies Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 17 Product number Product name Description Size Distributor name ... Distributor location Product Product groups Category Department Category name Description ... Department name Description ... Distributor division x Division name Responsible ... x Distributor region Region name Area ... Fig. 3.17. Parallel independent hierarchies, composed of a balanced and a noncovering hierarchy Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 18 City name City population City area ... Sales district State State name State population State area ... District name Representative Contact info ... Store location Country Sales organization Store number Store name Store address Manager name ... Sales organization Store Store location City Country name Capital Country area GDP Growth ... Sales region Region name Responsible Region extent ... Fig. 3.18. Parallel dependent hierarchies, composed of two balanced hierarchies Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 19 Employee id Employee name Address Function Education ... Work Employee Live City City name City population City area ... Sales district State State name State population State area ... District name Representative Contact info ... Fig. 3.19. Parallel dependent hierarchies leading to different parent members of the shared level Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 20 Area Sales Category Customer Customer location Customer type Department Type Sector x x Profession Branch Class Quantity Amount Product groups Time Sales district Store location Customer location City Store location Customer location Store Sales organization Store location Product State Sales organization Country Sales region Works Sales time Sales time Payroll Payroll time Bimester Supervision Quarter Sales time Sales time Employee Base salary Working hours Extra payment supervisor Section Affiliated Month .. Division Program type subordinate Payroll time Year Sales incentive programs Amount Fig. 3.20. A multidimensional schema containing several kinds of hierarchies Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 21 Store number Store name ... Store location Store Province State Province id Province name ... State name State population ... (a) Schema all state 1 province A province B province C province D ... store X store Y ... store Z store K (b) Examples of instances Fig. 3.21. A generalized hierarchy with different root levels Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 22 Product Product number Product name ... Product groups Brand Brand name Description ... x Category Category name Description ... Department Department name Description ... Fig. 3.22. A generalized hierarchy with three aggregation paths and different root levels Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 23 Product Product number Product name Description Size ... Time Date Day of week Weekday flag Weekend flag ... Payment date Customer Order date Due date Ship to Order line Shipping date Quantity Amount Bill to Customer id Customer name Customer address ... Fig. 3.23. Example of a role-playing dimension Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 24 Store Store number Store name Store address Manager name Sales group district Sales group region City name City population City area State name State population State area State major activity ... Product Product number Product name Description Size ... Sales Time Date Event Weekday flag Weekend flag Season ... Transaction Transaction no Quantity Amount Fig. 3.24. A schema containing a fact dimension Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 25 Client Time Date Day of week Weekday flag Weekend flag Season ... Financial info Balance Account Account no Type Description Opening date ... Bank structure Client id Client name Client address ... Agency Agency name Address Area No. employees ... Fig. 3.25. Multidimensional schema for analysis of bank accounts Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 26 Time Account Client Balance T1 A1 C1 100 T1 A1 C2 100 T1 A1 C3 100 T1 A2 C1 500 T1 A2 C2 500 Fig. 3.26. Example of double-counting problem for a multivalued dimension Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 27 Client Client id Client name Client address ... Time Date Event Weekday flag Weekend flag Season ... Account Financial info Balance Account no Type Description Opening date ... Bank structure Account holders Agency Agency name Address Area No. employees ... (a) Creating two fact relationships Fig. 3.27. Decomposition of the fact relationship in Fig. 3.25 Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 28 Client Client id Client name Client address ... Time Date Event Weekday flag Weekend flag Season ... Financial info Balance Account Account no Type Description Opening date ... Bank structure Holder Agency Agency name Address Area No. employees ... (b) Including a nonstrict hierarchy Fig. 3.27. Decomposition of the fact relationship in Fig. 3.25 Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 29 Time Date Event Weekday flag Weekend flag Season ... Financial info Balance Client Client id Client name Client address ... Account Account no Type Description Opening date ... Bank structure Group id ... Holder Client group Agency Agency name Address Area No. employees ... Fig. 3.28. Alternative representation of the schema shown in Fig. 3.25 Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 30 Dimension 1 /Name: string 1 xor / DimHierAgg Generalization Aggregation Composition Association Derived attribute 1..* Hierarchy Related Criterion: string RoleName: string 1..* 1 Level Name: string 1..* HierLevAgg 2..* Fact relationship MeasAgg 0..* 2..* 1 1 1..* Attribute Name: string Name: string LevAttrAgg Type: DataType Derived: Boolean 1 0..* 0..* Identified 1..* KeyAttrAgg child parent 1..* 0..* Connects Key MinChildCard: int MaxChildCard: int MinParentCard: int MaxParentCard: int DistrFactor: Boolean Measure Additivity: AddType « enumeration » DataType integer real string ... « enumeration » AddType additive semiadditive nonadditive Fig. 3.29. Metamodel of the MultiDim model Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 31 Product Product key Product number Product name Description Size Category fkey ... ProdDist Product key Distributor key Distr. factor Store Store key Store number Store name Store address Manager name Sales group district Sales group region City name City population City area State name State population State area Major activity ... Category Category key Category name Description Department fkey ... Department Department key Department name Description ... Time Distributor Distributor key Distributor id Name ... Time key Date Event Weekday flag Weekend flag Season ... Sales Product fkey Order date fkey Payment date fkey Store fkey Customer fkey Quantity Amount Sector Sector key Sector name Description ... Customer Customer key Customer id Customer name Customer address Sector fkey Profession fkey ... Profession Profession key Profession name Description ... Fig. 3.30. Relational representation of the example in Fig. 3.2 Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 32 Store Product Product key Product number Product name Description Size Distributor Category fkey ... Category Category key Category name Description Department fkey ... Department Department key Department name Description ... (a) Snowflake structure Store key Store number Store name Store address Manager name Sales group district Sales group region City name City population City area State name State population State area Major activity ... (b) Flat table Fig. 3.31. Relations for a balanced hierarchy Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 33 bank X branch 1 agency 11 ATM 111 ATM 121 branch 2 branch 3 agency 12 PH agency 31 agency 32 PH PH PH PH Fig. 3.32. Transformation of the unbalanced hierarchy shown in Fig. 3.5b into a balanced one using placeholders Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 34 Employee Employee key Employee id Name Address City State Title Position Salary Gender Marital status No. children ... Supervisor fkey (a) Financial entity Entity key Entity name Entity type Address Model Money capacity Area No. employees Min. capital Max. capital Manager Headquarters ... Parent fkey (b) Fig. 3.33. Relational implementation of the recursive hierarchies shown in Fig. 3.6 Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 35 Product Product key Product number Product name Description Size Distributor Category Department ... Time Time key Date Event Weekday flag Weekend flag Season ... Sales Product fkey Time fkey Employee fkey Quantity Amount Employee Employee key Employee id Name Address City State Title Position Salary Gender Marital status No. children ... Supervisor fkey Fig. 3.34. A schema with a recursive hierarchy in the Employee dimension Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 36 Financial entity Entity key Entity name 1 2 3 4 5 6 7 8 9 10 ATM111 ATM112 agency11 agency12 branch1 branch2 agency31 agency32 branch3 bankX Entity No. Min. Max. Parent Money Addr. Model capacity Area ... type emp. capital capital fkey ATM ATM Agency Agency Branch Branch Agency Agency Branch Bank … … … … … … … … … … T1 T2 null null null null null null null null 100000 150000 null null null null null null null null null null 150 135 null null 230 185 null null null null null … null null null … 12 null null … 9 null null … null 10000 99000 … null 1500 55000 … 15 null null … 12 null null … null 7500 98500 … null null null … 3 3 5 5 10 10 9 9 10 null Fig. 3.35. Parent-child relational schema of an unbalanced hierarchy with instances from Fig. 3.5b Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 37 Type Customer Customer key Customer Id Customer name Address Type fkey Profession fkey ... Type key Type name Description Sector fkey ... Profession Profession key Profession name Description Category fkey ... Sector Sector key Sector name Description Branch fkey ... Class Class key Class name Description Branch fkey ... Area Area key Area name Description ... Branch Brach key Branch name Description Area fkey ... Fig. 3.36. Relations for the generalized hierarchy shown in Fig. 3.8 Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 38 Type Customer Customer key Customer Id Customer name Address Type fkey Profession fkey Branch fkey Customer type ... Type key Type name Description Sector fkey ... Profession Profession key Profession name Description Category fkey ... Sector Sector key Sector name Description Branch fkey ... Class Class key Class name Description Branch fkey ... Area Area key Area name Description ... Branch Brach key Branch name Description Area fkey ... Fig. 3.37. Improved relational representation for the generalized hierarchy shown in Fig. 3.8 Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 39 EmplSection Employee fkey Section fkey Distributing factor Payroll Employee fkey ... Salary Employee Employee key Employee id Employee name Position ... Section Section key Section name Description Division fkey ... Division Division key Division name Type ... Fig. 3.38. Relational tables for the nonstrict hierarchy of Fig. 3.11 Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 40 Payroll Employee ref ... Salary Employee Employee key Employee id Employee name Position Sections (1,n) Section ref Distrib. factor ... Section Section key Section name Description Division ref ... Division Division key Division name Type ... Fig. 3.39. Object-relational representation of the nonstrict hierarchy of Fig. 3.13 Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 41 Quarter Time Time key Date Month fkey ... Month Month key Month name Quarter fkey Bimester fkey ... Quarter key Quarter number Year fkey ... Year Year key Year ... Bimester Bimester key Bimester number Year fkey ... Fig. 3.40. Relations for the alternative hierarchies in Fig. 3.16 Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 42 City Store Store key Store number Store name Store address City fkey Sales district fkey ... City key City name City population City area State fkey ... Sales district District key District name Representative Contact info State fkey ... Country State State key State name State population State area Country fkey Sales region fkey ... Country key Country name Capital Country area GDP Growth ... Sales region Region key Region name Responsible Region extent ... Fig. 3.41. Relational schema of a set of parallel dependent hierarchies composed of two balanced hierarchies Copyright © 2008 Elzbieta Malinowski & Esteban Zimányi 43