Position Paper Divya Manduva Pratima Surapaneni 2/15/2016 Dimensional Modeling Position Paper Divya Manduva Pratima Surapaneni 2/15/2016 Introduction: Dimensional Data Modeling plays a significant role in the design and development of Data Marts. Data modeling provides the ability to visualize what cannot otherwise be realized and also helps in making sure that all data objects required by the business are correctly identified and represented [1]. Dimensional Modeling is a logical design technique used for designing data marts to support end-user queries. Dimensional Modeling can be used to help analysts in easily understanding and navigating the data structures and exploit the data. It is also used to improve performance and provides a consistent base for analysis [3]. Dimensional Modeling aims to provide end-users with the capability to "Slice and Dice" the data as per the analytical requirement. Dimensional Modeling can be used when data needs to be aggregated or summarized. “The process and outcome of designing logical database schemas created to support OLAP and Data Warehousing solutions is known as Dimensional Modeling [12]” Dimensional Vs Relational Modeling: Since dimensional modeling deals with denormalized data, it involves lesser number of tables and keys than ER modeling [7]. The below table illustrates the differences between Relational databases and dimensional databases [2]: Relational Database (OLTP) Dimensional Database (OLAP) Data is atomized Data is summarized Data is current Data is historical Processes one record at a time Processes many records at a time Process oriented Subject oriented Designed for highly structured repetitive processing Designed for highly unstructured analytical processing The table below represents the differences between dimensional modeling and relational or ER Modeling [4]: Relational Modeling Dimensional Modeling Data is stored in RDBMS Data is stored in RDBMS or Multidimensional Position Paper Divya Manduva Pratima Surapaneni 2/15/2016 databases Tables are units of storage Cubes are units of storage Data is normalized and used for OLTP. Optimized for Data is denormalized and used in data OLTP processing warehouse and data mart. Optimized for OLAP Several tables and chains of relationships among them Few tables and fact tables are connected to dimensional tables Volatile(several updates) and time variant Non volatile and time invariant Detailed level of transactional data Summary of bulky transactional data(Aggregates and Measures) used in business decisions SQL is used to manipulate data MDX is used to manipulate data Normal Reports User friendly, interactive, drag and drop multidimensional OLAP Reports Approaches for designing Data Marts: Two different approaches to understanding Data warehousing has been: Ralph Kimball’s dimensional modeling techniques for building data warehouses, Bill Inmon’s Corporate Information Factory (CIF) Architecture that believes in using Relational Modeling techniques for the data warehouse [6]. Corporate Information Factory (CIF) Architecture: The figure below provides an overview of CIF. Both Bill Inmon and Dr.Kimball have valid but different approaches to implement a data warehouse. Position Paper Divya Manduva Pratima Surapaneni 2/15/2016 Figure 1. Basic structure of Corporate Information Factory (CIF) [8] CIF is a conceptual or logical architecture. Three main phases of CIF are Data Acquisition, Primary storage, Data Delivery. Data extracted from External sources flows into the system through the data acquisition applications of CIF can be condensed to operational reports or can be transformed and integrated with other data before it to the operational data store (ODS) which is the primary storage management of the data warehouse. The final phase of Data delivery includes delivering relevant data into data marts, Oper marts (these are temporary data structures that obtain data from ODS [9]), Exploration warehouse. Kimball’s Dimensional Modeling: The below figure describes Ralph Kimball’s DW scenario [15]: The following table identifies the components of Dimensional Modeling and illustrates the life cycle of Dimensional Data Model design: Position Paper Divya Manduva Pratima Surapaneni 2/15/2016 Figure: Dimensional Model Design Life Cycle [1] Kimball’s Dimensional Modeling is a top-down analysis approach. The above processes involved in the design of Dimensional model are illustrated as follows [1]: 1. Identify Business process requirements: This involves gathering requirements of business processes by selecting and prioritizing business processes based on Quality of data in the source system, business process significance, and the feasibility and complexity of business processes. 2. Identify the grain: We need to define the granularity of business process selected so that neither fact nor dimensional tables are over populated with data. Also, we should be able to add new facts and dimensions to the existing model with not many changes to front-end applications 3. Identify the dimensions: Identify dimensions that are valid for the grain chosen in previous step 4. Identify the facts: Identify facts that are valid for the grain defined above Position Paper Divya Manduva Pratima Surapaneni 2/15/2016 5. Verify Model: It is very essential to verify that the dimensional model meets the business requirements. This step might sometimes involve changes to be made to the grain. 6. Physical design Considerations: It involves improving the performance of the designed model and may require tuning by taking actions such as partitioning, indexing, creating aggregates, and data placement. From ER Model to Dimensional Model: Following steps are to be followed in order to produce dimensional model from an ER model: Step 1: Classifying entities as transactional, component, or classification entities Step 2: Identifying maximal and minimal hierarchies Step 3: Produce Dimensional Models using operators like collapse hierarchy, aggregation Step 4: Evaluation and Refinement of Dimensional models to produce final data mart design by combining fact tables, combining dimension tables, and handling subtypes using hierarchies. Comparison of CIF and Kimball’s Dimensional Modeling: The main goal of CIF is better business operations, improved business intelligence and enhanced business management [10] where as the main goal of Kimball’s Dimensional Modeling is to represent a set of business measurements in a standard framework and to achieve a database model that will answer business queries quickly and efficiently. While there is a need for normalized data structures before loading the dimension models in CIF approach, Kimball’s approach suggests that the data structures required prior to loading the dimensional model depends on source data realities, target data model, and anticipated transformations [13]. Second difference between the two approaches is in the way in which they deal with the atomic data. Where Kimball feels that atomic data should be dimensionally structured, Inmon’s CIF model suggests that it should be stored in a normalized data warehouse [13]. From the architecture stand point, the Inmon approach is a store and publishes architecture. Enterprise data warehouse (EDW) model supports BI applications indirectly by publishing the data in an aggregated dimensional form, for access by end users where as Kimball approach does not require publication layer as BI layer sits directly over EDW Position Paper Divya Manduva Pratima Surapaneni 2/15/2016 [14]. Conclusion: We would like to conclude by suggesting a hybrid model that would implement ER or Relational Modeling from the designer’s perspective of Data warehouse design and Dimensional modeling at the users end of the data warehouse design. In that way, Relational Modeling facilitates better analysis of business model for the analysts of the business and Dimensional modeling provides better support for end-user queries. Thus, this facilitates both designers and users of the data warehouse. So, we are concluding this paper by taking a position that it is a good idea to incorporate both ER modeling and Dimensional Modeling in the design of Data Warehouses. References: 1. Dimensional Modeling: In a Business Intelligence Environment, Chuck Ballard, Daniel M. Farrell, Amit Gupta, Carlos Manuela, Stanislaw Venice http://www.redbooks.ibm.com/redbooks/pdfs/sg247138.pdf 2. IBM Informix Dynamic Server Enterprise and Workgroup Edition, v10.00.xC3; IBM Informix Dynamic Server Express Edition, v10.00.xC3E; and IBM Informix Client Software Developer's Kit, v2.90.xC3. http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.ddi.doc/ddi222.ht m 3. http://www.wilshireconferences.com/EDF2002/analytics-sessions.htm 4. http://www.learndatamodeling.com/diff_r_d.htm 5. A method for developing Dimensional Data Marts, Tim Chenoweth, David Schuff, Robert St. Louis, Communications of ACM December, Volume 46, Issue 12 2003 6. Mastering data warehouse design : relational and dimensional techniques / Claudia Imhoff, Nicholas Galemmo, Jonathan G. Geiger http://search.barnesandnoble.com/booksearch/isbnInquiry.asp?z=y&endeca=1&isbn=0471324213 &itm=9 Position Paper Divya Manduva Pratima Surapaneni 2/15/2016 7. http://www.dbmsmag.com/9510d05.html 8. http://www.casact.org/newsletter/index.cfm?fa=viewart&id=5349 9. http://www.b-eye-network.com/view/410 10. http://www.dkms.com/papers/cifckf.pdf 11. Daniel L. Moody, Mark A.R. Kortink, “From Enterprise to Dimension Models: A Methodology for Data Warehouse and Data Mart Design”, Proceedings of the International Workshop on Design and Management of Data Warehouses (DMDW'2000), Stockholm, Sweden, June 5-6, 2000. (http://ssdi.di.fct.unl.pt/mei/bddw/material_apoio/artigos/files/2000-Moody.pdf) 12. http://www.atlantamdf.com/Presentations/AtlantaMDF_091106.pdf 13. http://www.intelligententerprise.com/showArticle.jhtml;jsessionid=NKBOH2L3S2BMMQSNDL RCKH0CJUNN2JVN?articleID=17800088&pgno=2 14. http://blogs.ittoolbox.com/dw/design/archives/dimensional-modeling-fundamentals-7712 15. Dimensional Modeling: A whirlwind Tour of How and Why, Wayne Little, October 2006