Two different types of database technologies

advertisement
Position Paper
Divya Manduva
Pratima Surapaneni
2/15/2016
Dimensional Modeling
Position Paper
Divya Manduva
Pratima Surapaneni
2/15/2016
Introduction:
Dimensional Data Modeling plays a significant role in the design and development of Data Marts. Data
modeling provides the ability to visualize what cannot otherwise be realized and also helps in making sure
that all data objects required by the business are correctly identified and represented [1]. Dimensional
Modeling is a logical design technique used for designing data marts to support end-user queries.
Dimensional Modeling can be used to help analysts in easily understanding and navigating the data
structures and exploit the data. It is also used to improve performance and provides a consistent base for
analysis [3]. Dimensional Modeling aims to provide end-users with the capability to "Slice and Dice" the
data as per the analytical requirement. Dimensional Modeling can be used when data needs to be
aggregated or summarized.
“The process and outcome of designing logical database schemas created to support OLAP and
Data Warehousing solutions is known as Dimensional Modeling [12]”
Dimensional Vs Relational Modeling:
Since dimensional modeling deals with denormalized data, it involves lesser number of tables and keys
than ER modeling [7].
The below table illustrates the differences between Relational databases and dimensional databases [2]:
Relational Database (OLTP)
Dimensional Database (OLAP)
Data is atomized
Data is summarized
Data is current
Data is historical
Processes one record at a time
Processes many records at a time
Process oriented
Subject oriented
Designed for highly structured repetitive processing
Designed for highly unstructured analytical
processing
The table below represents the differences between dimensional modeling and relational or ER Modeling
[4]:
Relational Modeling
Dimensional Modeling
Data is stored in RDBMS
Data is stored in RDBMS or Multidimensional
Position Paper
Divya Manduva
Pratima Surapaneni
2/15/2016
databases
Tables are units of storage
Cubes are units of storage
Data is normalized and used for OLTP. Optimized for
Data is denormalized and used in data
OLTP processing
warehouse and data mart. Optimized for OLAP
Several tables and chains of relationships among them
Few tables and fact tables are connected to
dimensional tables
Volatile(several updates) and time variant
Non volatile and time invariant
Detailed level of transactional data
Summary of bulky transactional
data(Aggregates and Measures) used in
business decisions
SQL is used to manipulate data
MDX is used to manipulate data
Normal Reports
User friendly, interactive, drag and drop
multidimensional OLAP Reports
Approaches for designing Data Marts:
Two different approaches to understanding Data warehousing has been: Ralph Kimball’s dimensional
modeling techniques for building data warehouses, Bill Inmon’s Corporate Information Factory (CIF)
Architecture that believes in using Relational Modeling techniques for the data warehouse [6].
Corporate Information Factory (CIF) Architecture: The figure below provides an overview of CIF.
Both Bill Inmon and Dr.Kimball have valid but different approaches to implement a data warehouse.
Position Paper
Divya Manduva
Pratima Surapaneni
2/15/2016
Figure 1. Basic structure of Corporate Information Factory (CIF) [8]
CIF is a conceptual or logical architecture. Three main phases of CIF are Data Acquisition, Primary storage,
Data Delivery. Data extracted from External sources flows into the system through the data acquisition
applications of CIF can be condensed to operational reports or can be transformed and integrated with other
data before it to the operational data store (ODS) which is the primary storage management of the data
warehouse. The final phase of Data delivery includes delivering relevant data into data marts, Oper marts
(these are temporary data structures that obtain data from ODS [9]), Exploration warehouse.
Kimball’s Dimensional Modeling:
The below figure describes Ralph Kimball’s DW scenario [15]:
The following table identifies the components of Dimensional Modeling and illustrates the life cycle of
Dimensional Data Model design:
Position Paper
Divya Manduva
Pratima Surapaneni
2/15/2016
Figure: Dimensional Model Design Life Cycle [1]
Kimball’s Dimensional Modeling is a top-down analysis approach. The above processes involved in the
design of Dimensional model are illustrated as follows [1]:
1.
Identify Business process requirements: This involves gathering requirements of business
processes by selecting and prioritizing business processes based on Quality of data in the source
system, business process significance, and the feasibility and complexity of business processes.
2.
Identify the grain: We need to define the granularity of business process selected so that neither
fact nor dimensional tables are over populated with data. Also, we should be able to add new facts
and dimensions to the existing model with not many changes to front-end applications
3.
Identify the dimensions: Identify dimensions that are valid for the grain chosen in previous step
4.
Identify the facts: Identify facts that are valid for the grain defined above
Position Paper
Divya Manduva
Pratima Surapaneni
2/15/2016
5.
Verify Model: It is very essential to verify that the dimensional model meets the business
requirements. This step might sometimes involve changes to be made to the grain.
6.
Physical design Considerations: It involves improving the performance of the designed model and
may require tuning by taking actions such as partitioning, indexing, creating aggregates, and data
placement.
From ER Model to Dimensional Model:
Following steps are to be followed in order to produce dimensional model from an ER model:
Step 1: Classifying entities as transactional, component, or classification entities
Step 2: Identifying maximal and minimal hierarchies
Step 3: Produce Dimensional Models using operators like collapse hierarchy, aggregation
Step 4: Evaluation and Refinement of Dimensional models to produce final data mart design by combining
fact tables, combining dimension tables, and handling subtypes using hierarchies.
Comparison of CIF and Kimball’s Dimensional Modeling:
The main goal of CIF is better business operations, improved business intelligence and enhanced business
management [10] where as the main goal of Kimball’s Dimensional Modeling is to represent a set of
business measurements in a standard framework and to achieve a database model that will answer business
queries quickly and efficiently. While there is a need for normalized data structures before loading the
dimension models in CIF approach, Kimball’s approach suggests that the data structures required prior to
loading the dimensional model depends on source data realities, target data model, and anticipated
transformations [13]. Second difference between the two approaches is in the way in which they deal with
the atomic data. Where Kimball feels that atomic data should be dimensionally structured, Inmon’s CIF
model suggests that it should be stored in a normalized data warehouse [13]. From the architecture stand
point, the Inmon approach is a store and publishes architecture. Enterprise data warehouse (EDW) model
supports BI applications indirectly by publishing the data in an aggregated dimensional form, for access by
end users where as Kimball approach does not require publication layer as BI layer sits directly over EDW
Position Paper
Divya Manduva
Pratima Surapaneni
2/15/2016
[14].
Conclusion:
We would like to conclude by suggesting a hybrid model that would implement ER or Relational Modeling
from the designer’s perspective of Data warehouse design and Dimensional modeling at the users end of
the data warehouse design. In that way, Relational Modeling facilitates better analysis of business model
for the analysts of the business and Dimensional modeling provides better support for end-user queries.
Thus, this facilitates both designers and users of the data warehouse. So, we are concluding this paper by
taking a position that it is a good idea to incorporate both ER modeling and Dimensional Modeling in the
design of Data Warehouses.
References:
1.
Dimensional Modeling: In a Business Intelligence Environment, Chuck Ballard, Daniel M. Farrell,
Amit Gupta, Carlos Manuela, Stanislaw Venice
http://www.redbooks.ibm.com/redbooks/pdfs/sg247138.pdf
2.
IBM Informix Dynamic Server Enterprise and Workgroup Edition, v10.00.xC3; IBM Informix
Dynamic Server Express Edition, v10.00.xC3E; and IBM Informix Client Software Developer's
Kit, v2.90.xC3.
http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.ddi.doc/ddi222.ht
m
3.
http://www.wilshireconferences.com/EDF2002/analytics-sessions.htm
4.
http://www.learndatamodeling.com/diff_r_d.htm
5.
A method for developing Dimensional Data Marts, Tim Chenoweth, David Schuff, Robert St.
Louis, Communications of ACM December, Volume 46, Issue 12 2003
6.
Mastering data warehouse design : relational and dimensional techniques / Claudia Imhoff,
Nicholas Galemmo, Jonathan G. Geiger
http://search.barnesandnoble.com/booksearch/isbnInquiry.asp?z=y&endeca=1&isbn=0471324213
&itm=9
Position Paper
Divya Manduva
Pratima Surapaneni
2/15/2016
7.
http://www.dbmsmag.com/9510d05.html
8.
http://www.casact.org/newsletter/index.cfm?fa=viewart&id=5349
9.
http://www.b-eye-network.com/view/410
10. http://www.dkms.com/papers/cifckf.pdf
11. Daniel L. Moody, Mark A.R. Kortink, “From Enterprise to Dimension Models: A Methodology
for Data Warehouse and Data Mart Design”, Proceedings of the International Workshop on
Design and Management of Data Warehouses (DMDW'2000), Stockholm, Sweden, June 5-6,
2000. (http://ssdi.di.fct.unl.pt/mei/bddw/material_apoio/artigos/files/2000-Moody.pdf)
12. http://www.atlantamdf.com/Presentations/AtlantaMDF_091106.pdf
13. http://www.intelligententerprise.com/showArticle.jhtml;jsessionid=NKBOH2L3S2BMMQSNDL
RCKH0CJUNN2JVN?articleID=17800088&pgno=2
14. http://blogs.ittoolbox.com/dw/design/archives/dimensional-modeling-fundamentals-7712
15. Dimensional Modeling: A whirlwind Tour of How and Why, Wayne Little, October 2006
Download