Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC Data Warehousing Concepts, by Dr. Khalil 1 Content Designing a Data Warehouse Database Dimensional Modeling Star Schema Snowflake Schema Advantages of Dimensional Modeling Methodology for Dimensional Modeling Data Warehousing Concepts, by Dr. Khalil 2 Designing a Data Warehouse Database Designing a data warehouse database is highly complex. The database component of a data warehouse is described using a technique called dimensionality modeling: “A logical design technique that aims to present the data in a standard, intuitive form that allows for high-performance access” Dimensionality modeling uses the concepts of Entity-Relationship (ER) modeling with some important restrictions. Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables. Every dimension table has a simple (non-composite) primary key that corresponds exactly to one of the components of the composite key in the fact table. This characteristic ‘star-like’ structure is called a star schema or star join. Data Warehousing Concepts, by Dr. Khalil 3 Star Schema A logical structure that has a fact table containing factual data in the center, surrounded by dimension tables containing reference data (which can be denormalized). The diagram shows a Star schema for property sales of a Real Estate database. Data Warehousing Concepts, by Dr. Khalil 4 Other Schema Versions Snowflake Schema A variant of the star schema where dimension tables do not contain denormalized data. Starflake Schema A hybrid structure that contains a mixture of star and snowflake schemas. The diagram shows part of star schema for property sales of a Real Estate database with a normalized version of the Branch dimension table. Data Warehousing Concepts, by Dr. Khalil 5 Dimensional Model - Advantages Efficiency – The consistency of the underlying database structure allows more efficient access to the data by various tools including report writers and query tools. Ability to handle changing requirements – The start schema can adapt to changes in the user’s requirements, as all dimensions are equivalent in terms of providing access to the fact table. Extensibility – The dimensional model is extensible. Ability to model common business situations – There are a growing number of standard approaches for handling common modeling situations in the business world. Predictable query processing – Data warehouse applications that drill down will simply be adding more dimension attributes from within a single star schema. Data Warehousing Concepts, by Dr. Khalil 6 Database Design Methodology for Data Warehouse Nine-Step Methodology by Kimball (1996): 1- Choosing the process 2- Choosing the grain 3- Identifying and conforming the dimensions 4- Choosing the facts 5- Storing pre-calculations in the fact table 6- Rounding out the dimension tables 7- Choosing the duration of the database 8- Tracking slowly changing dimensions 9- Deciding the query priorities and the query modes Data Warehousing Concepts, by Dr. Khalil 7 1- Choosing the process The process (function) refers to the subject matter of a particular data mart. The best choice for the first data mart tends to be the one that is related to sales. Data Warehousing Concepts, by Dr. Khalil 8 2- Choosing the grain Means deciding exactly what a fact table record represents. Data Warehousing Concepts, by Dr. Khalil 9 3- Identifying and Conforming the Dimensions Dimensions set the context for asking questions about the facts in the fact table. The diagram shows Star schema for property sales and property advertising with Time, PropertyForSale, Branch, and Promotion as conformed (shared) dimension tables. Data Warehousing Concepts, by Dr. Khalil 10 4- Choosing the Facts The grain of the fact table determines which facts can be used in the data mart. All the facts must be expressed at the level implied by the grain. The diagram shows how the Lease fact table shown in the previous diagram could be corrected so that the fact table is appropriately structured Data Warehousing Concepts, by Dr. Khalil 11 5- Storing Pre-Calculations in the Fact Table Once the facts have been selected each should be re-examined to determine whether there are opportunities to use precalculations. A common example of the need to store pre-calculations occurs when the fact comprise a profit and loss statement. The diagram shows the fact table with the rentDuration, totalRent, clientAllowance, staffCommission, and totalRevenue attributes. These types of facts are useful because they are additive quantities, from which we can derive valuable information. Data Warehousing Concepts, by Dr. Khalil 12 6- Rounding out the Dimension Tables In this step, we return to the dimension tables and add many text descriptions to the dimensions as possible. The text descriptions should be as intuitive and understandable to the users as possible. The usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables. Data Warehousing Concepts, by Dr. Khalil 13 7- Choosing the Duration of the Database The duration measures how far back in time the fact table goes. Very large fact tables raise at least two very significant design issues: First, it is often increasingly difficult to source increasingly old data. Second, it is mandatory that the old versions of the important dimensions be used, not the most current versions. This is known as the ‘slowly changing dimension’ problem’. Data Warehousing Concepts, by Dr. Khalil 14 8- Tracking Slowly Changing Dimensions The slowly changing dimension problem means, for example, that the proper description of the old client and the old branch must be used with the old transaction history. Often, the data warehouse must assign a generalized key to these important dimensions in order to distinguish multiple snapshots of clients and branches over a period of time. There are three basic types of slowly changing dimensions: Type 1 – where a changed dimension attribute is overwritten; Type 2 – where a changed dimension attribute causes a new record to be created; Type 3 – where a changed dimension attribute causes an alternate attribute to be created so that both the old and the new values of the attribute are simultaneously accessible in the same dimension record. Data Warehousing Concepts, by Dr. Khalil 15 9- Deciding the Query Priorities and the Query Modes In this step we consider physical design issues. The most critical physical design issues affecting the end-user’s perception of the data mart are the physical sort order of the fact table on disk and the presence of pre-stored summaries or aggregations. Beyond these issues there are a host of additional physical design issues affecting administration, backup, indexing performance, and security. Data Warehousing Concepts, by Dr. Khalil 16 Example- Dimensional Model (Fact Constellation) for a Real Estate Data Warehouse At the end of this methodology, we have a design for a data mart that supports the requirements of a particular Real Estate business is designed for a Real Estate business process and also allows the easy integration with other related data marts to ultimately form the enterprise-wide data warehouse. We integrate the star schemas for the business processes of the Real Estate company using the conformed dimensions. For example, all the fact tables share the Time and Branch dimensions. A dimensional model, which contains more than one fact table sharing one or more conformed dimension tables, is referred to as a fact constellation. Data Warehousing Concepts, by Dr. Khalil 17 Example- Fact and Dimension Tables for each Business Process Business Process Fact Table Dimension Tables Property Sales PropertySale Time, Branch, Staff, PropertyForSale, Owner,ClientBuyer, Promotion Property Rentals Lease Time, Branch, Staff, PropertyForSale, Owner,ClientBuyer, Promotion Property Viewing PropertyViewing Time, Branch, Staff, PropertyForSale, PropertyForRent,ClientBuyer, ClientRenter Property Advertising Advert Time, Branch, Staff, PropertyForSale, PropertyForRent, Promotion, Newspaper Property Maintenance PropertyMaintenanc Time, Branch, Staff, e PropertyForRent Data Warehousing Concepts, by Dr. Khalil 18 Thank you Data Warehousing Concepts, by Dr. Khalil 19