Welcome to MIS420! Be ready, You are entering a zone where murphy’s law works like gravity Module II: Designing Datamarts Datawarehouse & Datamart OLAPs vs. OLTPs Dimensional Modeling Creating Physical Design Using SQL Mgt. Studio 2 BI System Components Module 1: Delivering BI Chapter 1, 2, 10,18– Larson Book • Flat Files • Transactions DB (OLTP) • XML Files • Excel Files • Etc. Data Modeling Capability Transaction Data Source (AKA Datawarehousing Capability) Module 2: Design a Datamart: Chapter 3 & 6 Larson Book Requirement Analysis Creating a Schema SS DB Engine Creating KPI Creating Reports Excel and Tableau Data Repository OLAP System • Datamart • Data Warehouse • Multidimensional Database - Cubes Module 4: Populate a DataMart Chapter 7 & 8 – Larson Book ETL Process SSI Services Module 3: Business Analytics Chapter 4,9, 10 – Larson Book Build an OLAP/Cube SSA Services Data Analysis Visualization • • • • Cube Browsing Reporting Dashboards Data Mining Data Modeling Capability: Datamart component Module 2: Design a Datamart: Chapter 3 & 6 Larson Book Requirement Analysis Creating a Schema SS DB Engine: our own DBE Account: KJ##NID –You will create all the Datamarts for ICA#2 and HW#2 in this account – use the naming convention wisely! Transaction Data Source • Flat Files • Transactions DB (OLTP) • XML Files • Excel Files • Etc. Data Repository OLAP System • Datamart (KJ##NID Account has the datamart tables and schemas) • DataWarehouse • Multidimensional Database - Cubes Module 4: Populate a DataMart Chapter 7 & 8 – Larson Book ETL Process SSI Services Module 3: Business Analytics Chapter 4,9, 10 – Larson Book Build an OLAP/Cube SSA Services Module 1: Delivering BI Chapter 1, 2, 10,18– Larson Book Creating KPI Creating Reports Excel and Tableau Data Analysis Visualization • • • • Cube Browsing Reporting Dashboards Data Mining Outline Data Warehouse Concept OLAPs vs. OLTPs (fundamental differences that suggest the need for different design approaches) Dimensional Modeling Creating Physical Design Using SQL Mgt. Studio 5 Datawarehouse & Datamart Concept and Characteristics 6 Data Warehouse Data Warehouse is a “central” repository for all or significant parts of the data that an enterprise's various business systems collect. A warehouse is a collection of data that is 1. 2. 3. 4. subject-oriented, integrated, time-variant and non-volatile. Provides a consolidated view of enterprise data, optimized for reporting and analysis. A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format Data Marts are smaller versions of warehouses 7 OLAP vs OLTP 8 OLAP vs. OLTP Online Transaction Processing Systems (OLTP): Systems that process transactions (e.g., order processing) – Inserting, Updating, Deleting appropriate records in a database at the end of each transaction. Online Analytical Processing Systems (OLAP): Systems that summarize&&analyze analyze a collection of transaction data. summarize 9 OLTP vs OLAP Relationship between OLTP and OLAP? Structural/Design differences? Purpose /Function difference? Difference in the type of data or information stored Size Users Data stored Performance Metric? 10 OLTP vs OLAP Relationship between OLTP and OLAP? OLTP a data source for OLAP Structural/Design differences? ER Modeling vs. Dimensional Modeling ER-Design vs. Star or Snow-Flake Design ER-Design -well structured steps, have been used and tested for decades vs. Star and Snow-Flake Design widely used for only a decade and still unstructured and the “rules” are not well established Application oriented vs. Subject oriented 11 OLTP vs OLAP Purpose /Function difference? OLTP process transactions vs. .OLAP conducts analysis (performance, gain insight) OLTP focus on transaction processing efficiencies vs. OLAP ease data retrieval that is cognitively less overloading (allows for “chunks” or “Cubes” of data to be viewed OLTP process repetitive transactions (insert, delete) and conduct simple manipulations (select , update) vs. OLAP involves examining (mostly read only) many data items, complex relationships and focuses on aggregates OLTP views detailed and flat transactions vs. OLAP multidimensional and aggregates 12 OLTP vs OLAP Difference in the type of data or information stored OLTP current and isolated vs. OLAP historic and consolidated OLTP stores data specific to a transaction vs. OLAP stores data specific to performance Size Users - OLTP has thousands of users vs. OLAP have hundreds or fewer users Data stored - OLTP stores 100s MB-GB vs. OLAP stores 100s GBTB Performance Metric? Transaction Throughput vs. OLAP Query Throughput Data Quality - “Dirty” data a major issue for OLAP 13 Dimensional Modeling Modeling Technique used to design data warehouses and data marts 14 ER Modeling vs. Dimensional Modeling ER Modeling Dimensional Modeling Transaction Capture Data Retrieval Reduce Data Redundancy – Intuitive and high query 15 highly normalized tables Hard for End-user to understand and remember Not query friendly All the attributes for an entity including categorical as well as numeric, belong to the entity table. Well defined theory driven process performance Categorical data in a 'dimension' entity and the 'fact' entity has mostly numeric attributes. The only categorical (nonfact) field in the fact table are the keys to dimension tables Process ill-defined…more of an art Dimensional Modeling – Benefits 16 1. Produce database structures that are easy for end users to understand and write queries against. 2. Optimize query performance (as opposed to update performance). 3. Scalability - Dimensional models are scalable and “easily” accommodate unexpected new data. Designing a Data Mart Conceptualize: Identifying the information that the decision makers need - measures, dimensions, hierarchies, and attributes. (Group Deliverable I) Design & Build the database structure for the data mart using either a star or snowflake schema. (Group Deliverable II) 17 Requirement Analysis –Decision Makers' Needs (GD#1) Business intelligence design must start with the decision makers What foundational and feedback information do they need? How do they need that information sliced and diced for proper analysis? More specifically: What facts, figures, statistics, and so forth do you need for effective decision making? (measures) How should this information be sliced and diced for analysis? (dimensions) What additional information can aid in decision making? (attributes) 18 Data Mart – Structure Data Mart’s Structure consists of the following two types of data objects Performance Measures (also referred as facts) Dimensions Hierarchies Attributes 19 Data Mart – Structure Performance Measures :A Measure is a numeric quantity expressing some aspect of the organization's performance. The information represented by this quantity is used to support or evaluate the decision making and performance of the organization. A measure can also be called a fact. Example – Total Sales. Information needed during the design process 1. 2. 3. 4. Name of the measure What fields should be used to supply the data (source) Data type (money, integer, decimal) Formula used to calculate the measure (if there is one) Measures define what the decision makers want to see 20 Data Mart – Structure Dimensions (Slicers): A Dimension is a categorization used to spread out an aggregate measure to reveal its constituent parts. Examples: “total sales by sales person by year” Dimension - Key words: "by," "for each," or "for every“ Information needed during the design process Name of the dimension What fields should be used to supply the data (source) Data type of the dimension's key (the code that uniquely identifies each member of the dimension) Name of the parent dimension (if there is one) The dimensions and hierarchies define how the decision maker wants to 21 view the data. Data Mart – Structure Hierarchy (Slicers; Drill Down): A Hierarchy is a structure made up of two or more levels of related dimensions. A dimension at an upper level of the hierarchy completely contains one or more dimensions from the next lower level of the hierarchy. Example: Time Dimension – Month, Quarter, Year. Hierarchies are used to organize dimensions into various levels 22 Hierarchies – “roll up cities into sales regions" or "drill down from year into quarter” Data Mart – Structure Attributes: An Attribute is an additional piece of information pertaining to a dimension member that is not the unique identifier or the description of the member. Example: Regional Manager’s information, Customers’ gender and age. Provides more contextual information about a dimension Information needed during the design process Name of the attribute What fields should be used to supply the data (source) Data type Name of the dimension to which it applies Allows decision makers to filter data 23 Dimensional Design – The Schema Key Principle - A dimensional schema physically separates the measures that quantify a subject’s performance (e.g., student, business, team, process) from the descriptive elements (a.k.a. dimensions) that summarize and categorize the performance. Two types of schema A Star Schema A Snow Flake Schema 24 Data Mart’s – Data Objects – Various Measures and Dimensions – how to configure? Dimensions Measures Measures Hierarchies Hierarchies 25 The main idea underlying this design Dim 3 Dim 6 Dim 1 Measure Group (Facts) Dim 5 Dim 2 Dim 4 26 The Star Schema 27 The Snow Flake Schema 28 The Tables Measures – All the measures are placed in a single table called the fact table in the schema The dimensions are places in their own table In the star schema, all the information for a hierarchy is stored in the same table. The information for the parent (or grandparent or great-grandparent, and so forth) dimension is added to the table containing the dimension at the lowest level of the hierarchy. The snowflake schema works a bit differently. In the snowflake schema, each level in the dimensional hierarchy has its own table. The dimension tables are linked together with foreign key relationships to form the hierarchy. 29 A Four Step Dimensional Modeling Process http://www.kimballgroup.com/ (Not in the book) Step 1: Describe the Business Process that the Datamart Supports & Identify the Sources of Measurement Key concept - Measurement Events Step 2: Declare the Fact Table Grain Key Concept – Fact Table Data Views Step 3: Choosing the Dimensions Key Concept – Cardinalities & Hierarchies Step 4: Choosing the Facts Key Concept – Its relationships with the measurement events and the grain 30 Dimension Modeling Details - Steps and Examples Refer to the Class Handout and LBD#1 for this section 31 Converting Logical Design to Physical Design Using SQL Mgt. Studio Refer to LBD#2 for this Section 32 Summary Overview of Data Warehouse concept – A data source for OLAPs OLTP vs OLAP – Compare and Contrast Dimensional Modeling Benefits Data Objects Data Structures Schemas – Logical and Physical 33 Process of Designing these schemas Logical Dimensional Schema 1. In a logical dimensional schema, the fact, measures, and dimensions are represented as entities and attributes that are independent of a database vendor and can be transformed to a physical dimensional schema for any database vendor (such as SQL Server 2012). 2. A logical schema conceptually separates the facts/measures and the dimensions surrounding measure events – i.e., illustrates the performance (in form of measures and dimensions) about which an individual/organization wants to collect data, and depicts relationships among the measures and dimensions. 3. A logical schema contains representations of facts, dimensions, hierarchies, and attributes, relationships, unique identifiers, and constraints between relationships. Physical Dimensional Schema After the logical objects and relationships are defined in a logical data model, you can use the SQL Server 2012 Management Studio to transform (i.e., digitize) the logical model into a database-specific physical representation in the form of a physical schema. A physical dimensional schema the objects in the star or snowflake schema are actually database tables. 34