2/10/2011 The Source of Business Intelligence 1 Contributors to Effective Decision Making Specific goals Measures to assess progress toward goals Specific – not vague Able to implement Foundational F d i l information i f i from feedback based on implemented Business B i Intelligence I t lli provides id foundation information and feedback information measures used as basis for decision making used to evaluate a decision made 2 Where does the Information for Business Intelligence Come From? Customer satisfaction surveys Market research Customer’s behavior (e.g. buying behavior) Transactional data Holds a wealth of information about customers 3 1 2/10/2011 Transactional Data Definition “Transactional data is the information stored to track the interactions, or business transactions, carried out by an organization.” Larson, L B. B (2008). (2008) Delivering D li i Business B i Intelligence with Microsoft SQL Server 2008. New York: McGraw-Hill Osborne. 4 Online Transaction Processing Definition “Online transaction processing (OLTP) systems record business interactions as they happen. They support the day-to-day operation of an organization.” Larson, L B. B (2008). (2008) Delivering D li i BBusiness i Intelligence with Microsoft SQL Server 2008. New York: McGraw-Hill Osborne. 5 Why Do We Need Another System for BI if the Data is Already in the OLTP Systems? OLTP BI Optimized for efficiently processing Data is not “per transaction” and storing transactions Highly normalized relational designs Not designed to efficiently deliver aggregates of large numbers of data points Can slow the system down OLTP systems support day-to-day business operations Some data in OLTP systems is archived so that the system continues to operate efficiently Transactional data may not be all in one place (accounting system, inventory system, etc.) data Aggregated, summarized Cumulative transaction results Over time – so need data that coincides with time in the past Need data across functional areas for BI 6 2 2/10/2011 Aggregate Definition “An aggregate is a number that is calculated from amounts in many detail records. An aggregate may be a sum of many numbers, but it can also be derived using other arithmetic operations or even from a count of the number of items in a group.” Larson, B. (2008). Delivering Business Intelligence with Microsoft SQL Server 2008. New York: McGraw-Hill Osborne. 7 Solution Extract data from OLTP systems and put it in a specially designed data mart. 8 Data Mart Definition “A data mart is a body of historical data in an electronic repository that does not participate in the daily operations of the organization I Instead, d this hi data d iis usedd to create business intelligence. The data in the data mart usually applies to a specific aspect of the business.” Larson, B. (2008). Delivering Business Intelligence with Microsoft SQL Server 2008. New York: McGraw-Hill Osborne. http://www.outcomesinc.com/odis_datamart 9 3 2/10/2011 Data Mart vs. Data Warehouse Data marts are smaller development projects that focus on a particular aspect of a business Data warehouse is a one-stop repository for all the historical data of a business 10 Data Mart Features Built for speed of access Can be a relational database Requires fewer table joins when data is retrieved for a view that is the basis of a report Denormalization D li i is i OK when h ddone to speedd up retrieval i l Design is organized around fact tables and dimension tables Result is a “star” schema Multiple stars from “snowflakes” 11 How Does Data Get In to the Data Mart? Loaded at regular intervals Based on extraction of data from OLTP systems Can be from disparate OLTP systems Data load = periodic copy of data from OLTP into data mart Automated, A t t d generally ll speaking ki Run at off-peak hours Once in the data mart, the data is largely static There is some latency in the data mart data Amount of latency is planned around business intelligence needs 12 4 2/10/2011 Data Cleansing Data Mart = OLTP + OLTP Problems Data marts often contain Different data types may data from multiple OLTP systems y Enables complex measures to be calculated and compared Data problems must be resolved before data is entered into the data mart exist across systems for the same qquantities Different unique identifiers for same entitity Different time periods used 13 Data Cleansing Definition “Data cleansing removes inconsistencies and errors from transactional data so it has the consistency necessary for f use ina data d mart.” Larson, B. (2008). Delivering Business Intelligence with Microsoft SQL Server 2008. NewYork: McGraw-Hill Osborne. 14 Extract – Transform - Load ETL Extract Transform (Data Cleansing) Load 15 5 2/10/2011 ETL Definition “The Extract, Transform, and Load (ETL) process: extracts data to copy from one or more OLTP systems, 2. Performs anyy required q data cleansingg to transform the data into a consistent format, 3. And loads the cleansed data by inserting it into the data mart.” Larson, B. (2008). Delivering Business Intelligence with Microsoft SQL Server 2008. NewYork: McGraw-Hill Osborne. 1. 16 Data Mart Structure Based on four categories of data used for BI Measures Dimensions Attributes Hierarchies 17 Measures Definition “A measure is a numeric quantity expressing some aspect of the organization’s performance. The information represented by this quantity is used to support or evaluate the decision making and performance of the organization.” Larson, B. (2008). Delivering Business Intelligence with Microsoft SQL Server 2008. New York: McGraw-Hill Osborne. A measure is often also called a fact 18 6 2/10/2011 Dimensions Definition A dimension is a categorization used to spread out an aggregate measure to reveal its constituent parts. Larson, B. (2008). Delivering Business Intelligence with Microsoft SQL Server 2008. 2008 New York: McGraw-Hill Osborne. 19 Dimensions Total Sales (a fact) Total Sales for 2005 (by year adds time as a dimension) Total Sales of Mythic World for 2005 (by product by year adds product as a second dimension) 20 Find the Total Sales by Product by Year and by Region Region South Central Mythic World Product 2005 21 Year 7 2/10/2011 The Star Schema A relational database schema that holds measures and dimensions in a data mart Fact tables contain measures Dimension tables One O table t bl per di dimension i 22 Star Schema 3 dimensions 1 fact table 23 Fact Table Has a column for the measure Could be multiple measures Has fields for dimension keys (FKs) PK is a composite key made up of all the dimension FKs 24 8 2/10/2011 Dimension Tables More than one because you usually want to be able to see the data from more than one perspective Field for unique identifier Field for description 25 Attributes Definition “An attribute is an additional piece of information pertaining to a dimension memberthat is not the unique identifier or the description of the member.” Larson, L B. B (2008). (2008) Delivering D li i BBusiness i Intelligence with Microsoft SQL Server 2008. New York: McGraw-Hill Osborne. Often included as filtering agents Become additional fields in the corresponding dimension table 26 Hierarchies Definition “A hierarchy is a structure made up of two or more levels of related dimensions. A dimension at an upper level of the hierarchy completely contains one or more dimensions from the next lower level of the hierarchy.” Larson, B. (2008). Deliveringg Business Intelligence g with Microsoft SQL Server 2008. NewYork: McGrawHill Osborne. Example:Year – Quarter – Month Example: ProductType – ProductSubType - Product Enable drill-down-roll-up access to more granularity or more aggregation as needed 27 9 2/10/2011 Structuring Hierarchies PK is at the lowest level of the hierarchy FK in fact table must point to the lowest level of the hierarchy Fact table will have one row = one set of measures for each unique combination of members at the lowest level of all the hierarchies Measures at higher levels of the hierarchy are calculated rather than stored Aggregates: Sum, Average 28 Snowflake Schema Each level of a hierarchy is stored in a separate dimenion table Advantages Good relational design Easier to maintain Disadvantages Requires a number of table joins when performing aggregation operations Performance issue in large databases Business intelligence make information readily available to decision makers 29 The Source of Business Intelligence 30 10