University of Manitoba Asper School of Business 3500 DBMS Bob Travica Business Analytics and Decision Making OLTP, OLAP & SAP Chapter 9 & SAP Materials Updated 2018 D B S Y S T E M S OLTP vs. OLAP Online Transaction Processing (OLTP) = relational database systems Online Analytical Processing (OLAP) Category Data storage Indexes Joins Duplicated data Updates Queries OLTP 3NF tables Few Many Normalized, limited duplication Continuous, small data sets Specific OLAP Multidimensional cubes Many Minimal De-normalized DB In some intervals, Large data sets Ad hoc 2 of 20 D B S Y S T E M S OLAP via Data Warehousing MIS 3500 Predefined reports Interactive data analysis Operations’ data Periodical transfers Online Transaction Processing (OLTP): Querying Databases with 3NF tables Flat files Online Analytical Processing (OLAP); Data warehousing; Data Mining. Usually de-normalized data. 3 of 23 D B S Y S T E M S OLTP & OLAP in Enterprise Systems Enterprise Systems (Enterprise Resource Planning Systems) support both. Example: An SAP-based system can be a TPS, MIS and DSS for the entire organization. DSS capability draws on data warehousing & cubing. Process approach to organization with data flowing smoothly end-to-end. Process link up horizontally (department-todepartment) and vertically (process-sub-process). Business process* is for the most part the system process. More... 4 of 23 D B S Y S T E M S Date Warehousing Goals Data warehouse (DW): Integrates data from different sources to get a larger picture of business Yields multidimensional view of data by creating data cubes Allows for statistical analysis on large data sets (test hypotheses on relationships between pieces of data) Allows for discovering new relationships by querying cubes or by applying data mining software. 5 of 20 D B Extraction, Transformation, and Loading • Preparations performed on data – ETL process Transform/Transport S Y S T E M S Customers Extract Convert “Client” to “Customer” Load Apply standard product numbers Convert currencies Fix region codes Transaction data from diverse systems. Data warehouse: All data must be consistent. 6 of 20 D B S Y S T E M S Three-Dimensional View of Data: Cube • Created in a datawarehouse P1 Months in year P2 P3 P4 P5 Sales at Location Logic similar to crosstab query And pivot table. 7 of 20 D B S Y S T E M S Data Hierarchy Year Levels Quarter Roll-up To get higher-level totals Month Week Drill-down To get lower-level details Day 8 of 20 D B Datawarehouse Tables: Star Design Dimension S Y S T E M S Product ProductID Price Design is: - Hierarchical (dimension tables have no direct association) - De-normalized (fact table): Price & Quantity inputted to Fact table; Calculated fact Dimension Customer Location CustomerID LocationID Detail Detail Dimension Fact Table * Revenue=Price*Quantity per customer, product, period Sale Sale SaleDate SaleDate Quantity Quantity Discount Discount StoreID Inputted from Product, Sale & Customer; most dimensions replicated in Fact table. Revenue broken down by product, sales location, and desired time period (time column/s – day of year, or even smaller; basis for rollup). New keys usually used in the fact table (e.g., SaleTbl#-Row#). 9 of 20 D B Data Warehouse for Tyson Foods TYSON FROZEN FOODS CENTRAL FACT AND DIMENSIONS Central Fact CUSTOMER DIMENSION B S Y S T E M S PK DIMID_CUSTOMER SID_CUSTOMER **NOTE-- SEE DRAWING B COPATIME PK DIMID_COPATIME SID_COPATIME WWFWK-FISC WK WWFWE-FISC WK END Dimension tables (truncated) provide inputs for “facts” (calculated attributes) in Fact table. ORGANIZATION DIMENSION PK DIMID_ORGANIZATION SID_ORGANIZATION PRCTR- PROFIT CENTER WWPRS-PRICING SEGMENT WWMLG-BUSINESS DIVISION PK PK PK PK PK PK DIMID_PRODUCT DIMID_CUSTOMER DIMID_TIME DIMID_COPATIME DIMID_ORGANIZATION DIMID_DISTRIBUTION VV100-CASES VV101-UNIT OF MEASURE VV102-GROSS SALES VV105-PRODUCT MFG VV301-GEN DIV ADMIN COST VV403-ADVERTISING VV407-GUARANTEED LOANS VV409-INTERCO COGS VV412-INTERCO COST ADJ VV424-PRODUCT RELAT EXP VV426-SALES RELATED EXP VV433-PRICING PROMO VVA01-SURCHARGE VVA02-OFF INVOICE DISC VVA04-BILLBACK VVA06-SALES ADJUST VVA07-PRIMARY BROKERAGE VVA10-SGML EXPENSE VVA11-SHUTTLE FRT VVA13-OUTSIDE FREEZR COST VVA14-TYSN FREEZR CHARGE VVA15-DIRECT MAILING VVA16-SALES & MKT EXP VVA18-GEN AND ADMIN VVA19-G AND a VVB01--ACCRUED ADVERTISING VVB02-ACCRUED MKT VVB09-INTERCO SALES VVB10-STEVEDORING VVB12--SALES ACCRUAL VVB19--FREIGHT FWD VVB20-INCENTIVE BROKERAGE VVB32-GENERAL VVB34-R AND D ACCRUAL VVB35-REGULAR COOP VVB39-SAMS UPCHARGE VVB41-SPECIAL COOP CHARGES VVB42-SLOTTING VVB47-AD HOC PROGRAMS VVJ13-INVENTORY ADJUSTMENT VVJ17-CONTRACT COST DISCOUNT Central FactCol1 PRODUCT DIMENSION PK DIMID_PRODUCT SID_PRODUCT WWML3-MINOR PRODUCT LINE *NOTE-- SEE DRAWING A A DISTRIBUTION DIMENSION PK DIMID_DISTRIBUTION SID_DISTRIBUTION VKBUR-SALES PERSON ZZSBK- SECONDARY BROKER ZZPBK- PRIMARY BROKER BZIRK- SALES DISTRICT TIME DIMENSION PK DIMID_TIME SID_TIME 0CALMONTH 0CALQUART1 0CALYEAR 0FISCPER 0FISCVARNT 0CALWEEK 10 of 20 D B Datawarehouse Tables: Snowflake Design Product S Y S T E M S ItemID Description Price Category Design is: - Network-like (dimension tables can connect directly) - Still partly normalized (SaleCustomer-City) OLAPItems MerchTblRow SaleTblRow Price Quantity Fact Table • • Sale SaleID SaleDate CustomerID Discount SalesTax Dimension Tables City CityID ZipCode City State Customer CustomerID Phone FirstName LastName Address ZipCode CityID Advantage: Design of Fact table simpler (Customer, City out); faster processing in a narrower scope. 11 of 20 SAP Datawarehouse D B Datawarehouse Cube Details S Y S T E M S Can also be Dimensions More on SAP Datawarehouse 12 of 20 D B S Y S T E M S Multidimensional View of Data – Precursors to DW: Excel Pivot Table Dimension: Last Name Facts (Key Figures, Measures) Dimensions: Quarter, Month Quarter Month Quarter 1 Quarter 2 Quarter 3 Quarter 4 Grand Total LastName EmployeeIDData Carpenter 8 Sum of Animal 1,668.91 Sum of Merchandise 324.90 Eaton 6 Sum of Animal 522.37 Sum of Merchandise 30.60 Farris 7 Sum of Animal 5,043.36 Sum of Merchandise 826.92 Gibson 2 Sum of Animal 4,983.51 Sum of Merchandise 668.25 Hopkins 4 Sum of Animal 3,747.96 Sum of Merchandise 476.91 James 5 Sum of Animal 3,282.77 Sum of Merchandise 505.89 O'Connor 9 Sum of Animal 2,643.69 Sum of Merchandise 263.70 Reasoner 3 Sum of Animal 4,577.43 Sum of Merchandise 762.30 Reeves 1 Sum of Animal 1,120.93 Sum of Merchandise 263.88 Shields 10 Sum of Animal 1,008.76 Sum of Merchandise 62.10 Total Sum of Animal 28,599.69 Total Sum of Merchandise 4,185.45 606.97 78.30 426.39 99.00 341.85 54.90 1,059.70 188.10 1,549.83 238.50 1,194.88 252.90 2,373.08 693.45 180.91 83.70 625.74 89.10 372.65 121.50 437.88 99.00 510.12 55.80 589.68 116.80 7,591.11 1,624.05 162.15 22.50 2,840.72 569.50 7.20 128.70 562.50 107.10 796.47 306.00 2,556.10 450.90 128.41 7.20 150.11 99.00 2,500.24 396.90 6,701.03 1,495.80 2,709.47 630.90 1,426.72 192.60 6,899.53 1,321.02 9,089.44 1,357.65 5,443.90 858.51 6,243.84 1,397.34 3,334.72 403.20 8,293.09 1,365.10 1,120.93 263.88 1,170.91 84.60 45,732.55 7,874.80 Can place data in rows or columns. By grouping months, can instantly get quarterly or monthly totals. 13 of 20 D B S Y S T E M S Multidimensional View of Data: CUBE Option in SQL 99 SELECT Category, Month, Sum, GROUPING (Category) AS Gc, GROUPING (Month) AS Gm FROM … GROUP BY CUBE (Category, Month...) Category Bird Bird … Bird Bird Cat Cat … Cat (null) (null) (null) … (null) Month Amount Gc Gm 1 2 135.00 45.00 0 0 0 0 (null) (null) 1 2 32.00 607.50 396.00 113.85 0 1 0 0 0 0 0 0 (null) 1 2 3 1293.30 1358.8 1508.94 2362.68 1 0 0 0 0 1 1 1 (null) 8451.79 1 1 14 of 20 D B S Y S T E M S GROUPING SETS: Hiding Details SELECT Category, Month, Sum FROM … GROUP BY GROUPING SETS ( ROLLUP (Category), ROLLUP (Month), () ) Category Month Bird (null) Cat (null) … (null) 1 (null) 2 (null) 3 … (null) (null) Amount 607.50 1293.30 1358.8 1508.94 2362.68 8451.79 15 of 20 D B S Y S T E M S SQL RANK Functions SELECT Employee, SalesValue RANK() OVER (ORDER BY SalesValue DESC) AS rank DENSE_RANK() OVER (ORDER BY SalesValue DESC) AS dense FROM Sales ORDER BY SalesValue DESC, Employee; Employee SalesValue rank dense Jones 18,000 1 1 Smith 16,000 2 2 Black 16,000 2 2 White 14,000 4 3 DENSE_RANK does not skip numbers • Therefore, advances in SQL motivate DBMS vendors to support OLAP and data warehousing. 16 of 20 D B S Y S T E M S Broader Data Analysis with Data Mining Goal: To discover unknown relationships in the data that can be used to make better decisions. Exploratory analysis. A bottom-up approach that scans the data to find relationships Some statistical routines, but they are not sufficient Statistics relies on averages Sometimes the important data lies in more detailed pairs Supervised by developer vs. unsupervised (self-organizing artificial neural networks) 17 of 23 D B Common Techniques for Data Mining 1. Classification/Prediction S Y S T E M S 2. Association Rules/Market Basket Analysis 3. Clustering Some based heavily on classical statistics (1), others use specialized mining software (2), yet others combine statistics with specialized mining software (3). Currently, these techniques are considered part of data analytics. 18 of 23 D B S Y S T E M S 1. Classification (Prediction) Purpose: “Classify” things that are causes and those that are effects. Examples Which borrowers/loans are most likely to be successful? Which customers are most likely to want a new item? Which companies are likely to file bankruptcy? Which workers are likely to quit in the next six months? Which startup companies are likely to succeed? Which tax returns are fraudulent? 19 of 23 D B S Y S T E M S Classification Process Clearly identify the outcome/dependent variable. Identify potential variables that might affect the outcome. Use sample data to test and validate the model. Run the data through the model. Regression/correlation analysis, decision trees & tables (below). Income Credit History Job Stability Credit Success 50000 Good Good Yes 75000 Mixed Bad No 20 of 20 D B 2. Association/Market Basket Purpose: Determine what events or items go together/co-occur. S Y S T E M S Examples: What items are customers likely to buy together? (Business use: Consider putting the two together to increase cross-selling.) 21 of 23 D B S Y S T E M S Association Challenges If an item is rarely purchased, any other item bought with it seems important. So combine items into categories. Some relationships are obvious. Burger and fries. Some relationships are puzzling/meaningless. Hardware store found that toilet rings sell well only when a new store first opened. But what does it mean? Caution applies to data analytics: mere relationships without a background knowledge can be misleading. 22 of 20 D B S Y S T E M S 3. Cluster Analysis Purpose: Determine groups of people or some entities. Examples Are there groups of customers? (If so, we could target them; market segmentation) Do the locations for our stores have elements in common? (If so, we can search for similar clusters for new locations.) Do employees have common characteristics? (If so, we can hire similar, or dissimilar, people.) Large intercluster distance Small intracluster distance 23 of 23