Lecture 09: OLAP www.cl.cam.ac.uk/Teaching/current/Databases/ 1 2+2 /* Microsoft SQL Server 2005 */ /* By the way, it is just VHVyaW5nIG1hY2hpbmU= :-) */ WITH SubQuery(t, s, a, b) AS ( SELECT 0, 's', CAST ('<' AS VARCHAR(8000)), CAST ('110110' AS VARCHAR(8000)) UNION ALL David Srbecky SELECT t + 1, newS, CASE mv WHEN 'l' THEN SubString(curr.a, 1, Len(curr.a) - 1) WHEN 's' THEN SubString(curr.a, 1, Len(curr.a) - 1) + newZ WHEN 'r' THEN SubString(curr.a, 1, Len(curr.a) - 1) + newZ + Left(b + '_', 1) ELSE '?' END, CASE mv WHEN 'l' THEN newZ + b WHEN 's' THEN b WHEN 'r' THEN SubString(b, 2, ((Len(b)-1)+Abs(Len(b)-1))/2) ELSE '?' END FROM SubQuery AS curr, ( SELECT 's', '<', '1', '<', 'r' UNION ALL SELECT '1', '1', '1', '1', 'r' UNION ALL /* find 0 */ SELECT '1', '_', 'a', '0', 's' UNION ALL SELECT '1', '0', '2', '0', 's' UNION ALL SELECT '2', '0', '2', '0', 'r' UNION ALL /* find 1; left */ SELECT '2', '_', 'a', '_', 's' UNION ALL SELECT '2', '1', '3', '1', 'l' UNION ALL SELECT '3', '0', '4', '1', 's' UNION ALL /* 0 -> 1 */ SELECT '4', '1', '4', '1', 'r' UNION ALL /* find 0 or _; left */ SELECT '4', '_', '5', '_', 'l' UNION ALL SELECT '4', '0', '5', '0', 'l' UNION ALL SELECT '5', '1', '6', '0', 's' UNION ALL /* 1 -> 0 */ SELECT '6', '1', '6', '1', 'l' UNION ALL /* rewind */ SELECT '6', '0', '6', '0', 'l' UNION ALL SELECT '6', '<', 's', '<', 's' /* restart */ ) AS prog(currS, currZ, newS, newZ, mv) WHERE curr.s = currS AND Right(curr.a, 1) = currZ ) SELECT CharIndex('0', a + b) - 2 FROM SubQuery WHERE s = 'a' OPTION (MAXRECURSION 0); /* SELECT t, s, a + '.' + b FROM SubQuery OPTION (MAXRECURSION 0); */ 2 Acknowledgments • DB2/400: Mastering Data Warehousing Functions. (IBM Redbook) Chapters 1 & 2 only. http://www.redbooks.ibm.com/abstracts/sg245184.html • Data Warehousing and OLAP Hector Garcia-Molina (Stanford University) http://www.cs.uh.edu/~ceick/6340/dw-olap.ppt • Data Warehousing and OLAP Technology for Data Mining Department of Computing London Metropolitan University http://learning.unl.ac.uk/csp002n/CSP002N_wk2.ppt 3 uzz Words Buzz Words Buzz Words Buzz W • • • • • • • • • • • • • Data Warehouse (DW) Decision Support (DS) Data Marts (DM) Data Mining (DM) Enterprise Dashboard (ED) Multi-Dimensional Modeling (MDM) Online Analytic Processing (OLAP) Extract, Transform, and Load (ETL) MOLAP vs. ROLAP Three Letter Acronym (TLR) Drill Down, Roll up (DD+RU) Data vs. Knowledge (DvK) Data Cube vs. Sugar Cube (DCvSC) Don’t be surprised to see this sort of BDB (Blah-Dee-Blah) in the trade press: “The ED lets you transform enterprise data into knowledge with at-a-glance DS/DM and MDM, allowing interactive DD/RU over large DCs.” 4 OLTP vs. OLAP • • • • Database is operational Data is up-to-date Mostly updates Need to support high levels of update transactions • Normal form schemas are important • • • • Database is for analysis Data is historical Mostly reads Need to efficiently support complex queries, and only bulk loading of data • Schema optimized for query processing 5 Decision Support Systems Information Sources Data Warehouse Server (Tier 1) OLAP Servers (Tier 2) Clients (Tier 3) e.g., MOLAP Semistructured Sources Analysis serve Extract Transform Load Query/Reporting Data Warehouse serve e.g., ROLAP serve Operational DB’s Data Mining Data Marts 6 From Enrico Franconi CS 636 xOLAP • Multi-dimensional OLAP (MOLAP) – ‘A k-dimensional matrix based on a non relational storage structure.’ [Agrawal et al] • Relational OLAP (ROLAP) – ‘A relational back-end wherein operations of the data are translated to relational queries.’ [Agrawal et al] • Hybrid OLAP (HOLAP) – Integration of MOLAP with ROLAP. • Desktop OLAP (DOLAP) – Simplified versions of MOLAP or ROLAP. • ZOLAP – Speak with your chemist (normally only prescribed for death march victims) 7 Beware of Data Warehouse Death March Death March projects “use a forced march imposed upon relatively innocent victims, the outcome of which is usually a high casualty rate.” Edward Yourdon, 1997, Death March: The Complete Software Developer’s Guide to Surviving “Mission Impossible Projects” Data Warehouses and Decision Support systems are among the most complex and demanding in the IT world. Failure rates are very high…. 8 Relational data model • based on a single structure of data values in a two dimensional table CUSTOMER ORDER Cus_id Cus_name … Ord_no Ord_date Cus_id … 001 Robert … 01 02 Dec 02 002 … 002 Lyn … 02 03 Dec 02 Lyn … … … … … … … … 9 Data warehousing ___Multidimensional Data Sales volume as a function of product, month, and region Product Dimensions: Product, Location, Time Month 10 A Sample Data Cube 2Qtr 3Qtr 4Qtr sum U.S.A Canada Mexico Country TV PC VCR sum 1Qtr Date Total annual sales of TV in U.S.A. sum 11 A Concept Hierarchy for Dimension Location all all Europe region country city office Germany Frankfurt ... ... ... Spain North_America Canada Vancouver ... L. Chan ... ... Mexico Toronto M. Wind 12 Cuboids Corresponding to the Cube all 0-D(apex) cuboid product product,date date country product,country 1-D cuboids date, country 2-D cuboids 3-D(base) cuboid product, date, country 13 Multidimensional Data: A University Sample Data Cube Abraham Bridget Caroline Avg Business Computing Avg Average Mark of Abraham in Year 1. Year 1 Year 2 Year 3 Time Module Avg Students’ marks as a function of student, department, 14 Data Warehousing • “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.” —W. H. Inmon 15 OLAP Operations • Roll up (drill-up): summarize data – by climbing up hierarchy or by dimension reduction • Drill down (roll down): reverse of roll-up – from higher level summary to lower level summary or detailed data, or introducing new dimensions • Slice and dice: – project and select • Pivot (rotate): – reorient the cube, visualization, 3D to series of 2D planes. • Other operations – drill across: involving (across) more than one fact table – drill through: through the bottom level of the cube to its backend relational tables (using SQL) 16