Data Cube and OLAP Server Madhavi Gundavarapu Outline • • • • • • • What is Data Analysis? Steps in Data Analysis SQL-92 Aggregate Functions Limitations of GROUP BY OLAP Server CUBE Operator ROLLUP Operator Madhavi Gundavarapu Data Cube and OLAP Server 2 What is Data Analysis? query DATA ANALYSIS exact response • User issues a query, receives a response and formulates the next query based on the response • This process repeats until the user gets the required result • Fundamentally an iterative process Madhavi Gundavarapu Data Cube and OLAP Server 3 Why Data Analysis? • • • • • Search for unusual patterns of data Summarize data values Extract statistical information Contrast one category with another Provide a consolidated view of enterprise data buried in OLTP databases – Help Decision makers understand business trends • Derive intelligible results from ad hoc, voluminous and scattered data Madhavi Gundavarapu Data Cube and OLAP Server 4 Steps in Data Analysis • Formulate query 200 150-200 150 100-150 • Extract aggregated data 100 50-100 0-50 50 • Visualize results Blue 0 1990 • Analyze 1991 1992 Red ALL Analyze & Formulate Extract Visualize Madhavi Gundavarapu Data Cube and OLAP Server 5 Overview of SQL-92 • SQL has several aggregate operators: – sum(), count(), avg(), min(), max() • The basic idea is: – Combine all values in a column – into a single scalar value SUM() • Syntax – SELECT sum(units) FROM inventory; Madhavi Gundavarapu Data Cube and OLAP Server 6 Overview of SQL-92 (contd.): Distinct Clause •DISTINCT – Allows aggregation over distinct values – Example SELCT COUNT(DISTINCT locations) FROM inventory; Madhavi Gundavarapu Data Cube and OLAP Server 7 Overview of SQL-92 (contd.): GROUP BY Clause • Group By allows aggregates over table sub-groups SELCT location, sum(units) • Result is a new table FROM inventory • Syntax: GROUP BY location HAVING Table A A A B B B B B C C C C C D D Madhavi Gundavarapu attribute nation = “USA”; SUM() A B C D Data Cube and OLAP Server 8 Limitations of GROUP BY • Users want CrossTabs – GROUP BY is limited to 0-D and 1-D aggregates M T W T F S S • AIR HOTEL FOOD MISC • • Users want sub-totals and totals – drill-down & roll-up reports sum Madhavi Gundavarapu Data Cube and OLAP Server 9 Multidimensional Data • Measure Attributes • Dimension Attributes • Example Item-name Skirt Skirt Skirt … Model Chevy Chevy Chevy … Madhavi Gundavarapu Color Dark Pastel White … Year 1990 1990 1990 … Size Large Large Large … Color Red White Blue … Data Cube and OLAP Server Number 10 20 15 … Sales 5 87 62 … 10 OLAP System • On-Line Analytical Processing System • Interactive system • Permits analysts to view summaries of multidimensional data • On-Line indicates – No long waits to see result of a query – response times within a few seconds for new summaries • View data at different levels of granularity Madhavi Gundavarapu Data Cube and OLAP Server 11 SQL:1999 OLAP Extensions • SQL-92 functionality was limited • SQL:1999 standard defines – CUBE – ROLLUP – as generalizations of GROUP BY clause Madhavi Gundavarapu Data Cube and OLAP Server 12 CUBE : Relational Aggregate Operator •N-dimensional generalization of simple aggregate functions Aggregate Group By (with total) Sum By Color RED WHITE BLUE Cross Tab Chevy Ford Sum By Color RED WHITE BLUE The Data Cube and The Sub-Space Aggregates By Make Sum By Year By Make By Make & Year RED WHITE BLUE By Color & Year Sum Madhavi Gundavarapu Data Cube and OLAP Server By Make & Color By Color 13 CUBE : The Idea • 0-dimensional Aggregate (sum(), max(),...) • a1, a2, ...., aN, f() • Super-aggregate over 1-Dimensional sub-cubes • • • • ALL, a2, ...., aN , f() a1, ALL, a3, ...., aN , f() ... a1, a2, ...., ALL, f() • Super-aggregate over 2-Dimensional sub-cubes • ALL, ALL, a3, ...., aN , f() • ... • a1, a2 ,...., ALL, ALL, f() Madhavi Gundavarapu Data Cube and OLAP Server 14 An Example C h ev y S a les C ro ss T a b C h ev y 1 9 9 0 1 9 9 1 1 9 9 2 T ota l (A L L ) b la ck w h ite T o ta l (A L L ) 50 40 90 85 115 200 154 199 353 289 354 1286 SELECT model, year, color, sum(sales) as sales FROM sales WHERE model in (‘Chevy’) AND year BETWEEN 1990 AND 1992 GROUP BY CUBE (model, year, color); Madhavi Gundavarapu Data Cube and OLAP Server 15 CUBE Contd. SELECT model, year, color, sum(sales) as sales FROM sales WHERE model in (‘Chevy’) AND year BETWEEN 1990 AND 1992 GROUP BY CUBE (model, year, color); • Computes union of 8 different groupings: – {(model, year, color), (model, year), (model, color), (year, color), (model), (year), (color), ()} Madhavi Gundavarapu Data Cube and OLAP Server 16 Example Contd. SALES Model Year Color Chevy 1990 red Chevy 1990 white Chevy 1990 blue Chevy 1991 red Chevy 1991 white Chevy 1991 blue Chevy 1992 red Chevy 1992 white Chevy 1992 blue Ford 1990 red Ford 1990 white Ford 1990 blue Ford 1991 red Ford 1991 white Ford 1991 blue Ford 1992 red Ford 1992 white Ford 1992 blue Madhavi Gundavarapu Sales 5 87 62 54 95 49 31 54 71 64 62 63 52 9 55 27 62 39 CUBE Data Cube and OLAP Server DATA CUBE Model Year Color ALL ALL ALL chevy ALL ALL ford ALL ALL ALL 1990 ALL ALL 1991 ALL ALL 1992 ALL ALL ALL red ALL ALL white ALL ALL blue chevy 1990 ALL chevy 1991 ALL chevy 1992 ALL ford 1990 ALL ford 1991 ALL ford 1992 ALL chevy ALL red chevy ALL white chevy ALL blue ford ALL red ford ALL white ford ALL blue ALL 1990 red ALL 1990 white ALL 1990 blue ALL 1991 red ALL 1991 white ALL 1991 blue ALL 1992 red ALL 1992 white ALL 1992 blue Sales 942 510 432 343 314 285 165 273 339 154 199 157 189 116 128 91 236 183 144 133 156 69 149 125 107 104 104 59 116 110 17 GROUPING Function • SQL:1999 uses NULL to represent both ALL and regular null values • GROUPING function – Can be applied to an attribute – Returns 1 if NULL value represents ALL – Returns 0 in all other cases Madhavi Gundavarapu Data Cube and OLAP Server 18 GROUPING Example SELECT model, year, color, sum(sales) as sales, GROUPING(model) as model_flag, GROUPING(year) as year_flag, GROUPING(color) as color_flag FROM sales WHERE model in (‘Chevy’) AND year BETWEEN 1990 AND 1992 GROUP BY CUBE (model, year, color); Madhavi Gundavarapu Data Cube and OLAP Server 19 Rollup and Drill down • Allow analysts to view data at any desired level of granularity • Rollup – Operation of moving from finer-granularity of data to a coarser granularity • Drill Down – Operation of moving from coarser-granularity of data to a finer granularity – Cannot be generated from coarse-granularity data – Has to be computed from original data Madhavi Gundavarapu Data Cube and OLAP Server 20 ROLLUP Operator • Rollup example SELECT model, year, color, sum(sales) as sales FROM sales WHERE model in (‘Chevy’) AND year BETWEEN 1990 AND 1992 GROUP BY ROLLUP (model, year, color); • Only 4 groupings are generated – {(model, year, color), (model, year), (model), ()} Madhavi Gundavarapu Data Cube and OLAP Server 21 Summary • SQL-92 has limited functionality to support OLAP operations • SQL:1999 has introduced extensions to address these limitations – provides operators such as CUBE, GROUPING and ROLLUP Madhavi Gundavarapu Data Cube and OLAP Server 22 Questions Madhavi Gundavarapu Data Cube and OLAP Server 23