Group By

advertisement
Data Cube
and
OLAP Server
Madhavi Gundavarapu
Outline
•
•
•
•
•
•
•
What is Data Analysis?
Steps in Data Analysis
SQL-92 Aggregate Functions
Limitations of GROUP BY
OLAP Server
CUBE Operator
ROLLUP Operator
Madhavi Gundavarapu
Data Cube and OLAP Server
2
What is Data Analysis?
query
DATA
ANALYSIS
exact
response
• User issues a query, receives a response and
formulates the next query based on the
response
• This process repeats until the user gets the
required result
• Fundamentally an iterative process
Madhavi Gundavarapu
Data Cube and OLAP Server
3
Why Data Analysis?
•
•
•
•
•
Search for unusual patterns of data
Summarize data values
Extract statistical information
Contrast one category with another
Provide a consolidated view of enterprise
data buried in OLTP databases
– Help Decision makers understand business trends
• Derive intelligible results from ad hoc,
voluminous and scattered data
Madhavi Gundavarapu
Data Cube and OLAP Server
4
Steps in Data Analysis
• Formulate query
200
150-200
150
100-150
• Extract aggregated data
100
50-100
0-50
50
• Visualize results
Blue
0
1990
• Analyze
1991
1992
Red
ALL
Analyze &
Formulate
Extract
Visualize
Madhavi Gundavarapu
Data Cube and OLAP Server
5
Overview of SQL-92
• SQL has several aggregate operators:
– sum(), count(), avg(), min(), max()
• The basic idea is:
– Combine all values in a column
– into a single scalar value
SUM()
• Syntax
– SELECT sum(units)
FROM
inventory;
Madhavi Gundavarapu
Data Cube and OLAP Server
6
Overview of SQL-92 (contd.):
Distinct Clause
•DISTINCT
– Allows aggregation over distinct values
– Example
SELCT COUNT(DISTINCT locations)
FROM inventory;
Madhavi Gundavarapu
Data Cube and OLAP Server
7
Overview of SQL-92 (contd.):
GROUP BY Clause
• Group By allows aggregates over table sub-groups
SELCT
location, sum(units)
• Result is a new table
FROM
inventory
• Syntax:
GROUP BY location
HAVING
Table
A
A
A
B
B
B
B
B
C
C
C
C
C
D
D
Madhavi Gundavarapu
attribute
nation = “USA”;
SUM()
A
B
C
D
Data Cube and OLAP Server
8
Limitations of GROUP BY
• Users want CrossTabs
– GROUP BY is limited to 0-D and 1-D aggregates
M T W T F S S
•
AIR
HOTEL
FOOD
MISC
•
• Users want sub-totals and totals
– drill-down & roll-up reports
sum
Madhavi Gundavarapu
Data Cube and OLAP Server
9
Multidimensional Data
• Measure Attributes
• Dimension Attributes
• Example
Item-name
Skirt
Skirt
Skirt
…
Model
Chevy
Chevy
Chevy
…
Madhavi Gundavarapu
Color
Dark
Pastel
White
…
Year
1990
1990
1990
…
Size
Large
Large
Large
…
Color
Red
White
Blue
…
Data Cube and OLAP Server
Number
10
20
15
…
Sales
5
87
62
…
10
OLAP System
• On-Line Analytical Processing System
• Interactive system
• Permits analysts to view summaries of
multidimensional data
• On-Line indicates
– No long waits to see result of a query
– response times within a few seconds for new
summaries
• View data at different levels of granularity
Madhavi Gundavarapu
Data Cube and OLAP Server
11
SQL:1999 OLAP Extensions
• SQL-92 functionality was limited
• SQL:1999 standard defines
– CUBE
– ROLLUP
– as generalizations of GROUP BY clause
Madhavi Gundavarapu
Data Cube and OLAP Server
12
CUBE : Relational Aggregate Operator
•N-dimensional generalization of simple aggregate functions
Aggregate
Group By
(with total)
Sum
By Color
RED
WHITE
BLUE
Cross Tab
Chevy Ford
Sum
By Color
RED
WHITE
BLUE
The Data Cube and
The Sub-Space Aggregates
By Make
Sum
By Year
By Make
By Make & Year
RED
WHITE
BLUE
By Color & Year
Sum
Madhavi Gundavarapu
Data Cube and OLAP Server
By Make & Color
By Color
13
CUBE : The Idea
• 0-dimensional Aggregate (sum(), max(),...)
• a1, a2, ...., aN, f()
• Super-aggregate over 1-Dimensional sub-cubes
•
•
•
•
ALL, a2, ...., aN , f()
a1, ALL, a3, ...., aN , f()
...
a1, a2, ...., ALL, f()
• Super-aggregate over 2-Dimensional sub-cubes
• ALL, ALL, a3, ...., aN , f()
• ...
• a1, a2 ,...., ALL, ALL, f()
Madhavi Gundavarapu
Data Cube and OLAP Server
14
An Example
C h ev y S a les C ro ss T a b
C h ev y 1 9 9 0 1 9 9 1 1 9 9 2 T ota l (A L L )
b la ck
w h ite
T o ta l
(A L L )
50
40
90
85
115
200
154
199
353
289
354
1286
SELECT
model, year, color, sum(sales) as sales
FROM
sales
WHERE
model in (‘Chevy’)
AND
year BETWEEN 1990 AND 1992
GROUP BY CUBE (model, year, color);
Madhavi Gundavarapu
Data Cube and OLAP Server
15
CUBE Contd.
SELECT
model, year, color, sum(sales) as sales
FROM
sales
WHERE
model in (‘Chevy’)
AND
year BETWEEN 1990 AND 1992
GROUP BY
CUBE (model, year, color);
• Computes union of 8 different groupings:
– {(model, year, color), (model, year),
(model, color), (year, color), (model),
(year), (color), ()}
Madhavi Gundavarapu
Data Cube and OLAP Server
16
Example Contd.
SALES
Model Year Color
Chevy 1990 red
Chevy 1990 white
Chevy 1990 blue
Chevy 1991 red
Chevy 1991 white
Chevy 1991 blue
Chevy 1992 red
Chevy 1992 white
Chevy 1992 blue
Ford 1990 red
Ford 1990 white
Ford 1990 blue
Ford 1991 red
Ford 1991 white
Ford 1991 blue
Ford 1992 red
Ford 1992 white
Ford 1992 blue
Madhavi Gundavarapu
Sales
5
87
62
54
95
49
31
54
71
64
62
63
52
9
55
27
62
39
CUBE
Data Cube and OLAP Server
DATA CUBE
Model Year Color
ALL
ALL
ALL
chevy ALL
ALL
ford
ALL
ALL
ALL
1990 ALL
ALL
1991 ALL
ALL
1992 ALL
ALL
ALL
red
ALL
ALL
white
ALL
ALL
blue
chevy 1990 ALL
chevy 1991 ALL
chevy 1992 ALL
ford
1990 ALL
ford
1991 ALL
ford
1992 ALL
chevy ALL
red
chevy ALL
white
chevy ALL
blue
ford
ALL
red
ford
ALL
white
ford
ALL
blue
ALL
1990 red
ALL
1990 white
ALL
1990 blue
ALL
1991 red
ALL
1991 white
ALL
1991 blue
ALL
1992 red
ALL
1992 white
ALL
1992 blue
Sales
942
510
432
343
314
285
165
273
339
154
199
157
189
116
128
91
236
183
144
133
156
69
149
125
107
104
104
59
116
110
17
GROUPING Function
• SQL:1999 uses NULL to represent both
ALL and regular null values
• GROUPING function
– Can be applied to an attribute
– Returns 1 if NULL value represents ALL
– Returns 0 in all other cases
Madhavi Gundavarapu
Data Cube and OLAP Server
18
GROUPING Example
SELECT
model, year, color, sum(sales) as sales,
GROUPING(model) as model_flag,
GROUPING(year) as year_flag,
GROUPING(color) as color_flag
FROM
sales
WHERE
model in (‘Chevy’)
AND
year BETWEEN 1990 AND 1992
GROUP BY CUBE (model, year, color);
Madhavi Gundavarapu
Data Cube and OLAP Server
19
Rollup and Drill down
• Allow analysts to view data at any desired
level of granularity
• Rollup
– Operation of moving from finer-granularity of
data to a coarser granularity
• Drill Down
– Operation of moving from coarser-granularity
of data to a finer granularity
– Cannot be generated from coarse-granularity
data
– Has to be computed from original data
Madhavi Gundavarapu
Data Cube and OLAP Server
20
ROLLUP Operator
• Rollup example
SELECT
model, year, color, sum(sales) as sales
FROM
sales
WHERE
model in (‘Chevy’)
AND
year BETWEEN 1990 AND 1992
GROUP BY ROLLUP (model, year, color);
• Only 4 groupings are generated
– {(model, year, color), (model, year),
(model), ()}
Madhavi Gundavarapu
Data Cube and OLAP Server
21
Summary
• SQL-92 has limited functionality to support
OLAP operations
• SQL:1999 has introduced extensions to
address these limitations
– provides operators such as CUBE, GROUPING and ROLLUP
Madhavi Gundavarapu
Data Cube and OLAP Server
22
Questions
Madhavi Gundavarapu
Data Cube and OLAP Server
23
Download