The Model Clause explained

advertisement
The Model Clause explained
Tony Hasler, UKOUG Birmingham
2012
Tony Hasler, Anvil Computer Services Ltd.
Who is Tony Hasler?
Google Tony Hasler!
My blog contains all the material from this presentation on the front
page
Additional blog entry (October) on the model clause
Tony Hasler http://tonyhasler.wordpress.com
Health Warning about the model clause
No support for Active Session History (ASH) whilst on the CPU
Limited support for SQL Performance Monitor
No session statistics related to modelling
No trace events for diagnosis (that I can find)
Performance is poor when data spills to disk
Introduced in 10g but no new features in 11g
The main example in chapter 22 (10g) or chapter 23 (11g) in the
Data Warehousing Guide (calculating mortgage amortization) is
incorrect and always has been (mortgage fact ‘Payment’ should be
‘PaymentAmt’)
Perhaps this is not a flagship component!
However, the model clause:
− Enables you to do stuff that cannot otherwise be (easily) done in
SQL
Hasler http://tonyhasler.wordpress.com
− The parallelisationTony
capabilities
can be useful in some cases
What is in this talk and what is not
IN:
− What the model clause is
− What it is used for and what it isn’t
− Syntax and semantics of the main features
− Recommendations on how to use the model clause
− A real life example
Not in due to time constraints:
− An exhaustive list of all model clause features
− This can be found in the Data Warehousing Guide Chapter 22
(10g) or Chapter 23 (11g)
− No demos
Tony Hasler http://tonyhasler.wordpress.com
What the model clause is and what is it used for
Part of a query block meaning it can appear after any select
keyword.
Provides spreadsheet like capabilities inside SQL
Can perform unimplemented SQL features, e.g. moving medians.
=> Just because something is inefficient to execute doesn’t
mean you don’t need to do it!
Iteration is possible, such as with ‘Feedback loops’ (see Example
4 Calculating Using Simultaneous Equations in the Data
Warehousing guide)
A series of calculations within the model can be sequenced
manually or dependencies can be determined by the engine (only
use when necessary, not because of laziness)
Tony Hasler http://tonyhasler.wordpress.com
Model terms compared
with Excel
Rows/columns <->
Dimensions
Formulas <-> Rules
Worksheets <->
Partitions
Tony Hasler http://tonyhasler.wordpress.com
Values <->
Measures
Comparison of Model clause terms with Excel
spreadsheet terms
Model Term
Excel term
Differences
PARTITION
Worksheet
It is possible for formulas in one Excel
worksheet to reference cells in another. It
is not possible for partitions in a model
clause to reference each other.
DIMENSION
Row and column
In Excel there are always exactly two
dimensions. In a model clause there are
one or more.
MEASURE
A cell value
In Excel only one value is referenced by
the dimensions. The model clause allows
one or more measures.
RULES
Formulas
Nested cell references possible with
models
Tony Hasler http://tonyhasler.wordpress.com
Overview of a query block with no model clause
Joins and outer join predicates
Only scalar functions allowed
Logically precedes
WHERE clause: inner join predicates and
selection predicates
Only scalar functions allowed
Logically precedes
Analytic calculations
GROUP BY clause
Only scalar functions allowed
Logically precedes
HAVING clause
Scalar and aggregate functions allowed
Logically precedes
SELECT list
Logically precedes
Scalar, aggregate, and analytic functions
allowed
DISTINCT
Logically precedes
ORDER BY clause
Scalar, aggregate, and analytic clauses
allowed
Tony Hasler http://tonyhasler.wordpress.com
Overview of a query block with a model clause
Analytic calculations
Joins, join predicates, selection predicates
and GROUP BY clause
Only scalar functions allowed
Logically precedes
HAVING clause
Scalar and aggregate functions allowed
Logically precedes
Model PARTITION, DIMENSION and
MEASURES clauses
Scalar, aggregate, and analytic functions
allowed
Logically precedes
Model RULES clause
Independent calculations
Logically precedes
SELECT list
Logically precedes
Only scalar functions on model outputs
allowed
DISTINCT
Logically precedes
ORDER BY clause
Only scalar functions on model outputs
allowed
Tony Hasler http://tonyhasler.wordpress.com
Recommendations on use of the model clause
Use factored subqueries to avoid mixing aggregate functions and
analytic functions with the model clause
Use only scalar values as inputs to the model clause
Perform all calculations in the model clause so that…
The final select list is just a simple list of identifiers
Be careful using the model clause in a sub query as the CBO
assumes that the cardinality of the output of a model clause is the
same as the cardinality of the input.
Tony Hasler http://tonyhasler.wordpress.com
A simplified version of the real world problem
CREATE TABLE stock_holdings
(
customer_name
VARCHAR2 (100)
,stock_name
VARCHAR2 (100)
,business_date
DATE
,VALUE
NUMBER
);
Calculate the moving one year average (mean) value and moving
standard deviation, and zscore for each customer, stock, and
business date.
Tony Hasler http://tonyhasler.wordpress.com
Mathematical terms loosely explained
The Average (mean) is a “typical value”
The Standard Deviation is a “typical difference between a value
and the mean”
The Zscore is the number of standard deviations that a particular
value deviates from the mean. In other words it is a measure of
how unusual a value is.
Tony Hasler http://tonyhasler.wordpress.com
Problem 1: Oracle doesn’t support moving intervals of
months or years properly!!!
WITH q1
AS (
SELECT DATE '2012-01-01' + ROWNUM mydate
FROM DUAL
CONNECT BY LEVEL <= 400)
,q2
AS (SELECT mydate
,COUNT (*)
OVER (
ORDER BY mydate
RANGE BETWEEN INTERVAL '1' YEAR PRECEDING AND CURRENT ROW)
cnt
FROM q1)
SELECT *
FROM q2
WHERE mydate = DATE '2013-01-10';
Result: MYDATE
10/01/2013
CNT
367
Tony Hasler http://tonyhasler.wordpress.com
Model clause solution (1)
WITH q1
AS (
SELECT DATE '2012-01-01' + ROWNUM mydate, 0 cnt
FROM DUAL
CONNECT BY LEVEL <= 400)
SELECT mydate, cnt
FROM q1
MODEL RETURN UPDATED ROWS
DIMENSION BY (mydate)
MEASURES (cnt)
RULES
(cnt [DATE '2013-01-10'] =
COUNT (*)
[mydate BETWEEN ADD_MONTHS (CV () + 1, -12) AND CV ()]);
Tony Hasler http://tonyhasler.wordpress.com
Model clause solution (2)
CREATE TABLE business_dates
(
business_date
PRIMARY KEY NOT NULL
,business_days_in_year
NOT NULL
,first_day_in_year
NOT NULL
,business_day_number
NOT NULL
)
ORGANIZATION INDEX
AS
WITH q1 AS (
SELECT DISTINCT business_date, 0 AS business_days_in_year,
SYSDATE AS first_day_in_year, 0 AS business_day_number FROM stock_holdings
)
SELECT business_date
,business_days_in_year
,first_day_in_year
,business_day_number
FROM q1
MODEL
DIMENSION BY (business_date)
MEASURES(business_days_in_year,first_day_in_year,business_day_number)
RULES (
first_day_in_year[ANY] = ADD_MONTHS(CV(business_date)+1,-12),
business_days_in_year[ANY] =
COUNT(*)[business_date BETWEEN first_day_in_year[CV()] AND CV()],
business_day_number[ANY] = ROW_NUMBER() OVER(ORDER BY BUSINESS_DATE));
Tony Hasler http://tonyhasler.wordpress.com
Execution plan for previous statement
------------------------------------------------------| Id | Operation
| Name
|
------------------------------------------------------|
0 | CREATE TABLE STATEMENT
|
|
|
1 | LOAD AS SELECT
| BUSINESS_DATES |
|
2 |
SQL MODEL ORDERED
|
|
|
3 |
VIEW
|
|
|
4 |
HASH UNIQUE
|
|
|
5 |
TABLE ACCESS FULL
| STOCK_HOLDINGS |
|
6 |
WINDOW (IN SQL MODEL) SORT|
|
-------------------------------------------------------
Tony Hasler http://tonyhasler.wordpress.com
Problem 2: avoiding data densification
Oracle recommends the use of partitioned outer joins for data
densification to simplify analytic functions.
The model clause can also be used for data densification
In my case, however, this would have multiplied the number of rows
enormously as customers tended to hold particular stocks for about
10% of the time.
We can use the model clause to avoid this by performing multiple
calculations in sequence
Tony Hasler http://tonyhasler.wordpress.com
Model clause solution
SELECT customer_name,stock_name,business_date,first_day_in_year,VALUE,mov_avg,mov_stdd,zscore
FROM stock_holdings sh, business_dates bd
WHERE sh.business_date=bd.business_date
MODEL
PARTITION BY (customer_name,stock_name)
DIMENSION BY (sh.business_date)
MEASURES (VALUE,business_days_in_year, first_day_in_year,0 AS mov_avg,
0 AS mov_stdd, 0 AS zscore, 0 AS mov_sum, 0 AS mov_cnt)
RULES (
mov_sum[ANY] = SUM(VALUE)[business_date BETWEEN first_day_in_year[CV()]AND CV()],
mov_cnt[ANY] = COUNT(*) [business_date BETWEEN first_day_in_year[CV()] AND CV()],
mov_avg[ANY] = mov_sum[CV()]/
business_days_in_year[CV()],
mov_stdd[ANY] =
SQRT((
(VAR_POP(VALUE)[business_date BETWEEN first_day_in_year[CV()] AND CV()]
*mov_cnt[CV()])
+(mov_sum[CV()]*
(AVG(VALUE)[business_date BETWEEN first_day_in_year[CV()] AND CV()]
- mov_avg[CV()])))
/business_days_in_year[CV()]
),
zscore[ANY] = DECODE(mov_stdd[CV()],0,0,(VALUE[CV()]-mov_avg[CV()])/mov_stdd[CV()])
);
Tony Hasler http://tonyhasler.wordpress.com
Parallel execution plan
-------------------------------------------------------| Id | Operation
| Name
|
-------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
1 | PX COORDINATOR
|
|
|
2 |
PX SEND QC (RANDOM)
| :TQ10001
|
|
3 |
BUFFER SORT
|
|
|
4 |
SQL MODEL ORDERED
|
|
|
5 |
PX RECEIVE
|
|
|
6 |
PX SEND HASH
| :TQ10000
|
|
7 |
NESTED LOOPS
|
|
|
8 |
PX BLOCK ITERATOR |
|
|
9 |
TABLE ACCESS FULL| STOCK_HOLDINGS
|
|* 10 |
INDEX UNIQUE SCAN | SYS_IOT_TOP_78729 |
-------------------------------------------------------Tony Hasler http://tonyhasler.wordpress.com
Summary
The model clause allows you to build your own analytical functions
and/or your own analytical windows – amongst other things.
The model clause also allows you to parallelise calculations and
can be useful even if the calculations are supported without the
model clause
However, model clause aggregates will be slower than standard
analytic functions
Performance degrades rapidly when partitions spill to disk.
Tony Hasler http://tonyhasler.wordpress.com
Questions
Tony Hasler http://tonyhasler.wordpress.com
Download