Unit Sales - Berkeley Database Research

advertisement
Modeling and Querying
Multidimensional Data Sources in
Siebel Analytics
Kazi A. Zaman
Kazi.Zaman@siebel.com
Donovan A. Schneider
Donovan.Schneider@siebel.com
Structure of Talk
 Challenges of federating relational and
multidimensional data sources
 Overview of Multidimensional data sources
 Overview of Siebel Analytics Architecture
 Our approach to solving the problem
 Issues with multi vendor support
 Conclusions and Future Work
© 2005 Siebel Systems, Inc. Confidential.
2
Why federating multidimensional sources is important
 Enterprises have a multitude of data sources
 Not always consolidated in a single data warehouse
 Cubes (OLAP systems) are best suited for certain
applications: e.g. budgeting
 Many important business questions require
information from both relational and multidimensional
systems
 Budgets vs. actuals
 Real time Reporting: HR system data integrated with sales
pipeline data
© 2005 Siebel Systems, Inc. Confidential.
3
Multidimensional Data Sources
 Highly aggregated view of data, primarily used for
analysis
 Provides a dimensional view of data
 Prominent examples: Microsoft Analysis Services,
Hyperion, SAP/BW
 Cubes: Storage mechanism not necessarily MOLAP
 Query Language: Vendor specific interfaces, MDX
 Access Mechanisms: Vendor specific Interfaces (e.g.
BAPI), ODBO, XMLA
© 2005 Siebel Systems, Inc. Confidential.
4
Key Differences from Relational Systems
 Rich metadata exposed: Dimensions, hierarchies,
levels, measures
 Specialized language constructs for manipulating this
metadata: Ancestors(), Descendants()
 Query results are multidimensional datasets- not
rowsets
 Ability to specify complex multi pass calculations
 Special functionality for time series calculations
© 2005 Siebel Systems, Inc. Confidential.
5
Siebel Analytics Server
 Analytics Server is a federated system
 Supports rich data sources: Relational(DB2, Oracle, SQL Server,
Teradata), OLAP (Analysis Services, SAP/BW), XML
 Supports rich schemas (OLTP, DW)
 Executes queries specified against a logical business
model containing data warehousing constructs
 Analytics Server translate logical queries to queries
against one or more backend data sources
 Design goal to push as much processing to back end
data sources
 Carries out post processing on joined query results
 Does not have its own storage layer
© 2005 Siebel Systems, Inc. Confidential.
6
Query Processing Overview
Repository -- Metadata
Presentation Layer
Generated access plan
and Initial SQL
Business Model &
Mapping Layer
•Dimensions
•Hierarchies
•Measures
•Alternative Sources
•Partitioning
•Aggregation Rules
•Time Series
Navigation
Optimizer/Compiler
(Rewrite Rules)
Optimized SQL based on
target databases
and DB Features tables
also perform optimization
to improve efficiencies
Code Generator
Physical Layer
Generate physical SQL
For external sources and
Internal plan for operations
that must be executed in
the server, including
Parallelization, sorting, etc.
•Security
•Connections
•DB Features
•Schema
© 2005 Siebel Systems, Inc. Confidential.
7
Requirements for federating multidimensional sources
 Model multidimensional data sources in physical layer
of metadata
 Mark fragments of a federated query plan for
execution at a multidimensional source based on
source capabilities
 Generate MDX from the relational query plan fragment
(SQL to MDX translation)
 Ability to convert multidimensional result set into two
dimensional rowset
© 2005 Siebel Systems, Inc. Confidential.
8
Challenges
 SQL has a relational model, MDX multidimensional
 We convert the multidimensional model to relational
 Lose full power of multidimensional model
 SQL : open world : Country = “USA”
 MDX closed world : Geography.[USA]
 If no such member, query will fail.
© 2005 Siebel Systems, Inc. Confidential.
9
Metadata Modeling: Cubetables
Cube with 2 hierarchies and 2 measures
Time: Year -> Quarter -> Month
Geography: State -> City
Measures: profit, sales
Cube Table T
(Year, Quarter, Month, State, City, Profit, Sales)
Hierarchy, level , agg rule info is preserved
© 2005 Siebel Systems, Inc. Confidential.
10
Metadata screenshot
© 2005 Siebel Systems, Inc. Confidential.
11
Rowset Creation from Multidimensional Result Sets
 MDX result sets consist of dimensional
members on axes and measures in
delimited cells.
Sales
SELECT
{[Measures].[Sales]} on COLUMNS,
1997
Coke
100
1998
Coke
200
{Crossjoin({[Year].Members},
{[Products].[Soda].Members}) on ROWS
FROM [Sales]
 Generate only 2 dimensional queries
 Measures on COLUMNS, dimensions on
ROWS
© 2005 Siebel Systems, Inc. Confidential.
12
Transforming the Intermediate Rowset
 Intermediate rowset may need further transformation
 Number of columns in rowset may differ from number of requested
columns
 Ordering of columns in rowset may differ from requested order.
 Protocols for intermediate rowset transformation
 A simple example protocol maps intermediate column indexes to
columns in the final rowset
 (1, 2, 3) : select year, product, sum (sales) from T group by year,
product
 (3, 2, 1): select sum (sales), product, year from T group by year,
product
 Different protocols for different data sources/ MDX generation
algorithms
© 2005 Siebel Systems, Inc. Confidential.
13
MDX Code Generation
 Effectively SQL to MDX translation along with rowset
creation protocol data
 Makes use of cubetable specific metadata –
hierarchies and levels
 Different code generation strategies for different SQL
templates
 Support as wide a set of SQL templates as possible
 Generate efficient MDX – lack of mature optimizers in
multidimensional data sources
© 2005 Siebel Systems, Inc. Confidential.
14
MDX Generation Examples
SELECT
c1, c2…, aggr(m1), aggr(m2)
FROM
Table
WHERE
<conditions>
GROUP BY c1, c2….
HAVING
<conditions>
Goal to translate entire SQL template to efficient MDX
Metadata Information
 T (Store Country, Store State, Year, Quarter, Unit Sales)
 Aggregation Rule: SUM
© 2005 Siebel Systems, Inc. Confidential.
15
Multiple dimensions plus measure with matching
aggregate rule
SQL
Select “Store Country”, Year, SUM(Unit Sales)
From T
Group By “Store Country”, Year
MDX
Select
{[Unit Sales]} on columns,
{ nonemptycrossjoin([Store Country].members,
[Year].members)} on rows
From [Sales]
© 2005 Siebel Systems, Inc. Confidential.
16
Measure with non-matching aggregate rule
Select “Store Country”, Year, AVG(Unit Sales)
From T
Group By “Store Country”, Year
with
set [A] as '{[Store Country].members}'
set [B] as '{[Year].members}'
set [C] as 'nonemptycrossjoin({[A]},{[B]})'
member [measures].[MS1] as
'AVG(nonemptycrossjoin(Descendants(Store.currentmember,[S
tore State]), Descendants(Time.currentmember,[Quarter])
),[Unit Sales])'
select
{[MS1]} on columns,
{[C]} on rows
from [Sales]
© 2005 Siebel Systems, Inc. Confidential.
17
Matching aggregate rule, predicate refers to GROUP BY
columns
Select “Store Country”, Year, SUM(Unit Sales)
From T
Where “Store Country” In (‘USA’, ‘India’) AND Year = ‘1997’
Group By “Store Country”, Year
with
set [A] as '{filter([Store Country].members,
Store.currentmember.name = "USA" OR
Store.currentmember.name = "India")}'
set [B] as '{filter([Year].members,
time.currentmember.name = "1997") }'
set [C] as 'nonemptycrossjoin({[A]},{[B]})'
select
{[Unit Sales]} on columns,
{[C]} on rows
from [Sales]
© 2005 Siebel Systems, Inc. Confidential.
18
Multiple levels of a dimension plus measure with matching
aggregate rule, predicates refers to both levels
Select “Store Country”, “Store State”, SUM(Unit Sales)
From T
Where “Store Country” = ‘USA’ AND “Store State” In (‘CA’,’ OR’)’
Group By “Store Country” , “Store State”
with
member [measures].[CountryAnc] as
'ancestor(Store.Currentmember,[Store Country]).name'
set [A] as 'filter({[Store
Country].members},Store.currentmember.name = “USA”)‘
set[B]
as'Filter( Generate({[A]},Descendants([Store].currentmemb
er,[Store].[Store State])), [Store].currentmember.name=
"CA" OR [Store].currentmember.name= "OR" )'
© 2005 Siebel Systems, Inc. Confidential.
19
Continued…..
select
{[Measures].[CountryAnc],
columns,
[Measures].[Unit Sales]} on
{[B]} on rows
From
[Sales]
© 2005 Siebel Systems, Inc. Confidential.
20
Multiple levels of a dimension plus measure with matching
aggregate rule, predicate refers to columns not in project list
Select Store Country, Store State, SUM(UnitSales)
From T
Where Year = ‘1997’
Group By Store Country, Store State
© 2005 Siebel Systems, Inc. Confidential.
21
Multiple levels of a dimension plus measure with matching
aggregate rule, predicate refers to columns not in project list
Slicer used:
with
member [measures].[CountryAnc] as
'ancestor(Store.Currentmember,[Store
Country]).name'
set [A] as '{[Store State].members}'
select
{[Measures].[CountryAnc],[Unit Sales]} on columns,
{ [A]} on rows
From
[Sales]
Where ([1997])
© 2005 Siebel Systems, Inc. Confidential.
22
Multiple levels of a dimension plus measure with matching
aggregate rule, predicate refers to columns not in project list
with
member [measures].[CountryAnc] as
'ancestor(Store.Currentmember,[Store Country]).name'
member [Measures].[YearAnc] as
'ancestor([Time].Currentmember,[Time].[Year]).name'
set [A] as '{[Store State].members}'
set [B] as '{[Time].[Year].members} '
member [measures].[MS1] as
'SUM(filter(nonemptycrossjoin(Descendants(Store.currentmember,[S
tore State]), {[B]} ), [Time].currentmember.name="1997",[Unit
Sales])'
select
{[Measures].[CountryAnc],
[Measures].[MS1]} on columns,
{[A]} on rows
From
[Sales]
© 2005 Siebel Systems, Inc. Confidential.
23
Dimension plus measure with matching aggregate rule
with HAVING clause
Select “Store Country”, SUM(Unit Sales)
From T
Group By “Store Country”
Having SUM(Unit Sales) > 10000
select
{[Unit Sales]} on columns,
Filter({ [Store Country].members}, 10000 < [Unit Sales])
on rows
from
[Sales]
© 2005 Siebel Systems, Inc. Confidential.
24
Multiple Vendor Support
 MDX and XMLA support varies widely from vendor to
vendor
 Caption names vs Unique Names
 Classes of hierarchies supported
 Treatment of Properties
 Using ancestor within a calculated member
 Metadata returned
 <structure>
 Cardinality of levels
© 2005 Siebel Systems, Inc. Confidential.
25
Captions vs Member Names
 Caption : USA
Member Name: PG2003012
 MDX queries use Member Name not caption
 Incoming SQL uses Caption not member name
 Member Name is 7 bit ASCII
 Need to convert between captions & member names
 Solution: cache mappings between member names
and captions on demand
 Affects class of predicates pushed (no more >, <)
© 2005 Siebel Systems, Inc. Confidential.
26
Conclusions and Future Work
 Ability to handle multidimensional and relational data
in a single framework
 Generate efficient MDX queries for best performance
 Varying vendor support requires differing MDX code
generation and intermediate rowset processing
strategies
 Support for larger number of vendors, wider class of
SQL, parent-child hierarchies
© 2005 Siebel Systems, Inc. Confidential.
27
© 2005 Siebel Systems, Inc. Confidential.
28
Download