Modeling and Querying Multidimensional Data Sources in Siebel Analytics Kazi A. Zaman Kazi.Zaman@siebel.com Donovan A. Schneider Donovan.Schneider@siebel.com Structure of Talk Challenges of federating relational and multidimensional data sources Overview of Multidimensional data sources Overview of Siebel Analytics Architecture Our approach to solving the problem Issues with multi vendor support Conclusions and Future Work © 2005 Siebel Systems, Inc. Confidential. 2 Why federating multidimensional sources is important Enterprises have a multitude of data sources Not always consolidated in a single data warehouse Cubes (OLAP systems) are best suited for certain applications: e.g. budgeting Many important business questions require information from both relational and multidimensional systems Budgets vs. actuals Real time Reporting: HR system data integrated with sales pipeline data © 2005 Siebel Systems, Inc. Confidential. 3 Multidimensional Data Sources Highly aggregated view of data, primarily used for analysis Provides a dimensional view of data Prominent examples: Microsoft Analysis Services, Hyperion, SAP/BW Cubes: Storage mechanism not necessarily MOLAP Query Language: Vendor specific interfaces, MDX Access Mechanisms: Vendor specific Interfaces (e.g. BAPI), ODBO, XMLA © 2005 Siebel Systems, Inc. Confidential. 4 Key Differences from Relational Systems Rich metadata exposed: Dimensions, hierarchies, levels, measures Specialized language constructs for manipulating this metadata: Ancestors(), Descendants() Query results are multidimensional datasets- not rowsets Ability to specify complex multi pass calculations Special functionality for time series calculations © 2005 Siebel Systems, Inc. Confidential. 5 Siebel Analytics Server Analytics Server is a federated system Supports rich data sources: Relational(DB2, Oracle, SQL Server, Teradata), OLAP (Analysis Services, SAP/BW), XML Supports rich schemas (OLTP, DW) Executes queries specified against a logical business model containing data warehousing constructs Analytics Server translate logical queries to queries against one or more backend data sources Design goal to push as much processing to back end data sources Carries out post processing on joined query results Does not have its own storage layer © 2005 Siebel Systems, Inc. Confidential. 6 Query Processing Overview Repository -- Metadata Presentation Layer Generated access plan and Initial SQL Business Model & Mapping Layer •Dimensions •Hierarchies •Measures •Alternative Sources •Partitioning •Aggregation Rules •Time Series Navigation Optimizer/Compiler (Rewrite Rules) Optimized SQL based on target databases and DB Features tables also perform optimization to improve efficiencies Code Generator Physical Layer Generate physical SQL For external sources and Internal plan for operations that must be executed in the server, including Parallelization, sorting, etc. •Security •Connections •DB Features •Schema © 2005 Siebel Systems, Inc. Confidential. 7 Requirements for federating multidimensional sources Model multidimensional data sources in physical layer of metadata Mark fragments of a federated query plan for execution at a multidimensional source based on source capabilities Generate MDX from the relational query plan fragment (SQL to MDX translation) Ability to convert multidimensional result set into two dimensional rowset © 2005 Siebel Systems, Inc. Confidential. 8 Challenges SQL has a relational model, MDX multidimensional We convert the multidimensional model to relational Lose full power of multidimensional model SQL : open world : Country = “USA” MDX closed world : Geography.[USA] If no such member, query will fail. © 2005 Siebel Systems, Inc. Confidential. 9 Metadata Modeling: Cubetables Cube with 2 hierarchies and 2 measures Time: Year -> Quarter -> Month Geography: State -> City Measures: profit, sales Cube Table T (Year, Quarter, Month, State, City, Profit, Sales) Hierarchy, level , agg rule info is preserved © 2005 Siebel Systems, Inc. Confidential. 10 Metadata screenshot © 2005 Siebel Systems, Inc. Confidential. 11 Rowset Creation from Multidimensional Result Sets MDX result sets consist of dimensional members on axes and measures in delimited cells. Sales SELECT {[Measures].[Sales]} on COLUMNS, 1997 Coke 100 1998 Coke 200 {Crossjoin({[Year].Members}, {[Products].[Soda].Members}) on ROWS FROM [Sales] Generate only 2 dimensional queries Measures on COLUMNS, dimensions on ROWS © 2005 Siebel Systems, Inc. Confidential. 12 Transforming the Intermediate Rowset Intermediate rowset may need further transformation Number of columns in rowset may differ from number of requested columns Ordering of columns in rowset may differ from requested order. Protocols for intermediate rowset transformation A simple example protocol maps intermediate column indexes to columns in the final rowset (1, 2, 3) : select year, product, sum (sales) from T group by year, product (3, 2, 1): select sum (sales), product, year from T group by year, product Different protocols for different data sources/ MDX generation algorithms © 2005 Siebel Systems, Inc. Confidential. 13 MDX Code Generation Effectively SQL to MDX translation along with rowset creation protocol data Makes use of cubetable specific metadata – hierarchies and levels Different code generation strategies for different SQL templates Support as wide a set of SQL templates as possible Generate efficient MDX – lack of mature optimizers in multidimensional data sources © 2005 Siebel Systems, Inc. Confidential. 14 MDX Generation Examples SELECT c1, c2…, aggr(m1), aggr(m2) FROM Table WHERE <conditions> GROUP BY c1, c2…. HAVING <conditions> Goal to translate entire SQL template to efficient MDX Metadata Information T (Store Country, Store State, Year, Quarter, Unit Sales) Aggregation Rule: SUM © 2005 Siebel Systems, Inc. Confidential. 15 Multiple dimensions plus measure with matching aggregate rule SQL Select “Store Country”, Year, SUM(Unit Sales) From T Group By “Store Country”, Year MDX Select {[Unit Sales]} on columns, { nonemptycrossjoin([Store Country].members, [Year].members)} on rows From [Sales] © 2005 Siebel Systems, Inc. Confidential. 16 Measure with non-matching aggregate rule Select “Store Country”, Year, AVG(Unit Sales) From T Group By “Store Country”, Year with set [A] as '{[Store Country].members}' set [B] as '{[Year].members}' set [C] as 'nonemptycrossjoin({[A]},{[B]})' member [measures].[MS1] as 'AVG(nonemptycrossjoin(Descendants(Store.currentmember,[S tore State]), Descendants(Time.currentmember,[Quarter]) ),[Unit Sales])' select {[MS1]} on columns, {[C]} on rows from [Sales] © 2005 Siebel Systems, Inc. Confidential. 17 Matching aggregate rule, predicate refers to GROUP BY columns Select “Store Country”, Year, SUM(Unit Sales) From T Where “Store Country” In (‘USA’, ‘India’) AND Year = ‘1997’ Group By “Store Country”, Year with set [A] as '{filter([Store Country].members, Store.currentmember.name = "USA" OR Store.currentmember.name = "India")}' set [B] as '{filter([Year].members, time.currentmember.name = "1997") }' set [C] as 'nonemptycrossjoin({[A]},{[B]})' select {[Unit Sales]} on columns, {[C]} on rows from [Sales] © 2005 Siebel Systems, Inc. Confidential. 18 Multiple levels of a dimension plus measure with matching aggregate rule, predicates refers to both levels Select “Store Country”, “Store State”, SUM(Unit Sales) From T Where “Store Country” = ‘USA’ AND “Store State” In (‘CA’,’ OR’)’ Group By “Store Country” , “Store State” with member [measures].[CountryAnc] as 'ancestor(Store.Currentmember,[Store Country]).name' set [A] as 'filter({[Store Country].members},Store.currentmember.name = “USA”)‘ set[B] as'Filter( Generate({[A]},Descendants([Store].currentmemb er,[Store].[Store State])), [Store].currentmember.name= "CA" OR [Store].currentmember.name= "OR" )' © 2005 Siebel Systems, Inc. Confidential. 19 Continued….. select {[Measures].[CountryAnc], columns, [Measures].[Unit Sales]} on {[B]} on rows From [Sales] © 2005 Siebel Systems, Inc. Confidential. 20 Multiple levels of a dimension plus measure with matching aggregate rule, predicate refers to columns not in project list Select Store Country, Store State, SUM(UnitSales) From T Where Year = ‘1997’ Group By Store Country, Store State © 2005 Siebel Systems, Inc. Confidential. 21 Multiple levels of a dimension plus measure with matching aggregate rule, predicate refers to columns not in project list Slicer used: with member [measures].[CountryAnc] as 'ancestor(Store.Currentmember,[Store Country]).name' set [A] as '{[Store State].members}' select {[Measures].[CountryAnc],[Unit Sales]} on columns, { [A]} on rows From [Sales] Where ([1997]) © 2005 Siebel Systems, Inc. Confidential. 22 Multiple levels of a dimension plus measure with matching aggregate rule, predicate refers to columns not in project list with member [measures].[CountryAnc] as 'ancestor(Store.Currentmember,[Store Country]).name' member [Measures].[YearAnc] as 'ancestor([Time].Currentmember,[Time].[Year]).name' set [A] as '{[Store State].members}' set [B] as '{[Time].[Year].members} ' member [measures].[MS1] as 'SUM(filter(nonemptycrossjoin(Descendants(Store.currentmember,[S tore State]), {[B]} ), [Time].currentmember.name="1997",[Unit Sales])' select {[Measures].[CountryAnc], [Measures].[MS1]} on columns, {[A]} on rows From [Sales] © 2005 Siebel Systems, Inc. Confidential. 23 Dimension plus measure with matching aggregate rule with HAVING clause Select “Store Country”, SUM(Unit Sales) From T Group By “Store Country” Having SUM(Unit Sales) > 10000 select {[Unit Sales]} on columns, Filter({ [Store Country].members}, 10000 < [Unit Sales]) on rows from [Sales] © 2005 Siebel Systems, Inc. Confidential. 24 Multiple Vendor Support MDX and XMLA support varies widely from vendor to vendor Caption names vs Unique Names Classes of hierarchies supported Treatment of Properties Using ancestor within a calculated member Metadata returned <structure> Cardinality of levels © 2005 Siebel Systems, Inc. Confidential. 25 Captions vs Member Names Caption : USA Member Name: PG2003012 MDX queries use Member Name not caption Incoming SQL uses Caption not member name Member Name is 7 bit ASCII Need to convert between captions & member names Solution: cache mappings between member names and captions on demand Affects class of predicates pushed (no more >, <) © 2005 Siebel Systems, Inc. Confidential. 26 Conclusions and Future Work Ability to handle multidimensional and relational data in a single framework Generate efficient MDX queries for best performance Varying vendor support requires differing MDX code generation and intermediate rowset processing strategies Support for larger number of vendors, wider class of SQL, parent-child hierarchies © 2005 Siebel Systems, Inc. Confidential. 27 © 2005 Siebel Systems, Inc. Confidential. 28