Query Processing in the AquaLogic Data Services Platform Vinayak Borkar, Michael Carey, Dmitry Lychagin, Till Westmann, Daniel Engovatov, Nicola Onose BEA Systems www.bea.com Data Is Everywhere Today Relational databases made things too easy Departmental vs. inter-galactic centralized databases Databases come in many flavors Relational: Oracle, DB2(s), SQL Server, MySQL, … Hangers-on: IMS, IDMS, VSAM, … Not all data is SQL-accessible Packaged applications: SAP, PeopleSoft, Siebel, Oracle, SalesForce, … Custom “homegrown” applications Files of various shapes and sizes (XML, non-XML) And the list goes on… (C) Copyright 2006, BEA Systems, Inc | 2 Painful to Develop Applications No one “single view of X” for any X What data do I have about X? How do I stitch together the info I have? What else is X related to? No uniformity in source model or language Data about X is stored in many different formats Accessing or updating X involves many different APIs Manual coding of “distributed query plans” No reuse of artifacts Different access criteria or returned data different access plans No model to help organize or find the artifacts anyway (C) Copyright 2006, BEA Systems, Inc | 3 Agenda Why data services? Overview and Example The Query Processor Work in progress at BEA Summary and Q&A (C) Copyright 2006, BEA Systems, Inc | 4 Overview JAVA/J2EE WEB SERVICE JAVA API WSDL REPORTING JDBC/SQL Client API Developer Tooling Data Processing Engine Connectivity Relational Web Services Tables, views stored procedures, SQL Business partners, Adapter, .Net Files XML, Flat Files Java Functions LDAP J2EE JCA JMS (C) Copyright 2006, BEA Systems, Inc | 5 Excel Custom Access Example: Customer Profile Data Service Customer Info Order Info Credit Card Info Rating Info CUSTOMER, ORDER CREDIT_CARD (C) Copyright 2006, BEA Systems, Inc | 6 getRating(…) Data Service – Design View (C) Copyright 2006, BEA Systems, Inc | 7 Data Service – “Get All” Read Method (::pragma function ... kind="read" ...::) declare function tns:getProfile() as element(ns0:PROFILE)* { for $CUSTOMER in db1:CUSTOMER() return <tns:PROFILE> <CID>{ fn:data($CUSTOMER/CID) }</CID> <LAST_NAME>{ fn:data($CUSTOMER/LAST_NAME) }</LAST_NAME> <ORDERS>{ db1:getORDER($CUSTOMER) }</ORDERS> <CREDIT_CARDS>{ db2:CREDIT_CARD()[CID eq $CUSTOMER/CID] }</CREDIT_CARDS> <RATING>{ fn:data(ws1:getRating( <ns5:getRating> <ns5:lName>{ data($CUSTOMER/LAST_NAME) }</ns5:lName> <ns5:ssn>{ data($CUSTOMER/SSN) }</ns5:ssn> </ns5:getRating> ) }</RATING> </tns:PROFILE> }; (C) Copyright 2006, BEA Systems, Inc | 8 Data Service – Read & Navigate Methods (::pragma function ... kind="read" ...::) declare function tns:getProfileByID($id as xs:string) as element(ns0:PROFILE)* { tns:getProfile()[CID eq $id] }; ... (::pragma function ... kind="navigate" ...::) declare function tns:getCOMPLAINTs($arg as element(ns0:PROFILE)) as element(ns8:COMPLAINT)* { db3:COMPLAINT()[CID eq $arg/CID] }; ... (C) Copyright 2006, BEA Systems, Inc | 9 Agenda Why data services? Overview and Example The Query Processor Work in progress at BEA Summary and Q&A (C) Copyright 2006, BEA Systems, Inc | 10 Query Processor Overview (C) Copyright 2006, BEA Systems, Inc | 11 Efficient processing Avoid unnecessary work Function inlining (view unfolding) Push work to the sources Queryable sources can do some of our work (C) Copyright 2006, BEA Systems, Inc | 12 Optimization: Function Inlining Example: This fragment let $x := <CUSTOMER> <LAST_NAME>{$name}</LAST_NAME> <ORDERS>…</ORDERS> </CUSTOMER> return fn:data($x/LAST_NAME) can be replaced by $name But we need to: maintain structural type information for compilation extend “preserve” mode for runtime (C) Copyright 2006, BEA Systems, Inc | 13 Pushdown: Overview SQL Translator tries to “swallow” as much as possible. Make translation easy Remove unnecessary functions (by inlining) Translation to joins and grouping Split sorting and grouping Make sources do as much as possible DBMS specific code Maximize Pushdown Inverse functions (C) Copyright 2006, BEA Systems, Inc | 14 Pushdown: Preparation Translation to joins and grouping Split sorting and grouping for $CUSTOMER in db1:CUSTOMER() return <tns:PROFILE> <CID>{ fn:data($CUSTOMER/CID) }</CID> … <ORDERS>{ db1:getORDER($CUSTOMER) }</ORDERS> <CREDIT_CARDS>{ db2:CREDIT_CARD()[CID eq $CUSTOMER/CID] }</CREDIT_CARDS> … </tns:PROFILE> (C) Copyright 2006, BEA Systems, Inc | 15 Pushdown: Inverse Functions I Example fragment of tns:getProfile: <tns:PROFILE> <CID>{fn:data($CUSTOMER/CID)}</CID> <LAST_NAME>{ fn:data($CUSTOMER/LAST_NAME) }</LAST_NAME> <SINCE>{int2date($CUSTOMER/SINCE)}</SINCE> ... <tns:PROFILE> used in this query for $c in tns:getProfile() where $c/SINCE gt $start return $c yields (after inlining) for $c1 in ns3:CUSTOMER() where int2date($c1/SINCE) gt $start return <tns:PROFILE> ... </tns:PROFILE> (C) Copyright 2006, BEA Systems, Inc | 16 Pushdown: Inverse Functions II Register Inverse function date2int for int2date Transformation rule (gt, int2date) gt-intfromdate with declare function gt-intfromdate($x1 as xs:dateTime, $x2 as xs:dateTime) as xs:boolean?{ date2int($x1) gt date2int($x2) }; Now we can rewrite the query into for $c1 in ns3:CUSTOMER() where $c1/SINCE gt ns1:date2int($start) return <tns:PROFILE> ... </tns:PROFILE> can be pushed as SELECT * FROM "CUSTOMER" t1 WHERE (t1."SINCE" > ?) (C) Copyright 2006, BEA Systems, Inc | 17 Optimization: PP-k Join Parameter Passing in chunks of k Prerequisites Distributed join Right side is a relational source Idea: relational source can partition its content Steps Read k items from the left into L Select all items from the right that match any of the items in L into R Join L and R in the middleware Repeat until the left is exhausted Benefit: excellent trade-off between Memory footprint in the middleware Roundtrip overhead imposed by the data source k = 20 (C) Copyright 2006, BEA Systems, Inc | 18 Example: “Get All” Read Method Revisited (::pragma function ... kind="read" ...::) declare function tns:getProfile() as element(ns0:PROFILE)* { for $CUSTOMER in db1:CUSTOMER() return <tns:PROFILE> <CID>{ fn:data($CUSTOMER/CID) }</CID> <LAST_NAME>{ fn:data($CUSTOMER/LAST_NAME) }</LAST_NAME> <ORDERS>{ db1:getORDER($CUSTOMER) }</ORDERS> <CREDIT_CARDS>{ db2:CREDIT_CARD()[CID eq $CUSTOMER/CID] }</CREDIT_CARDS> <RATING>{ fn:data(ws1:getRating( <ns5:getRating> <ns5:lName>{ data($CUSTOMER/LAST_NAME) }</ns5:lName> <ns5:ssn>{ data($CUSTOMER/SSN) }</ns5:ssn> </ns5:getRating> ) }</RATING> </tns:PROFILE> }; (C) Copyright 2006, BEA Systems, Inc | 19 Evaluation Plan, Example 1 (getProfile) (C) Copyright 2006, BEA Systems, Inc | 20 Evaluation Plan, Example 2 (query getProfile) (C) Copyright 2006, BEA Systems, Inc | 21 Agenda Why data services? Overview and Example The Query Processor Work in progress at BEA Summary and Q&A (C) Copyright 2006, BEA Systems, Inc | 22 Some ALDSP Work in Progress Native JDBC/SQL92 support (available on Sep 15) Bilingual engine for efficient reporting/BI tool access Support for compensating transactions Extend update facility to support safe non-XA updates (sagas) XQuery Update support (as well as XQueryP) Goal: no Java coding for many Web service use cases (C) Copyright 2006, BEA Systems, Inc | 23 Agenda Why data services? Overview and Example The Query Processor Work in progress at BEA Summary and Q&A (C) Copyright 2006, BEA Systems, Inc | 24 Summary Covered here Why Data Services? How are Data Services used? Some techniques for efficient evaluation In the paper XQuery extensions Runtime architecture Caches Updates Security (C) Copyright 2006, BEA Systems, Inc | 25