Translate Graphical XML Query Language to SQLX Wei Ni Tok Wang Ling Department of Computer Science University of Singapore {niwei, lingtw}@comp.nus.edu.sg 7/1/2016 DASFAA 2005, Beijing, China 1 Roadmap ORA-SS and overview of our project GLASS query Translation from GLASS to SQLX • What is SQLX • Preprocessing • Translation Conclusion and future works 7/1/2016 DASFAA 2005, Beijing, China 2 1. ORA-SS ORA-SS [6] (Object-Relational-Attribute model for SemiStructured data ) is a rich semantic model for semistructured data • Represents both the tree-like data structure and the relationship types contained in the data set; • Distinguishes relationship attributes from object attributes; • Uses Object ID indicating different object instances rather than element instances. It is different from ER diagram in that ORA-SS represent the hierarchical structure of the semi-structured data. 6. G. Dobbie, X. Y. Wu, T. W. Ling, M. L. Lee. ORA-SS: An Object-Relationship-Attribute Model for Semistructured Data. TR21/00, Technical Report, Department of Computer Science, National University of Singapore, December 2000. 7/1/2016 DASFAA 2005, Beijing, China 3 An XML example binary relationship type jm between project and member ternary relationship type jmp among project, member and publication project jm, 2, 1:n, 1:n J# member Jname jm jmp, 3, 0:n, 1:n M# Mname age * qualification The DTD of the example dataset job_title degree year publication job_title is an attribute of the relationship type jm university P# title <!ELEMENT project (Jname?, member+)> <!ATTLIST project project_id ID #REQUIRED> <!ELEMENT Jname PCDATA> <!ELEMENT member (Mname, age, job_title+, publication*, qualification*)> <!ATTLIST member member_id CDATA #IMPLIED > <!ELEMENT Mname PCDATA> <!ELEMENT age PCDATA> <!ELEMENT job_title PCDATA> <!ELEMENT qualification PCDATA> <!ATTLIST qualification degree CDATA #IMPLIED university CDATA #IMPLIED year CDATA #IMPLIED> <!ELEMENT publication (review*)> <!ATTLIST publication pub_id CDATA #IMPLIED title CDATA #IMPLIED> <!ELEMENT review PCDATA> * review An example ORA-SS schema Object Relations project (J#, Jname) member (M#, Mname, age qualification(degree, university, year)*) publication (P#, title, (review)*) Relationship Relations jm (J#, M#, job_title) jmp (J#, M#, P#) The ORDB storage schema of the example XML data set 7/1/2016 DASFAA 2005, Beijing, China 4 Overview of our project In our project, the XML data is stored in ORDB (e.g. Oracle 9i), where • Relationship types and object classes are separately stored; • Relationship type attributes and object attributes are differentiated; • Multi-valued attributes and composite attributes are stored as nested tables. We choose SQLX in our translation rather than XQuery because SQLX supports SQL-in-XML-out in ORDB (e.g. Oracle 9i). Meanwhile, we also have translation to XQuery. 7/1/2016 DASFAA 2005, Beijing, China 5 2. GLASS query GLASS [12] (Graphical query LAnguage for SemiStructured data) is a high expressive graphical query language for semi-structured data, which is designed on the base of ORA-SS. • Separates the complex query logic from the query graph (the query graph is the graphical part of a GLASS query). • Considers relationship types in querying XML data. 12. W. Ni, T. W. Ling. GLASS: A Graphical Query Language for Semi-Structured Data. DASFAA 2003. 7/1/2016 DASFAA 2005, Beijing, China 6 2. GLASS query (Cont.) A typical GLASS query consists of four parts (1) Left Hand Side Graph (LHS graph) – denotes the basic conditions of a query (which can be different from the structure of source schema, where we shall do view validation first [3]); (2) Right Hand Side Graph (RHS graph) – defines the output structure of the query result; (3) Link Set – specifies the bindings between the RHS graph and LHS graph. When two graph entities are linked, they are visually connected by a line, which means the data type and value of the entity in the RHS graph are from the corresponding linked entity in the LHS graph. (4) Condition Logic Window (CLW) – It is an optional part where users write conditions and constructions that are difficult to draw, which includes Logic expressions, Mathematic expressions, Comparison expressions and IF-THEN statements. 7/1/2016 DASFAA 2005, Beijing, China 7 Original Schema 2. GLASS query – example project jm, 2, 1:n, 1:n J# member Jname jm Example 1. Mname Find the member whose age is less than 35, and he either has taken part in less than 5 projects or written more than 6 publications in some of the projects he attended; display the member id and name. member :A: To group publication under each pair of member and project in the ternary relationship type jmp age age < 35 member _group :B: jm, 2 project CNT<5 _group :C: M# jmp, 3 Mname publication * qualification job_title degree year publication university P# To group project under member in the binary relationship type jm The hierarchical position of member and project is swapped in this query. jmp, 3, 0:n, 1:n M# title * review The object node with name “member” is linked with the object node with the same name in the LHS graph, which means the member in the RHS graph (the result view) is from the member in the LHS graph. CNT>6 CLW A AND (B OR C); The GLASS query of Example 1. 7/1/2016 DASFAA 2005, Beijing, China 8 Original Schema 2. GLASS query – example project jm, 2, 1:n, 1:n J# member Jname jm Example 1. jmp, 3, 0:n, 1:n M# Mname Find the member whose age is less than 35, and he either has taken part in less than 5 projects or written more than 6 publications in some of the projects he attended; display the member id and name. age :A: There are 3 condition identifiers defined in the LHS graph, “A”, “B” and “C”, each condition identifier is quoted by a pair of “:”s. age < 35 qualification job_title degree year publication university P# member * title * review member _group :B: jm, 2 project CNT<5 _group :C: M# jmp, 3 Mname publication The default logic “AND” is replaced by the logic expressions in the CLW. CNT>6 CLW A AND (B OR C); The GLASS query of Example 1. 7/1/2016 DASFAA 2005, Beijing, China 9 3. Translation from GLASS to SQLX What is SQLX [7] • SQLX (aka. SQL/XML) is an XML-related specification expanded on SQL. • SQLX combines the features in both XML document processing and the traditional SQL • Three fundamental SQLX functions: • XMLELEMENT • XMLATTRIBUTES • XMLAGG Why SQLX • An extended standard on SQL, which is supported by Oracle, IBM, Microsoft, etc. • It can be processed by ORDB systems. 7. Information technology -- Database languages -- SQL -- Part 14: XML-Related Specifications. ISO/IEC 9075-14:2003 7/1/2016 DASFAA 2005, Beijing, China 10 3. Translation from GLASS to SQLX – Preprocessing Expansion of the simple projection; Expansion of the abbreviated RHS graph; Construction of the condition tree from LHS graph and CLW. • In comparison with the condition tree in TQL [13], our condition tree • Contains the information of relationship types; • Can represent aggregation and restructuring (such as swap); • Expresses logic in CLW. 13. Y. Papakonstantinou, M. Petropoulos and V.Vassalos. QURSED: Querying and Reporting Semistructured Data. ACM SIGMOD 2002, Jun 4-6, Madison, Wisconsin, USA. 7/1/2016 DASFAA 2005, Beijing, China 11 3. Translation from GLASS to SQLX (Preprocessing) Original Schema – The Generation of condition tree project jm, 2, 1:n, 1:n J# member Jname Step 1. Initializing the condition tree. jm Link to the member in the RHS graph jmp, 3, 0:n, 1:n M# Mname age * qualification job_title degree year publication university P# member :A: member project age < 35 age < 35 CNT<5 _group :C: M# jmp, 3 _group :B: jm, 2 project CNT<5 Mname publication A AND (B OR C); The original GLASS query 7/1/2016 The condition tree is initially a copy of the LHS graph from the original GLASS query. _group :C: jmp, 3 publication CNT>6 CLW * review member :A: _group :B: jm, 2 title CNT>6 CLW A AND (B OR C); The CLW remains unchanged in this step. The condition tree after Step 1. DASFAA 2005, Beijing, China 12 3. Translation from GLASS to SQLX (Preprocessing) – The Generation of condition tree Step 2. Decomposing the LHS graph (making duplicate nodes as necessary). This is the box of the group entity that consists of “member” and “project” The red dot line means that the 3 member nodes should be the same, i.e. the same variable in SQLX or XQuery expressions. Link to the member in the RHS graph Link to the member in the RHS graph member :A: _group :B: jm, 2 project age < 35 member :A: CNT<5 _group :C: jmp, 3 publication age <35 member project member :B: _group jm, 2 project CNT<5 :C: _group jmp, 3 the box publication CNT>6 CNT>6 CLW A AND (B OR C); The condition tree after Step 1. 7/1/2016 CLW A AND (B OR C); The condition tree after Step 2. DASFAA 2005, Beijing, China 13 3. Translation from GLASS to SQLX (Preprocessing) – The Generation of condition tree Step 3. Adding the logic expressions (in CLW) into the condition tree. If there are any existential and universal quantifiers in CLW, they will be added also in this step. Clip A and the logic node over B and C with a logic node “AND” Link to the member in the RHS graph Link to the member in the RHS graph [AND] Clip B and C first with a logic node “OR” [OR] member member :B: :A: _group jm, 2 project CNT<5 age <35 project member member :B: :A: :C: _group jm, 2 _group jmp, 3 publication CNT>6 project CNT<5 age <35 project member :C: _group jmp, 3 publication CNT>6 The condition tree after Step 3. CLW A AND (B OR C); The condition tree after Step 2. 7/1/2016 member The “AND” node here is necessary to differentiate the following two different query logic: - A AND B OR C - A AND (B OR C) DASFAA 2005, Beijing, China 14 3. Translation from GLASS to SQLX (Preprocessing) – The Generation of condition tree Step 4. Eliminating universal quantifiers in the condition tree. (not applied in this example) Link to the member in the RHS graph [AND] Comments: [OR] member member project member :B: :A: :C: _group jm, 2 project CNT<5 age <35 _group jmp, 3 publication CNT>6 The condition tree is a combination of LHS graph and CLW, where we generate the WHERE clauses in the SQLX expressions directly by traversing the condition tree instead of the original LHS graph and/or CLW. The final condition tree. 7/1/2016 DASFAA 2005, Beijing, China 15 3. Translation from GLASS to SQLX – Translate Condition Tree and RHS graph into SQLX Traverse the expanded RHS graph in depth-first order and generate the select statements of the SQLX. Generate where clauses: • Render join for parent-child/ancestordescendant relations (such relations on a XPath are performed by joining a series of tables in ORDB); • Specify conditions for linked nodes by traversing the condition tree. 7/1/2016 DASFAA 2005, Beijing, China 16 3. Translation from GLASS to SQLX – Translate Condition Tree and RHS graph into SQLX [AND] [OR] member member :B: :A: _group jm, 2 project CNT<5 age <35 member Traverse the (expanded) RHS graph and obtain the query block below: project member :C: _group jmp, 3 M# Mname publication CNT>6 Preprocessed GLASS query for translation For Each Node N in DF-Traversing the expanded RHS graph The SQLX query expressions SELECT XMLELEMENT(NAME “member”, XMLATTRIBUTES (M1.M# AS “member_id”) XMLELEMENT (NAME “Mname”, M1.Mname) ) FROM member M1 WHERE … CASE (N is the root of RHS) THEN generate a block of SELECT XMLELEMENT; SELECT XMLELEMENT (NAME “N” …) FROM… WHERE… CASE (N is an attribute in XML) THEN generate a block of SELECT XMLATTRIBUTES; SELECT XMLATTRIBUTES (… AS “N” …) FROM… WHERE… CASE (other kind of N) THEN generate a block of SELECT XMLAGG(XMLELEMENT); //i.e. N is an element but not the root SELECT XMLAGG (XMLELEMENT (NAME “N” …) FROM… WHERE…) 7/1/2016 DASFAA 2005, Beijing, China 17 3. Translation from GLASS to SQLX – Translate Condition Tree and RHS graph into SQLX [AND] [OR] member member :B: :A: _group jm, 2 project CNT<5 age <35 member Traverse the (expanded) RHS graph and obtain the query block below: project member :C: _group jmp, 3 M# Mname publication CNT>6 The SQLX query expressions M1.age <35 AND ( M1.M# IN (SELECT M# FROM member WHERE (SELECT COUNT(J#) FROM jm WHERE M# = jm.M#)<5) OR M1.M# IN (SELECT M# FROM member WHERE (SELECT J# FROM project WHERE(SELECT COUNT(P#) FROM jmp WHERE jmp.J# = project.J# AND jmp.M# = member.M#)>6)) ) 7/1/2016 DASFAA 2005, Beijing, China SELECT XMLELEMENT(NAME “member”, XMLATTRIBUTES (M1.M# AS “member_id”) XMLELEMENT (NAME “Mname”, M1.Mname) ) FROM member M1 WHERE … 18 4. Conclusion & future works We present the translation from GLASS to SQLX. • GLASS supports SWAPPING and GROUPING on the base of the semantic information in ORA-SS. • By translating into SQLX, GLASS can be processed on ORDB systems such as Oracle 9i. • The translation method is simple. Compared with other XML-toSQL translation works, • The ORDB data storage preserves the semantic information in ORASS: differentiates relationship type attributes from object attributes. • GLASS query considers the relationship information in ORA-SS which are important to define the query semantic clearly. • Our translation is simpler in comparison with the literature review in [8] and supports SWAPPING, GROUPING (also Quantifiers) in GLASS. 8. R. Krishnamurthy, R. Kaushik, and J. F. Naughton XML-to-SQL Query Translation Literature: The State of the Art and Open Problems. University of Wisconsin-Madison 7/1/2016 DASFAA 2005, Beijing, China 19 4. Conclusion & future works Future works • Query optimization on the base of ORA-SS • Decrease the cost of join • Cost model and optimized query plan • Improving our case tool (so far partially developed) 7/1/2016 DASFAA 2005, Beijing, China 20 References: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. S. Ceri, S.Comai, E. Damiani, P.Fraternali, S. Paraboschi, and L.Tanca. XML-GL: a graphical language of querying and restructuring XML documents. In Proc. WWW8, Toronto, Canada, May 1999. S. Ceri, S. Comai, E. Damiani, P. Fraternali, and L. Tanca. Complex Queries in XML-GL. SAC(2) 2000:888-893. Y. B. Chen, T. W. Ling, M. L. Lee. Designing Valid XML Views. 21st International Conference on Conceptual Modeling (ER'2002), pp 463-477, October 7-11, 2002, Tampere, Finland. S. Cohen, Y. Kanza, Y. Kogan, W. Nutt, Y. Sagiv and A. Serebrenik. Equix – Easy Querying in XML Databases. In proceedings of Webdb’98 – The Web and Database Workshop, 1998. S. Comai, E. Damiani, P. Fraternali. Computing Graphical Queries over XML Data. ACM Transactions on Information Systems, Vol. 19, No. 4, October 2001, Pages 371-430. G. Dobbie, X. Y. Wu, T. W. Ling, M. L. Lee. ORA-SS: An Object-Relationship-Attribute Model for Semistructured Data. TR21/00, Technical Report, Department of Computer Science, National University of Singapore, December 2000. Information technology -- Database languages -- SQL -- Part 14: XML-Related Specifications. ISO/IEC 9075-14:2003 R. Krishnamurthy, R. Kaushik, and J. F. Naughton XML-to-SQL Query Translation Literature: The State of the Art and Open Problems. University of Wisconsin-Madison B. Ludaescher, Y. Papakonstantinou, and P. Velikhov. Navigation-driven evaluation of virtual mediated views. In Proceedings of the sixth International Conference on Extending Database Technology (EDBT)(Konstanz, Germany, March), Lecture Notes in Computer Science, vol. 1777, Springer-Verlage, New York, 2000. L. Mark, etc. XMLApe. College of Computing, Georgia Institue of Technology. http://www.cc.gatech.edu/projects/XMLApe/ K. D. Munroe, B. Ludaescher and Y. Papakonstantinou. Blended Browsing and Querying of XML in Lazy Mediator System. Konstanz, Germany, March 2000. W. Ni, T. W. Ling. GLASS: A Graphical Query Language for Semi-Structured Data. DASFAA 2003. Y. Papakonstantinou, M. Petropoulos and V.Vassalos. QURSED: Querying and Reporting Semistructured Data. ACM SIGMOD 2002, Jun 4-6, Madison, Wisconsin, USA. XQuery 1.0: An XML Query Language. W3C Working Draft 22 August 2003 http://www.w3.org/TR/xquery/ XML Path Language (XPath) 2.0. W3C Working Draft 22 August 2003 http://www.w3.org/TR/xpath20/ XML Schema. http://www.w3.org/XML/Schema 7/1/2016 DASFAA 2005, Beijing, China 21 The End Thank you very much & wish you have wonderful time in Beijing. 7/1/2016 DASFAA 2005, Beijing, China 22