XML Structures for Relational Data Wenyue Du, Mong Li Lee, Tok Wang Ling Department of Computer Science School of Computing National University of Singapore {duwenyue, leeml, lingtw}@comp.nus.edu.sg Contents Introduction 1. – – – Background 2. – – – 3. 4. 5. Motivation Related Works Our Approach XML XML DTD Semantic Enrichment Proposed Relational to XML Translation Comparison Conclusion 2 1. Introduction Outline – – – Motivation Related Works Our Approach 3 Introduction Motivation XML is emerging as a standard for information publishing on the World Wide Web. However, the underlying data is often stored in traditional relational databases. Some mechanism is needed to translate the relational data into XML data. 4 Introduction Motivation (cont.) Generates XML structures that are able to describe the semantics and structures in underlying relational databases. Obtains properly structured XML data without unnecessary redundancies and proliferation of disconnected XML elements. 5 Introduction Related Works • [1, 5, 6] basically focus on single relation translation. In order to handle a set of related relations, the relations are first denormalized to one single relation. – The flat XML structure does not provide a good way to show the structure of data. – It causes a lot of redundancies. Relations: Dept(D#, Dname) Employee (E#, Ename, JoinDate, D#) <!ELEMENT Results(Employee*)> <!ELEMENT Employee (EMPTY)> Maps to <!ATTLIST Employee E# CDATA #REQUIRED Ename CDATA #IMPLIED JoinDate CDATA #IMPLIED D# CDATA #REQUIRED DNAME CDATA #IMPLIED > 6 Introduction Related Works (cont.) • [7] developed a method to generate a hierarchical DTD for XML data from a relational schema. – It lacks of semantic enrichment. So it cannot handle more complex situations. Relations: <!ELEMENT Results(Employee*)> Dept (D#, Dname) <!ELEMENT Employee (Dept)> Maps to Employee (E#, Ename, JoinDate, D#) <!ATTLIST Employee E# ID #REQUIRED Ename CDATA #IMPLIED JoinDate CDATA #IMPLIED> <!ELEMENT Dept (EMPTY)> Is it an attribute of object or relationship? <!ATTLIST Dept … > 7 Introduction Our Approach XML structures for relational data can be obtained by the following steps: Relational Schema Semantic Enrichment Semantically Translation Enriched Rules Relational Schema ORA-SS Schema Diagram ORA-SS to XML-Schema Algorithm XMLSchema 8 2. Background Outline – – – XML XML Schema Semantic Enrichment 9 Background / XML XML Basic constructs of XML: – Element – Attribute – Reference (link) : a relationship between resources (e.g. elements). It is specified by attaching specific attributes or sub-elements. 10 Background / XML DTD XML DTD A Document Type Definition (DTD) describes structure on an XML document. <RESULTS> <!ELEMENT RESULTS (CUSTOMER*)> <CUSTOMER CID=“C980054Z"> <CNAME>J. Tan</CNAME> <AGE>36</AGE> </CUSTOMER> … </RESULTS> <!ELEMENT CUSTOMER (CNAME, AGE)> <!ATTLIST CUSTOMER CID ID <!ELEMENT CNAME (#PCDATA)> <!ELEMENT AGE XML document #REQUIRED> (#PCDATA)> Corresponding DTD 11 Background / Semantic Enrichment Semantic Enrichment • Semantic enrichment is a process that upgrades the semantics of databases, in order to explicitly express semantics that is implicit in the data. Such as various relationship types, cardinality constraints, etc. 12 Background / Semantic Enrichment Extra information needed: • Functional Dependencies (FDs) and keys • Inclusion dependencies (INDs) e.g. STUDENT (S#, SNAME) HOBBIES(S#, HOBBY) HOBBIES[S#] STUDENT[S#] • Semantic dependencies (SDs) (T.W. Ling & M.L. Lee, 1995) 13 Background / Semantic Enrichment Semantic Dependencies EMPLOYEE(E#, ENAME, JOINDATE, D#) - JOINDATE is functionally dependent on only E# - Assuming JOINDATE refers to the date on which an employee assumes duty with the department. We say that JOINDATE is semantically dependent on {E#, D#} 14 Background / Semantic Enrichment Semantic Enrichment using SD together with FD and IND To obtain: Object relations and object attributes that represent regular and weak entity types, and their properties. Relationship relations and relationship attributes that represent various relationship types such as binary, n-ary, recursive and ISA (inheritance), and their properties. Mix-type relations: We need to split them into object relations and relationship relations Fragments of object relations or relationship relations that represent multi-valued attributes of entity types or relationship types. Cardinality constraints 15 Background / Semantic Enrichment An Original Relational Schema COURSE (CODE, TITLE) DEPT (D#, DNAME) STUDENT (S#, SNAME) TUTORIAL (T#, TUTORIALTITLE) HOBBIES(S#, HOBBY) STUDENTDEPT (S#, D#) C_S (CODE, S#, GRADE) ATTEND (CODE, T#, S#) COURSEMEETING (CODE, S#,MEETINGHISTORY) 16 Background / Semantic Enrichment The Semantically Enriched Schema Object Relations: Relationship Relations: COURSE (CODE, TITLE) STUDENTDEPT (S#, D#) DEPT (D#, DNAME) C_S (CODE, S#, GRADE) STUDENT (S#, SNAME) ATTEND (CODE, T#, S#) TUTORIAL (T#, TUTORIALTITLE) Fragment of Object Relations HOBBIES(S#, HOBBY) Fragment of Relationship Relations COURSEMEETING (CODE, S#,MEETINGHISTORY) fragment of C_S 17 3. Proposed Relational to XML Translation Outline – – – ORA-SS Model Relational Schema to ORA-SS Translation ORA-SS to XML Schema Translation 18 Proposed Relational to XML Translation / ORA-SS ORA-SS Model ORA-SS (Object-Relationship-Attribute model for Semi-Structured data) G. Dobbie, X.Y. Wu, T.W. Ling, M.L. Lee, “ORA-SS: An ObjectRelationship-Attribute Model for Semi-structured Data”, TR 21/00, National Univ. of Singapore, 2001 19 Proposed Relational to XML Translation / ORA-SS Concepts of ORA-SS (cont.) Object class Binary relationship COURSE TUTORIAL STUDENT C_S 2,1:n,1:n CODE TITLE C_S GRADE STUDENT1 S# SNAME C_S_Ref ATTEND 3,1:n,1:n T# TUTORIAL TITLE T_Ref TUTORIAL1 Identifier Relationship attribute Ternary relationship Reference 20 Enriched Relational Schema to ORA-SS Schema Translation Enriched Relational Schema to ORA-SS Schema Translation Objectives: • Identify object classes and their attributes from object relations • Identify relationship types and their attributes from relationship relations • Identify hierarchical structure • Generate ORA-SS schema 21 Enriched Relational Schema to ORA-SS Schema Translation Overview of Translation Rules 1. Object relation rules: to translate object relations 2. Relationship relation rules: to translate relationship relations 3. Combination rule: to be applied to the result obtained from the application of object and relationship relation rules, and generate the final ORA-SS schema. 22 Enriched Relational Schema to ORA-SS Schema Translation /Object Relation Translation Rules Rule O1: Mapping object relations STUDENT STUDENT(S#, SNAME) Maps to S# SNAME Single-valued attribute 23 Enriched Relational Schema to ORA-SS Schema Translation /Object Relation Translation Rules Rule O2: Mapping fragment of object relations STUDENT STUDENT(S#, SNAME) HOBBIES(S#, HOBBY) S# SNAME * HOBBY Maps to Multivalued attribute 24 Enriched Relational Schema to ORA-SS Schema Translation /Relationship Relation Translation Rules Rule R1: Mapping 1-m/1-1 relationship relation Objectives: Reduce disconnected elements Use parent-child structure Avoid unnecessary redundancies Use references Example: ADVISOR(STAFF#, POSITION) // object relation STUDENT(S#, SNAME) // object relation STU_ADV(S#, STAFF#) //1-m relationship relation 25 Enriched Relational Schema to ORA-SS Schema Translation /Relationship Relation Translation Rules Rule R1: Mapping 1-m/1-1 relationship relation (cont.) Case 1: All the objects (instances) of STUDENT participate in the relationship type STU_ADV ADVISOR STU_ADV Maps to STU_ADV 2,0:n,1:1 STUDENT Use parent-child structure 26 Enriched Relational Schema to ORA-SS Schema Translation /Relationship Relation Translation Rules Rule R1: Mapping 1-m/1-1 relationship relation (cont.) Case 2: 1. Not all the objects of STUDENT participate in STU_ADV. or 2. STUDENT is already as a child object and all the objects of ADVISOR participate in STU_ADV . STUDENT STU_ADV Maps to STU_ADV 2,0:1,1:n ADVISOR Use parent-child structure 27 Enriched Relational Schema to ORA-SS Schema Translation /Relationship Relation Translation Rules Rule R1: Mapping 1-m/1-1 relationship relation (cont.) Case 3: There exist objects of STUDENT and ADVISOR do not participate in STU_ADV STU_ADV STUDENT ADVISOR Maps to STU_ADV 2,*,? ADVISOR1 A_Ref or ADVISOR STUDENT STU_ADV 2,*,? S_Ref STUDENT1 Use reference 28 Enriched Relational Schema to ORA-SS Schema Translation /Relationship Relation Translation Rules Rule R2: Mapping m-n binary relationship relation COURSE Three ways to map: CODE TITLE STUDENT GRADE S# SNAME STUDENT COURSE COURSE(CODE, TITLE) C_S, 2,1:n,1:n C_S(S#, CODE, GRADE) STUDENT (S#, SNAME) C_S CODE Preferred Mapping TITLE STUDENT1 S# SNAME C_S_REF C_S GRADE COURSE STUDENT C_S, 2,1:n,1:n S# SNAME COURSE1 C_S GRADE CODE TITLE C_S_REF 29 Enriched Relational Schema to ORA-SS Schema Translation /Relationship Relation Translation Rules Other relationship relation rules Fragment of relationship relation is translated similarly to the translation of the fragment of object relation. N-ary relationship relation is translated using reference structures. The level of each referencing object may be determined by the aggregations. If B ISA A, then B is mapped to a child object class (OB) of OA. 30 Enriched Relational Schema to ORA-SS Schema Translation /Combination Rule Combination Rule: to be applied to the result obtained from the application of object and relationship relation rules, and generate the final ORA-SS schema. Example: PERSON(SSNO, RACE) //object relation STUDENT(S#, SSNO, MAJOR) //object relation DEPT(D#, DNAME) //object relation STU_DEPT(S#, D#) //relationship relation STUDENT ISA PERSON and one DEPT has many STUDENT. In this case, STUDENT potentially has multiple parents (i.e., DEPT and PERSON). 31 Enriched Relational Schema to ORA-SS Schema Translation /Combination Rule Combination Rule: Current solution: Use references (K. Williams, et al. January 2001) -- It causes too many disconnected elements. <!ELEMENT Results (PERSON*, STUDENTS* DEPT*)> <!ELEMENT PERSON (EMPTY)> <!ATTLIST PERSON SSNO ID #REQUIRED RACE CDATA #IMPLIED STU_REF1 IDREF #REQUIRED> <!ELEMENT STUDENT (EMPTY)> <!ATTLIST STUDENT S# ID #REQUIRED MAJOR CDATA #IMPLIED > <!ELEMENT DEPT (EMPTY)> <!ATTLIST DEPT D# ID #REQUIRED DNAME CDATA #IMPLIED STU_REF2 IDREFS #REQUIRED> 32 Enriched Relational Schema to ORA-SS Schema Translation /Combination Rule Combination Rule: (cont.) Our approach: Translations are produced sequentially according to their priorities. The translation with the lowest priority will be carried out last. The priorities of translations (in descending order) 1. ISA, etc. semantic relationship relations and their fragments // high semantic cohesion among these participating object classes 2. 1-1 and 1-m relationship relation and their fragments // potentially represented as hierarchy (p-c) structure 3. m-1 relationship relations and their fragments // potentially represented as hierarchy structure; preferably view as 1-m 4. m-n, n-ary relationship relations and their fragments This rule is used to avoid or reduce potential multiple parents. 33 Enriched Relational Schema to ORA-SS Schema Translation /Combination Rule Combination Rule: (cont.) DEPT PERSON ISA, 2,1:?,1:1 SSNO RACE STUDENT D# DNAME STUDENT1 D_S_REF S# We map STUDENT to the child object class of PERSON first. Then map DEPT according to 1-m relationship relation rule. Thus, we may get the following result. MAJOR <!ELEMENT OurSolution (PERSON*, DEPT*)> <!ELEMENT PERSON (STUDENT)> <!ATTLIST PERSON SSNO ID #REQUIRED RACE CDATA #IMPLIED > <!ELEMENT STUDENT (EMPTY)> <!ATTLIST STUDENT S# ID MAJOR CDATA <!ELEMENT DEPT <!ATTLIST DEPT D# ID DNAME CDATA D_S_REF IDREFS #REQUIRED #IMPLIED > (EMPTY)> #REQUIRED #IMPLIED #REQUIRED> 34 Enriched Relational Schema to ORA-SS Schema Translation A possible ORA-SS Schema diagram derived from university database Object Relations: Relationship Relations: COURSE (CODE, TITLE) STUDENTDEPT (S#, D#) DEPT (D#, DNAME) C_S (CODE, S#, GRADE) STUDENT (S#, SNAME) ATTEND (CODE, T#, S#) TUTORIAL (T#, TUTORIALTITLE) Fragment of Relationship Relations Fragment of Object Relations COURSEMEETING (CODE, S#,MEETINGHISTORY) HOBBIES(S#, HOBBY) STUDENT COURSE STUDENT1 TUTORIAL DEPT STUDENTDEPT 2,0:n,1:1 C_S 2,1:n,1:n CODE TITLE * S# SNAME HOBBY STUDENT2 D# DNAME C_S_REF C_S * T# TUTORIAL TITLE D_S_REF C_S MEETING GRADE HISTORY fragment of C_S ATTEND 3,1:n,1:n TUTORIAL1 T_REF 35 Algorithm: Mapping ORA-SS Schema Diagram to XML DTD Input: an ORA-SS schema diagram SD Output: an XML DTD Begin Start from the top of SD and proceed downward, for each object class O encountered do: Step 1. Sub-object classes of O <!ELEMENT O (subelementsList)> Step 2. For each attribute A of O Case (1) A is a single valued simple attribute <!ATTLIST O A type> Case (2) A is a single valued composite attribute, replace A with its components and add to <!ATTLIST O attributename type> Case (3) A is a multivalued simple attribute <!ELEMENT A(#PCDATA)> Case (4) A is a multivalued composite attribute <!ELEMENT A(EMPTY)> A’s components <!ATTLIST A componentName type> Step 3. For each relationship attribute A under O, add A to subelementsList in <!ELEMENT O(subelementsList)>. Case (1) A is a simple attribute <!ELEMENT A(#PCDATA)>. Case (2) A is a composite attribute <!ELEMENT A(EMPTY)>, 36 A’s components <!ATTLIST A componentName type> Algorithm: Mapping ORA-SS Schema Diagram to XML DTD The obtained XML structures (DTD) <!ELEMENT UNIVERSITY (COURSE*, STUDENT*, DEPT*, TUTORIAL*)> <!ELEMENT COURSE (STUDENT1*)> <!ATTLIST COURSE CODE ID #REQUIRED TITLE CDATA #IMPLIED> <!ELEMENT STUDENT1 (MEETINGHIS*,TUTORIAL1*)> <!ATTLIST STUDENT1 C_S_REF IDREF #REQUIRED GRADE CDATA #IMPLIED> <!ELEMENT MEETINGHIS (#PCDATA)> <!ELEMENT TUTORIAL1 (EMPTY)> <!ATTLIST TUTORIAL1 T_REF IDREF #REQUIRED> <!ELEMENT STUDENT (HOBBIES*)> <!ATTLIST STUDENT S# ID #REQUIRED SNAME CDATA #IMPLIED> <!ELEMENT HOBBIES (#PCDATA)> <!ELEMENT DEPT (STUDENT2*)> <!ATTLIST DEPT D# ID #REQUIRED DNAME CDATA #IMPLIED> <!ELEMENT STUDENT2 (EMPTY)> <!ATTLIST STUDENT2 D_S_REF IDREF #IMPLIED> <!ELEMENT TUTORIAL(EMPTY)> <!ATTLIST TUTORIAL T# ID #REQUIRED TUTORIAL_TITLE CDATA #IMPLIED> 37 4. Comparison Rich structured and represents the real world accurately Yes ( ) Partially [3] No [1, 5, 6] The representation of various relationship types and their attributes Yes ( Number of disconnected elements Few ( ) This paper Partially [7] No [1, 3, 5, 6] ) Many Unnecessary redundancies [7], This paper Avoidable ( [7], This paper Naïve approaches ) This paper Partially [3, 7] Many [1, 5, 6] 5 Conclusion Method proposed in this paper achieves Generation of semantically sound XML structures for relational data possible Generation of properly structured XML data without unnecessary redundancies and proliferation of disconnected XML elements possible 39 References [1] S. Banerjee, et al “Oracle 8i – The XML Enabled Data Management System”, Proc. 16th Int’l Conf. on Data Engineering, 2000 [2] G. Dobbie, X.Y. Wu, T.W. Ling, M.L. Lee, “ORA-SS: An ObjectRelationship- ttribute Model for Semi-structured Data”, TR 21/00, NUS, 2001 [3] D.W. Lee, M. Mani, F. Chiu, W.W Chu, “Nesting-based Relational-to-XML Schema Translation”, Proc, 4th Int’l Workshop on Web and Databases, 2001 [4] T.W. Ling, M.L. Lee, “Relational to Entity-Relationship Schema Translation Using Semantic and Inclusion Dependencies”, In Journal of Integrated Computer-Aided Engineering, pages 125-145, 1995 [5] SYBASE, “Using XML with the Sybase Adaptive Server SQL Databases, A Technical Whitepaper”, http://www.sybase.com,2000 [6] V. Turau, “Making Legacy Data Accessible for XML Applications”, http://www.informatik.fh-wiesbaden.de/~turau/veroeff.html1999 [7] K. Williams, et al., “XML Structures for Existing Databases”, http://www106.ibm.com/developerworks/library/x-struct/ January 2001 [8] W.Y. Du, M.L. Lee, T.W. Ling, “XML Structures for Relational Data”, Proc. 2nd Int’l Conf. on Web Information Systems Engineering (WISE) , IEEE Computer Society, 2001