The ORA-SS Approach for Designing Semistructured Databases Xiaoying Wu, Tok Wang Ling, Mong Li Lee National University of Singapore Gillian Dobbie University of Auckland, New Zealand 1 Outline 1. Motivation 2. Introduction to ORA-SS (Object-Relationship- 3. 4. 5. 6. 7. Attribute ) Model From ORA-SS to XML DTD Normal form for ORA-SS schema diagram Designing ORA-SS schema diagram into normal form Comparison with related proposals Summary 2 1. Motivation Example 1.1: Redundancy in XML document <department> <name>cs</name> <professor> <staffnumber>12</staffnumber> <name>Smith</name> <course> <coursecode>230</coursecode> <title>Database</title> </course> </professor> <professor> <staffnumber>22</staffnumber> <name>Jones</name> <course> <coursecode>230</coursecode> <title>Database</title> </course> </professor> </department> 3 1. Motivation (Cont.) Example 1.1 (Cont.) department. department name 1 name professor 3 CS staff number 6 12 4 name 7 Smith 5 staff number course 8 course code 10 230 professor staff number name course professor 16 22 name 11 database course (b) DataGuide 18 17 Jones title course code grade course code 19 230 title 20 database (a) OEM Database 4 1. Motivation (Cont.) Example 1.1 (Cont.) Corresponding ORA-SS instance diagram and schema diagram department department 2, 1:n, 1:1 name: cs Staff name: number: Smith 12 course code: 230 name professor professor 2, 1:n, 1:n Staff name: number:22 Jones Staff number name course course course title: Database professor course code: 230 (a) ORA-SS instance diagram title: Database course code title (b) Nested object class in an ORA-SS schema diagram 5 1. Motivation (Cont.) Example 1.1 (Cont.) department course 2, 1:n, 1:1 name course code professor title 2, 1:n, 1:n Staff number Course-Ref name course1 A better Designed ORA-SS schema diagram 6 1. Motivation (Cont.) Example 1.1 (Cont.) department course Course-Ref name: C.S. Staff number: 12 professor name: Smith course code: title: database 230 professor Staff name: number:22 Jones course1 course1 Course-Ref A better Designed ORA-SS instance schema diagram 7 1. Motivation (Cont.) Example 1.2:Ambiguity in OEM database and its DataGgide project JMP id 1 project project project name member name position 17 31 2 16 4 3 J1 name 5 M1 name member id position publication 6 publication 7 8 Pub1 name 18 J2 title 9 number 11 Pub2 member 19 13 title number 12 14 Pub3 id 20 publication name 10 number id 21 M1 title 15 publication number title name 32 J3 33 position publication 22 23 number 24 Pub1 An OEM Database 34 publication publication 26 title 25 number 27 Pub2 DataGuide member 29 number title title 28 30 Pub3 31 8 1. Motivation (Cont.) Example 1.2(Cont.) :Ternary Relationship Type Representation project jm 2, +,+ id name mp 3, 0:n, 1:m pub1 Pub2 Pub3 m2 member publication (b) A data instance of (a) project publication number m1 project member name position j1 j2 j3 title (a) ORA-SS Schema Diagram (mp is a ternary relationship type) id name member name position publication number title (c) DataGuide 9 1. Motivation (Cont.) Example 1.2 (Cont.):Binary Relationship Type Representation j1 j2 j3 project jm 2, +, + id name member publication (b) A data instance of (a) project position publication number pub1 Pub2 Pub3 m2 project member mp 2, *, + name m1 title (a)ORA-SS Schema Diagram (mp is a binary relationship type) id name member name position publication number title (c) DataGuide Note the DataGuide for the schema diagram is the same as for the previous schema! 10 2. Introduction to ORA-SS Model Four concepts: object classes relationship types attributes references Four Diagrams: schema diagram instance diagram functional dependency diagram inheritance diagram 11 2. Introduction to ORA-SS Model(Cont.) Object Class – attributes of object class • Single valued • Multi-valued – ordering on object class employee SSN name age * hobby Object class employee with attributes in an ORA-SS schema diagram 12 2. Introduction to ORA-SS Model(Cont.) Relationship Type – attributes of relationship type • Single valued • Multi-valued – degree of n-ary relationship type – participation constraints of objects in relationship type – disjunctive relationship type – recursive relationship type 13 2. Introduction to ORA-SS Model(Cont.) Relationship type(Cont.) project j1 j2 j3 jm 2, +, + id name m2 member project mp 2, *, + name m1 member pub1 Pub2 Pub3 publication (b) A data instance of (a) position publication number title (a)ORA-SS Schema Diagram (mp is a binary relationship type) Representing binary relationship type 14 2. Introduction to ORA-SS Model(Cont.) Relationship type(Cont.) project jm 2, +,+ id name member name position j1 j2 j3 m1 m2 project mp 3, 0:n, 1:m pub1 Pub2 Pub3 member publication (b) A data instance of (a) publication number title (a) ORA-SS Schema Diagram (mp is a ternary relationship type) Representing ternary relationship type 15 2. Introduction to ORA-SS Model(Cont.) Attributes – – – – – – – – – key attribute and identifier composite attribute disjunctive attribute attribute with unknown structure (ANY) ordering on attribute Attributes of object class/relationship type Single-valued / multi-valued attribute fixed and default values of attribute derived attribute 16 2. Introduction to ORA-SS Model(Cont.) Attributes(Cont.) course cs 2, 4:n, 3:8 title dept-prefix D:CS * ANY student number cs number firstname lastname cs * mark grade hobby Object classes with relationship type and attributes in an ORA-SS schema diagram 17 2. Introduction to ORA-SS Model(Cont.) Attributes(Cont.) course assign 2, 1:n, 1:1 code title lecture theatre exam venue laboratory project homework topic algorithm number deadline Disjunctive attribute and relationship in an ORA-SS schema diagram 18 2. Introduction to ORA-SS Model(Cont.) References course code title lecture theatre exam venue student + text book laboratory cs 2, 1:n, 1:m student1 cs Student-Ref number first name name address last name grade Referencing an object class in an ORA-SS schema diagram 19 2. Introduction to ORA-SS Model(Cont.) References (Cont.) course cp 2, 0:5, 1:n code title title course-prereq. prereq Recursive relationship type in an ORA-SS schema diagram student course cs 2, +, + code title cs 2, +, + number name course1 student1 cs grade Student-Ref Course-Ref cs grade Symmetric relationship sets in an ORA-SS schema diagram 20 3. Mapping ORA-SS schema diagram to XML DTD Algorithm 1: Mapping ORA-SS Schema Diagram to XML DTD input: an ORA-SS schema diagram SD output: an XML DTD Begin For each object class O in SD do: Step 1. sub-object classes of O <!ELEMENT O (subelementsList)>. Step 2. For each attribute A of O Case (1)A is a single valued simple attribute <!ATTLIST O A type> Case (2)A is a single valued composite attribute, replace A with its components and add them to <!ATTLIST O attributeName type> Case (3)A is a multivalued simple attribute <!ELEMENTA (#PCDATA)>. Case (4)A is a multivalued composite attribute <!ELEMENTA (#EMPTY)>, A’s components <!ATTLIST A componentName type > 21 3. Mapping ORA-SS schema diagram to XML DTD (Cont.) Algorithm 1: mapping ORA-SS schema diagram to XML DTD (cont.) Step 3. For each relationship attribute A under O Case (1)A is a simple attribute subelementsList. <!ELEMENTA (#PCDATA)> add A to O ’s Case (2)A is a multi-valued simple attribute to O ’s subelementsList . <!ELEMENTA (#PCDATA)> and add A Case (3)A is a single-valued composite attribute <!ELEMENTA (#PCDATA)>. A’s components <!ATTLISTA componentName type >. Case (4) A is a multi-valued composite attribute <!ELEMENTA (#PCDATA)>. A’s components <!ATTLISTA componentName type >. add A to O ’s subelementsList. Step 4. For each reference O-Ref Case (1) O is a child object class of O1, and has no extra attributes and child object classes <!ATTLIST O1 O-Ref IDREF(S)> Case (2) O is a root object class or it has nested attributes or child object classes <!ATTLIST O O-Ref IDREF(S)> 22 3. Mapping ORA-SS schema diagram to XML DTD (Cont.) Example 3.1 course code title lecture theatre exam venue student + text book laboratory cs 2, 1:n, 1:m student1 cs Student-Ref number first name name address last name grade Referencing an object class in an ORA-SS schema diagram 23 3. Mapping ORA-SS schema diagram to XML DTD (Cont.) Example 3.1 (Cont.) <!ELEMENT course (textbook+, student 1+)> <!ATTLIST course code CDATA #REQUIRED title CDATA lecture-theater CDATA #IMPLIED laboratory CDATA #IMPLIED > <!ELEMENT textbook #PCDATA> <!ELEMENT student1 (grade)> <!ATTLIST student1 Student-Ref IDREF #REQUIRED > <!ELEMENT grade #PCDATA > <!ELEMENT student (name)> <!ATTLIST student number ID #REQUIRED address CDATA> <!ELEMENT name EMPTY> <!ATTLIST name first-name CDATA last-name CDATA> An XML DTD for the ORA-SS schema diagram 24 4. Normal form for ORA-SS schema diagram Observation: ORA-SS is similar to nested relations – tree-like structure – repeating groups or multiple occurrences of objects. e.g.: the corresponding nested relation for the following ORA-SS schema diagram is Dept (dept-name, course (code, title, student (number, s-name, grade)*)*) department 2, 1:n, 1:1 Dept name course cs, 2, 1:n, 1:n code title student cs number s-name grade 25 4. Normal form for ORA-SS schema diagram(Cont.) Objectives: To ensure the corresponding set of nested relations of the ORA-SS schema diagram is in normal form for set of nested relations (NFNR) [5,6] We will define Object class normal form (O-NF) Relationship type normal form (R-NF) ORA-SS normal form schema (ORA-SS NF) 26 4. Normal form for ORA-SS schema diagram(Cont.) Defn: object class normal form (O-NF) An object class O of an ORA-SS schema diagram is said to be in object class normal form (O-NF), if the nested relation constructed by O’s single valued attributes as its atomic attributes, O’s multivalued attributes as its repeating groups, is in normal form NF-NR. 27 4. Normal form for ORA-SS schema diagram(Cont.) Example 4.1:Assume we have following functional dependencies: {S# dept, deptfaculty} for the ORA-SS schema diagram: The corresponding nested relation for the schema diagram is : Staff(s#,dept,faculty), staff S# dept faculty faculty 2,1:n,1:1 it is not in 3NF, since faculty is transitive dependent on S# , hence the relation is not in NF-NR. A better Designed ORA-SS schema diagram: Transitive functional dependency is removed. dept 2,1:n,1:1 staff 28 4. Normal form for ORA-SS schema diagram(Cont.) Defn: relationship type normal form (R-NF) A relationship type R of an ORA-SS schema diagram D is said to be in relationship type normal form (R-NF), if the nested relation constructed by the identifiers of the participating object classes, and R’s atomic attributes as its atomic attributes, R’s multivalued attributes and composite attributes as its repeating groups, is in normal form NF-NR. 29 4. Normal form for ORA-SS schema diagram(Cont.) Example 4.2:The ORA-SS schema attempts to show that the lecturer can teach all the courses using all the textbooks as described on the curriculum, i.e. it should satisfy a MVD constraints: course-codeisbn | staff#.. course ct 2, 1:n, 1:n course code title text The nested relation for the relationship type course ctl is: ctl(course-code,isbn,staff#) It is not in 4NF, so is not in NF-NR, hence the code title cl relationship type ctl is not in R-NF. ct 2, 1:n, 1:n ctl, 3, 1:n, 1:n isbn title isbn staff# name lecturer text lecturer office title 2, 1:n, 1:n staff# name office A better design: MVD is removed 30 4. Normal form for ORA-SS schema diagram(Cont.) Defn: ORA-SS normal form schema An ORA-SS schema diagram D is in normal form (NF) iff it satisfies the following conditions: 1.Every object class in D is in O-NF. 2.For every relationship type R in D (a) R is in R-NF. (b) Case(1) R is a binary relationship type from object class A to object class B, then all the B’s attributes can stay with B only if R is a one-to-many or one-to-one binary relationship type from A to B. All the attributes of R (if any) should be attached to B. Case (2) R is a n-ary relationship type with n (n>2) participating object classes O1,O2,…,On, and the path going downward from the top of D linking those object classes is /O1/O2/…/On, then for each object class Oi (2in), (i) Oi should have an i-ary relationship Ri with its ancestors O1,O2,…,Oi-1. (ii) The attributes of Oi can stay with Oi only if functional dependency Oi O1,O2,…,Oi-1 can be derived from the functional dependency diagram for D. The attributes of Ri (if any) should be attached to Oi. 3.There is no relationship type nested under another many-to-many or many-to one binary or n-ary (n>2) relationship type. 4.Every relationship type cannot be derived from other relationship types in D. 31 4. Normal form for ORA-SS schema diagram(Cont.) Example 4.4: The ORA-SS schema diagram is not in NF, if professor is also an employee in the department: the qualification of a professor can be derived from that of employee, such information will be repeated in the underlying databases. department department name name professor professor staff# title title Qual. + Staff-Ref ** grad ** grad research research student student interests interests year degree 2,1:n,1:1 2,1:n,1:1 2,1:n,1:1 2,1:n,1:1 employee employee name staff# name staff# Qual.++ Qual. job-history * *job-history company degree j-date yeardegree j-datecompany year A ORA-SS schema diagram that in notNF in NF 32 5. Converting ORA-SS Schema Diagrams into Normal Form Two Approaches for Designing Semistructured Databases: Approach 1. – based on the users’ requirements, come out an initial ORA-SS schema diagram; – normalize the ORA-SS schema diagram to its normal form; – map it to an XML DTD or XML Schema; Approach 2. – Extract schema from the instances using the schema extracting techniques. – Translate the schema into ORA-SS schema diagram. Here we need semantic enrichment, since not all semantics needed are available from the extracted schema. – Convert the ORA-SS schema diagram into its normal form. – translate the NF ORA-SS schema diagram back to XML DTD or XML Schema. – Restructuring the initial data instance to conform to the generated XML DTD or XML Schema. 33 5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.) Algorithm 2: Converting an ORA-SS schema diagram into NF ORA-SS schema diagram. Input : an ORA-SS schema diagram SD, and its functional dependency diagram. Output : a NF ORA-SS schema diagram. { step 1. Convert any non O-NF object class to O-NF. step 2. Make each relationship type R in R-NF. step 3. This step involves two sub-steps. (1) Construct diagrams for each object class with their attributes. (2) Represent each relationship type R. We make R satisfy the item (b) of condition 2 as well as condition 3 of the NF definition by introducing referencing object classes, and requiring each relationship type start with an object class with attributes (i.e., non-reference object class). step 4. Remove those relationship types along with their associated attributes that can be derived from other relationship types in the schema diagram to satisfy condition 4 of NF definition. } 34 5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.) Example 5.1: . There is a many-to-many binary relationship pc between professor and course, and a many-to-many binary relationship ct between course and textbook. It is not in NF ORA-SS since it violates the condition 3 of the NF definition. professor pc, 2, *, * staff# name course ct, 2, *,* code title ISBN textbook + author title (a) Initial ORA-SS schema diagram 35 5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.) Example 5.1 (Cont.) Step 1. The three given object classes are already in O-NF. Step 2. The two relationship type pc and ct are already in R-NF. Step 3. (1) generate three diagrams for the object classes with attributes. professor staff# name textbook course code title ISBN + author title (b) Fragment diagrams for object classes 36 5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.) Example 5.1 (Cont.) Step 3.(Cont.) (2) represent the binary relationship pc, by creating a reference object class course1 referencing course and nest course1 under professor professor course textbook pc, 2, *, * staff# name c-ref Course1 code title + ISBN author title (c) Diagrams after representing relationship pc 37 5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.) Example 5.1 (Cont.) Step 3.(Cont.) (2) represent the binary relationship ct, by creating a reference object class textbook1 referencing textbook and nest textbook1 under course. pc, 2, *, * ct, 2, *,* c-ref staff# name textbook course professor course1 code title t-ref ISBN textbook1 + author title (d) Final ORA-SS schema diagram that in NF Step 4.(passed). The schema generated is in NF. 38 5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.) Example 5.2. There is a binary relationship cs between course and student and a ternary relationship cst between course, student and tutor. The grade is an attribute of the binary relationship cs, and feedback is an attribute of the ternary relationship cst. It is not in NF ORA-SS since it violates the item (ii) of case 2 in condition 2-(b) of NF definition. course cid ? title cs,2,0:m,0:n student cs sid ? ? grade name age cst,3,0:m,0:n tutor cst tid * name feedback interest (a) Initial ORA-SS schema diagram 39 5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.) Example 5.2(Cont.) Step 1. The three given object classes are already in O-NF. Step 2.The two relationship type cs and cst are already in R-NF. Step 3. (1) generate three diagrams for the object classes with attributes. course cid ? title student sid ? name age tid tutor * name interest (b) Fragment diagrams for object classes 40 5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.) Example 5.2 (Cont.) Step 3.(Cont.) (2) represent the binary relationship cs. we create a reference object class student1 referencing student and nest student1 under course. Relationship attribute grade is attached to student1. course cid student cs, ? 2,0:m,0:n title s-ref student1 sid ? name age tid tutor * name interest cs ? grade (c) Diagram representing binary relationship cs 41 5. Converting ORA-SS Schema Diagrams into Normal Form(Cont.) Example 5.2 (Cont.) Step 3.(Cont.) (2) represent the relationship cst. we create a reference object class tutor1 referencing tutor, and nest tutor1 under student1. Relationship attribute feedback is attached to tutor1. student course cid cs,2,0:m,0:n ? title s-ref sid name tutor ? age tid * name interest student1 cs ? grade cst,3,0:m,0:n t-ref tutor1 cst feedback (d) Final ORA-SS schema diagram that in NF Step 4.(passed). The schema generated is in NF. 42 6. Comparison with Related Proposal The first attempt to define normal form for semistructured data[4] – Defines a schema called S3-Graph, a labeled graph in which vertices correspond to objects and edges represent the object-subobject relationship. Its data instance is called semistructured data graph. – S3-Graph cannot show the degree of a n-ary relationship type, neither can it distinguish between attributes of object classes and attributes of relationships types. 43 6. Comparison with Related Proposal(Cont.) The first attempt to define normal form for semistructured data[4] (Cont.) – Defined a dependency constraint SSdependency. – Proposes S3-NF. An S3-Graph is in S3-NF if there is no transitive SS-dependency. Hence, only this kind of redundancy can be recognized by S3-NF 44 6. Comparison with Related Proposal(Cont.) The first attempt to define normal form for semistructured data[4] (Cont.) – Presents two approaches to design S3-NF databases 1. The decomposition method can remove identified transitive SS-dependency and achieve S3-NF, while may not able to remove the partial functional dependency inside an entity type or object classes, as well as the redundancy result from over-nesting. 2. The transformation of a normal form ER diagram into an S3-Graph. The result may not be unique but is dependent on the path constructed. Hence some results may not satisfy the application requirements and comply with the user’s viewpoints. 45 6. Comparison with Related Proposal(Cont.) The most recent proposal: XNF (XML Normal Form)[2] – It mainly provides algorithms to translate a schema, represented in a conceptual model called CM hypergraph to a scheme-tree forest in XNF. – CM hypergraph has no concept of attribute (so too many objects) and no hierarchical structure. – The given algorithms are non-deterministic, and suffers from efficiency. – Adding new required information requires redesign schema. – The algorithms generate a large no of solutions rather than verifying whether a SS schema is in normal form or not. – ISA hierarchies are removed from CM hypergraph before input to the algorithms. 46 6. Comparison with Related Proposal(Cont.) The advantages of our proposal: – 2-level design: incremental and iterative • First, identify or figure out object classes,and relationship types from user requirements. • Then add attributes for object classes and relationship types. In contrast, XNF requires all the needed information to be presented at once. Even a small change in information requirements requires redesign the whole schema. 47 6. Comparison with Related Proposal(Cont.) The advantages of our proposal (Cont.): – Preserve the hierarchical structure satisfying users’ requirements. In contrast, since CM graph has no hierarchy, XNF needs to generate many solutions. The approach fails when user already has a hierarchical structure, and wants to preserve it and verifies the design is good or not. 48 7. Summary ORA-SS model helps to detect redundancy in semistructured data. We need a normal form for ORA-SS, since ORASS schema diagrams may contain redundancies and suffers from considerable updating anomalies. We define a normal form ORA-SS schema diagram. It ensures – no unnecessary redundancy and – no updating anomalies for semistructured databases generated from the schema . We present an algorithm for mapping ORA-SS schema diagram into XML DTD/Schema 49 7. Summary (Cont.) We give a design methodology and present a comprehensive algorithm for normalizing an ORA-SS schema diagram into its normal form. The steps presented can also be used as guidelines for designing semistructured databases using the ORASS model – As ORA-SS distinguished objects Vs. attributes, the design complexity is reduced. – ORA-SS allows 2 levels of design: first object classes and relationship type then add in attributes. We show that ORA-SS design approach outperform other related proposals. 50 References 1. 2. 3. 4. 5. 6. 7. G.Dobbie, X.Y.Wu, T.W.Ling and M.L.Lee. ORA-SS: An Object-RelationshipAttribute Model for Semistructured Data. Technical Report TR21/00, School of Computing, National University of Singapore, 2000. D.W.Embley and W.Y.Mok. Developing XML Documents with Guaranteed “Good” Properties. ER 2001. R. Goldman and J. Widom. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. Proceedings of the TwentyThird International Conference on Very Large Data Bases, pages 436-445, Athens, Greece, August 1997. S. Y. Lee, M. L. Lee, T. W. Ling and L. A.. Kalinichenko. Designing Good Semi-structured Databases. ER 1999: 131-145 T.W. Ling. A Normal Form for Entity-Relationship Diagrams. Proc. 4th International Conference on Entity-Relationship Approach (1985) T. W. Ling. A normal form for sets of not-necessarily normalized relations. In Proceedings of the 22nd Hawaii International Conference on System Sciences, pp. 578-586. United States: IEEE Computer Society Press, 1989. X.Y.Wu, T.W. Ling, M.L.Lee, G.Dobbie. Designing Semistructured Databases Using ORA-SS Model, in Proceedings of the 2nd International Conference on Web Information Systems Engineering (WISE), IEEE Computer Society Kyoto, Japan, December 2001. 51