Rainbow - Bridging XML and Relational Databases: Design, Implementation, and Evaluation MQP Project Members: MQP Advisor: Tien Vu, Prof. Elke A. Rundensteiner, PhD Mirek Cymer, Sponsor: John Lee Verizon Laboratories Incorporated 04-19-2001 1 HTML vs. XML 04-19-2001 2 XML Data Management by RDBMS Microsoft, IBM, Informix, Oracle,... Advantages: Matured database tools available. Efficient query and analysis tools. Easy integration with existing business databases. Issues: Map between XML and Relational Model. Update Propagation. Query Translation and Optimization. 04-19-2001 3 Traditional System Architecture XML Query User XML Legend XML Query Engine XML Data RDBMS Sub system XML Manager XML 04-19-2001 4 Motivation for Flexible Mapping Query Performance varies with respect to how data is mapped. Car iid pid 1 0 … … SELECT * FROM model; Make iid pid Value 2 1 Ford … … … SELECT model FROM car WHERE make = ‘Ford’; car Model iid pid Value 3 1 Mustang … … … 04-19-2001 iid pid Make Model Year 1 0 Ford Mustang 2001 … … … … … Year iid pid Value 3 1 2001 … … … 5 Rainbow Architecture XML Query User XML Legend XML Query Engine Restructuring Subsystem DTD Manager DTD RDBMS XML Manager XML Data Sub system XML Flexible mapping = fixed Mapping + restructuring 04-19-2001 6 Rainbow Restructuring Subsystem XML Query User XML Legend XML Query Engine Sub system Data Restructuring Subsystem Process DTD Manager DTD 04-19-2001 XML Manager XML 7 Rainbow Restructuring Subsystem XML Query User XML Legend XML Query Engine Restructuring Sub system Mapping Data Restructure Operator Library Restructurer Process DTD Manager DTD 04-19-2001 XML Manager XML 8 Restructuring Operator Library Library contains following operators: Pushup/Pushdown Attribute Pushup/Pushdown Nesting Rename Item/Rename Attribute SwitchNesting Split/Merge Nesting Reference/Dereference Operator is composed of DTD Modifications Data Changes 04-19-2001 9 Pushup Attribute Operator DTD Modifications: In Out A A Data Changes: CREATE VIEW out.$A AS SELECT p.<all_columns>, c.$x B B FROM in.$A p, in.$B c x WHERE c.pid = p.iid CREATE VIEW out.$B AS x Pushup 04-19-2001 SELECT <all-columns-but-x> FROM in.$B 10 Instantiated Pushup Operator DTD Modifications: In Out Car Car Model Data Changes: Model Value CREATE VIEW out.Car AS SELECT p.iid, p.pid, c.value FROM in.Car p, in.Model c WHERE c.pid = p.iid CREATE VIEW out.Model AS SELECT iid, pid FROM in.Model Value Pushup 04-19-2001 pushUpAttribute(‘Model’, ‘Value’, ‘Car’, ‘Model’); 11 Mapping Mapping is a Sequence of Instantiated Operators For Example: 1. pushUpAttribute(‘Model’, ‘Value’, ‘Car’, ‘Model’); 2. renameAttribute(‘Car’, ‘Value’, ‘Model’); car Car iid pid 1 1 0 … … Model car iid pid Value 1 0 Mustang … … … 2 iid pid Model 1 0 Mustang … … … Model iid pid Value iid pid 3 Mustang 3 1 … … … 1 … … 04-19-2001 12 Rainbow Implementation Development Tools Java: Visual Café 4, Javadoc, JAVA2 Oracle 8i, XML 4J, JDBC1.2, SQL Statistics of Class Implementation 44 total 17 created Extended 43% 19 extended 8 reused 04-19-2001 New 39% Reused 18% 13 Screen Shot of Rainbow 04-19-2001 14 Screen Shot of Rainbow 04-19-2001 15 Screen Shot of Rainbow 04-19-2001 16 Setup for Rainbow Evaluation Experimental Database Server: Oracle 8i on a PII 300MHz, 256MB, Microsoft NT Server Client: Pentium 233MHz, 128MB, Microsoft NT Workstation Data Designed a DTD Generated XML using IBM’s XML-Generator 04-19-2001 DTD CONTENT: <!ELEMENT one (two+)> <!ELEMENT two (three)> <!ELEMENT three (four)> <!ELEMENT four (five)> <!ELEMENT five (six)> <!ELEMENT six (seven)> <!ELEMENT seven EMPTY> <!ATTLIST seven attribute #REQUIRED> 17 Query Performance Evaluation Time of Join Query (data-size=22Mb) Query Time (s) 40 30 20 10 0 0 1 2 3 4 5 6 7 # of PushUpAttribute Time of Join Query 04-19-2001 18 Overhead Cost Restructure Time (s) Time of overhead (datasize=22Mb) 350 300 250 200 150 100 50 0 0 1 2 3 4 5 6 7 # of pushUpAttribute overhead time 04-19-2001 19 MQP Accomplishments Technical accomplishments Implemented functional prototype system Confirmed feasibility of Rainbow architecture Designed automated test bed Conducted preliminary experimental studies Knowledge acquired OO, Java, JDBC, SQL, RDBMS, XML, DTD Logistics of setting up experiments Teamwork & S/W Engineering & Software Reuse 04-19-2001 20 Potential Future Work XML query translation to SQL Experiment with test plans and test beds to realize the full potential of the restructuring component. 04-19-2001 21 Special thanks to: Prof. Elke A. Rundensteiner Ph.D. Xin Zhang Visit Rainbow at http://davis.wpi.edu/dsrg/TJM/ Project Members: Tien Vu, Mirek Cymer, John Lee 04-19-2001 22