Design Issues in XML Databases Ref: Designing XML Databases by Mark Graves 1 Storing data in XML • XML can be stored as a flat file, in an object-oriented database or in a relational database. • Flat File Database: A flat file is a simple storage mechanism and does not provide indexed queries or easy modification of a document. • The easiest way to store XML data is a single flat file that stores the entire XML document. • The data in the file is accessible by a variety of text editors and xml tools (eg: parsers). • Flat file databases (consisting of many files) are a useful alternative. • In cases where the functionality of a traditional DBMS may not be necessary and when special-purpose operations like text search, or temporal or spatial operations are needed, flat file databases may be a better choice. 2 Flat-file Databases • Usually flat file databases will store large quantities of static data in a flat file database organized with one document per file. • Flat file databases have the following limitations: – quick access and indexing are difficult. – Long updates and recovery are not efficient. 3 Flat-file Databases • Splitting a document into several document fragments may be more efficient than storing an entire document in a file. • For example, in applications involving many transaction, each transaction can be stored in a file. • Another approach is to slice a document with an identifier for each slice. • A simple mechanism to link documents is to have a special type name, for example, “include” with attributes that provide the necessary information to link to another document. • Example: <include element=“transaction” id=“12345”/> These places where the document can be sliced into multiple document fragments are called “slice points”. When the slice points are already determined by the application, the appropriate element type may used directly, such as: <transaction id=“12345” In practice, this approach is too inefficient because of time needed to access each file individually. 4 Using a Relational Database to store XML data • Developing a relational schema for XML may be the most practical approach to integrating XML into a high-throughput enterprise. • There are three approaches that are used with the choice of a Relational Schema: • Fine-grained • Coarse-grained and • Medium-grained 5 Using a Relational Database to store XML data • Fine-grained approach: every construct in the document is given a unique identity in the relational database. • Every element, attribute and character data region can be individually accessed, modified or deleted with minimal effect on other document constructs. • This provides the most flexibility and ease of access both with the XML DBMS-specific operations as well as the traditional relational ones. • However, regenerating the entire document can be timeconsuming when large. 6 Using a Relational Database to store XML data Fine-grained Relational Schema -- Create fine-grained storage tables and constraints create table xdb_doc ( doc_id NUMBER(8) NOT NULL, name VARCHAR2(128) NOT NULL, root NUMBER(8) ); alter table xdb_doc add primary key (doc_id); create table xdb_ele ( doc_id NUMBER(8) NOT NULL, ele_id NUMBER(8) NOT NULL, parent_id NUMBER(8), tag VARCHAR2(32) NOT NULL ); alter table xdb_ele add primary key (doc_id, ele_id); create table xdb_attr ( doc_id NUMBER(8) attr_id NUMBER(8) ele_id NUMBER(8) name VARCHAR2(32) value VARCHAR2(255) ); NOT NULL, NOT NULL, NOT NULL, NOT NULL, 7 Using a Relational Database to store XML data create table xdb_child ( doc_id NUMBER(8) NOT NULL, ele_id NUMBER(8) NOT NULL, indx NUMBER(6) NOT NULL, child_class VARCHAR2(4) NOT NULL, -- ELE, STR, or TEXT child_id NUMBER(8) NOT NULL ); alter table xdb_child add primary key (doc_id, ele_id, indx); create table xdb_str ( doc_id NUMBER(8) NOT NULL, cdata_id NUMBER(8) NOT NULL, ele_id NUMBER(8) NOT NULL, value VARCHAR2(255) NOT NULL ); 8 Using a Relational Database to store XML data create table xdb_text ( doc_id NUMBER(8) NOT NULL, cdata_id NUMBER(8) NOT NULL, ele_id NUMBER(8) NOT NULL, value LONG NOT NULL ); alter table xdb_text add primary key (doc_id, cdata_id); -- Foreign Keys alter table xdb_doc add constraint fk_xdb_doc_root foreign key (doc_id, root) references xdb_ele (doc_id, ele_id); alter table xdb_ele add constraint fk_xdb_ele_doc_id foreign key (doc_id) references xdb_doc (doc_id); alter table xdb_attr add constraint fk_xdb_attr_doc_id foreign key (doc_id) references xdb_doc (doc_id); alter table xdb_attr add constraint fk_xdb_attr_ele_id foreign key (doc_id, ele_id) references xdb_ele (doc_id, ele_id); alter table xdb_child add constraint fk_xdb_child_doc_id foreign key (doc_id) references xdb_doc (doc_id); … 9 Using a Relational Database to store XML data -- Search for all elements with a given tag name. select doc_id, ele_id from xdb_ele where doc_id = 1 and tag = 'TagName'; -- Search for all elements with a given attribute name. select doc_id, attr_id from xdb_attr where doc_id = 1 and name = 'AttrName' and value = 'AttrValue'; -- Search for all elements with a given tag name that has a -- character data child consisting of a given string. select ele.doc_id, ele.ele_id from xdb_ele ele, xdb_str str where ele.doc_id = 1 and str.doc_id = ele.doc_id and ele.tag = 'TagName' and ele.ele_id = str.ele_id and str.value = 'SearchedForValue'; … 10 Using a Relational Database to store XML data Coarse-grained approach: In this approach, a document is stored in its entirety. • Even though it is similar to storing the documents in flat files, this approach has the advantage of allowing the documents to be referred to within other structures in the database. • In addition, it provides the security, recovery and other features of a DBMS in which it is stored. • Its particular usefulness is part of a hybrid representation. 11 Using a Relational Database to store XML data Coarse-grained approach: create table cgrel_doc ( doc_id NUMBER(8) NOT NULL, name VARCHAR2(128), body LONG ); alter table cgrel_doc add primary key (doc_id); 12 Using a Relational Database to store XML data Medium-grained approach: The fine-grained approach works well to perform tasks that access elements, whereas the tasks to store and retrieve an entire document are difficult. The coarse-grained approach works well in manipulating entire documents but has difficulty with the element manipulation. The medium-grained approach is a compromise between the fine and coarsegrained approaches. 13 Using a Relational Database to store XML data Medium-grained approach: The document tree can be sliced into sections where the sub-sections are stored with a coarsegrained approach. This is particularly useful if the sections are accessed individually: for example, in reference books such as dictionaries, a medium grained approach would be to store each entry separately. 14 Using a Relational Database to store XML data Medium-grained approach: Determining the slice points is a complex issue. • How many slice-points are created? • How many levels of slicing are created? • Does the slicing depend upon the element type name or the depth in the tree or the size of the document section? • Are some sections of the document sliced more finely than the other sections? 15 Using a Relational Database to store XML data Medium-grained approach: One way to approach the slicing granularity is to view slicing as a method to index the database. • An index speeds up access for a particular database request by creating an index table that provides quick navigation of the indexed information. • Slices ca be created on a element type name (s) for which frequent access is anticipated. • A combination of element type names and attribute values can also be used to drive the index slice method. • Choosing slice points based on desired indexing will reduce the data access time over the coarse-grained approach for queries or other accesses that involve the indexed tags. • Indexes can be created on a few highly requested element types. • If most of the directly accessed element type names are indexed then the access time for those queries approaches the access time under the fine-grained approach while also reducing the document regeneration time. 16 Using a Relational Database to store XML data Medium-grained approach: • Another way to approach the slicing granularity is to view the slicing method as a buffering mechanism. • Slices may be determined by physical characteristics, such as size of the slice. • The slice size can be chosen to efficiently use network communication protocols to reduce the time needed to retrieve a portion of the document when the network response is a critical factor. • Combinations of approaches may be used for particular applications or documents. 17 Using a Relational Database to store XML data Medium-grained approach: One point to be addressed is how to represent the slice points in the document. One mechanism is to create a specific element type to represent the necessary information, ensuring that the element type name is unique in the document. For example, an element type called “slice” or “proxy” could be created with attributes that contain sufficient information to reconstruct the document, namely “document-id” and “elementid”. 18 Using a Relational Database to store XML data Medium-grained approach: tables and constraints for medium-grained storage create table xdb_frag ( doc_id NUMBER(8) NOT NULL, frag_id NUMBER(8) NOT NULL, ele_id NUMBER(8) NOT NULL, body LONG NOT NULL ); alter table xdb_frag add primary key (doc_id, frag_id); create table xdb_frag_ref ( doc_id NUMBER(8) NOT NULL, frag_id NUMBER(8) NOT NULL, ele_ref NUMBER(8) NOT NULL ); alter table xdb_frag_ref add primary key (doc_id, frag_id, ele_ref); alter table xdb_frag add constraint fk_xdb_frag_doc_id foreign key (doc_id) references xdb_doc (doc_id); alter table xdb_frag add constraint fk_xdb_frag_ele_id foreign key (doc_id, ele_id) references xdb_ele (doc_id, ele_id); alter table xdb_frag_ref add constraint fk_xdb_frag_ref_doc_id foreign key (doc_id) references xdb_doc (doc_id); alter table xdb_frag_ref add constraint fk_xdb_frag_ref_ele_ref foreign key (doc_id, ele_ref) references xdb_ele (doc_id, ele_id); alter table xdb_frag_ref add constraint fk_xdb_frag_ref_frag_id foreign key (doc_id, frag_id) references xdb_frag (doc_id, frag_id); 19 Relational Data Server • A relational data server (combined with a web server) makes data from a relational DBMS available in XML. • It works by querying the database and formatting the report from the RDBMS as XML. • The basic process consists of the following steps: – The user specifies a relational query to the web browser as a URL. 20 Relational Data Server • The basic process consists of the following steps: – The user specifies a relational query to the web browser as a URL. – The web browser sends the URL to the data server. – The data server parses the URL request and creates a SQL query. – The data server passes the SQL query to the database server. – The database server executes the query. – The database server returns the relational report to the data server. – The data server formats the report as XML. – The data server returns the XML report to the Web browser. – The web browser parses the XML report and displays it to the user. When a stylesheet is used, it must be retrieved and parsed by the web browser. 21 Some Issues To be addressed • Can relational views be displayed in addition to tables? • Can data be updated? Or, is data read-only? • Can data be retrieved from multiple tables? • How are joins handled? • How complex can the mapping from relations to XML be? • Can helper tables or joining tables be recognized and handled differently? • Can foreign key constraints be followed? To what depth? • How are links between tables handled? • How expressive is the query facility? 22 • We will briefly examine a relational data server (implemented in java) called rServe. • Ref: http://www.xwdb.org/docs/XWDB_RServe_User_Guide.pdf • • Some examples of usage: http://127.0.0.1/servlets/com.xweave.xmldb.rserve.XMLServlet?tablename= customer&stylesheet=/ss/generic1.xsl – Which retrieves all the records from the customer table, then formats the data using the stylesheet http://127.0.0.1/ss/generic.xsl. 23 Creating a SQL Query • The relational data server uses the parameters of the URL to build a SQL statement. • Example: • Select * from tablename; • Much of the functionality of a SQL statement can be encloded as a URL. • The where condition is given by pre-pending the condition to the value in a name-value pair. • Example: • …&pno=‘p1’&quantity=<=20& 24 Formatting a Report as XML • One way to format data as XML is to use the table names and column names to create the element types. • Information on primary keys and foreign keys can be used to create a document hierarchy. • An issue to address about element naming is: – Special characters may be used in table and column names that are not valid as xml element names. 25 Extracting Dictionary Data • A commercial relational DBMS provides information in its catalog (Data Dictionary) consisting of system tables. • The system tables contain information about the “metadata” – primary keys, foreign keys and other integrity constraints that are defined on the data. • The data in a RDBMS can be mapped to XML as a tree of parent/child relationships (represented using primary and foreign keys). 26