Flat File Database

advertisement
Design Issues in XML Databases
Ref: Designing XML Databases by Mark Graves
1
Storing data in XML
• XML can be stored as a flat file, in an object-oriented
database or in a relational database.
• Flat File Database: A flat file is a simple storage
mechanism and does not provide indexed queries or easy
modification of a document.
• The easiest way to store XML data is a single flat file that
stores the entire XML document.
• The data in the file is accessible by a variety of text editors
and xml tools (eg: parsers).
• Flat file databases (consisting of many files) are a useful
alternative.
• In cases where the functionality of a traditional DBMS may
not be necessary and when special-purpose operations
like text search, or temporal or spatial operations are
needed, flat file databases may be a better choice.
2
Flat-file Databases
• Usually flat file databases will store large
quantities of static data in a flat file
database organized with one document per
file.
• Flat file databases have the following
limitations:
– quick access and indexing are difficult.
– Long updates and recovery are not efficient.
3
Flat-file Databases
•
Splitting a document into several document fragments may be more efficient
than storing an entire document in a file.
• For example, in applications involving many transaction, each transaction
can be stored in a file.
• Another approach is to slice a document with an identifier for each slice.
• A simple mechanism to link documents is to have a special type name, for
example, “include” with attributes that provide the necessary information to
link to another document.
• Example:
<include element=“transaction” id=“12345”/>
These places where the document can be sliced into multiple document
fragments are called “slice points”.
When the slice points are already determined by the application, the appropriate
element type may used directly, such as:
<transaction id=“12345”
In practice, this approach is too inefficient because of time needed to access
each file individually.
4
Using a Relational Database to
store XML data
• Developing a relational schema for XML
may be the most practical approach to
integrating XML into a high-throughput
enterprise.
• There are three approaches that are used
with the choice of a Relational Schema:
• Fine-grained
• Coarse-grained and
• Medium-grained
5
Using a Relational Database to store XML data
• Fine-grained approach: every construct in the
document is given a unique identity in the relational
database.
• Every element, attribute and character data region can
be individually accessed, modified or deleted with
minimal effect on other document constructs.
• This provides the most flexibility and ease of access both
with the XML DBMS-specific operations as well as the
traditional relational ones.
• However, regenerating the entire document can be timeconsuming when large.
6
Using a Relational Database to store XML data
Fine-grained Relational Schema
-- Create fine-grained storage tables and constraints
create table xdb_doc (
doc_id
NUMBER(8)
NOT NULL,
name
VARCHAR2(128)
NOT NULL,
root
NUMBER(8)
);
alter table xdb_doc add primary key (doc_id);
create table xdb_ele (
doc_id
NUMBER(8)
NOT NULL,
ele_id
NUMBER(8)
NOT NULL,
parent_id
NUMBER(8),
tag
VARCHAR2(32)
NOT NULL
);
alter table xdb_ele add primary key (doc_id, ele_id);
create table xdb_attr (
doc_id
NUMBER(8)
attr_id
NUMBER(8)
ele_id
NUMBER(8)
name
VARCHAR2(32)
value
VARCHAR2(255)
);
NOT NULL,
NOT NULL,
NOT NULL,
NOT NULL,
7
Using a Relational Database to store XML data
create table xdb_child (
doc_id
NUMBER(8)
NOT NULL,
ele_id
NUMBER(8) NOT NULL,
indx
NUMBER(6)
NOT NULL,
child_class
VARCHAR2(4) NOT NULL, -- ELE, STR, or TEXT
child_id
NUMBER(8)
NOT NULL
);
alter table xdb_child add primary key (doc_id, ele_id, indx);
create table xdb_str (
doc_id
NUMBER(8)
NOT NULL,
cdata_id
NUMBER(8)
NOT NULL,
ele_id
NUMBER(8) NOT NULL,
value
VARCHAR2(255) NOT NULL
);
8
Using a Relational Database to store XML data
create table xdb_text (
doc_id
NUMBER(8)
NOT NULL,
cdata_id
NUMBER(8)
NOT NULL,
ele_id
NUMBER(8)
NOT NULL,
value
LONG
NOT NULL
);
alter table xdb_text add primary key (doc_id, cdata_id);
-- Foreign Keys
alter table xdb_doc add constraint fk_xdb_doc_root
foreign key (doc_id, root) references xdb_ele (doc_id, ele_id);
alter table xdb_ele add constraint fk_xdb_ele_doc_id
foreign key (doc_id) references xdb_doc (doc_id);
alter table xdb_attr add constraint fk_xdb_attr_doc_id
foreign key (doc_id) references xdb_doc (doc_id);
alter table xdb_attr add constraint fk_xdb_attr_ele_id
foreign key (doc_id, ele_id) references xdb_ele (doc_id, ele_id);
alter table xdb_child add constraint fk_xdb_child_doc_id
foreign key (doc_id) references xdb_doc (doc_id);
…
9
Using a Relational Database to store XML data
-- Search for all elements with a given tag name.
select doc_id, ele_id
from xdb_ele
where doc_id = 1
and tag = 'TagName';
-- Search for all elements with a given attribute name.
select doc_id, attr_id
from xdb_attr
where doc_id = 1
and name = 'AttrName'
and value = 'AttrValue';
-- Search for all elements with a given tag name that has a
-- character data child consisting of a given string.
select ele.doc_id, ele.ele_id
from xdb_ele ele, xdb_str str
where ele.doc_id = 1
and str.doc_id = ele.doc_id
and ele.tag = 'TagName'
and ele.ele_id = str.ele_id
and str.value = 'SearchedForValue';
…
10
Using a Relational Database to store XML data
Coarse-grained approach: In this approach, a
document is stored in its entirety.
• Even though it is similar to storing the documents
in flat files, this approach has the advantage of
allowing the documents to be referred to within
other structures in the database.
• In addition, it provides the security, recovery and
other features of a DBMS in which it is stored.
• Its particular usefulness is part of a hybrid
representation.
11
Using a Relational Database to store XML data
Coarse-grained approach:
create table cgrel_doc (
doc_id
NUMBER(8) NOT NULL,
name
VARCHAR2(128),
body
LONG
);
alter table cgrel_doc add primary key
(doc_id);
12
Using a Relational Database to store XML data
Medium-grained approach:
The fine-grained approach works well to
perform tasks that access elements,
whereas the tasks to store and retrieve an
entire document are difficult.
The coarse-grained approach works well in
manipulating entire documents but has
difficulty with the element manipulation.
The medium-grained approach is a
compromise between the fine and coarsegrained approaches.
13
Using a Relational Database to store XML data
Medium-grained approach: The document
tree can be sliced into sections where the
sub-sections are stored with a coarsegrained approach. This is particularly
useful if the sections are accessed
individually: for example, in reference
books such as dictionaries, a medium
grained approach would be to store each
entry separately.
14
Using a Relational Database to store XML data
Medium-grained approach:
Determining the slice points is a complex
issue.
• How many slice-points are created?
• How many levels of slicing are created?
• Does the slicing depend upon the element
type name or the depth in the tree or the
size of the document section?
• Are some sections of the document sliced
more finely than the other sections?
15
Using a Relational Database to store XML data
Medium-grained approach:
One way to approach the slicing granularity is to view slicing as a
method to index the database.
• An index speeds up access for a particular database request by
creating an index table that provides quick navigation of the indexed
information.
• Slices ca be created on a element type name (s) for which frequent
access is anticipated.
• A combination of element type names and attribute values can also
be used to drive the index slice method.
• Choosing slice points based on desired indexing will reduce the data
access time over the coarse-grained approach for queries or other
accesses that involve the indexed tags.
• Indexes can be created on a few highly requested element types.
• If most of the directly accessed element type names are indexed
then the access time for those queries approaches the access time
under the fine-grained approach while also reducing the document
regeneration time.
16
Using a Relational Database to store XML data
Medium-grained approach:
• Another way to approach the slicing granularity is
to view the slicing method as a buffering
mechanism.
• Slices may be determined by physical
characteristics, such as size of the slice.
• The slice size can be chosen to efficiently use
network communication protocols to reduce the
time needed to retrieve a portion of the
document when the network response is a
critical factor.
• Combinations of approaches may be used for
particular applications or documents.
17
Using a Relational Database to store XML data
Medium-grained approach:
One point to be addressed is how to represent the
slice points in the document.
One mechanism is to create a specific element
type to represent the necessary information,
ensuring that the element type name is unique in
the document.
For example, an element type called “slice” or
“proxy” could be created with attributes that
contain sufficient information to reconstruct the
document, namely “document-id” and “elementid”.
18
Using a Relational Database to store XML data
Medium-grained approach:
tables and constraints for medium-grained storage
create table xdb_frag (
doc_id
NUMBER(8) NOT NULL,
frag_id
NUMBER(8) NOT NULL,
ele_id
NUMBER(8) NOT NULL,
body
LONG
NOT NULL
);
alter table xdb_frag add primary key (doc_id, frag_id);
create table xdb_frag_ref (
doc_id
NUMBER(8) NOT NULL,
frag_id
NUMBER(8) NOT NULL,
ele_ref
NUMBER(8) NOT NULL
);
alter table xdb_frag_ref add primary key (doc_id, frag_id, ele_ref);
alter table xdb_frag add constraint fk_xdb_frag_doc_id
foreign key (doc_id) references xdb_doc (doc_id);
alter table xdb_frag add constraint fk_xdb_frag_ele_id
foreign key (doc_id, ele_id) references xdb_ele (doc_id, ele_id);
alter table xdb_frag_ref add constraint fk_xdb_frag_ref_doc_id
foreign key (doc_id) references xdb_doc (doc_id);
alter table xdb_frag_ref add constraint fk_xdb_frag_ref_ele_ref
foreign key (doc_id, ele_ref) references xdb_ele (doc_id, ele_id);
alter table xdb_frag_ref add constraint fk_xdb_frag_ref_frag_id
foreign key (doc_id, frag_id) references xdb_frag (doc_id, frag_id);
19
Relational Data Server
• A relational data server (combined with a
web server) makes data from a relational
DBMS available in XML.
• It works by querying the database and
formatting the report from the RDBMS as
XML.
• The basic process consists of the following
steps:
– The user specifies a relational query to the
web browser as a URL.
20
Relational Data Server
• The basic process consists of the following steps:
– The user specifies a relational query to the web browser as a
URL.
– The web browser sends the URL to the data server.
– The data server parses the URL request and creates a SQL
query.
– The data server passes the SQL query to the database server.
– The database server executes the query.
– The database server returns the relational report to the data
server.
– The data server formats the report as XML.
– The data server returns the XML report to the Web browser.
– The web browser parses the XML report and displays it to the
user.
When a stylesheet is used, it must be retrieved and parsed by the
web browser.
21
Some Issues To be addressed
• Can relational views be displayed in addition to
tables?
• Can data be updated? Or, is data read-only?
• Can data be retrieved from multiple tables?
• How are joins handled?
• How complex can the mapping from relations to
XML be?
• Can helper tables or joining tables be recognized
and handled differently?
• Can foreign key constraints be followed? To what
depth?
• How are links between tables handled?
• How expressive is the query facility?
22
• We will briefly examine a relational data
server (implemented in java) called
rServe.
• Ref: http://www.xwdb.org/docs/XWDB_RServe_User_Guide.pdf
•
•
Some examples of usage:
http://127.0.0.1/servlets/com.xweave.xmldb.rserve.XMLServlet?tablename=
customer&stylesheet=/ss/generic1.xsl
– Which retrieves all the records from the customer table, then
formats the data using the stylesheet
http://127.0.0.1/ss/generic.xsl.
23
Creating a SQL Query
• The relational data server uses the parameters
of the URL to build a SQL statement.
• Example:
• Select * from tablename;
• Much of the functionality of a SQL statement can
be encloded as a URL.
• The where condition is given by pre-pending the
condition to the value in a name-value pair.
• Example:
• …&pno=‘p1’&quantity=<=20&
24
Formatting a Report as XML
• One way to format data as XML is to use
the table names and column names to
create the element types.
• Information on primary keys and foreign
keys can be used to create a document
hierarchy.
• An issue to address about element naming
is:
– Special characters may be used in table and
column names that are not valid as xml
element names.
25
Extracting Dictionary Data
• A commercial relational DBMS provides
information in its catalog (Data Dictionary)
consisting of system tables.
• The system tables contain information
about the “metadata” – primary keys,
foreign keys and other integrity constraints
that are defined on the data.
• The data in a RDBMS can be mapped to
XML as a tree of parent/child relationships
(represented using primary and foreign
keys).
26
Download