On utilizing variables for specifying FDs in data

Data & Knowledge Engineering 60 (2007) 494–510
www.elsevier.com/locate/datak
On utilizing variables for specifying FDs in data-centric
XML documents
Wai Yin Mok
*
Accounting and Information Systems, University of Alabama in Huntsville, Huntsville, AL 35899, United States
Received 21 June 2005; accepted 22 March 2006
Available online 21 April 2006
Abstract
We are interested in specifying functional dependencies (FDs) for data-centric XML documents (XML documents that
are used mainly for data storage). FDs are a natural constraint. Specifying FDs for XML documents is more difficult
because unlike relational databases, XML documents do not have uniform structures. This paper introduces XML Template Functional Dependencies (XTFDs), which are able to specify FDs for XML documents. This paper also presents a
necessary and sufficient condition for an XTFD to cause data redundancy in XML documents. Further, we propose Attribute Rule and Text String Rule as two procedures that can be repeatedly applied to remove redundancy caused by XTFDs.
In addition, we prove that if an XML document has data redundancy with respect to an FD specified by using the tree
tuple approach, it would have data redundancy with respect to an XTFD and show by example that XTFDs can specify
some FDs for XML documents that the tree tuple approach cannot.
2006 Elsevier B.V. All rights reserved.
Keywords: Functional dependencies; Data-centric XML documents; DTDs; Data redundancy
1. Introduction
This paper focuses on specifying functional dependencies (FDs) for data-centric XML documents. That is,
XML documents that are used mainly for data storage. FDs are a natural constraint of relational databases
[7]. Specifying FDs for XML documents, however, is more difficult because unlike relational databases, XML
documents do not have uniform structures. Inspired by Template Dependencies [10], this paper introduces
XML Template Functional Dependencies (XTFDs) that are able to specify FDs for XML documents. An
XTFD consists of a hypothesis and a conclusion where the hypothesis identifies a ‘‘pattern’’ in an XML document and the conclusion specifies two text strings/attribute values in such a pattern that must be equal. Since
XTFDs are based on simple concepts like variables (Definition 1) and functions (Definition 4), XTFDs are
highly intuitive and are easier to understand than many other proposals of the same purpose (see Section
6). Additionally, we present a necessary and sufficient condition for an XTFD to cause data redundancy in
*
Tel.: +1 256 824 6980; fax: +1 256 824 2929.
E-mail address: mokw@email.uah.edu
0169-023X/$ - see front matter 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.datak.2006.03.009
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
495
XML documents. We further propose Attribute Rule and Text String Rule as two procedures that can be
repeatedly applied to remove redundancy caused by XTFDs. We also prove that if an XML document has
data redundancy with respect to an FD specified by using the tree tuple approach [1], it would have data
redundancy with respect to an XTFD. However, the converse of this statement is not true. This implies that
any redundancy removal method for XTFDs will suffice for the FDs of [1] as well.
Before we begin our presentation, we first point out the assumptions of this paper. We suppose that the
reader is familiar with XML documents up to the level of [3]. Conventionally, when we write about an
XML element, we usually ignore its end tag. As in [1,14], we assume every element (instance) in an XML document is distinct. Hence, the Personnel elements on Lines 4, 10 and 15 in Fig. 1(b) are all distinct, although
they all look alike. The same is true for all the other elements in Fig. 1(b). Attribute values, however, are not
distinct. For example, the Name-attribute values in the Employee elements on Lines 11 and 16 in Fig. 1(b) are
identical and both are equal to "Tom". Like attribute values, text strings are not unique. For example, the text
strings of the Dept elements on Lines 6 and 23 in Fig. 1(b) are identical and both are equal to "Library".
Like [1,14], we also assume an element can only have at most one text string. In other words, Mixed Content is
not allowed. For example, <E1>text1<E2/>text2</E1> is not allowed since element E1 has two text
strings "text1" and "text2". Thus, from here on, we can say ‘‘the’’ text string of an element rather than
‘‘a’’ text string of an element.
As opposed to the tree tuple approach [1], we emphasize that the definition of XTFDs does not depend on
any definition of DTDs or that of XML schema. In fact, if one wants, any definition for DTDs, as the one in
[16], or any definition for XML schema, as the one in [17], will serve our purpose. What we need are the sets of
element names and attribute names and the set of valid XML documents of a given DTD or XML schema.
XTFDs, in turn, are constructed from the variables derived from the given sets of element names and attribute
names.
This paper is organized as follows. Section 2 defines the syntax and semantics of XTFDs. Section 3 provides
and proves a necessary and sufficient condition for XTFDs to cause data redundancy in XML documents.
Attribute Rule and Text String Rule are given in Section 4 as two procedures that can be repeatedly applied
<!ELEMENT
<!ELEMENT
<!ATTLIST
<!ELEMENT
<!ELEMENT
<!ATTLIST
Example (Project, Employees)>
Project (Personnel, Project*)>
Project Name CDATA #REQUIRED>
Personnel (Employee*)>
Employee (Dept?)>
Employee EID CDATA #REQUIRED
Name CDATA #REQUIRED>
<!ELEMENT Employees (Employee*)>
<!ELEMENT Dept (#PCDATA)>
(a)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<!DOCTYPE Example SYSTEM "project1.dtd">
<Example>
<Project Name="eLibrary">
<Personnel>
<Employee EID="E01" Name="Mary">
<Dept>Library</Dept>
</Employee>
</Personnel>
<Project Name="Web Interface">
<Personnel>
<Employee EID="E02" Name="Tom"/>
</Personnel>
</Project>
<Project Name="T3 Line">
<Personnel>
<Employee EID="E02" Name="Tom"/>
<Employee EID="E03" Name="Jack"/>
</Personnel>
</Project>
</Project>
<Employees>
<Employee EID="E01" Name="Mary">
<Dept>Library</Dept></Employee>
<Employee EID="E02" Name="Tom">
<Dept>Web</Dept></Employee>
<Employee EID="E03" Name="Jack">
<Dept>Network</Dept></Employee>
</Employees>
</Example>
(b)
Fig. 1. An XML document with redundancy.
496
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
until no more data redundancy caused by XTFDs remains. Section 5 shows that if an XML document has
data redundancy with respect to the FDs of [1], it would have data redundancy with respect to XTFDs but
not the other way around. Section 6 discusses some related work and points out possible future research.
We conclude in Section 7.
2. XTFDs
2.1. Syntactic definition
Syntactically, XTFDs are constructed from the variables that are derived from the given element names and
attribute names. These variables are defined as follows.
Definition 1. Let E be a finite set of element names and A be a finite set of attribute names. For each e 2 E, we
define var(e), a set of element variables, as {e1, e2 , . . .}. Likewise, for each a 2 A, we define var(a), a set of
attribute variables, as {a1 , a2 , . . .}. We use the symbol S to denote a text string and let strvar = {S1, S2, . . .} be
a set of text string variables. We further define a special element variable: root. Finally, we let var(E) be
[e2Evar(e) [ {root} and var(A) be [a2Avar(a). Without loss of generality, we assume E \ A = ; and root
and S are not in E [ A. That is, neither root nor S may be used as an element name or an attribute name.
An important consequence of Definition 1 is that var(E), var(A), and strvar are pairwise disjoint because
E \ A = ; and root and S are not in E [ A.
Example 1. The XML document in Fig. 1(b) tells a very simple story. Mary, Tom and Jack are three
employees who are working on a project called eLibrary. eLibrary has two subprojects: Web Interface and T3
Line. Mary is from the Library Department, Tom the Web Department and Jack the Network Department.
Referring to Fig. 1(b), E = {Example, Project, Personnel, Employee, Employees, Dept} and
A = {EID, Name}. Therefore, var(E) = {root, Example1, Project1, Personnel1, Employee1, Employees1, Dept1, Example2, Project2, Personnel2, Employee2, Employees2, Dept2, . . .} and
var(A) = {EID1, Name1, EID2, Name2, . . .}.
With var(E), var(A), and strvar defined, we are ready to define the syntax of XTFDs.
Definition 2. Let E be a finite set of element names and A be a finite set of attribute names. An XTFD over E
and A consists of a hypothesis and a conclusion. An XTFD’s hypothesis is a finite (possibly empty) set of
XTFD-elements, where each XTFD-element has one of the following forms:
• s1.s2. .sm where m P 1 and si 2 var(E),
• s1.s2. .sm.aj where m P 1, si 2 var(E) and aj 2 var(A),
• s1.s2. .sm.Sj where m P 1, si 2 var(E) and Sj 2 strvar.
An XTFD’s conclusion has one of the following forms:
• ei = ej where ei, ej 2 var(e) and e 2 E,
• ai = aj where ai, aj 2 var(a) and a 2 A,
• Si = Sj where Si, Sj 2 strvar.
Note that the syntax of an XTFD’s conclusion prohibits us for using two element variables of different element names and two attribute variables of different attribute names on the two sides of the equality symbol. In
addition, since var(E), var(A), and strvar are pairwise disjoint, we can immediately discern the form of a given
XTFD-element by looking at its last variable.
Example 2. Conventionally, we write each XTFD-element of an XTFD above a horizontal line and its
conclusion below that line. Fig. 2 shows five syntactically correct XTFDs and one syntactically incorrect
XTFD. Specifically, XTFD1, XTFD2, XTFD3, XTFD4, and XTFD5 are syntactically correct and XTFD6 is
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
497
Fig. 2. Some sample XTFDs.
syntactically incorrect. XTFD6 is incorrect because Name1, an attribute variable, is not the last variable of its
first XTFD-element. In addition, S1, a text string variable, is on the left-hand side of the equality
symbol = and Name2, an attribute variable, is on the right-hand side. This violates the syntax of an XTFD’s
conclusion.
2.2. Semantic definition
To formalize the meaning of an XTFD, it is necessary to define the extent of each variable in
var(E) [ var(A) [ strvar in the context of a given XML document. Such extents are defined as follows.
Definition 3. Let Doc be a valid finite XML document with respect to a DTD or an XML schema. Let E and
A be the sets of element names and attribute names in Doc respectively. For each ei 2 var(E) where e 2 E, we
define ext(ei) to be the set of e elements in Doc. For each ai 2 var(A) where a 2 A, we define ext(ai) to be the
set of a-attribute values in Doc. For each Si 2 strvar, we define ext(Si) to be the set of text strings in Doc.
ext(root) is defined to be the singleton set that only contains the root element of Doc.
Example 3. In Fig. 1(b), ext(root) = ext(Examplei) = {<Example>}—the singleton set that contains the
root element, ext(Projecti) = {<Project Name="eLibrary">, <Project Name="Web Interface">, <Project Name="T3 Line">} and ext(Personneli) = {<Personnel>—the Personnel element on Line 4, <Personnel>—the Personnel element on Line 10, <Personnel>—the Personnel
element on Line 15}. ext(Employeei), ext(Employeesi), and ext(Depti) can be derived similarly. For the
attribute variables, ext(Namei) = {"eLibrary", "Web Interface", "T3 Line", "Mary", "Tom",
"Jack"} and ext(EIDi) = {"E01", "E02", "E03"}. For the text string variables, ext(Si) = {"Library",
"Web", "Network"}.
It is now possible to define when an XML document violates an XTFD. The idea is to use a function / to
relate the XML document and the XTFD.
Definition 4. Let Doc be a valid finite XML document with respect to a DTD or an XML schema. Let E and
A be the sets of element names and attribute names in Doc respectively. Let V be var(E) [ var(A) [ strvar. A
function / : V ! [v 2V extðvÞ relates an XTFD W over E and A and Doc if
498
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
1. for each variable v 2 V that appears in W, /(v) 2 ext(v),
2. for each XTFD-element in W, /(si+1) is a child element of /(si) in Doc (1 6 i < m),1
3. for each XTFD-element in W that ends with an attribute variable aj, /(aj) is the a-attribute value in the
element /(sm) in Doc,
4. for each XTFD-element in W that ends with a text string variable Sj, /(Sj) is the text string of the element /
(sm) in Doc.
Doc violates W if there is a function / that relates W and Doc such that
1. if W’s conclusion is ei = ej where ei, ej 2 var(e) and e 2 E, then /(ei) 5 /(ej);
2. if W’s conclusion is ai = aj where ai, aj 2 var(a) and a 2 A, then /(ai) 5 /(aj);
3. if W’s conclusion is Si = Sj where Si, Sj 2 strvar, then /(Si) 5 /(Sj).
Otherwise, Doc satisfies W. In addition, Doc vacuously satisfies W if there is no function that relates W and
Doc.2
Example 4. Let Doc be the XML document in Fig. 1(b). Because the ‘‘trick’’ we used for XTFD1 in Fig. 2 will
be used again in this paper (e.g., XTFD2 and XTFD5), we explain XTFD1 thoroughly. First, there are two
Employee variables Employee1 and Employee2. These two variables can be mapped to two (not necessarily distinct) Employee elements. Second, because the attribute variable EID1 appears as the last variables in
both the first and third XTFD-elements in XTFD1 (i.e., Employee1.EID1 and Employee2.EID1 respectively), for any function / that relates XTFD1 and Doc, / must map Employee1 and Employee2 to two
(not necessarily distinct) Employee elements where their EID attributes have the same value. Third, the second and fourth XTFD-elements of XTFD1 are respectively Employee1.Name1 and Employee2.Name2,
which means /(Name1) is the Name-attribute value of the element /(Employee1) and /(Name2) is the
Name-attribute value of the element /(Employee2). Fourth, XTFD1’s conclusion states that /(Name1) = /
(Name2). Collectively, XTFD1 means for any two (not necessarily distinct) Employee elements, if they have
the same EID-attribute value, they must have the same Name-attribute value. We find that this indeed is the
case for Doc and thus Doc satisfies XTFD1.
Doc, however, violates XTFD2 in Fig. 2. To see this violation, we can find a function / such that /
(Employee1) = the Employee element on Line 5, /(Employee2) = the Employee element on Line 22,
/(Dept1) = the Dept element on Line 6, and /(Dept2) = the Dept element on Line 23. Further, /
(EID1) = "E01" and /(S1) = "Library". However, /(Dept1) and /(Dept2) are not the same Dept
element. Likewise, Doc violates XTFD3 in Fig. 2. To see this violation, we simply let /(Name1) = "T3 Line"
and /(Name2) = "Tom". Concerning XTFD4 in Fig. 2, since the element variable root can only be mapped
to the root element, which cannot be a child element of any element, there is no function that can relate Doc
and XTFD4. Therefore, Doc satisfies XTFD4 vacuously. Doc also satisfies XTFD5 in Fig. 2 because for any
two Employee elements, if they have the same EID-attribute value, the text strings of their respective child
Dept elements are identical.
3. Data redundancy
In this section, we prove a necessary and sufficient condition for an XTFD to cause data redundancy in an
XML document. First, we need a formal definition of redundancy. Note that the following redundancy definition is essentially the same as the one we used in [9].
Definition 5. Let Doc be a valid finite XML document with respect to a DTD or an XML schema. Let E and A
be the sets of element names and attribute names in Doc respectively. Suppose that Doc satisfies an XTFD W
1
2
According to Definition 2, sm is the last element variable.
In other words, if we cannot find a ‘‘counter-example’’ for W in Doc, then we cannot say Doc violates W.
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
499
over E and A. A text string/an attribute value str in Doc is redundant with respect to W if str is replaced by a
symbol h that does not appear in Doc, W’s conclusion forces h to be equal to str if Doc is to continue to satisfy W.
In other words, W’s conclusion can be used to recover a redundant text string/attribute value after it is
replaced by a never-before-appeared symbol.
Example 5. Let us consider the XML document in Fig. 1(b) and XTFD1 in Fig. 2. Suppose that we replace
the attribute value "Tom" in the Employee element on Line 11 by h. Let / be a function such that
/(Employee1) is equal to the Employee element on Line 11, /(Employee2) is equal to the Employee
element on Line 16, /(EID1) is equal to "E02", /(Name1) is equal to h and /(Name2) is equal to "Tom". To
satisfy XTFD1, XTFD1’s conclusion forces h to be "Tom". For XTFD5 in Fig. 2, now suppose we replace the
text string "Library" in the Dept element on Line 6 by h. Let / be a function such that /(Employee1) is
equal to the Employee element on Line 5, /(Employee2) is equal to the Employee element on Line 22,
/(Dept1) is equal to the Dept element on Line 6, and /(Dept2) is equal to the Dept element on Line 23.
Further, /(EID1) is equal to "E01", /(S1) is equal to h and /(S2) is equal to "Library". To satisfy XTFD5,
XTFD5’s conclusion forces h to be "Library".
Example 5 leads us to make several remarks. In the following, let Doc be an XML document and let W be
an XTFD.
Remark 1. If Doc has redundant text strings/attribute values with respect to W, then Doc must satisfy W, but
not vacuously. Otherwise, we cannot use W’s conclusion to recover any redundant text string/attribute value
after it is replaced by a never-before-appeared symbol. For example, XTFD2 and XTFD3 in Fig. 2 cannot be
used to recover any text string/attribute value in the XML document in Fig. 1(b) since the XML document in
Fig. 1(b) satisfies none of them. Moreover, because the XML document in Fig. 1(b) satisfies XTFD4
vacuously, we cannot use XTFD4’s conclusion to recover any redundant text string/attribute value either.
Remark 2. Note that by our assumptions in Section 1, every element in Doc is unique and therefore no element can be recovered by using the equality comparison of an XTFD’s conclusion after it is replaced by a
never-before-appeared symbol. Thus, Doc can only have redundant text strings/attribute values, but not
redundant elements. Consequently, if Doc has redundant text strings/attribute values with respect to W,
W’s conclusion cannot be of the form ei = ej.
As in the previous sections, we use typewriter style for element names and element variables. However,
we use italic style for the instances of XML elements in an XML document. For example, ei is an element
variable of an element whose name is e; but ej is a particular e-element instance in an XML document. Thus,
ej 2 ext(ei).
Theorem 1. Let Doc be a valid finite XML document with respect to a DTD or an XML schema. Let E and A be
the sets of element names and attribute names in Doc respectively. A text string/an attribute value str in Doc is
redundant with respect to an XTFD W over E and A if and only if Doc satisfies W but not vacuously, and one of
the following conditions holds:
1. If W’s conclusion has the form ai = aj where ai, aj 2 var(a) and a 2 A, then there must be a function / that
relates W and Doc such that (1) /(ai) is the a-attribute value of an element ei, (2) /(aj) is the a-attribute value
of an element ej, (3) /(ai) = /(aj) and (4) ei 5 ej.
2. If W’s conclusion has the form Si = Sj where Si, Sj 2 strvar, then there must be a function / that relates W and
Doc such that (1) /(Si) is the text string of an element ei, (2) /(Sj) is the text string of an element ej, (3) /
(Si) = /(Sj) and (4) ei 5 ej.
Proof. If Doc satisfies W but not vacuously, and either Condition 1 or Condition 2 holds, then W’s conclusion
can be used to recover a redundant text string/attribute value after it is replaced by a never-before-appeared
symbol if Doc is to continue to satisfy W. For the only-if part, by Remark 1 Doc must satisfy W but not vacuously. By Remark 2 if Doc has redundant text strings/attribute values with respect to W, W cannot have a
500
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
conclusion of the form ei = ej where ei, ej 2 var(e) and e 2 E. Therefore, it suffices to consider the other two
forms. By Definition 5, if W’s conclusion can be used to recover a text string/attribute value str after it is
replaced by a never-before-appeared symbol h, and since W’s conclusion is simply an equality comparison,
it must be the case that str appears at least twice in Doc. Since we disallow Mixed Content, each element
can have at most one text string. Also the XML syntax disallows an element to have two different attributes
of the same attribute name. Hence, there must be two distinct elements ei and ej such that str is a text string or
an attribute value in them. h
4. Redundancy removal
To remove redundant text strings/attribute values caused by an XTFD in an XML document, we focus on
the two different elements ei and ej in Theorem 1 from which a function / ‘‘extracts’’ two identical text strings/
attribute values for the XTFD’s conclusion. Concerning ei and ej, two possibilities arise: (1) one of them is an
ancestor element of the other and (2) neither one is an ancestor element of the other. The principle of inheritance of object-oriented design provides a solution for Case 1. This principle states that descendant objects
inherit attributes from ancestor objects in a generalization hierarchy. In an XML document, every element—except the root element—has a unique parent element [3]. Thus, it is natural to extend this parent-child
relationship to an ancestor-descendant relationship for the elements in an XML document. Therefore, based
on the principle of inheritance, we remove the redundant text strings/attribute values from descendant elements. For Case 2, our heuristics chooses one element from the two alternatives for removing redundant text
strings/attribute values until no more can be found. In Example 6, we present a guideline as to how to make
such a choice. We first inductively define the ancestor-descendant relationship for the elements in an XML
document.
Definition 6. Let ep, eq and er be elements in an XML document. If ep is the parent of eq, then ep is an ancestor
of eq and eq is a descendant of ep. If ep is an ancestor of eq and eq is the parent of er, then ep is also an ancestor of
er and er is also a descendant of ep.
Let Doc be a valid finite XML document with respect to a DTD or an XML schema. Let E and A be the sets
of element names and attribute names in Doc respectively. Suppose that Doc satisfies an XTFD W over E and A
and Doc has redundant text strings/attribute values with respect to W. Let ei and ej be the two distinct elements
in Theorem 1. The following rules remove redundant text strings/attribute values caused by W from Doc.
Attribute Rule: Suppose that W’s conclusion has the form ai = aj where ai, aj 2 var(a) and a 2 A. For Case
1, assume ei is a descendant element of ej. Remove the attribute a along with its value from ei. For Case 2,
remove the attribute a along with its value from ei, but not the one in ej. If ei has no attributes and no child
elements and no text strings left, remove ei entirely. In order to do so, we may have to make the attribute a
an optional attribute and the e element an optional element in the corresponding DTD or XML schema of
Doc.
Text String Rule: Suppose that W’s conclusion has the form Si = Sj where Si, Sj 2 strvar. Because we disallow Mixed Content, Case 1 does not apply. For Case 2, remove the text string of ei, but not the one in ej. If
ei has no attributes left,3 remove ei entirely. In order to do so, we may have to make the e element an
optional element in the corresponding DTD or XML schema of Doc.
Theorem 2. Attribute Rule and Text String Rule are able to remove redundant text strings/attribute values from
XML documents caused by XTFDs.
Proof. Given an XTFD W, based on W’s conclusion we either apply Attribute Rule or Text String Rule. Since
the number of text strings/attribute values strictly decreases in this process, eventually this process will halt
when there are no redundant text strings/attribute values left. h
3
Since we disallow Mixed Content, ei does not have any child elements.
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
<!ELEMENT
<!ELEMENT
<!ATTLIST
<!ELEMENT
<!ELEMENT
<!ATTLIST
Example (Project, Employees)>
Project (Personnel, Project*)>
Project Name CDATA #REQUIRED>
Personnel (Employee*)>
Employee (Dept?)>
Employee EID CDATA #REQUIRED
Name CDATA #IMPLIED>
<!ELEMENT Employees (Employee*)>
<!ELEMENT Dept (#PCDATA)>
(a)
501
<!DOCTYPE Example SYSTEM "project2.dtd">
<Example>
<Project Name="eLibrary">
<Personnel>
<Employee EID="E01"/>
</Personnel>
<Project Name="Web Interface">
<Personnel>
<Employee EID="E02"/>
</Personnel>
</Project>
<Project Name="T3 Line">
<Personnel>
<Employee EID="E02"/>
<Employee EID="E03"/>
</Personnel>
</Project>
</Project>
<Employees>
<Employee EID="E01" Name="Mary">
<Dept>Library</Dept></Employee>
<Employee EID="E02" Name="Tom">
<Dept>Web</Dept></Employee>
<Employee EID="E03" Name="Jack">
<Dept>Network</Dept></Employee>
</Employees>
</Example>
(b)
Fig. 3. An XML document without redundancy.
Example 6. With respect to XTFD1 and XTFD5 in Fig. 2, Fig. 3(b) shows the resulting XML document after
we apply Attribute Rule and Text String Rule to the XML document in Fig. 1(b). Note that since we have to
remove the attribute Name from several Employee elements, we have modified the DTD so that now the
attribute Name is optional in Employee elements (see Fig. 3(a)). Also note that the DTD states that an
Employee element can have an optional child Dept element. Because the child Dept element of the
Employee element on Line 5 has no attributes left after we remove its text string, we remove the Dept element entirely. For Case 2, we have chosen to keep all information of employees together. Thus, we remove
redundant text strings/attribute values from the child Employee elements of the Personnel elements
instead. Now, the child Employee elements of the Personnel elements become ‘‘pointers’’, which point
to the child Employee elements of the Employees element.
5. The tree tuple approach and XTFDs
We discuss the relationship between XTFDs and the FDs specified by using the tree tuple approach [1] in
this section. To begin with, we need to take some definitions from [1].
5.1. Basic definitions
Definition 7. Let D be a DTD and let E and A respectively be the sets of element names and attribute names
of D such that E \ A = ;. Let r 2 E be a special element name such that an r element is always the root
element and it is not a child element of any element. A path of D has one of the following three forms:
1. r.r1.r2. .rm where m P 0 and ri 2 E,4
2. r.r1.r2. .rm.k where m P 0, ri 2 E and k 2 A,
3. r.r1.r2. .rm.S where m P 0, ri 2 E and S is the same symbol in Definition 1 that denotes a text string.
4
Note that we use si’s for element variables and ri’s for element names. Also, r1, r2, . . . , rm in a path are not necessarily distinct. That is,
ri and rj in a path may denote the same element name.
502
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
A path p of D is valid if there is a valid XML document Doc with respect to D such that there is a sequence of
elements e1, e2, , em+1 in Doc where e1 is the root element of Doc, ei is the parent of ei+1 (i 6 m) in Doc, and
the following conditions hold:
1. If p is of Form 1 or Form 2 or Form 3, then e1’s name is r, e2’s name is r1, . . ., and em+1’s name is rm.
2. If p is of Form 2, then additionally the element em+1 has an k attribute.
3. If p is of Form 3, then additionally the element em+1 has a text string.
Note that since an r element must be the root element and it is not a child element of any element, r is always
in the first (and not in any other) position in a valid path. Finally, we let paths(D) be the set of valid paths of D
and EPaths(D) be the set of valid paths of D of Form 1 only. Clearly, EPaths(D) paths(D).
Example 7. Let D be the DTD in Fig. 1(a) and Doc be the XML document in Figure 1(b). With respect to D
and Doc, the sets E and A of Definition 7 are the same as those in Example 1 and the special element name r of
Definition 7 is Example. Note that D is recursive because a project may have subprojects and so on. Therefore, paths(D) is infinite and as such, we cannot exhaustively list all of its members. Nevertheless, let us point
out some members of paths(D). With respect to Doc,
Example,
Example.Employees,
Example.Employees.Employee,
Example.Employees.Employee.Dept
are in EPaths(D). However,
Example.Employees.Employee.EID,
Example.Employees.Employee.Name,
Example.Employees.Employee.Dept.S
are in paths(D) EPaths(D). Note that
Example.Project.Project.Project
is a valid path in paths(D) but to show its membership, we have to find another valid XML document other
than the one in Fig. 1(b).
5.2. Tree tuples
We are now ready to define a tree tuple. Note that our definition of tree tuples is equivalent to but simpler
than the one in [1].
Definition 8. Let D be a DTD and let Doc be a valid finite XML document with respect to D. Let U be the set
of elements, attribute values and text strings in Doc. A tree tuple t is a function from paths(D) to U such that
1. t(r) = the root element of Doc (r is a path with only one element name r in it).
2. Let r.r1.r2. .rm and r.r1.r2. .rm.rm+1 be two paths in EPaths(D). If t(r.r1.r2. .rm) 5 ?,5 then
t(r.r1.r2. .rm. rm+1) is a child rm+1 element of t(r.r1.r2. .rm) in Doc if one exists. Otherwise,
t(r.r1.r2. .rm. rm+1) = ?.
3. Let r.r1.r2. .rm. k be a path in paths(D) EPaths(D). If t(r.r1.r2. .rm) 5 ? and t(r.r1.r2. .rm) has
an k attribute, then t(r.r1.r2. .rm.k) is the value of that k attribute. Otherwise, t(r.r1.r2. .rm. k) is ?.
4. Let r.r1.r2. .rm.S be a path in paths(D) EPaths(D). If t(r.r1.r2. .rm) 5 ? and t(r.r1.r2. .rm) has
a text string, then t(r.r1.r2. .rm.S) is that text string. Otherwise, t(r.r1.r2. .rm.S) is ?.
5
? denotes a null.
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
503
Example 8. Let t be a tree tuple and let us continue Example 7. t assigns the following paths to the following
elements, attribute values and text strings:
t(Example) = <Example>,
t(Example.Employees) = <Employees>,
For the path Example.Employees.Employee, t has three choices: Mary’s Employee element, Tom’s
Employee element and Jack’s Employee element. For this example, let
t(Example.Employees.Employee) = <Employee EID="E01" Name="Mary">,
t(Example.Employees.Employee.EID) = "E01",
t(Example.Employees.Employee.Name) = "Mary",
t(Example.Employees.Employee.Dept) = <Dept>—the child Dept element of <Employee
EID="E01" Name="Mary">,
t(Example.Employees.Employee.Dept.S) = "Library".
Note that we have not listed all possible values of t. To show two more values of t,
t(Example.Project.Personnel.Employee.Dept.S) = "Library" and
t(Example.Project.Project.Project) = ?.
It is also illustrative to calculate the number of different tree tuples in the XML document in Fig. 1(b). Since
there are three choices for assigning values for t(Example.Employees.Employee) (the three different child
Employee elements of the Employees element) and there are three choices for assigning values for
t(Example.Project.Project.Personnel.Employee) (the three different Employee elements which are
descendants of <Project Name="eLibrary">), there are altogether nine different tree tuples in the XML
document in Fig. 1(b).
5.3. FDs based on tree tuples
FDs that are based on tree tuples are defined as follows:
Definition 9. Let D be a DTD and let Doc be a valid finite XML document with respect to D. An FD in [1] has
the form S ! p where S paths(D), S 5 ;, and p 2 paths(D).6 S ! p holds in Doc if for any two tree tuples t1
and t2, if t1(q) = t2(q) 5 ? for each path q 2 S, then it must be the case that t1(p) = t2(p) 5 ? or
t1(p) = t2(p) = ? (see in [1, p. 206]).
Example 9. Consider this FD that is based on tree tuples:
Example.Project.Project.Personnel.Employee.EID !
Example.Project.Project.Personnel.Employee.Name,
which holds in the XML document in Fig. 1(b). It means the EID-attribute values functionally determine the
Name-attribute values of the Employee elements that are descendants of <Project Name="Web Interface"> or <Project Name = "T3 Line">. Note that the XML document in Fig. 1(b) has data redundancy with respect to this FD. To see this, consider two tree tuples t1 and t2 where
t1(Example.Project.Project) = <Project Name="Web Interface"> and t2(Example.Project.Project) = <Project Name="T3 Line">. Both t1 and t2 reach "E02" and "Tom" of two different
Employee elements and thus the data redundancy. Now consider another FD that is based on tree tuples:
Example.Project.Project, Example.Project.Project.Personnel.Employee.EID !
Example.Project.Project.Personnel.Employee.Name.
6
For simplicity and without loss of generality, we assume the right-hand side of an FD in [1] is a single path.
504
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
Because the path Example.Project.Project is on the left-hand side of the FD, it means for any two tree
tuples t1 and t2, they must have the same value on this path. That is, t1(Example.Project.Project) = t2(Example.Project.Project) = <Project Name="Web Interface"> or <Project
Name="T3 Line">. Now, the EID-attribute value of each Employee element of a subproject is unique.
Thus, there is no data redundancy with respect to this FD in the XML document in Fig. 1(b).
5.4. Transforming FDs based on tree tuples to XTFDs
Syntactically, it is easy to transform an FD in [1] to an XTFD. The idea is to simulate a tree tuple by a set of
carefully constructed XTFD-elements.
Algorithm 1
Input: A DTD D and an FD S ! p where S paths(D), S 5 ;, and p 2 paths(D)
Output: An XTFD
1. Let S 0 = S [ {p}.
2. For each path q 2 S 0 , encode q by an XTFD-element as follows: Transform the element name r in q to the
element variable root and transform each other element name e in q to an element variable ei such that ei
only appears once in q’s encoding. If q has an attribute name, transform that attribute name to an attribute
variable. If q has the text string symbol S, transform S to a text string variable. For any two paths
q, q 0 2 S 0 , ensure that the same variables appear precisely in the encoding of the common prefix of q
and q 0 and this condition must hold constantly throughout this algorithm. Call the resulting set of
XTFD-elements X1.
3. Repeat Step 2’s instructions for this step. However, none of the variables other than the element variable
root used in Step 2 can be used again in this step. Call the resulting set of XTFD-elements X2.
4. For each path q 2 S, let vi be the last variable in q’s encoding in X1 and let vj be the last variable in q’s encoding in X2. Replace vj by vi to simulate t1(q) = t2(q) where t1 and t2 are two tree tuples. However, as we
replace vj by vi, we have to ensure that the condition that the same variables appear precisely in the encoding of the common prefix of any two paths in S 0 continues to hold for X2.
5. Lastly, we add a conclusion vi = vj where vi is the last variable in p’s encoding in X1 and vj is the last
variable in p’s encoding in X2.
Example 10. Let us illustrate how to encode one of the FDs in Example 9: p 0 ! p where
p 0 = Example.Project.Project.Personnel.Employee.EID and
p = Example .Project.Project.Personnel.Employee. Name.
At Step 1, we set S 0 = {Example.Project. Project.Personnel.Employee.EID,
Example.Project.Project.Personnel.Employee.Name}.
At Step 2, X1 has the following XTFD-elements:
root.Project1.Project2.Personnel1.Employee1.EID1 (p 0 ’s encoding),
root.Project1.Project2.Personnel1.Employee1.Name1 (p’s encoding).
Note that we use the same variables to encode the common prefix of p 0 and p.
At Step 3, X2 has the following XTFD-elements:
root.Project3.Project4.Personnel2.Employee2.EID2 (p 0 ’s encoding),
root.Project3.Project4.Personnel2.Employee2.Name2 (p’s encoding).
Again, we use the same variables to encode the common prefix of p 0 and p and none of the variables used in
Step 2 is used again in Step 3 except the element variable root.
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
505
At Step 4, we change EID2 to EID1 to simulate
t1(Example.Project.Project.Personnel.Employee.EID) =
t2(Example.Project.Project.Personnel.Employee.EID).
After Step 5, we have formed XTFD7 in Fig. 4. For the other FD in Example 9, the transformed XTFD is
XTFD8 in Fig. 4. The reason for the second Project element variable is Project2 for every XTFD-element of XTFD8 is because we changed Project4 to Project2 in the XTFD-element root.Project3.Project4 in X2. Such a change is necessary because the path Example.Project.Project is
on the left-hand side of the FD and we want to simulate the condition t1(Example.Project.Project) = t2(Example.Project.Project). After that, to maintain the requirement that the same variables
are used to encode the common prefix of any two paths, we have to change Project4 to Project2 in the
XTFD-elements
root.Project3.Project4.Personnel2.Employee2.EID1 and
root.Project3.Project4.Personnel2.Employee2.Name2
in X2 as well and the result is shown in Fig. 4.
5.5. FDs based on tree tuples and XTFDs
Even though we can syntactically transform an FD of [1] to an XTFD, we cannot say XTFDs generalize
the FDs of [1] because they denote different semantics. Example 11 demonstrates what this means.
Example 11. Consider the FD Example.Foo.A ! Example.Foo.B and XTFD9—the output of Algorithm 1
by using Example.Foo.A ! Example.Foo.B as the input.
root.Foo1.A1
root.Foo1.B1
root.Foo2.A1
root.Foo2.B2
B1 = B2
XTFD9
The XML document in Fig. 5 satisfies XTFD9 but it does not satisfy Example.Foo.A ! Example.Foo.B. To
see this, notice that the second Foo element in Fig. 5 does not have an B attribute. Thus, any function / that
relates XTFD9 and the XML document in Fig. 5 must map both Foo1 and Foo2 of XTFD9 to <Foo A="1"
B="2"/> and as such, the conclusion is trivially satisfied. For the FD Example.Foo.A ! Example.Foo.B,
however, we can have two tree tuples t1 and t2 such that
t1(Example.Foo) = <Foo A="1" B="2"/> and
t2(Example.Foo) = <Foo A="1"/>. Therefore,
t1(Example.Foo.A) = "1",
t2(Example.Foo.A) = "1",
t1(Example.Foo.B) = "2" but
t2(Example.Foo.B) = ?. Because t1(Example.Foo.B) 5 t2(Example.Foo.B), the XML document in
Fig. 5 does not satisfy Example.Foo.A ! Example.Foo.B.
Although we cannot say XTFDs generalize the FDs of [1], XTFDs and the FDs of [1] still relate in an interesting way. This relationship is shown by Theorem 3, which makes use of Lemma 1 in its proof.
Lemma 1. Let D be a DTD and let Doc be a valid finite XML document with respect to D. Suppose
S 0 paths(D) and q 2 S 0 . Let X be the resulting set of XTFD-elements of Step 2 in Algorithm 1 with S 0 as the
506
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
Fig. 4. Transformed XTFDs.
<Example>
<Foo A="1" B="2"/>
<Foo A="1"/>
</Example>
Fig. 5. An XML document.
input. Let lastvar(q) be the last variable in q’s encoding. A tree tuple t assigns nonnull values to every path q 2 S 0
if and only if there is a function / that relates X and Doc such that t(q) = /(lastvar(q)).7
Proof. For each path q 2 S 0 , Step 2 encodes q by an XTFD-element such that each variable in q’s encoding is
unique. This condition makes possible the existence of a function that relates q’s encoding and Doc; for if a
variable appears more than once in q’s encoding, then since an element cannot be an ancestor of itself, it is
impossible to have a function that relates q’s encoding and Doc. Now, if there is a tree tuple t that assigns
a nonnull value to q, then by Definition 4, there exists a function / that relates q’s encoding and Doc such
that t(q) = /(lastvar(q)). On the other hand, if there is a function / that relates q’s encoding and Doc and since
? is not part of XTFDs, it is obvious that there is a tree tuple such that t(q) = /(lastvar(q)). Another condition
Step 2 ensures is that for any pair of paths q, q 0 2 S 0 , the same variables are used to encode the common prefix
of q and q 0 . This condition ensures that if a function / that relates X and Doc, then there is a tree tuple t such
that t(q) = /(lastvar(q)) for each path q 2 S 0 . The proof for the converse of this statement is clear and thus is
omitted. h
Theorem 3. Let D be a DTD and let Doc be a valid finite XML document with respect to D. If Doc has data
redundancy with respect to an FD S ! p where S paths(D), S 5 ;, and p 2 paths(D), then Doc has data redundancy with respect to XTFDSp—the output of Algorithm 1 by using S ! p as the input.
7
Technically, we have only defined how a function relates an XTFD and an XML document (see Definition 4). However, it is
straightforward to extend this concept to a set of XTFD-elements and an XML document, and similarly to a particular XTFD-element
and an XML document.
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
507
Proof. If Doc has redundancy with respect to an FD S ! p, then according to Definition 5.1 in [1] (XNF’s
definition), it must be the case that there are two different tree tuples t1 and t2 in Doc such that
t1(q) = t2(q) 5 ? for each path q 2 S, t1(p) = t2(p) 5 ?, and t1(p) and t2(p) are either two attribute values
or two text strings in two different elements in Doc. Since t1 and t2 assign nonnull values to each path in S
and p, and since X1 and X2 of Algorithm 1 simulate two tree tuples, Doc has redundancy with respect to
XTFDSp. h
The converse of Theorem 3, however, is not true. Example 12 serves as a counter-example.
Example 12. Recall that the XML document in Fig. 1(b) satisfies XTFD1 in Fig. 2. In fact, the XML
document in Fig. 1(b) also has data redundancy with respect to XTFD1. Consider the two Employee
elements <Employee EID="E01" Name="Mary"> and <Employee EID="E01" Name="Mary"> where
one is a child of the first <Personnel> and the other is a child of <Employees>. A function / can map
Employee1 to one and Employee2 to the other and thus the data redundancy. However, there does not exist
any FD using the tree tuple approach for XTFD1. By Definition 9, an FD of [1] has the form S ! p where
S paths(D), S 5 ;, and p 2 paths(D), which implies it can only compare two text strings/attribute values on
the same path. These two Employee elements, however, are on different paths and therefore any FD of [1]
cannot compare.
We conclude this section by stating that:
1. If an XML document has data redundancy with respect to an FD of [1], there exists an XTFD such that the
XML document has data redundancy with respect to that XTFD as well (by Algorithm 1 and Theorem 3).
2. There are some XML documents that have data redundancy with respect to XTFDs but they have no data
redundancy with respect to any FD of [1] (see Example 12).
3. Consequently, an XML document does not have data redundancy with respect to XTFDs means it does
not have data redundancy with respect to the FDs of [1].
6. Related and future work
This section comments on some related work and presents preliminary ideas to address the deficiencies of
XTFDs. More refined results will be reported in future papers, however.
6.1. Mixed content
Many proposals for specifying FDs in XML documents have recently appeared. [1] and [14] are two papers
in this area. [2] is more concerned with defining absolute and relative keys in XML documents—a closely
related subject. However, the proposal of [2] makes use of XPath expressions [15], which cannot be simulated
by variables. [6] is another attempt to specify FDs in XML documents. Like [2], it also makes use of XPath
expressions. [4] takes a different approach. Instead of using ‘‘paths’’, it uses a subgraph-based approach which
allows them to cope with a wider class of FDs.
All of these proposals have their relative merits. Like XTFDs, neither the approach in [1] nor the one in [14]
can handle Mixed Content. The reason is that there is no mechanism in these FD proposals to distinguish
different text strings and elements nested within another element. In other words, there is no way in these proposals to specify a specific text string, or element, nested within another element. To address this issue, currently we are incorporating XPath expressions with variables to create a more expressive constraint. [8] is an
early report of our effort. As an example, consider the E1 element in Section 1, namely <E1>text1<E2/
>text2</E1>. In order to refer to the second text string text2, we adopt the syntax of XPath expressions.
Specifically, an index, or a position, can be optionally associated with each variable. Thus, E13.S[2]5—an
XTFD-element in this new syntax—is allowed. E13, as before, can be mapped to the E1 element. However,
S[2]5, as opposed to S5, can only be mapped to the second text string, but not the first one. Thus, the syntax
of XPath expressions provides a way to solve this problem.
508
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
6.2. Multivalued dependencies (MVDs)
Proposals for specifying MVDs in XML documents have also appeared. Since we do not consider MVDs in
this paper, these proposals are beyond the scope of our investigation. However, we are also currently extending our approach so that we can express MVDs in XML documents as well. Notably, [5,11–13] are some significant researches in this area.
Here, we present an preliminary idea on extending XTFDs to handle MVDs, which is also inspired by [10].
We believe our proposal is more intuitive and easier to understand than the current ones in the literature. For
example, to specify that all subprojects must employ the same group of employees in the XML document in
Fig. 1(b), the following XTFD suffices:
root.Project1.Project2.Personnel1.Employee1.EID1
root.Project3.Project4
root.Project3.Project4.Personnel2.Employee2.EID1
XTFD10
Note that XTFD10’s conclusion is not an equality comparison, but it is an XTFD-element itself. What this
XTFD means is that if there is a function that relates XTFD10’s hypothesis and an XML document, then the
same function must also relate XTFD10’s conclusion and the XML document. We note that the XML document in Fig. 1(b) does not satisfy XTFD10 because two different groups of employees work on the two different subprojects.
6.3. FD implication
In the relational database theory [7], a set F of FDs implies an FD X ! Y means for any relation r that
satisfies F, r also satisfies X ! Y. This definition also immediately applies to XTFDs. Thus, a set X of XTFDs
implies an XTFD W if for any valid XML document Doc (with respect to a DTD or an XML schema) that
satisfies X, Doc also satisfies W. For example, the following XTFD is equivalent to XTFD8 in Fig. 4.
root.Project1.Project2.Personnel1.Employee1.EID1
root.Project1.Project2.Personnel1.Employee1.Name1
root.Project3.Project2.Personnel2.Employee2.EID1
root.Project3.Project2.Personnel2.Employee2.Name2
Name1 = Name2
XTFD11
This example, of course, is quite simple. However, the chase algorithm of the relational database theory [7]
also provides a hint on solving this implication problem. Basically, to test if a set X of XTFDs implies an
XTFD W, we set up a tableau T for W’s hypothesis and run the XTFDs in X on T. Then, X implies W if
and only if the two variables of W’s conclusion are made equal during the chase. We shall report more refined
results in a future paper.
6.4. Normalization
Normalization is a well-studied topic in the relational database theory. Both [1] and [14] provide normal forms based on their FD proposals. By Theorem 1, it is quite straightforward to define a normal
form based on XTFDs. The idea is to make sure the two elements ei and ej of Theorem 1 are actually
the same element. This approach is also used for XNF’s definition in [1]. An example can illustrate what
we mean by this. Given a DTD or an XML schema, for any valid XML document Doc, if XTFD1 implies
XTFD12, then Doc will have no data redundancy with respect to XTFD1, where XTFD12 is presented
below.
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
509
Employee1.EID1
Employee1.Name1
Employee2.EID1
Employee2.Name2
Employee1 = Employee2
XTFD12
Of course, such an implication problem must consult the underlying DTD or the XML schema.
7. Conclusions
The contributions of this paper are summarized as follows:
• We have defined XTFDs—a new constraint for XML documents that can specify FDs in XML documents.
XTFDs are based on simple concepts like variables and functions.
• We have given a necessary and sufficient condition for an XTFD to cause data redundancy in XML
documents.
• We have presented two procedures—Attribute Rule and Text String Rule—that can be repeatedly applied
in an XML document to remove data redundancy caused by XTFDs.
• We have proved that if an XML document has data redundancy with respect to an FD of [1], it has data
redundancy with respect to an XTFD and show by example that the converse of this statement is not true.
Consequently, if an XML document does not have data redundancy with respect to XTFDs, it will not
have data redundancy with respect to the FDs of [1] either.
Acknowledgements
W.Y. Mok would like to thank the anonymous reviewers for their helpful suggestions. He was supported in
part by the Richard A. Witmondt Faculty Fellowship and a UAH Research Mini-Grant.
References
[1] Marcelo Arenas, Leonid Libkin, A normal form for XML documents, ACM Trans. Database Syst. 29 (1) (2004) 195–232.
[2] Peter Buneman, Susan B. Davidson, Wenfei Fan, Carmen S. Hara, Wang Chiew Tan, Reasoning about keys for XML, Inf. Syst. 28
(8) (2003) 1037–1063.
[3] Patrick Carey, New Perspective on XML-Comprehensive, Course Technology, 2004.
[4] Sven Hartmann, Sebastian Link, More functional dependencies for XML, in: ADBIS, vol. 2798, Lecture Notes in Computer Science,
2003, pp. 355–369.
[5] Sven Hartmann, Sebastian Link, Multi-valued dependencies in the presence of lists, in: PODS, 2004, pp. 330–341.
[6] Mong-Li Lee, Tok Wang Ling, Wai Lup Low, Designing functional dependencies for XML, in: EDBT, vol. 2287, Lecture Notes in
Computer Science, 2002, pp. 124–141.
[7] David Maier, The Theory of Relational Databases, Computer Science Press, 1983.
[8] Wai Yin Mok, XTFDs: FDs for XML documents, in: AMCIS, 2005, pp. 3054–3058.
[9] Wai Yin Mok, Yiu-Kai Ng, David W. Embley, A normal form for precisely characterizing redundancy in nested relations, ACM
Trans. Database Syst. 21 (1) (1996) 77–106.
[10] Fereidoon Sadri, Jeffrey D. Ullman, Template dependencies: a large class of dependencies in relational databases and its complete
axiomatization, J. ACM 29 (2) (1982) 363–372.
[11] Lawrence V. Saxton, Xiqun Tang, Tree multivalued dependencies for XML datasets, in: WAIM, vol. 3129, Lecture Notes in
Computer Science, 2004, pp. 357–367.
[12] Millist W. Vincent, Jixue Liu, Multivalued dependencies and a 4NF for XML, in: CAiSE, vol. 2681, Lecture Notes in Computer
Science, 2003, pp. 14–29.
[13] Millist W. Vincent, Jixue Liu, Chengfei Liu, A redundancy free 4NF for XML, in: Xsym, vol. 2824, Lecture Notes in Computer
Science, 2003, pp. 254–266.
[14] Millist W. Vincent, Jixue Liu, Chengfei Liu, Strong functional dependencies and their application to normal forms in XML, ACM
Trans. Database Syst. 29 (3) (2004) 445–462.
[15] W3C. XML path language (XPath) version 1.0. November 1999. Available from: <http://www.w3.org/TR/xpath>.
510
W.Y. Mok / Data & Knowledge Engineering 60 (2007) 494–510
[16] W3C. Extensible markup language (XML) 1.0, third edition, February 2004. Available from: <http://www.w3.org/TR/REC-xml/>.
[17] W3C. XML schema part 0: Primer, second edition, October 2004. Available from: <http://www.w3.org/TR/xmlschema-0/>.
Wai Yin Mok is an Associate Professor of Information Systems at the University of Alabama in Huntsville. He
received his B.S., M.S., and Ph.D. degrees in Computer Science from Brigham Young University in 1990, 1992,
and 1996 respectively. His papers appear in ACM Transactions on Database Systems, IEEE Transactions on
Knowledge and Data Engineering, Data & Knowledge Engineering, Decision Support Systems, Information Processing Letters, and Journal of Database Management. He is currently on the editorial review boards of Journal of
Database Management and Informing Science. He has been program committee members of several international
conferences including ER2003 and ER2006 and track chairs for several Information Systems Conferences.