List of Figures

advertisement
XML Schema Change Detection,
Versioning and Merging
Submitted by:
Abdullah Mohammad Baqasah
M.S. (Information technology)
B.S. (Computer Science)
A thesis submitted in total fulfilment of the requirements for the degree of
Doctor of Philosophy
School of Engineering and Mathematical Science
College of Science, Health and Engineering
La Trobe University
Bundoora, Victoria, 3086
Australia
January 2015
Content
Content .................................................................................................................................. i
List of Figures ........................................................................................................................ii
List of Tables ........................................................................................................................ iii
Abstract.................................................................................................................................iv
Statement of Authorship........................................................................................................ v
Contributions and Thesis Outcomes .....................................................................................vi
Acknowledgements .............................................................................................................. vii
Chapter 1:
1.1.
Introduction ...................................................................................................... 1
Background............................................................................................................. 1
1.1.1.
XML Schema Popularity .................................................................................. 1
1.1.2.
XML Schema and its Components ................................................................... 1
1.2.
Motivation ............................................................................................................... 2
Chapter 2:
Monitoring the Design of XML Schema Versions: A Survey ............................. 4
Chapter 3:
Conclusion ....................................................................................................... 6
References ........................................................................................................................... 7
Page i
List of Figures
Figure 1.1 XML Schema component diagram ....................................................................... 2
Figure 2.1 XML Schema trees T1 and T2.............................................................................. 5
Page ii
List of Tables
Table 1.1 Schema changes................................................................................................... 3
Page iii
Abstract
The eXtensible Markup Language (XML) is a meta-language that is widely used to
provide a non-proprietary universal format for sharing hierarchical data across different
software systems and application domains. Specifications and standards used in these
domains are defined by XML Schema Definition Language (XSD).
XML Schema standards tend to change over time for a multitude of reasons, such as
the introduction of new requirements and the correction of errors in the initial design.
Moreover, complex structures (e.g., complexType in XML Schema) are created but not fully
configured leaving a space for future extension or restriction. Indeed, each standard ends up
with different versions of the same schema. In this context, managing different versions of the
schema and their version changes can be critical for optimal functioning of the XML-based
application.
The work in this thesis is devoted to addressing the three topics of XML Schema change
detection, versioning and merging along with related issues, such as the appropriate design
of the change model (delta) used to store XSD changes, the storage technique for XSD
versions, the methodology of inserting, retrieving, comparing, and merging XSD versions.
Page iv
Statement of Authorship
Except where reference is made in the text of the thesis, this thesis contains no material
published elsewhere or extracted in whole or in part from a thesis submitted for the award of
any other degree or diploma.
No other person’s work has been used without due acknowledgment in the main text of
the thesis.
This thesis has not been submitted for the award of any degree or diploma in any other
tertiary institution.
I declare that the research in this thesis is my own original work during my PhD
candidature under the supervision of members of the advisory panel, i.e., Dr. Eric Pardede
(main supervisor), Prof. Wenny Rahayu (co-supervisor), except where otherwise
acknowledged in the text.
Abdullah Baqasah
Page v
Contributions and Thesis Outcomes
Journals
1. Baqasah, A., Pardede, E., Holubova, I., and Rahayu, W. (2015) XS-Diff: XML
Schema Change Detection Algorithm, International Journal of Web and Grid
Services (IJWGS)
2. Baqasah, A., Pardede, E., and Rahayu, W. (2015) Maintaining Schema Versions
Compatibility in Cloud Applications Collaborative Framework, World Wide Web
Journal (WWWJ)
Conferences
1. Baqasah, A., Pardede, E., Holubova, I., and Rahayu, W. (2013) On Change
Detection of XML Schemas, In 2013 IEEE 12th International Conference on
Trust,
Security
and
Privacy
in
Computing
and
Communications
(TrustCom2013), pp. 974-982
2. Baqasah, A., Pardede, E., and Rahayu, W. (2014) XSM - A Tracking System for
XML Schema Versions, In 2014 IEEE 28th International Conference on
Advanced Information Networking and Applications (AINA2014), pp. 1081-1088
3. Baqasah, A., Pardede, E., and Rahayu, W. (2014) A New Approach for
Meaningful XML Schema Merging, In the 16th International Conference on
Information Integration and Web-based Applications & Services (iiWAS2014)
Page vi
Acknowledgements
First of all, praise and glory be to Allah the Almighty God who gave me the strength to
complete this research. Next, my sincere thanks go to my first supervisor, Dr. Eric Pardede,
for his unwavering support throughout my PhD candidature, and for his insightful suggestions
in shaping this thesis. My special thanks also go to my second supervisor Prof. Wenny
Rahayu, for her expert guidance and extraordinary effort in ensuring that the thesis met
appropriate scholarly standards.
Page vii
Ch. 1. Introduction
Chapter 1:
Introduction
Section 1.1. of this chapter gives a general background.
With the number of applications, users, and corporations in various industries rising
every day, the amount of data received, stored, processed, and interchanged grows as well.
In addition to storing the data in relational database systems, it can be represented by the
eXtensible Markup Language (XML) format [W3C, 2008]. The structure of the exchanged XML
data can differ not only between corporations, but even among departments of a particular
corporation. For instance, if one corporation receives data from its partner, it has to convert it
to its own format. The data conversion in this case usually involves overheads in terms of time
and resources to convert the data. These overheads can be minimised by the creation of
unified standards for that exchange.
1.1.
Background
1.1.1. XML Schema Popularity
With the growing popularity of XML technology, combined with certain shortcomings of
DTDs, a large number of alternative schema languages is proposed, such as RELAX NG
[Clark & Makoto, 2001], Schematron [Jelliffe, 2000], and XML Schema (XSD) [W3C, 2004a].
A study conducted by [Grijzenhout & Marx, 2013] to analyse the quality of XML web shows
that 13,950 valid XML documents reference XSDs compared to only 2,046 valid XML
documents that reference DTDs. The study indicates that documents referencing an XSD are
more reliable than those referencing a DTD because 86.3% of the downloadable XSDs are
compiled (i.e., all referenced schemas can be retrieved and are syntactically correct)
compared to 29.8% of the downloadable DTDs. Given this comparison, XML Schema seems
to be the most accepted schema language.
1.1.2. XML Schema and its Components
The XML Schema 1.0 specifications were developed as an official recommendation by
the World Wide Web Consortium (W3C) in 2001. The specification was then revised in a
second edition in 2004, and in 2012, version 1.1 of the specification became official. This new
version contains several significant improvements as well as many small changes. The
schema specification consists of three parts. For more information see Figure 1.1.
Page 1
Ch. 1. Introduction
Schema
Figure 1.1 XML Schema component diagram
1.2.
Motivation
We justify the objective of the thesis by looking at different use cases where XML
Schema change control is required. These use cases discussed by W3C working group1 are
based on real examples submitted by users of XML Schema. The analyses of use cases below
describe how XML Schema validators behave when they receive different XML Schemas and
instances corresponding to them. In our thesis, we aim to provide a new methodology for
better controlling of XML Schemas and their development process. Therefore, the use cases
are devised so as to examine different issues that one may come across when versioning,
differencing, and merging XML Schemas. Each use case corresponds to one or more of the
above XML Schema issues (i.e., differencing, versioning, or merging) and explains its
requirements with examples. To navigate schema changes, see Table 1.1.
1
Available at: http://www.w3.org/XML/2005/xsd-versioning-use-cases.html
Page 2
Ch. 1. Introduction
Table 1.1 Schema changes
No.
Change type
Affected node/s
Changes from Vb to V1
1
update
unit elements under value1 and
value2 elements
2
insert
Two local simple types of unit
elements
3
insert
Two enumerations with value
‘mmHg’
Changes from V1 to V2
4
update
magnitude elements under value1
and value2 elements
5
insert
Two local simple types of
magnitude elements
6
insert
minInclusive
and
maxInclusive facets of the first
magnitude element
7
insert
minInclusive
and
maxInclusive facets of the second
magnitude element
Change description
Remove the built-in string data type from both
unit elements.
Insert two local simple types with restrictions under
the declaration of the two unit elements.
Insert one enumeration as a restriction facet to the
two inserted simple types in the previous operation.
Remove the built-in decimal data type from both
magnitude elements.
Insert two local simple types with restrictions under
the declaration of the two magnitude elements.
Insert minInclusive and maxInclusive facets
with values ‘90’ and ‘140’, respectively, to the
inserted simple type of the first magnitude element.
Insert minInclusive and maxInclusive facets
with values ‘60’ and ‘90’, respectively, to the
inserted simple type of the second magnitude
element.
Page 3
Ch. 2. Monitoring the Design of XML Schema Versions: A Survey
Chapter 2:
Monitoring the Design of XML Schema
Versions: A Survey
In this chapter, we conduct a thorough literature review discussing prior research on
topics related to this thesis. Section Error! Reference source not found. presents existing
work on detecting XML Schema changes. The first section investigates existing effort on
change detection for hierarchically-structured data, XML documents, and XML Schemas as a
basis to control XML Schema versions and to store deltas. It then discusses works related to
XML Schema evolution to identify changes that need to be maintained in the schema. Section
Error! Reference source not found. discusses existing work in XML Schema versioning.
First, it studies existing DBMS tools that support versioning (Section Error! Reference source
not found.). Then, it shows past research efforts to version both XML documents (Section
Error! Reference source not found.) and XML schemas (Section Error! Reference source
not found.) followed by a discussion of related work on XML Schema quality analysis as a
subtask in the versioning process (Section Error! Reference source not found.). In Section
Error! Reference source not found., we study the versioning functionality in terms of the
existing approaches of merging XML versions, and related issues such as conflict resolution
and patching. The aim of this survey is to investigate the current state of research on XML
Schema versioning and merging, and to highlight the issues that remain outstanding. Finally,
we conclude the chapter in Section Error! Reference source not found..
Page 4
Ch. 2. Monitoring the Design of XML Schema Versions: A Survey
street
commen t
7
seq
20
seq
20
seq
40
minIn
44 45
0
2
3
pattern
14
maxEx
1
4
trackID
local-to-global
0
billTo
14
4
2
46 47
43
seq
shipTo
13
[ST]
maxEx
3
1
orderID
2
shipDate
USPrice
13
[ST]
1
co mmen t
quantity
productName
0
4
0
42
[CT]
48 30 32 31 33
zip
11 12 15 16 17
Local order
23 24
34 50
city
4
partNum
3
41
18
28
seq
25
state
0
9
[CT]
49
pNumType
country
8
item
27
USAddress
street
zip
3
7
seq
10
seq
2
19
POType
fullname
state
1
6
Items
orderDate
city
0
5
items
33
37
39
comment
32
35
3
deliveryInfo
31
34
38
pattern
30
street
3
29
name
2
orderID
1
orderDate
17
0
28
seq
26
36
SKU
shipDate
2
16
co mmen t
1
24
items
15
USPrice
quantity
productName
0
12
23
18
partNum
11
22
comment
10
seq
21
billTo
0
9
[CT]
25
shipTo
8
item
27
USAddress
commen t
pOrder
19
POType
pOrder
country
6
Items
productName
5
1
schema
orderDate
4
1
schema
countryCode
3
T2
country
2
T1
1
local-to-global
global-to-local
global-to-local
Legend
migrated/moved node
deleted node
inserted node
updated node
migration type
Figure 2.1 XML Schema trees T1 and T2
Page 5
Ch. 3. Conclusion
Chapter 3:
Conclusion
This final chapter of the thesis concludes this study with a summary of the research
conducted in this study and a reflection on avenues for future research. The first part of the
chapter outlines the work detailed in the previous chapters of the thesis. The second part turns
to suggestions on how the research in the area of XML Schema version management,
differencing, and merging can be extended.
Page 6
References
1.
2.
3.
Page 7
Download