REVIEW - DBGroup

advertisement
We made some minor changes (some of these to accomplish reviewer requirements).
In particular:
Introduction:
- we move figure 1 to section 1
- page 3: annotations of the GVV: we extract from the bullet list the
definition of annotation
Section 1
- section 1.3: the first step of the ontology development --> the first step
of the ontology building
- section 1.5: we introduced footnote 5, we changed schema-derived
relationships explanation (pg. 9) and we changed the representation of
relationships producer in the example (pg. 9)
Section 3
- Supporting the evolution of an ontology represents a challenging issue (to
be faced). --> Supporting the evolution of an ontology represents a
challenging issue to be faced.
Following pages include detailed answers to the reviewer.
REVIEW
******************************
Reviewer: 3
A Reader Interest
1. Which category describes this manuscript?
Application
2. How relevant is this manuscript to the readers of this periodical?
Please explain your rating.
Relevant
B.
1. Please summarize what you view as the key point(s) of the manuscript and
the importance of this content
to the readers of this periodical.
The paper presents an approach to use annotations of local database schemas
in conjunction with a common thesaurus to generate a global virtual view along
with associated annotations and extension of a built up ontology by addition of
another source.
2. Is the manuscript technically sound? Please explain your answer.
Appears to be - but didn't check completely
C. Presentation
1. Are the title, abstract, and keywords appropriate? Please comment.
In the revised paper, they are more appropriate.
2. Does the manuscript contain sufficient and appropriate reference? Please
comment.
Important references are missing: more references are needed
3.Does the introduction state the objective of the manuscript in terms that
encourage the reader to read on?
Please explain your answer.
Yes. Better than before
4. How would you rate the organization of the manuscript?
Is it focused? Is the length appropriate?
Satisfactory
5. Please rate and comment on the readability of this manuscript.
Easy to Read
Section II. Summary and Recommendation
A. Evaluation
Fair
B. Recommendation
Please make your recommendation and explain your decision.
I would recommend a revision of the paper addressing the issues mentioned
below.
Section III. Detailed Comments
A. Public Comments (these will be made available to the author)
The authors have described a high level process of creating a virtual view
by using annotations in conjunction with a lexical thesaurus. However
important details appear to be missing:
- In the case of annotations of attributes, what about the annotations of
the values of those attributes... how will the common thesaurus help in this
case?
In your response you have mentioned that you focus only on the element
names and not the values. It is immaterial whether you have covered it in
another paper or not. The point here being, that annotations of the names and
the values will give you better help in understanding the domains and ranges of
the classes/attributes and increase the quality of the schemas generated.
If you do not believe that much value is added, please give convincing
arguments to the same.
We agree with your opinion: analyzing values of attributes we are helped in
understanding the domains and ranges of the classes/attributes. For example,
domains and ranges of the attributes are properly specified in the schema of
relational databases, and these descriptions are translated into the internal
ODLI3 language and taken into account in the integration process.
We suppose that the annotation phase is made from a source expert. For this
reason, we are considering to move the annotation phase from "a global level" to
a "local level" associating it to the wrapping phase. In this way, when a new
source has to be involved, a wrapper has to be placed to manage the source and
an expert has to annotate the source.
A preliminary work on this topic will be published in International Journal of
Web Engineering and Technology (IJWET) ISSN : 1476-1289.
Moreover, in that particular case, whereas we consider only HTML sources that
are translated by “commercial” wrappers into XML and DTD files (that are
directly managed by our wrapper), ranges and domains are not relevant due to the
fact that DTD does not manage this kind of information.
Mettere discorso con domain expert + annotazione su wrapper IJWET
- Local schema constraints such as key and integrity constraints also can
play an important role in the process. This has not been explored.
You have explained it in the response. Please include it in the paper, it
will add to the completeness of the discussion.
DONE (see section 1.5).
- The Common Thesarus generation process needs to be described in more
detail:
- How are the schema relationships analyzed and derived?
Your response discussed the use of primary and foreign keys to deduce
BT/NT relationships.
Actually you can deduce subclass of relationships, much stronger than
BT/NT. Please include that discussion in your paper.
We included in section 1.5 a discussion about schema-derived relationships in
XML data files.
Your consideration about primary and foreign keys is right and involves
relational databases that are not the focus of our paper. For this reason we
prefer to include this discussion as a footnote.
Regarding subclass of relationships much stronger than BT/NT, other papers
related to our integration methodology take into account intensional
(terminological relationships, with no implications on the extension of the
classes) and extensional relationships (with implications on the extension of
the classes). In particular, primary and foreign keys produce BT/NT
relationships both intensional and extensional.
See for more details [5] and :
D. Beneventano, S. Bergamaschi, F. Mandreoli: "Extensional Knowledge for
semantic query optimization in a mediator based system", International
Workshop on Foundations of Models for Information Integration (FMII-2001),
Viterbo, Italy, 16-18 Semptember, 2001.
For the goal of this paper, the distinction is not relevant; for this reason we
preferred to omit this specification and we considered only intensional
relationships, called in the paper simply relationships.
- How are DLs used to infer new relationships? Do you interpret
hypernymy/hyponymy using the subclass relationship? Does it not generate
spurious relationships?
Seems to me that there are two sources of generating the BT/NT
relationships. One is the intra-schema key constraints in which case they can be
represented using the DL subsumption operation.
The other source is the lexical ontology (WordNet) itself. You have not
displayed any examples of it, but there could be BT/NT relationships thrown up
which are NOT subclass of relationships.
Do you have approaches of avoiding this happening manually? What impact
will it have on the quality of the GVV in the above case?
You need to discuss these issues clearly and convincingly.
Some relationships are derived directly from the lexical Ontology (WordNet), for
example (see section 1.4), since the meaning assigned to Article is an hyponymy
of the meaning assigned to Publication, the tool derives the following lexicon
relationship:
UNI.Article NT CS.Publication
This relationship is inserted in the Common Thesaurus and participates to the
clustering phase also if it is not a subclass relationships; in other words,
lexicon-derived relationships are established independently from the involved
class structures. In the clustering phase, the classes UNI.Article and
CS.Publication are placed, by the tool, in the same cluster and the generated
Global Class (represented in table 4) takes into account their different
structure by means of the mapping tables.
As we shown in [5], ODLI3 relationships are translated into OLCD descriptions in
order to perform inference task typical of Description Logics. New relationships
are inferred by using the OLCD subsumption algorithm.
A further phase, the relationships’ validation exploits OLCD to validate
relationships between attributes in the Common Thesaurus and to delete spurious
relationships. The validation is based on the compatibility of domain associated
with attributes, distinguishing valid and invalid relationships.
- In the GVV generation process, what if there are multiple BT terms that
are not comparable to each other (assuming a lattice structure of the thesaurus)
Your response indicates that you chose a union of the BTs. What you need
to do is to consider the least upper bound (or most common ancestor). Explain
why choosing the union works well in most cases.
Our previous answer,
"If they are not comparable BT terms that are associated to a global class, we
consider the union of these terms (see example of global class GC1 in section
2.1)"
was referred to the GCB definition in section 2.1
Defining GCB (in order to semi-automatically associate an annotation to the
global class GC) we have not considered the least upper bound since we want to
consider the set of "broadest" local classes that belong to GC.
Let us consider the following example:
Local Classes = {L1, L2, L3, L4}
Common Thesaurus = {(L2 NT L1), (L3 NT L2), (L4 NT L2)}
Global Classes:
GC1 = {L1,L2}
GC2 = {L3,L4}
In this case, the least upper bound of GC2= {L3, L4} is L2 but we can not
consider the L2 annotation, because L2 belongs to a different Global Class
(GC1).
************************************
End of Review
Download