Adaptive Reasoning for ScienceWeb

advertisement
Adaptive Reasoning for ScienceWeb
Literature Review and Dissertation Research Proposal
by
Hui Shi
Department of Computer Science
May, 2011
Table of Contents
1.
Introduction ........................................................................................................................................................4
1.1
ScienceWeb ................................................................................................................................................4
1.2
Proposed Work ...........................................................................................................................................5
2.
Objectives ...........................................................................................................................................................6
3.
Literature Review ...............................................................................................................................................7
3.1
3.1.1
Digital Libraries ...................................................................................................................................7
3.1.2
Science-related Knowledge Bases ......................................................................................................7
3.1.3
Ontologies...........................................................................................................................................8
3.1.4
Sources of Information .......................................................................................................................9
3.2
4.
Related Areas..............................................................................................................................................7
Thesis Areas ................................................................................................................................................9
3.2.1
Description Logic ................................................................................................................................9
3.2.2
Inference Methods .......................................................................................................................... 10
3.2.3
Combination Method of Ontologies and Rules ............................................................................... 11
3.2.4
Scalability of Semantic Store ........................................................................................................... 14
3.2.5
Benchmarks ..................................................................................................................................... 17
3.2.6
Summary of Major Reasoning Systems ........................................................................................... 18
3.2.7
Limitations of Existing Research ...................................................................................................... 19
Preliminary Work ............................................................................................................................................. 20
4.1
Architecture of ScienceWeb .................................................................................................................... 20
4.2
Ontology Development and Synthetic Data Generator .......................................................................... 23
4.2.1
Ontology Development ................................................................................................................... 23
4.2.2
Synthetic Data generator ................................................................................................................ 24
4.3
Preliminary Benchmark Study on LUBM and UOBM............................................................................... 24
4.3.1
Experiments on Two Classic Benchmarks........................................................................................ 24
4.3.2
Summary.......................................................................................................................................... 26
4.4
Performance Study .................................................................................................................................. 26
4.4.1
Custom Rule Sets and Queries ........................................................................................................ 27
4.4.2
Comparison on the Full Rule Sets .................................................................................................... 27
4.4.3
Comparison among Systems on Transitive Rule ............................................................................. 29
2
4.4.4
4.5
5.
Formative Study for ScienceWeb ............................................................................................................ 30
Research Plan .................................................................................................................................................. 32
5.1
Proposed Work--- Adaptive Reasoning for ScienceWeb ......................................................................... 32
5.1.1
Architecture ..................................................................................................................................... 32
5.1.2
Adaptive Reasoning Mechanism ..................................................................................................... 38
5.1.3
Knowledge base............................................................................................................................... 40
5.1.4
Combination of DL and Rule-based Reasoners ............................................................................... 41
5.2
Evaluation ................................................................................................................................................ 43
5.2.1
Test Bed ........................................................................................................................................... 43
5.2.2
Formative Evaluation ....................................................................................................................... 44
5.2.3
Summative Evaluation ..................................................................................................................... 44
5.3
6.
Summary.......................................................................................................................................... 30
Schedule .................................................................................................................................................. 46
Contributions ................................................................................................................................................... 46
Table of Figures
Figure 1 Architecture of ScienceWeb ...................................................................................................................... 20
Figure 2 Evolution of ScienceWeb ........................................................................................................................... 21
Figure 3 Class tree of research community ontology .............................................................................................. 23
Figure 4 Query processing time of query 1 for ScienceWeb dataset ...................................................................... 28
Figure 5 Setup time for transitive rule .................................................................................................................... 29
Figure 6 Query processing time after inference over transitive rule ...................................................................... 30
Figure 7 Architecture of an Adaptive Reasoning System ........................................................................................ 33
Table of Tables
Table 1 General comparison among ontology reasoning systems.......................................................................... 19
Table 2 Query performance on LUBM(0,1) ............................................................................................................. 25
Table 3 Query performance on UOBM((DL-1)......................................................................................................... 26
Table 4 Completeness and soundness of query on UOBM((DL-1) .......................................................................... 26
Table 5 Caching ratios between processing time of single query and average processing time on ScienceWeb
ontology for query 1 ................................................................................................................................................ 29
Table 6 Comparison of different semantic reasoning systems ............................................................................... 30
Table 7 Comparison between changes to two reasoning process after changes ................................................... 37
Table 8 Schedule of tasks ........................................................................................................................................ 46
3
1.
Introduction
Consider a potential chemistry Ph.D. student who is trying to find out what the emerging areas are that
have good academic job prospects. What are the schools and who are the professors doing
groundbreaking research in this area? What are the good funded research projects in this area? Consider
a faculty member who might ask, “Is my record good enough to be tenured at my school? At another
school?” Similarly consider a NSF program manager who would like to identify emerging research areas
in mathematics that are not being currently supported by NSF. It is possible for these people each to
mine this information from the Web. However, it may take a considerable effort and time, and even then
the information may not be complete, may be partially incorrect, and would reflect an individual
perspective for qualitative judgments. Thus, the efforts of the individuals neither take advantage of nor
contribute to others’ efforts to reuse the data, the queries, and the methods used to find the data.
Qualitative descriptors such as “groundbreaking research in data mining” are likely to be accepted as
meaningful if they represent a consensus of an appropriate subset of the community at large. Once
accepted as meaningful, these descriptors can be realized in a system and made available for use by all
members of that community. For example, “groundbreaking” research for one segment of the
community could be work that results in many publications in refereed journals. For another segment it
could be research that leads to artifacts that make people more productive, where “more productive”
might mean to spend less time on finding papers in fields related to a given research problem.
Qualitative descriptors also evolve over time. For example, a community may later identify as another
factor in “good research” that the degree to which it is “transformative”. In addition to qualitative
descriptors, useful queries may also be written using quantitative descriptors, for example: “What is the
ordered list of PhD departments in CS when the ranking is the amount of research dollar spent in 2009?”
A number of projects (e.g., Libra [1, 2], Cimple [3], Arnetminer [4]) have built systems to capture
limited aspects of community knowledge and to respond to semantic queries. However, all of them lack
the level of community collaboration support that is required to build a knowledge base system that can
evolve over time, both in terms of the knowledge it represents as well as the semantics involved in
responding to qualitative questions. These systems are also homogeneous, in the sense that they harvest
data from one type of resources. A team at ODU is working on ScienceWeb [5, 6], which will combine
diverse resources such as digital libraries for published papers, curricula vitae from the web, and agency
data bases such as NSF’s research grant data base and that will use collaboration as the fundamental
approach to evolve its knowledge base.
1.1
ScienceWeb
In ScienceWeb the ODU team is developing a framework for a system that provides answers to
qualitative and quantitative queries of a large evolving knowledge base covering science research. The
system will support the community joining together, sharing the insights of its members, to evolve

the type of data to be gathered and how to organize them,
4


the methods for collecting the data, their sources, the process of collecting them, and validating
them, and
the meaning of qualitative descriptors and queries most needed and how they can be
computationally realized.
ScienceWeb will need to scale to accommodate the substantial corpus of information about researchers,
their projects and their publications. It will need to accommodate the inherent heterogeneity of both its
information sources and of its user community. Finally, it must allow the semantic (qualitative)
descriptors to evolve with time as the knowledge of the community grows and the problems the
community researches change.
ScienceWeb will develop new tools, technologies, and a framework, allowing a community to: (a)
collaboratively develop and evolve its domain knowledge base, (b) collaboratively develop queries for
answering qualitative questions, and (c) collaboratively help in automatically harvesting and validating
information from different resources. In the long-term, this framework should prove useful in many
domains and different contexts.
1.2
Proposed Work
Collaboration is at the heart of the approach to build ScienceWeb. Such collaboration includes building
and evolving the knowledge base, building, evolving and reusing queries and identifying, specifying
methods and harvesting raw information. The interaction between the system and people must be
effective enough to allow for collaborative development. Users are not willing to spend more than a
hour, perhaps even only several minutes to wait for the response of the system when trying, for example,
to create a new query [7].
Reasoning over the knowledge base provides support for answering qualitative questions and
quantitative questions, whose scalability and efficiency influence greatly the response time of the system.
ScienceWeb is a system that collects various research related information. The more complete the
knowledge base is, the more helpful answers the system will provide. As the size of knowledge base
increases, scalability becomes a challenge for the reasoning system. It will handle thousands even
millions units of reasoning items in the knowledge base. As users make changes to the basic descriptors
of the knowledge base, fast enough response time (ranges from seconds to a few minutes) in the face of
changes is one of the core challenges for the reasoning system.
The goal of this thesis is to develop an adaptive reasoning architecture and design an adaptive reasoning
system whose scalability and efficiency are able to meet the interaction requirements in ScienceWeb
system when facing a large and evolving knowledge base.
5
2.
Objectives
ScienceWeb is a platform where researchers including faculty, Ph.D. students and program managers
can collaboratively work together to get answers of their queries from a consensus point of view or from
their specific point of view. It will develop new tools, technologies, and a framework, allowing a
community to: (a) collaboratively develop and evolve its domain knowledge base, (b) collaboratively
develop queries for answering qualitative questions, and (c) collaboratively help in automatically
harvesting and validating information from different resources.
As we are addressing in this thesis only a few of the challenging research issues ScienceWeb poses, we
are making certain assumptions about the context for the problem we will research. We shall not address
the collaborative aspects of ScienceWeb including collaborative query, harvesting methods and ontology
evolution.
Instead, in this thesis we will research the issues involved in developing a reasoning architecture and
designing an adaptive reasoning system whose scalability and efficiency are able to meet the
requirements of query and answering in ScienceWeb. For evaluation purposes, we shall develop a base
query and rule set as well as instance data from a variety of sources. Specifically, the objectives are:



Research issues and develop tools for the reasoning support of ScienceWeb to answer qualitative
questions effectively:
o Design a reasoning architecture that will allow real-time or near real-time inferencing
o Based on the architecture, design and develop a reasoning system that can adapt to
changes in the query and rule sets as well as the ontology.
Create a test bed from within the research community corpus of Computer Science that is
sufficient for the evaluation of the adaptive reasoning system.
Evaluate system performance when changes are made to rules, queries and the ontology:
o Formative Evaluation:
 Work with a research community to identify tool features, performance criteria,
and knowledge base coverage needed for the reasoning system of ScienceWeb.
 For different components, evaluate various mechanisms that impact the
effectiveness of the reasoning system as measured in response time.
o Summative Evaluation
 Evaluate the system on effectiveness answering qualitative queries facing
evolutionary changes to the ontology, custom rules, data instances and queries.
 Evaluate the system with respect to the scalability in terms of number of data
instances.
 Evaluate the completeness and soundness of the results of queries
6
3.
Literature Review
In this section, we will describe the background of the ScienceWeb project which provides the context
and that of adaptive reasoning for ScienceWeb which is the thesis’s subject. The Related Areas section
gives a general view of related work to the whole project, such as what the other research has achieved
and has not done. The Thesis Area section focuses on the directly related areas to the work in this thesis,
such as the background knowledge of reasoning and what needs to be improved besides the existing
research work for ScienceWeb.
3.1
Related Areas
3.1.1
Digital Libraries
A significant class of digital libraries for scientific publications are the authoritative collections provided
by the professional societies such as IEEE [8] or ACM [9] and by commercial publishing houses such as
Springer-Verlag [10]. These libraries are limited in scope, being focused by design upon the publications
of the sponsoring organization. These organizations maintain complete digital libraries for all their
publications, but full access is typically restricted to paying members of the societies or subscribers. In
addition to traditional search features, they often provide special features such as references cited by a
paper, allow the exploration of the co-author relation, and sometimes provide citation counts.
A contrasting class of digital libraries consists of those systems that obtain their content from the web.
The stored content is typically a metadata record of a publication with references to the location of the
actual paper. Google Scholar [11] is designed to provide a simple way to search the scholarly literature.
One can search with keywords of common metadata of an article such as author, publication, date, and
area. It provides a citation count, related papers and a listing of versions available for download. For
articles in professional digital libraries, it provides metadata and a link to that digital library. DBLP [12,
13] provides bibliographic information on major computer science conference proceedings and journals.
DBLP provides a faceted search on publication years, publication types, venues and authors. It also
provides the functions of “Automatic Phrases” and “Syntactic Query Expansion” to generate more
patterns and forms for entering keywords and key phrases. CiteceerX [14, 15], Libra [1, 2] and
getCITED [16] are other systems that maintain and update their collection by continually crawling the
web. All these systems have different levels of coverage. For instance, searching for “Smalltalk-80: the
language and its implementation”, CiteceerX does not include this book in its collection, Google scholar
returns 979 as its citation count, Libra returns a citation count of 914, while getCITED returns 1. In
some of these systems, community members can edit and correct harvested information.
3.1.2
Science-related Knowledge Bases
There are a large number of knowledge bases [3, 17-23] for a variety of domains. As an example
consider Cimple [3], which is being used to develop DBLife, a community portal for database
researchers [19]. In the Cimple project, researchers developed a knowledge base that draws its basic
information from unstructured data on the web. It then analyzes the information and discovers relations
7
such as co-authorship and concentrates on having a consistent state of the information that it measures
using precision and recall. It uses members of the community to correct and refine extracted information.
Another example of a sophisticated knowledge base in Computer Science is Arnetminer [4], which
provides profiles of researchers, associations between researchers, publications, co-author relationships,
courses, and topic browsing. It has the capability to rank research and papers. It is a centrally developed
system with fixed queries and schemas for data, but the knowledge base is continually growing as new
data become available.
The Mathematics Genealogy project [24] is a knowledge base that defines the transitive relation of
advisor – PhD student for the domain of Mathematics. It is maintained by a few dedicated users with
few automated tools. However, it has obtained significant coverage within its domain and even has
expanded into Computer Science. Systems in this category are typically sustained by few people but
most users are also contributors.
A number of these systems have developed interesting ways of harvesting information (converting
unstructured or semi-structured information into structured), of forming natural language questions into
formal queries, and of enhancing the precision, recall and efficiency when returning answers.
3.1.3
Ontologies
There has been increasing effort in organizing web information within knowledge bases [25-28] using
ontologies. Ontologies can model real world situations, can incorporate semantics which can be used to
detect conflicts and resolve inconsistencies, and can be used together with a reasoning engine to infer
new relations or proof statements. The DBpedia [25] project focuses on converting Wikipedia content
into structured knowledge. YAGO [26] is automatically derived from Wikipedia [29] and WordNet [30].
SwetoDblp is a large ontology derived from bibliographic data of computer science publications from
DBLP. RKB Explorer [28] harvests information of people, projects, publications, research areas from a
different types of resources. Ontology-driven applications also include data mining [31-34], software
engineering [35], general natural language query systems [36, 37], and systems that help users build
formal semantic queries [25, 38].
A number of tools exist for collaboratively designing and developing ontologies: CO-Protégé [39],
Ontolingua [40], Ontosaurus [41], OntoEdit [42], WebODE [43], Kaon2 [44], OilEd [45]. With the
exception of OilEd and Protégé they provide support for collaboration and conflict resolution [46].
Changes introduced to ontologies can result in structural and semantic inconsistencies, the resolution of
which remains an open problem. Migration of instance data from one version of an ontology to the next
is a critical issue in the context of evolving ontologies. Some ontology changes, such as creating new
classes or properties, do not affect instances. When these changes occur, instances that were valid in the
old version are still valid in the new version. However, other changes may potentially affect instances.
In this latter case, some strategies can include having tools take their best guess as to how instances
8
should be transformed, allowing users to specify what to do for a specific class of changes, or flagging
instances that might be invalidated by changes.
3.1.4
Sources of Information
Many of the systems already discussed can serve as a source of information for the knowledge base we
intend to build. The open digital libraries such as DLBP can provide publications for the computer
science publications and, as it provides detailed citation records, can be used to build records of faculty
and researchers and their affiliations in the knowledge base. Information on funded research can be
obtained from federal agencies in the United States such as NSF and NIH. NIH also requires all projects
to publish their papers in PubMed [47] which can be used as well.
One of the greatest sources of information will be the curricula vitae researchers typically publish on
their websites. In computer science, a very high fraction of the researchers publish all relevant
information about their research in their curricula vitae which are generally available to any crawler.
More importantly most researchers regularly update these curricula vitae.
One of the few knowledge bases that provide information on the quality of an object is the study by the
National Academy of Sciences [48] that ranks various departments and universities with respect to a
number of criteria such as research output and student funding. It is interesting to note that, in a change
to a similar study done ten years ago, this report allows user to define their measure of quality and obtain
dynamically rankings according their measure.
3.2
Thesis Areas
3.2.1
Description Logic
Research in the field of knowledge representation and reasoning usually focused on methods for
providing high-level descriptions of the world that can be effectively used to build knowledge-based
systems. And these knowledge-based systems are able to get implicit consequences of its explicitly
represented knowledge. Thus, approaches to knowledge representation are crucial to the ability of
finding inferred consequences [49]. Early knowledge representation methods such as frames [50] and
semantic networks [51] lack well-defined syntax and a formal, unambiguous semantics, which are
elements of qualified knowledge representation. Description Logic was therefore introduced into
knowledge representation systems to improve the expressive power Description Logic [52] is a family
of logic-based knowledge representation formalisms, which is designed to represent the terminological
knowledge from an application domain [53].
There are many varieties of Description Logics. Different operators are allowed in these varieties. The
following letters denote the label of logic with different expressive power. OWL is a DL-based language,
which is a standard ontology language for the Semantic Web. The adoption of a DL-based language as
an ontology language marks an important step in the application of DL to different application domains,
such as natural language processing, database and biomedical ontologies. The design of OWL is based
on the SH family of DLs. OWL 2 is based on the expressive power of SROIQ(D), OWL-DL is based on
9
SHOIN(D), and for OWL-Lite it is based on SHIF(D) [54][54]. The expressive power is encoded in the
label for a logic using the following letters[54]:

F: Functional properties.

E: Full existential qualification (Existential restrictions that have fillers other than owl:thing).

U: Concept union.

C: Complex concept negation.

S: An abbreviation for ALC with transitive roles.

H: Role hierarchy (subproperties - rdfs:subPropertyOf).

R: Limited complex role inclusion axioms; reflexivity and irreflexivity; role disjointness.

O: Nominals. (Enumerated classes of object value restrictions - owl:oneOf, owl:hasValue).

I: Inverse properties.

N: Cardinality restrictions (owl:Cardinality, owl:MaxCardinality).

Q: Qualified cardinality restrictions (available in OWL 2, cardinality restrictions that have fillers
other than owl:thing).
(D): Use of datatype properties, data values or data types .

A DL knowledge base analogously consists of two parts: intentional knowledge (TBox), which
represents general knowledge regarding a domain, and extensional knowledge (ABox), which represents
a specific state of affairs. The “T” in term “TBox” denotes terminology or taxonomy which is built
based on the properties of concepts and the subsumption relationships among the concepts in the
knowledge. The “A” in term “ABox” denotes assertional knowledge that includes individuals of the
specific domain. [49, 55]
3.2.2 Inference Methods
DL Reasoning Algorithm
The main reasoning tasks for DL reasoners are verifying KB consistency, checking concept satisfiability,
concept subsumption and concept instances [53]. There are many algorithms for reasoning in DLs [53],
which are mainly responsible for the above reasoning tasks. First are structural subsumption algorithms
[56, 57], which first normalize the descriptions of concept, and then compare the syntactic structure of
the normalized descriptions recursively. The disadvantage of this approach is they are incomplete for
expressive DLs, although they are efficient. Second is the resolution-based approach [58-62], which
transforms DLs into first-order predicate logic and then applies appropriate first-order resolution provers.
Third is the automata-based approach [63-65], which is often more convenient for showing ExpTime
complexity upper-bounds than tableau-based approaches. Fourth is the tableau-based approach [66],
which is currently the most widely used reasoning algorithm for DLs. It had been previously used for
modal logics [67], then first introduced in the application of DLs by Schmidt-Schauß and Smolka [68] in
1991. This approach is able to deal with large knowledge bases from applications and is complete for
expressive DLs. Furthermore, highly-optimized tableau-based algorithms [69] are proposed as basis for
the new Web Ontology Language OWL2.
10
Inference Methods in First Order Logic
There are three kinds of inference methods in First Order Logic (FOL), Forward chaining, Backward
chaining and Resolution [70].
Forward chaining is an example of data-driven reasoning, which starts with the known data in the KB
and applies modus ponens in the forward direction, deriving and adding new consequences until no
more inferences can be made. Rete [71] is a well known Forward chaining algorithm. Backward
chaining is an example of goal-driven reasoning, which starts with goals from the consequents matching
the goals to the antecedents to find the data that satisfies the consequents. Resolution is a complete
theorem-proving algorithm that proves the theorems by showing that the negation contradicts with the
premises[70].
Materialization and Query-rewriting
Materialization and Query-rewriting are the most popular inference strategies adopted by almost all of
the state of the art ontology reasoning systems. Materialization means pre-computation and storage of
inferred truths in a knowledge base, which is always executed during loading the data and combined
with forward-chaining techniques. Query-rewriting means expanding the queries, which is always
executed during answering the queries and combine with backward-chaining techniques.
Materialization and forward-chaining are suitable for frequent, expensive computation of answers with
data that are relatively static. Owlim [72, 73], Oracle 11g [74], Minerva [75] and DLDB-OWL [76] all
implement materialization during loading of the data. Materialization permits rapid answer to queries
because all possible inferences have already been carried out. But any change in the ontology, instances,
or custom rules requires complete re-processing before responding to any new queries. Furthermore, a
large amount of redundant data may be produced by materialization of a large knowledge base, which
may slow the subsequent loading and querying.
Query-rewriting and backward-chaining are suitable for efficient computation of answers with data that
are dynamic and infrequent queries Virtuoso [77], AllegroGraph [78] and Sher [79] implement
dynamical reasoning when it is necessary. This approach improves the performance of answering new
queries after data changes and simplifies the maintenance of storage. But frequent repeated queries in
query-rewriting will require repeated reasoning, which is time-consuming compared to pure search in
materialization. Hstar [80] attempts to improve performance by adopting a strategy of partially
materializing inference data instead of complete materializing.
A hybrid approach may give the best of both worlds. Jena [81] supports three ways of inferencing:
forward-chaining, backward-chaining and a hybrid of these two methods. Reasonable adaptive hybrid
approach would combine the strong points of both patterns for better performance under changing
circumstances.
3.2.3
Combination Method of Ontologies and Rules
Classification of combination
11
Ontologies are used to represent a domain of interest by defining concepts, recording relations among
them and inserting individuals. Rules are mainly based on subsets of First Order Logic (FOL) and
possible extensions. There seems to be a trend to integrate ontologies and rules into the world of the
Semantic Web.
The integration of ontologies and rules has been clearly reflected in Tim Berners-Lee’s “Semantic Web
Stack” diagram [82], which illustrates the hierarchy of languages that compose the architecture the
Semantic Web. There exist two reasons why to integrate ontologies and rules. First is expressive power :
rules provide more expressive power to complement ontology languages, such as “composite properties”.
Second is reasoning techniques: existing considerable research on effective reasoning support of rules
provide solid basis for ontology inference.
Antoniou et al. classify the integration of ontologies and rules according to the degree of integration [83],
distinguishing between either the hybrid approach or the homogeneous approach.
In the homogeneous approach, the ontologies and rules are treated as a new logical language. There is
no separation between predicates of an ontology and predicates of rules. Rules in this new language may
be used to define classes and properties of an ontology. The inference in this approach is based on the
inference of the new logical language. Examples of the homogeneous approach include [84-88]. Some
of these are very popular in applications, such as DL+log [86], SWRL [87], DLP [88].
In the hybrid approach, there is a strict separation between predicates of ontologies and predicates of
rules. The predicates of ontologies are only used as constraints in rule antecedents. The execution of
rules refers to the relationship in ontology. The ontology reasoning and rule execution are performed by
two different reasoners. Inferencing in the hybrid approach is based on the interaction between the
ontology reasoner and the rule reasoner. There are also many examples of the hybrid approach AL-Log
[89, 90], HD-rules [91], NLP-DL [92], CARIN [93], dl-programs [94], r-hybrid KBs [95],and
DatalogDL [96].
RDF-entailment and RDFS-entailment are both examples of vocabulary entailment that capture the
semantics of RDF and RDFS. Entailment rules are inference patterns that define what RDF triples [97]
can be inferred from the existing knowledge[98]. Horst [99] defined “R-entailment” as an entailment
over an RDF graph based on a set of rules. It is an extension of “RDFS entailment”, extending the metamodeling capabilities of RDFS. Horst defined pD* semantics in [100] as a weakened variant of OWL
Full. Then in [101] pD* semantics were extended to apply to a larger subset of the OWL vocabulary. Rentailments incorporates pD* entailment. Owlim [72, 73] is the practical reasoner based on OWL Horst
[99-101]. Rule-based OWL reasoners are implemented based on the entailment rules, and the
interpretation of these entailment rules relies on the rule engine. So far, we have introduced three kinds
of semantic reasoners. The first category is the DL reasoner, such as Racer [102], Pellet [103, 104],
fact++ [105] and hermiT [106], with the inference of these reasoners being implemented by the popular
tableau algorithm or hypertableau [106]. The second category contains reasoners based on the
homogeneous approach, such as rule-based reasoner Owlim [72, 73], OWLJessKB [107], BaseVISor
12
[108], Jena [81, 109], Bossam [110], SweetRules [111], with the inference of these reasoners being
based on the implementation of entailments in a rule engine. For example: Jena relies on one forward
chaining RETE engine and one tabled Datalog engine [112]. Owlim [72, 73] cannot work without
support of TRREE. BaseVISor [108] and Bossam [110] are both RETE-based reasoners. The Datalogdriven reasoner Kaon2 [113, 114], which reduces a SHIQ(D) KB to a disjunctive Datalog program and
an F-Logic based reasoner F-OWL [115] are both examples of homogeneous approach besides rulebased reasoners. The third category contains hybrid reasoners based on the hybrid approach, such as
AL-Log [89, 90], HD-rules [91], NLP-DL [92], CARIN [93], dl-programs [94], r-hybrid KBs [95], and
DatalogDL [96] with the inference of these reasoners inference being implemented by integration of
existing DL reasoner and existing rule engine.
DL reasoners have great performance on complex TBox reasoning, but they do not have scalable query
answering capabilities that are necessary in applications with large ABoxes [116-118]. Rule-based
reasoners are based on the implementation of entailments in a rule engine. They have limited TBox
reasoning completeness because they may not implement each entailment or they choose the
performance instead of the completeness. TBox and ABox entailment rules are fired against ontological
knowledge in Rule-based reasoners. Hybrid reasoners that integrate a DL reasoner and an existing rule
engine can combine the strong points of both sides.
There are two examples of hybrid reasoners. First, DLE [119, 120] is a mixed framework combining the
DL reasoning of a DL reasoner and the forward-chaining reasoning of a rule-base reasoner. By using
the efficient DL algorithms, DLE disengages the manipulation of the TBox semantics from any
incomplete entailment-based approach. And DLE also achieves faster application of the ABox related
entailments and efficient memory usage[119]. However, there are three limitations in this framework:
1. It implements the materialization approach against the existing RDF graph based on the forwardchaining engine. With fast changing ontology and rules, DLE has to perform the materialization
all over again, which makes some specific query and answering slower.
2. It does not implement the storage schema for more scalable ABox reasoning when the size of
ABox exceeds that of main memory. It has only been tested in main memory, so the performance
is limited due to scalability of rule-based reasoners.
3. It is based totally on the performance of the rule-based reasoner on firing the entailments without
any optimization.
It has also been mentioned in [119] that approaches described in [121] would increase the performance.
Minerva [75] is also a scalable owl ontology storage and inference system, which combines a DL
reasoner and a rule engine for ontology inference, materializing all inferred results into a database. It
implements the materialization strategy like DLE.
Support of Custom rules
With custom rules, users can compose rules to import their own idea, such as the definition of a new
property, into the structure of the TBox. Support of custom rules brings more flexibility into the
13
reasoning systems. The combination of ontologies and rules makes the support of custom rules easier in
terms of the expressive power and inference. In another word, the combination support custom rules
more naturally. SWRL is a good example. Some reasoners support custom rules by combining the logic
and inferencing of SWRL to a differing extent, such as Racer, Pellet, hermiT, Bossam, SweetRules,
Kaon2 and Ontobroker. SWRL is a system that follows the homogeneous approach of combination of
ontologies and rules. Some reasoners support custom rules using their own rule formats based on the
existing rule engine, such as Bossam, Owlim, Jena, etc.
3.2.4
Scalability of Semantic Store
Storage Scheme
More and more semantic web applications contain large amounts of data conformed to the ontology.
How to store these large amounts of data and how to reason with them becomes a challenging issue for
the research in fields of Semantic Web and Artificial Intelligence and many semantic web applications.
Research has been conducted on the storage scheme and the improvement of reasoning methods for
large size of data [122].
We will first discuss the storage scheme and its implementation of different ontology stores.
Generally, there are two main kinds of ontology stores, native stores (disk-based stores) and databasebased stores. Native stores are built on the file system, while database-based stores use relational or
object relational databases as the backend store.
Examples of native stores are OWLIM [72, 73], AllegroGraph [78], Sesame Native [123] ,Jena TDB [81,
109], Hstar [80], Virtuoso [77].
AllegroGraph becomes a powerful graph-oriented database that can store pure RDF as well as any graph
data-structure. The bulk of an AllegroGraph triple-store is composed of assertions. Each assertion has
five fields: subject (s), predicate (p), object (o), graph (g) and triple-id (i). The s, p, o, and g fields are
strings of arbitrary size. AllegroGraph maintains a string dictionary which associates the unique string
with a special number to prevent duplication. AllegroGraph creates indices which contain the assertions
plus additional information to speed queries. AllegroGraph also keeps track of all the deleted triples. All
these features result in fast speed for load and update. The loading speed for LUBM8000 [124] of
AllegroGraph is 303.18 K triples per second [125].
OWLIM has two versions. The “standard” OWLIM version, referred to as SwiftOWLIM, uses inmemory reasoning and query evaluation. The persistence strategy of this version is based on N-Triples.
The indices of SwiftOWLIM are essentially hash tables. BigOWLIM is an even more scalable not-inmemory version, which stores the contents of the repository (including the “inferred closure”) in binary
files with no need to parse, re-load and re-infer all the knowledge from scratch. BigOWLIM uses sorted
indices, which are seen as permanently stored ordered lists. OWLIM uses B+ trees to index triples[122].
14
Jena TDB is a component of Jena for non-transactional native store and query. It is based on the file
system. A TDB store consists of three parts, the node table, triple and quad indexes and the prefixes
table. The node table generally stores the representation of RDF terms, and the node table provides two
mappings. One is from Node to NodeId, and the other is from NodeId to Node. The default storage of
the node table for the NodeId to Node mapping is a sequential access file, and that for Node to NodeId
mapping is a B+ Tree. Triples are used for the default graph and stored as 3-tuples of NodeIds in triple
indexes. Quads are used for named graphs and stored as 4-tuples. Each index has all the information
about a triple. The prefixes table uses a node table and an index for GPU (Graph->Prefix->URI). And
customized threaded B+ Trees are used in implementation of many persistent data structures in TDB
[126].
According to the data model introduced in Hstar, OWL data has been divided into three categories. First
category represents OWL Class, OWL Property and Individual Resource. Second category represents
the relation of or between elements in the first category. Third category represents characters defined on
OWL Property. Hstar designed a customized storage model following the categorization of OWL data. It
defines an inner identifier for Entities, OID, and stores the mapping from the URI of an entity to its OID
in global hash tables. The inheritance relation in RC is stored in a tree structure C-Tree with a C-index
for faster access to the data. The C-index is a B+ tree structure. The equivalence relation in RC is stored
in a B+ Tree and maintained in memory. Individuals related to the same Cx are stored in one Individual
List (I-List), whose start address is saved in Cx’s P-Node of C-Tree. An IC-index is built to facilitate
queries for type of given individual, which is also a B+ Tree index. The storage of relationship Rp,
RCP , RI and Tp related to property employs tree structures P-Tree, B+ Tree and List as well. Therefore,
HStar organizes its triples according to the categorization of different relationships and characteristics.
Tree structures and lists are the main data structures that are used in the implementation of storage. B+
trees are also used to index triples [80].
RDF data are stored as quads, such as graph, subject, predicate and object tuples in Virtuoso. All such
quads are in one table, which may require different indexing depending on the query. Bitmap indices are
adopted by Virtuoso to improve the space efficiency. When the graph is known, the default index layout
is a GSPO (graph, subject, predicate and object) as the primary key and OPGS as a bitmap index. When
the graph is left open, the recommended index layout is SPOG for primary key, OPGS, GPOS and
POGS as bitmap indices [127, 128].
Representative database-based stores are: Jena SDB [81], Oracle 11g R2 [74], Minerva [75], (Sesame +
MySQL) [123], DLDB-OWL [76] and (Sesame+ PostgreSQL) [123]. They all take advantage of
existing mature database technologies for persistent storage.
Comparing the native stores with database-based stores illustrates both their advantages and
disadvantages. The advantage of native stores is that they reduce the time for loading and updating data.
However, a disadvantage of native stores is they are not able to make direct use of the query
optimization features in database systems. Native stores need to implement the functionality of a
relational database from the beginning, such as indexing, query optimization, and access control. As for
15
database-based stores, the advantage is that they are able to make full use of mature database
technologies, especially query optimization while the disadvantage is that they may be slower in loading
and updating data,
Reasoning over Large ABoxes
Due to the large size of instance data conformed to corresponding ontology in many knowledge bases,
reasoning over large ABoxes has become an issue in the fields of Semantic Web and Description Logics.
There are two main kinds of approaches to dealing with this issue. The first approach includes
designing novel algorithms, schemes and mechanisms that enhance the reasoning ability on large and
expressive knowledge bases. Compared with some used state-of-the-art DL reasoners such as Racer,
FaCT++, and Pellet, Kaon2 has been shown to have better performance on knowledge bases with large
ABoxes but with simple TBoxes [118]. One of the novel techniques adopted in the implementation of
Kaon2 is reducing a SHIQ(D)knowledge base KB to a disjunctive Datalog program DD(KB) such that
KB and DD(KB) entail the same set of ground facts. There are three advantages and extensions based on
this transformation. First, existing research on techniques and optimizations of disjunctive Datalog
programs could be reused, such as magic set transformation [129]. Second, DL-safe rules are combined
with disjunctive programs naturally by translating them into Datalog rules, which increases the
expressive power of the logic. Third, in [130] an algorithm is developed for answering conjunctive
queries in 𝑆𝐻𝐼𝑄 (𝐷) extended with DL-safe rules efficiently based on the reduction to disjunctive Datalog.
A scalable ontology reasoning method by summarization and refinement has been applied in Sher [79].
A summary of the ontology, called a “summary ABox”, is used to reduce the reasoning to a small subset
of the original ABox, keeping the soundness and completeness [131]. The main idea of the method is to
aggregate individuals that belong to the same concepts into the summary with all the necessary relations
that preserve the consistency. Then queries are performed on the summary ABox rather than on the
original ABox. Another process called refinement is used in determine the answers of the queries by
expanding the summary ABox to make it more precise. Dolby et al. claimed that the method of
summarization and refinement can also be treated as an optimization that any tableau reasoner can
employ to achieve scalable ABox reasoning [132].
Guo et al. [133] have proposed a method that partitions a large and expressive ABoxes into small and
comparatively independent components following the analyses of TBox. Thus, specific kinds of
reasoning can be executed separately on each component and the final results are completed through
collecting and combining the results from each small component. After the partition, large ABoxes can
be processed by state-of-the-art in-memory reasoners.
In [134] Wandelt and Möller present a method “Role condensates”, which is a complete, but unsound
approach to answer conjunctive queries in a proxy-like manner.
The second approach adopts simplification by reducing the expressive power of TBoxes describing large
ABoxes. Calvanese et al. [135, 136] have proposed a new Description Logic, called DL-Lite, which is
not only rich enough to capture basic ontology languages, but also requires low complexity of reasoning.
16
The expressive power of this Description Logic allows conjunctive query answering through standard
database technology so that the query optimization strategies provided by current DBMSs can be used
for the better performance. They concluded that this is the first result of polynomial data complexity for
query answering over DL knowledge bases.
Horn-SHIQ [137, 138] is an expressive fragment of SHIQ and the Horn fragment of first-order logic. It
provides polynomial algorithms for satisfiability checking and conjunctive query answering on large
ABoxes.
3.2.5
Benchmarks
Benchmarks evaluate and compare the performances of different reasoning systems. A number of
popular RDF and OWL benchmarks exist to conduct evaluation of performance over such variables as
ABox size, TBox size, TBox complexity, and query response.
The SP2Bench SPARQL Performance Benchmark [139]has a scalable RDF data generator that creates
DBLP-like data and a set of relative benchmark queries . SP2Bench has been applied to a number of
well known existing engines including the Java engine ARQ on top of Jena [140], SDB, Sesame and
Virtuoso.
The Berlin SPARQL Benchmark(BSBM) [141] is also designed for comparison of the query
performance of storage systems that expose SPARQL endpoints (i.e., a conformant SPARQL protocol
service that enables users to query a knowledge base via the SPARQL language[142, 143]). BSBM
simulates the realistic enterprise conditions and focuses on measuring SPARQL query performance
against large amounts of RDF data. It is built around an e-commerce use case with a benchmark dataset,
a set of benchmark queries and a query sequence. BSBM has been applied to a number of existing
systems such as Virtuoso, Jena TDB, 4store [144], and OWLIM.
The Lehigh University Benchmark (LUBM) [124] is a widely used benchmark for evaluation of
Semantic Web repositories with different reasoning capabilities and storage mechanisms. LUBM
includes an ontology for university domain, scalable synthetic OWL data, fourteen extensional queries
and a few performance metrics.
The University Ontology Benchmark (UOBM) [145] extends the LUBM benchmark in terms of
inference and scalability testing. UOBM adds a complete set of OWL Lite and DL constructs in the
ontologies for more thorough evaluation on inference capability. UOBM as well creates more effective
instance links by enriching necessary properties and improves the instance generation methods for more
evaluation on scalability.
Both LUBM and UOBM have been widely applied to the state of the art reasoning systems to show the
performance regarding different aspects[124, 145].
Although more and more systems consider the existing benchmarks as a standard way to evaluate and
highlight their performance, some researchers have challenged the coverage of these benchmarks [146,
17
147]. Weithöner et al. [146] discussed some new aspects of evaluation such as ontology serialization,
TBox complexity, query caching, and dynamic ontology changes. They concluded that there is still no
current single benchmark suite that can cover both traditional and new aspects, as well as no reasoner
that is able to deal with large and complex ABoxes in a robust manner. A set of requirement that is
useful for the future benchmarking suites are given in the end.
3.2.6
Summary of Major Reasoning Systems
Jena: Jena is a Java framework for Semantic Web applications that includes an API for RDF. It supports
in-memory and persistent storage, a SPARQL query engine and a rule-based inference engine for RDFS
and OWL. The rule-based reasoner implements both the RDFS and OWL reasoners. It provides forward
and backward chaining and a hybrid execution model. It also provides ways to combine inferencing for
RDFS/OWL with inferencing over custom rules. [81]
Pellet: Pellet is a free open-source Java-based reasoner supporting reasoning with the full expressive
power of OWL-DL (SHOIN(D) in Description Logic terminology). It is a great choice when sound and
complete OWL DL reasoning is essential. It provides inference services such as consistency checking,
concept satisfiability, classification and realization. Pellet also includes an optimized query engine
capable of answering ABox queries. DL-safe custom rules can be encoded in SWRL. [104]
Kaon2: Kaon2 is a free (free for non-commercial usage) Java reasoner including API for programmatic
management of OWL-DL, SWRL, and F-Logic ontologies. It supports all of OWL Lite and all features
of OWL-DL apart from nominals. It also has an implementation for answering conjunctive queries
formulated using SPARQL. Much, but not all, of the SPARQL specification is supported. SWRL is the
way to support custom rules in Kaon2. [114]
Oracle 11g: As an RDF store, Oracle 11g provides APIs that supports the Semantic Web. Oracle 11g
provides full support for native inference in the database for RDFS, RDFS++, OWLPRIME, OWL2RL,
etc. Custom rules are called user-defined rules in Oracle 11g. It uses forward chaining to do the
inference. [74] In Oracle the custom rules are saved as records in tables.
OWLIM: OWLIM is both a scalable semantic repository and reasoner that supports the semantics of
RDFS, OWL Horst and OWL 2 RL. SwiftOWLIM is a “standard” edition of OWLIM for medium data
volumes. It is the edition in which all the reasoning and query are performed in memory. [73] Custom
rules are defined via rules and axiomatic triples.
We summarize the different ontology based reasoning systems along with the features they support in
Table 1.
18
Table 1 General comparison among ontology reasoning systems
3.2.7
Supports
RDF(S)?
Supports
OWL?
Rule Language
Supports
SPARQL
Queries?
Repository?
Persistent(P),
In-Memory (M)
Jena
yes
yes
Jena Rules
yes
M
Pellet
yes
yes
SWRL
no
M
Kaon2
yes
yes
SWRL
yes
M
Oracle 11g
yes
yes
Owl Prime
no
P
OWLIM
yes
yes
Owl Horst
yes
P
Limitations of Existing Research
In order to implement an adaptive reasoning framework suitable for “ScienceWeb”, an efficient,
complete and scalable reasoning system with support of customs rules that can adapt to changing
ontology, rules and instances is required.
As mentioned in Section 3.2.3, DL reasoners [102, 103, 105] have sufficient performance on complex
TBox reasoning, but they do not have scalable query answering capabilities that are necessary in
applications with large ABoxes. Rule-based OWL reasoners [72, 109] are based on the implementation
of entailment rules in a rule engine. They have limited TBox reasoning completeness because they may
not implement each entailment rule or they choose the performance instead of the completeness [119].
Hybrid reasoners [119] that integrate a DL reasoner and a rule engine can combine the strong points of
both sides.
Forward-chaining and materialization, adopted in reasoning systems [72-76], is suitable for frequent,
expensive computation of answers with data that are relatively static. However, any change in the
ontology, instances or custom rules requires complete re-processing before response to the new queries.
And large amount of redundant data will be produced for large knowledge base, which may slow the
performance of loading and querying. Backward-chaining and query-rewriting, adopted in reasoning
systems [77-79], is suitable for efficient computation of answers with data that are dynamic and
infrequent queries. However, frequent, repeated queries will require repeated reasoning that is a waste of
time. Reasonable adaptive hybrid approach would combine the strong points of both patterns for better
performance under changing circumstances.
Persistent stores, either native stores or database-based stores, are necessary for a scalable reasoning
system. Database-based stores may be slower in loading and updating data, but they are able to make
full use of mature database technologies, especially, query optimization. As for native stores, they are
not able to make use of the query optimization features in database systems. However, native stores
reduce the time for loading and updating data greatly.
19
In this thesis, we will combine both Forward-chaining & materialization and Backward-chaining &
query-rewriting with support of integration of DL reasoner and existing rule engine, and develop our
own persistent native stores for the scalable reasoning. The purpose of this combination will be to
develop a reasoning architecture and an adaptive system, whose scalability and efficiency are able to
meet the requirements of query and answering in ScienceWeb when facing a large knowledge base
subject to frequent changes,
4.
Preliminary Work
4.1
Architecture of ScienceWeb
This section describes the preliminary work we have already done to implement parts of the
ScienceWeb system that includes the design of the architecture of ScienceWeb, the ontology in
ScienceWeb and a synthetic data generator, the comparison of ontology reasoning systems, the study on
a selected benchmarks and a formative study in Virginia.
ScienceWeb is a platform where researchers including faculty, Ph.D. students and program managers
can collaboratively work together to get answers of their queries from a consensus point of view or from
their specific point of view. The collaborative aspect is not only in the construction of queries but in the
construction of the underlying ontology, rules and instance data. The proposed architecture of the
ScienceWeb is shown in Figure 1.
Query
Construction
Ontology
Construction
Rule
Construction
Harvesting
Methods
Construction
Collaborative Clients
Collaborative Middleware
Query
Evolution
Rule
Evolution
Ontology
Evolution
Harvesting
Methods
Evolution
Server
Query
Reasoner
Harvester
Data Source
Web
DB
Knowledge Base
Figure 1 Architecture of ScienceWeb
A traditional data mining architecture involves harvesting from data sources to populate a knowledge
base, which in turn can then answer queries about the harvested content. We propose to enhance this
architecture by adding a layer of collaborative clients for construction of queries, rules, ontological
concepts, and harvesting methods, mediated by a layer of server functions that oversee and support the
evolution of each of those functional groups within the knowledge base.
20
The system is built, developed and evolved based upon users’ collaborative contributions as shown in
Figure 2. Users contribute during querying & answering, harvesting and ontology evolution. Querying is
not an ordinary job of posting, parsing and retrieving as in a conventional database. Instead, it becomes
an interactive, collaborative process. Harvesting and ontology evolution also benefit from the
information provided by the users. Thus, collaboration is critical and widely spread throughout the
system.
Queries and rules are the core of ScienceWeb system. How the queries and rules develop based on the
collaboration is critical to the evolution of the system. Here are the main steps of development of queries
and rules. Firstly, users strive for consensus on rule expression of qualitative descriptor and quantitative
descriptor, consensus on adding rules into rule base, and consensus on changing the ontology. Secondly,
when users construct queries and rules, the existing queries and rules from others’ point of view should
be made available for help and inspiration. Thirdly, we want to create methods for automatic query
enrichment and rule prediction that are based on analysis of existing queries, rules and relationship
among them, and analysis of behaviors, interest of specific subset of community.
Contribute
Get answers
Community
Initial Knowledge Base K0
(Rules0+Ontology0
+Instance Data0)
Knowledge Base K1
(Rules1+Ontology1
+Instance Data1)
Knowledge Base Kn
(Rulesn+Ontologyn
+Instance Datan)
System
Figure 2 Evolution of ScienceWeb
In ScienceWeb, a qualitative query is to be answered according to the criteria given in form of custom
rules by member(s) of from the community, which means inference from custom rules has to be done
before returning the results of most queries. Current ontology reasoning systems do support inference
from custom rules. But the performance of inference from custom rules on large size instance data
cannot meet the requirement of real-time response. Experiments have been done on different sizes of
instance data (from thousands to millions) conforming to the ScienceWeb ontology. Results explicitly
show that when the size of the instance data grows, the reasoning for a qualitative query with new
custom rules takes minutes to hours, which is unacceptable for the real-time query-answering system. It
is critical to find approaches to solve the performance problem.
21
Following are basic scenarios that illustrate the workings of the modules constituting ScienceWeb as
shown in Figure 1.
Query construction and evolution:




The user interacts with the Query Construction client to present his query.
The Query Construction client works with the Query Evolution module and looks for similar past
queries in the knowledge base.
The user chooses to select one query from the similar past queries, or edit the past query, or use
the original query that he posts.
The query is passed on for query processing.
Query processing:


The user submits a query, either a formerly constructed query or a new/modified one arising
from interaction with the Query Construction and Query Evolution module.
The reasoner and DB in the knowledge base work together to return the answers of the posted
query.
Rule construction and evolution:





The user constructs the rules (criteria) to express the qualitative descriptor and quantitative
descriptor appearing in a new query for further inference.
The user interacts with Rule Construction client to introduce his rule.
The Rule Construction client works with the Rule Evolution module to look for similar past rules
to help users to compose rule.
The user chooses to edit the past rule or construct a totally new rule to compose a new rule.
After the new rule is constructed, the system informs the user that the new rule has been included
in the rule base.
Ontology construction and evolution:



A user attempts to express a concept in a query or rule that is unsupported by the collection of
data instances and properties in the current ontology.
The user works with the Ontology Construction client to add, edit or delete the classes or
properties in the ontology.
The user interacts with Ontology Evolution module to make sure that the new ontology is
consistent after the changes.
Harvesting method construction and evolution:

The user interacts with the Harvesting Method Construction client to describe the resources from
which the new instances can be obtained.
22




The harvester crawls some sample data from these resources.
The Harvesting Method Evolution module verifies the validity of these resources.
At a later time, the harvester obtains instances within valid resources. The user may change the
instances and/or the methods specified to extract the desired information. Following the changes
from the user, the changed specification become part of the harvesting schedule.
The harvesting Method Evolution module checks the consistency between the new instances and
past ones.
The goal of this thesis is to develop an adaptive reasoning architecture and design an adaptive reasoning
system whose scalability and efficiency are able to meet the requirements in ScienceWeb system when
facing the large and evolving knowledge base.
4.2
Ontology Development and Synthetic Data Generator
4.2.1
Ontology Development
As described in section 3, there have been a number of studies on reasoning systems using only their
native logic. To provide credibility for our context, we used benchmark data from these studies, replicate
their results with native logic, and then extend them by adding customized rules. We used LUBM for
comparing our results with earlier studies. The second set of data will emulate a future ScienceWeb.
Since ScienceWeb at this time is only a Gedanken experiment, we need to use artificial data for that
purpose. Figure 3 shows the limited ontology class tree we deem sufficient to explore the scalability
issues.
Figure 3 Class tree of research community ontology
23
Both the LUBM and the ScienceWeb ontologies are about concepts and relationships in a research
community. For instance, concepts such as Faculty, Publication, and Organization are included in both
ontologies, as are properties such as advisor, publicationAuthor, and worksFor. All the concepts of
LUBM can be found in the ScienceWeb ontology, albeit the exact name for classes and properties may
not be same. ScienceWeb will provide more detail for some classes. For example, the ScienceWeb
ontology has a finer granularity when it describes the classification and properties of Publication. The
ScienceWeb ontology shown in Figure 3 represents only a small subset of the one to be used for
ScienceWeb. It was derived to be a minimal subset that is sufficient to answer a few select qualitative
queries. The queries were selected to test the full capabilities of a reasoning system and to necessitate
the addition of customized rules.
4.2.2
Synthetic Data generator
In support of the performance analysis described above, a flexible system for generating benchmark
knowledge base instances of varying sizes has been developed [148]. The major challenge for this
generator is to not only produce ontology-conformant data sets of the desired size, but to guarantee a
plausible distribution for the many properties that relate objects across the knowledge base. Existing
generators such as LUBM [124] tend to work within an aggregation tree (e.g., for a university object
generate a number of departments, for each department generate a number of faculty, for each faculty
member generate a number of papers authored by that faculty member). Obtaining reasonable
distributions within such a tree is relatively straightforward. But when the domain expands to include
other aggregation trees (e.g., for each publisher generate several journals, for each journal generate
several papers) that must share objects with other aggregation trees, it is more of a challenge to maintain
reasonable distributions for relations that span such trees (e.g., in how many different journals will a
faculty member publish?) but such distributions are important to performance analysis of reasoning
systems involving such relations.
4.3
Preliminary Benchmark Study on LUBM and UOBM
In this section, we include preliminary benchmark study on two commonly used benchmarks: LUBM
and UOBM. In the study, we employ the extensional queries contained in the benchmarks themselves.
4.3.1
Experiments on Two Classic Benchmarks
In order to evaluate the ontology reasoning systems that support custom rules and give a more complete
analysis, we also evaluate them on top of two commonly used benchmarks: LUBM[124] and
UOBM[145].
LUBM contains scalable synthetic OWL data. Different size of dataset can be generated for experiments.
To identify the dataset, we use the following notation from paper [124] in the subsequent description:
LUBM(N, S): The dataset that contains N universities beginning at University0 and is generated using a
seed value of S.
24
The instance generator in UOBM can create instances according to user specified ontology (OWL Lite
or OWL DL). In addition, the user can specify the size of the generated instance data by setting the
number of universities to be constructed. UOBM contains 6 test datasets for experiments set up for
various goals: Lite-1, Lite-5, Lite-10, DL-1, DL-5 and DL-10. We use the following notation to identify
these test datasets in the subsequent description:
UOBM(Lite/DL-N): The dataset that contains N universities and is conformed to OWL Lite/OWL DL.
Due to scalability limits, some of the reasoning systems, such as Jena and Pellet, are not able to answer
queries on large dataset. We therefore collected experimental results only on the small dataset
LUBM( 0,1). These are presented in Table 2. There are 14 extensional queries involved in this
experiment.
Table 2 Query performance on LUBM(0,1)
q1
q2
q3
q4
q5
q6
q7
q8
q9
q10
q11
q12
q13
q14
Jena
n/a
n/a
6.4
354.2
n/a
6.9
n/a
n/a
n/a
n/a
191.0
6.2
208.5
6.8
Pellet
4.6
n/a
4.5
4.5
4.8
5.3
870.0
684.1
n/a
n/a
4.6
5.8
4.7
5.3
Oracle
0.5
0.9
0.5
1.5
0.8
1.5
1.5
2.7
95.8
0.8
0.6
0.7
0.6
1.4
Kaon2
0.4
0.5
0.4
0.6
0.6
0.6
0.6
1.0
0.7
0.6
0.4
0.6
0.6
0.4
Owlim
0.1
0.1
0.1
0.1
0.4
2.9
56.6
7.3
n/a
0.1
0.2
0.1
0.1
1.7
In this experiment, if there is a time-out (one hour), we mark it with a “n/a”. Generally, most ontology
reasoning systems have a good performance during this experiment except for Jena. Jena could not
answer 7 out of 14 queries and its answer to query 16 is incomplete. Pellet does not return answers for
query 2 and query 9. Owlim has poor performance for query 7 and is not able to answer query9 before
time-out. Query 9 is more complicated than query 7. There are 2 constraints in query 7 while query 9 is
featured by the most classes and properties in all the queries with 3 constraints. Therefore, the
complexity of a query does influence the querying performance of Owlim a lot. Oracle seems have a
steady performance of query.
UOBM is another popular benchmark based on LUBM. The goal of our second experiment is to
compare these reasoning systems in terms of query performance and completeness and soundness of the
results on top of UOBM. Again, due to scalability limits, some reasoning systems are not able to answer
queries on large datasets. We therefore only present the experiment results on the small dataset
UOBM((DL-1). These are shown in Table 3 and Table 4. 15 extensional queries are involved in this
experiment.
25
Table 3 Query performance on UOBM((DL-1)
q1
q2
q3
q4
q5
q6
q7
q8
q9
q10
q11
q12
q13
q14
q15
Jena
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
Pellet
9.9
9.8
10.2
n/a
9.9
10.9
10.3
15.2
9.6
9.3
1446.1
11.0
n/a
1725.7
14.3
Oracle
1.8
2.0
0.8
1.1
0.7
0.8
0.7
0.8
1.0
0.6
6.6
1.3
0.5
2.4
0.4
Kaon2
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
Owlim
0.2
0.8
0.4
0.5
0.1
0.2
0.2
0.2
0.4
0.1
2.9
0.4
0.3
2.2
0.1
Table 4 Completeness and soundness of query on UOBM((DL-1)
q1
q2
q3
q4
q5
q6
q7
q8
Jena
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
Pellet
32/32
2512/2512
666/666
n/a
0/200
165/165
19/19
303/303
Oracle
32/32
2512/2512
666/666
383/383
200/200
165/165
19/19
303/303
Kaon2
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
Owlim
32/32
2512/2512
666/666
383/383
200/200
165/165
19/19
303/303
q9
q10
q11
q12
q13
q14
q15
Jena
n/a
n/a
n/a
n/a
n/a
n/a
n/a
Pellet
0/1057
24/25
934/953
65/65
n/a
0/6643
0/0
Oracle
1057/1057
25/25
953/953
65/65
0/379
6643/6643
0/0
Kaon2
n/a
n/a
n/a
n/a
n/a
n/a
n/a
Owlim
1057/1057
25/25
953/953
65/65
379/379
6643/6643
0/0
In Table 3, if there is a time-out(one hour), we mark it with a “n/a”. Owlim is the only one system that
could answer all the queries. The completeness and soundness of results are shown in Table 4. Oracle
also performs well with a success on 14 out of 15 queries.
4.3.2
Summary
A benchmark is important to the evaluation of reasoning systems. The above initial work helps us
understand what the coverage of benchmarks is and how state-of-the-art systems perform on common
benchmarks.
4.4
Performance Study
The basic question addressed in this study is one of scalability of existing reasoning systems when
answering queries with customized rules in addition to native logic. Can these systems handle millions
of triples in real time? Can these systems handle an increasing complexity in the Abox? Can they handle
long transitive chains? To address the complexity issue we select both LUBM and UnivGenerator of
ScienceWeb to provide the sample data that will be involved in querying the ontology reasoning systems.
26
We define 5 custom rule sets that are going to be used. Finally, in this section we shall define the
environment, metrics and evaluation procedure to evaluate a system and provide the results of the
evaluation study. The details of the study are in Hsi et al. [5]
4.4.1
Custom Rule Sets and Queries
We now present the 5 rule sets and 3 corresponding queries that we used in the experiments. Rule sets
were defined to test basic reasoning to allow for validation, such as co-authors having to be different,
and to allow for transitivity and recursion. Rule sets 1 and 2 are for the co-authorship relation, rule set 3
is used in queries for the genealogy of PhD advisors (transitive) and rule set 4 is to enable queries for
“good” advisors. Rule set 5 is a combination of the first 4 sets.
Rule set 1: Co-author
authorOf(?x, ?p)  authorOf(?y, ?p)
⟹coAuthor(?x, ?y)
Rule set 2: validated Co-author
authorOf(?x, ?p)  authorOf(?y, ?p)  notEqual(?x, ?y)
⟹ coAuthor(?x, ?y)
Rule set 3: Research ancestor (transitive)
advisorOf(?x, ?y) ⟹ researchAncestor(?x, ?y)
researchAncestor(?x, ?y)  researchAncestor(?y, ?z)
⟹ researchAncestor(?x, ?z)
Rule set 4: Distinguished advisor (recursive)
advisorOf(?x,?y)  advisorOf(?x,?z)  notEqual(?y,?z)
 worksFor(?x,?u)
⟹ distinguishAdvisor(?x, ?u)
advisorOf(?x,?y)  distinguishAdvisor(?y,?u)  worksFor(?x,?d) ⟹ distinguishAdvisor(?x, ?d)
Rule set 5: combination of above 4 rule sets.
4.4.2
Comparison on the Full Rule Sets
Setup Time
We begin by comparing setup times for the five systems under investigation. The generated date sets
contained thousands to millions of triples of information. The setup time can be sensitive to the rules
sets used for inferencing about the data because some rules require filters (e.g., “not equal”) and some
rules express transitivity, or recursion. All will consequently involve different ABox reasoning with
different sizes and distributions of ABoxes. Therefore, each system-ontology pair was examined using
each of the 5 rules sets.
Jena, Pellet and Kaon2 load the entire data set into memory when performing inferencing. For these
systems, therefore, the maximum accepted size of the ABox depends on the size of memory. In our
environment, Jena and Pellet could only provide inferencing for small ABoxes (less than 600,000
27
triples). Kaon2 was able to return answers for ABoxes of up to 2 million triples. OWLIM and Oracle
have better scalability because they are better able to exploit external memory for inference. They both
are able to handle the largest data set posed, which is nearly 4 million triples. As the dataset grows, it
turns out Oracle has the best scalability.
For smaller ABoxes (less than 2 million triples), Kaon2 usually has the best performance on setup time.
However, as the size of the ABox grows past that point, only Oracle 11g and OWLIM are able to finish
the setup before time-out.
The performances of these systems vary considerably, especially when the size of dataset grows.
OWLIM performs better than Oracle 11g on rule sets 1, 2 and 3. However, OWLIM is not good at rule
set 4 in ScienceWeb dataset. This appears to be because the ScienceWeb dataset involves more triples in
the ABox inference for rule set 4 than does the LUBM dataset. Thus, when large ABox inferencing is
involved in set-up, Oracle 11g performs better than OWLIM. Oracle 11g needs to set a “filter” when
inserting rule set 2 into the database to implement “notEqual” function in this rule set. As the ABox
grows, more data has to be filtered while creating the entailment. This filter significantly slows the
performance. For LUBM, Oracle 11g is not able to finish the setup for a one million triple ABox in one
hour while OWLIM could finish in 100 seconds. Some data points are missing for larger size of datasets.
Query Processing Time
Based on our results, OWLIM has the best query performance in most datasets in our experiments, but
this superiority is overtaken as the size of dataset grows. Figure 4 shows the time required to process
query 1 on the ScienceWeb dataset.
Figure 4 Query processing time of query 1 for ScienceWeb dataset
In actual use, one might expect that some inferencing results will be requested repeatedly as component
steps in larger queries. If so, then the performance of a reasoning system may be enhanced if such results
are cached.
Next, we therefore record the processing time for single queries and the average processing time for
repeated queries. The average processing time is computed by executing the query consecutively 10
times. The average query processing time has been divided by the single query processing time to detect
how caching might affect query processing. Table 5 shows the caching ratio between processing time of
28
one single query and average processing time on ScienceWeb ontology for query 1. Based on our results,
OWLIM is the only one that does not have a pronounced caching effect. The caching effect of other
systems becomes weaker as the size of dataset grows.
4.4.3
Comparison among Systems on Transitive Rule
We anticipate that many ScienceWeb queries will involve transitive or recursive chains of inference. We
next examine the behavior of the system on such chains of inference. Because the instance data for the
LUBM and ScienceWeb ontologies may not include sufficient long transitive chains, we created a group
of separate instance files containing different number of individuals that are related via the transitive
rule in rule set 3.
As Figure 5 and Figure 6 show, Pellet only provides the results before time-out when the length of
transitive chain is 100. Jena’s performance degrades badly when the length is more than 200. Only
Kaon2, OWLIM and Oracle 11g could complete inference and querying on long transitive chains.
Kaon2 has the best performance in terms of setup time, with OWLIM having the second best
performance. For query processing time, OWLIM and Oracle have almost same performance in terms of
querying time. When the length of transitive chain grows from 500 to 1000, the querying time of Kaon2
has a dramatic increase.
Table 5 Caching ratios between processing time of single query and average processing time on ScienceWeb ontology
for query 1
3,511
6,728
13,244
166,163
332,248
1,327,573
2,656,491
3,653,071
3,983,538
Jena
6.13
5.57
5.50
2.40
1.87
Pellet
6.03
5.48
4.91
1.56
1.32
Oracle
5.14
2.77
2.65
5.59
3.40
3.49
3.75
3.75
8.22
Kaon2
6.34
5.59
5.59
2.24
1.65
1.02
1.02
OWLIM
1.83
1.83
1.30
1.48
1.20
1.05
1.07
1.01
1.03
Figure 5 Setup time for transitive rule
29
Figure 6 Query processing time after inference over transitive rule
The comparison of different semantic reasoning systems is shown in Table 6.
Table 6 Comparison of different semantic reasoning systems
Supported
expressive
power for
reasoning
Reasoning
algorithm
In-memory
reasoning
Materialization
Open-source
Owlim
SwiftOWLIM
BigOWLIM
R-entailment,
OWL 2 RL
Kaon2
SHIQ(D)
Pellet
SROIQ(D)
Jena
varies by
reasoner
Resolution
& Datalog
Yes
Tableau
Rule-based
Rule-based
Rule-based
Oracle
OWL:
union,
intersection,
OWL 2 RL
N/A
Yes
Yes
Yes
No
No
No
No
Yes
Yes
Yes
Yes
Yes
No
Yes
No
Yes
No
4.4.4 Summary
Based on the comparison among these state-of-the-art ontology reasoning systems on full rule
sets and transitive rule, we present a picture of these systems regarding the performance and
scalability of reasoning.
When more realistic models (ScienceWeb) than provided by LUBM are used, serious issues arise
in terms of performance and scalability when the size approaches million of triplets. OWLIM
and Oracle offer the best scalability for the kinds of datasets anticipated for ScienceWeb, but
they both have heavy loading and negative implications for evolving systems. Generally, realtime queries over large triplet spaces will have to be limited in their scope and new reasoning
mechanisms for evolving systems will be introduced.
4.5
Formative Study for ScienceWeb
We undertook a survey of faculty and PhD students at several universities in Virginia. Details of
the questions and results may be found at [7], but notable results relevant to this proposal include:
30



The perceived utility of a knowledge base with the proposed scope of ScienceWeb is high.
Respondents indicated an interest in use of such information for both research and
hiring/promotion purposes (including, apparently, job hunting by those expecting to
acquire new PhDs). More than 90% of respondents indicated that they would have used
such a system, had it been available, in the past year, to find information regarding hiring
and promotion, with more than half believing they would have used such a system
multiple times. Information on publications and projects had even higher perceived value,
with 100% of respondents indicating that they would have used such a system multiple
times in the past year.
There is rather lower willingness to invest time in collaborative construction of such a
system than might have been expected. Slightly over 40% of respondents would want to
devote no more than a few minutes at a time to such activities, with another 30% willing
to spend more than a few minutes but less than an hour. A notable exception to that rule
is in updating and making corrections to one’s own personal information in such a
knowledge base, where more than 25% of respondents indicated they would likely devote
multiple hours to validating and correcting information. This suggests that identification
of low-effort collaborative techniques will be essential to the success of a ScienceWeb.
Respondents generally believed themselves to share a consensus with both research peers
and departmental colleagues on quality judgments pertaining to research. For example,
more than 70% agreed with the statement “I share a consensus with the majority of my
research peers on the criteria that define a groundbreaking researcher in my field.”
By contrast, they did not believe that they shared such a consensus on questions such as
who might be good candidates for hiring or awarding of tenure, even with their
Departmental colleagues. Less than 20% agreed with the statement “I share a consensus
with the majority of my Departmental colleagues on the criteria for hiring a new assistant
professor”. This is an interesting distinction because one of the original motivations for
ScienceWeb came from a desire for a trusted source of information to address disputes in
the latter area.
31
5.
Research Plan
This section describes the proposed work for this thesis. The main components of this section are
the architecture design, the key research issues, the evaluation procedure and a schedule.
5.1
Proposed Work--- Adaptive Reasoning for ScienceWeb
Based on the survey of faculty and PhD students at several universities in Virginia, one finds a
rather lower willingness to invest time in collaborative construction of such a system than might
have been expected. Slightly over 40% of respondents would want to devote no more than a few
minutes at a time to such activities, with another 30% willing to spend more than a few minutes
but less than an hour.
Although the technology for storing large knowledge bases is reasonably mature, the
introduction of reasoning components into the query process poses a significant challenge. Our
own preliminary study (see Section 4.3) has shown that extant systems can provide fast
responses to queries with substantial reasoning components, but only at a cost of extensive preprocessing, that is required to integrate new knowledge instances and new rules. In essence, there
is a trade-off between scalability of query processing and support for rapid evolution of the data
and of the supporting rules from which new queries are constructed.
5.1.1 Architecture
There are two inferencing strategies that can be used by a reasoning system: one is called
Materialized Inference, where all the rules in the system are fired at once and all inferred triples
are generated. The other one is called Query-invoked Inference where relevant rules (TBox
inference) in the system are fired after a query is accepted and partial inferencing (ABox
inference) is done. In order to achieve the goal of improving the performance and scalability of
reasoning, I introduce an adaptive reasoning architecture with two enhanced inferencing
strategies: Materialized Inference with incremental function and Query-invoked Inference with
pre-computed TBox inferences. The adaptive mechanism needs to determine what part of TBox
inferencing is stable and what part of inferred instances should be temporary for incremental
inferencing. For Query-invoked Inference the mechanism needs to determine what part of Tbox
inferences should be pre-computed. As appropriate, the adaptive mechanism will switch between
Query-invoked Inference and Materialized Inference. In addition, we introduce a Long Term
Storage that will contain stable inferred instances from an incremental inference in Materialized
Inference and pre-computed TBox inferences for a query-invoked inference. Any Abox results of
a query-invoked inference will be saved in Short Term Storage. A query unaffected by changes
to ontology, custom rules or instances can be answered by searching in the long-term storage. A
query affected by changes to ontology, custom rules or instances can be answered by performing
a query-invoked inference and searching the short-term storage. In either case, a major issue is
to determine whether a query is affected by what change.
32
The resulting architecture of an adaptive reasoning system is presented in Figure 7. To explain
this architecture we present first a short description of the individual components and then a
series of scenarios that will illustrate the sequence of modules executed and the resulting data
flow upon various inputs from users.
Query
Changes
Query&Inference
Management
Change
Management
Hybrid
Reasoner
Materialization
Inference
Storage
Management
Query-invoked
Inference
Pre-computed Tbox
Inference
Storage
Query-invoked
Abox Inference
Materialized Triples
Low-level
reasoning
engine
Inferred Triples after
Query-invoked Inference
DL Reasoner
Custom rules + Ontology + Instances
Change History
Rule-based Reasoner
Figure 7 Architecture of an Adaptive Reasoning System
Input from users: Query and Changes
A Query is the basic way for users to search and retrieve information that they are interested. For
example, “Who are the ground breaking researchers in Digital Library?” Here “ground breaking”
is a qualitative descriptor that has been evolved by one (or more) user(s) by developing custom
rules and researcher and digital library are classes in the ontology.
Changes refer to changes of ontology, custom rule set and instances as originally defined by
users or harvested from the web. Generally, people might not agree on the ontology as it
originally was designed and they will make changes according to their own beliefs. Similarly,
custom rules represent a personal understanding of qualitative descriptors and as more people
add their own opinion, they will change and, it is to be hoped, these qualitative descriptors will
33
evolve to a consensus. The collection of instances will be enriched gradually with the discovery
of new sources of information by individuals and the subsequent update of the methods of
harvesting the information. Thus, changes of ontology, custom rule set and instances may occur
at random or periodic times with varying degree of frequency. Changes have a significant
influence on the process of storage and performance of query no matter whether the query
involves inferencing or not.
Query&Inference Management
Query&Inference Management is the component that makes the choice between Materialized
Inference and Query-invoked Inference based on the query and changes under different
circumstances. After a user submits a query, this component will determine if the query has been
affected by the changes to the ontology, custom rules or instances by communicating with
component of Change Management.
Change Management
Change Management maintains all the changes and arrangements in the Storage and records the
change history after changes to the ontology, custom rules or instances. It not only provides
change records to Query&Inference Management for the adaptive reasoning mechanism, but also
communicates with the Storage Management module to realize the actual changes and operations
in the storage.
Hybrid Reasoner
As a central component in the adaptive reasoning system, the Hybrid reasoner is a combination
of Materialization Inference and Query-invoked Inference, and is responsible for the reasoning
task with the assistance of a DL reasoner and a Rule-based reasoner.
Materialization Inference is one component that fires all of the rules in the system and generates
all inferred triples at once. It implements all of the inference in advance following by various
queries about the knowledge base from users. After Materialization Inference, answering query
does not involve any reasoning but simple parsing and searching.
Query-invoked Inference is another component that invokes partial inference (ABox inference)
when query is accepted after firing part of the rules (TBox inference) in the system in advance.
Pre-computed TBox Inference is responsible for the TBox inference before queries while Queryinvoked ABox Inference is responsible for the ABox inference during the query and answering.
Storage Management
The Storage Management aims to update and to arrange the storage in an organized way to
improve the scalability and performance of inferencing, especially the ABox inferencing. This
component provides mechanisms to group and index base triples obtained from the users and the
34
harvester module and triples that have been inferred such that search and updates can be done
efficiently.
Storage
There are four separate storage areas of data in the system: Materialized Triples that are
generated by Materialization Inference, Inferred Triples after Query-invoked Inference that are
generated by Query-invoked Inference, Change History that is generated by Change
Management and base storage including custom rules, ontology and instances as they were
defined at startup of the system.
Given this architecture to support adaptive reasoning for ScienceWeb, there are a number of
issues that need addressing. If minor changes happen, such as after changing a name of a
concept only, how could we retrieve correct answers without large time-consuming inferencing?
How can we implement incremental inferencing that avoids repeated inferencing that has already
been done? Which can make the reasoning system work in real time? How can we describe what
can be done in real time? How do we make use of short-term storage to enhance the performance
of inferencing?
Just to illustrate how important the answers to the above questions are let us do a gedanken
experiment on the number of triples after two ways of inferencing on the same rule sets. For the
complete rule sets and queries see [5] And for the experimental data see LUBM[124].
Let us take LUBM(0,1) as an example. The size of the base storage is 100,839 triples. After
materialize inference, the size of Materialized Triples is 64,985(RDFS+OWL), 77,434(one rule),
80,538(two rules), 80,981(three rules). The size of Inferred Triples after Query-invoked
Inference is 12,446(query related first rule), 3,101(query related second rule), 440(query related
third rule).
Let us take LUBM(0,10) as another example. The size of base storage is 624,828 triples. After
materialize inference, the size of Materialized Triples is 392,865(RDFS+OWL), 473,808(one
rule), 493,179(two rules), 495,957(three rules). The size of Inferred Triples after Query-invoked
Inference is 80,940(query related first rule), 19,371(query related second rule),2,772(query
related third rule).
We conclude from the above experiments that the size of Inferred Triples after Query-invoked
Inference is much smaller than that of Materialized Triples and that it is much more efficient to
use short-term storage rather than long-term storage.
Scenarios
Next, we will discuss the scenarios that illustrate how data flows in the above architecture and
how the adaptive reasoning system works when people post existing queries and new queries
with/without changes to the ontology, custom rules or instances. Initially, base ontology, custom
35
rules and instances are stored, grouped and indexed in the storage by Storage Management to
achieve the goal of faster search and inferencing. Materialization inferencing with the support of
DL reasoner and Rule-based reasoner is executed periodically such as once a day or a week.
After Materialization Inferencing, generated materialized triples are stored in the Storage
together with the base data. Existing query: Let us assume a user enters the query “Who are the
coauthors of Professor X in University Y?” Further let us assume the query already exists and
that the user is satisfied with the definition of “coauthor” meaning: “the people whose name has
appeared in the same paper with you”. The user accepts the definition and the reasoning system
chooses to search the answers within the Materialized Triples and will return the appropriate
answer.
New query and qualitative descriptor: Now, let us assume the user enters the query “Who are the
research ancestors of professor X?”. The query does not exist nor is the user satisfied with the
definition of “research ancestor”. The user specifies the rules for research ancestor as: “the
advisors of you or the advisors of your advisors”. The reasoning system then chooses to execute
Query-invoked Inferencing with support of DL reasoner and Rule-based reasoner. In this way it
looks for the related rules and related triples first, such as rules containing property “advisorof”
and triples containing the property “advisor” and “GraduateStudent”, and then returns the
answers by firing these related rules against these specific triples found. The Inferred Triples
after Query-invoked Inferencing will be stored in the Storage separately for answering the same
query in the future.
Changes to ontology include adding, removing or modifying concepts or properties in the
ontology.
Ontology property: A user changes the domain of property “publicationAuthor” from class
“Publication” to “Article”, then the module of Change Management will make changes to the
ontology stored in the knowledge base through Storage Management first, then remove triples
related to property “publicationAuthor” in the store of “Inferred Triples after Query-invoked
Inference”, such as triples containing property “coauthor”. After the changes, when a query is
submitted, such as “Who are coauthors of Professor X?”, of the Query & Inference Management
module will notice that the previous changes have an impact on the submitted query, so it will
choose Query-invoked Inference to answer the query. If the query is “Who are the advisors of
GraduateStudent Y?”, which has nothing to do with the changes, of the Query & Inference
Management module will choose the way of Materialized Inference to answer the query by
searching in the store of Materialized Triples.
Ontology class: A user removes the class “UnofficialPublication” from the ontology. Firstly, the
Change Management module will make changes to the ontology stored in the knowledge base
through Storage Management. Secondly, it will remove related custom rules and instances stored
in the knowledge base. Thirdly, it will remove triples related to class “UnofficialPublication” in
36
the store of “Inferred Triples after Query-invoked Inference”. After the changes, when a query is
submitted, of the Query & Inference Management module will decide the way of inferencing.
Changes to custom rules include adding, removing or modifying definition of custom rules such
as modifying the rule to define “groundbreaking professor”.
Custom rules: A user locates the existing qualitative descriptors for “groundbreaking professor”
and enters changes to the rule definition. The Change Management module will then make
changes to the custom rules that are stored in the knowledge base through Storage Management.
Next, Change Management will remove triples related to property “groundbreaking professor” in
the store of “Inferred Triples after Query-invoked Inference”. After the change, when a query is
submitted, of the Query & Inference Management module will decide the way of inferencing.
Similar to the above changes, changes to instances include adding, removing or modifying
instances in the storage, such as updating publication list of a professor.
Changes to ontology, custom rules and instances not only modify the contents of the storage, but
also have impacts on the reasoning process and answering related queries. For example, if a user
introduces a new transitive rule to define “research ancestor”, after the user asks: “who are the
research ancestors of professor X?”, the system will choose the way of Query-invoked
Inferencing. It will find the related rules and triples first, and then return the answers by firing
these rules against these specific triples. The comparison between two reasoning process, in
terms of operations that are activated by changes to ontology, custom rules and instances is
presented in Table 7.
Table 7 Comparison
changes to ontology
changes to custom rules
changes to instances


between changes to two reasoning process after changes
Operations activated by changes
Materialized Inferencing
Query-invoked Inferencing
Changes to the ontology stored in the knowledge base
Changes to the rules stored in the knowledge base(*)
Changes to the instances stored in the knowledge base(*)
N/A
Changes to the inferred triples
TBox and ABox inferencing
TBox inferencing
Materialized Triples generation
N/A
Changes to the custom rules stored in the knowledge base
N/A
Changes to the inferred triples
TBox and ABox inferencing
Related TBox inferencing(*)
Materialized Triples generation
N/A
Changes to the instances stored in the knowledge base
N/A
Changes to the inferred triples
TBox and ABox inferencing
N/A
Materialized Triples generation
N/A
“*” means possible operations may not happen
N/A means no operations
37
5.1.2 Adaptive Reasoning Mechanism
ScienceWeb is a collaborative platform that integrates efforts from users to define what data are
to be obtained in what way and how the data are to be organized and what forms the queries will
be. After a bootstrapping process has generated an initial knowledge base, we expect frequent
changes to all aspects of the knowledge base: ontology, rule set, harvesting methods, and
instance data. It is one of my hypotheses to be tested later that changes to the ontology and rule
set will stabilize over time whereas instance data will be continue to be changed as well as
periodically harvested. The Adaptive Reasoning Mechanism is designed to select the appropriate
reasoning method depending partially on the degree of change. Materialization Inference is good
at the situation with infrequent or no updates. In this situation, queries including qualitative
queries are executed without any inferencing. Any update of ontology, custom rules or instances
requires reloading of data and inference all over again, resulting in slow respond to queries
affected by these changes. Fast response to queries without inference is a merit of
Materialization Inference. Query-invoked Inference is good at the situation with fast changing
data due to that Query-invoked Inference starts from the query then searches all the triples that
meet the need, avoiding an entire materialization. As only related rules and data are involved,
answers can be returned within an acceptable time period.
5.1.2.1 Issues in Adaptive Reasoning Mechanisms
To realize an efficient adaptive reasoning system that produces most answers in real time for
even a large ScienceWeb, we need to address a number of difficult issues.



Materialization Inference
o When can incremental inference be invoked?
o What part of TBox inference and inferred triples are stable for incremental
inference?
Query-invoked Inference
o When can pre-computed TBox inferences be invoked?
o What pre-computed TBox inferences are?
o How to optimize Query-invoked Inference with large size of ABox?
Adaptive mechanism
o When can the system switch between Materialization Inference and Queryinvoked Inference?
o What are trivial changes and normal changes?
o Does the system need to differentiate the trivial changes and normal changes?
o When is a query affected by a change in the ontology or the rule set or the
instance data?
38
5.1.2.2 Example: steps for Query-invoked Inference
We take the query13[124] (to find all alumni of http://www.University0.edu) and dataset
LUBM(0,1) from LUBM as an example to illustrate the Query-invoked Inference.
1.
2.
3.
4.
Find query-related rules
Realize the query-related rules
Filter out the query-related triples
Inference on the query-related triples with the realized query-related rules
The query in SPARQL is:
"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> "
"PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#> "
"SELECT ?X "
"WHERE "
"{?X rdf:type ub:Person . "
"<http://www.University0.edu> ub:hasAlumnus ?X}";
The related rules to the query are:
@prefix ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
[inverseOf1: (?Q owl:inverseOf ?P) <-(?P owl:inverseOf ?Q) ]
[rdfs2: (?x rdf:type ?c) <-(?x ?p ?y), (?p rdfs:domain ?c)]
[rdfs3: (?y rdf:type ?c)<-(?x ?p ?y), (?p rdfs:range ?c) ]
[rdfs6: (?a ?q ?b)<-(?a ?p ?b), (?p rdfs:subPropertyOf ?q) ]
The pre-computed realized query-related rules to the query are:
@prefix ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
[rdfs200: (?x rdf:type ub:Person)<-(?x ub:doctoralDegreeFrom ?y)]
[rdfs201: (?x rdf:type ub:Person)<-(?x ub:degreeFrom ?y)]
[rdfs202: (?x rdf:type ub:Person)<-(?x ub:advisor ?y) ]
[rdfs203: (?x rdf:type ub:Person)<-(?x ub:title ?y) ]
[rdfs204: (?x rdf:type ub:Person)<-(?x ub:age ?y)]
[rdfs205: (?x rdf:type ub:Person)<-(?x ub:emailAddress ?y)]
39
[rdfs206: (?x rdf:type ub:Person)<-(?x ub:undergraduateDegreeFrom ?y) ]
[rdfs207: (?x rdf:type ub:Person)<-(?x ub:telephone ?y) ]
[rdfs208: (?x rdf:type ub:Person)<-(?x ub:mastersDegreeFrom ?y)]
[rdfs209: (?y rdf:type ub:Person)<-(?x ub:affiliateOf ?y) ]
[rdfs2010: (?y rdf:type ub:Person)<-(?x ub:hasAlumnus ?y) ]
[rdfs2011: (?y rdf:type ub:Person)<-(?x ub:member ?y) ]
[rdfs2012: (?y rdf:type ub:Person)<-(?x ub:publicationAuthor ?y) ]
[inverseOf100: (?Y ub:hasAlumnus ?X)<- (?X ub:degreeFrom ?Y) ]
[rdfs600: (?a ub:degreeFrom ?b)<-(?a ub:doctoralDegreeFrom ?b) ]
[rdfs601: (?a ub:degreeFrom ?b)<-(?a ub:mastersDegreeFrom ?b) ]
We used the backward-chaining engine in Jena to execute the inference and answers were
returned in 0.375 seconds. We used the hybrid engine in Jena to execute the inference and the
answers were returned in 208.50 seconds. We used the forward engine in Jena to execute the
inference and there were no answers returned in one hour. The above results show the superiority
of Query-invoked Inference efficiency with realized related rules compared to Materialization
Inference with all owl/rdfs rules. The superiority is helpful for answering query affected by
changes.
5.1.3 Knowledge base
In order to improve the scalability of ABox inferencing, the native storage of the knowledge base
is an alternative. Native storage will speed up inferencing as well as search since it reducing the
time for loading and updating data. Materialized Triples, Inferred Triples after Query-invoked
Inference are stored separately from base storage (that including custom rules, ontology and
instances) but we create a correlation index.
When we store triples, we group and index them to improve the inferencing performance. For
example, triples can be grouped by each property, e.g., “publicationAuthor” or “advisor”. Triples
with the same property can be stored in the same file. We will use Indexing as a way to store
relationship and enhance the search performance. Avoiding maintaining all the contents of the
repository in the main memory is also another problem we need to tackle.
5.1.3.1 Issues in Knowledge Base



How can the storage of base data and inferred data be separated and related?
When can temporary data become permanent?
When and how can the storage be updated when changes happen?
40




Do we need to group the storage by properties in the ongology?
How can we index the storage with whole graph of the data?
How can storage be updated incrementally and partially?
What part of data in the storage should be maintained in the main memory?
5.1.4 Combination of DL and Rule-based Reasoners
DL reasoners have sufficient performance on complex TBox reasoning, but they do not have
scalable query answering capabilities that are necessary in applications with large ABoxes.
Rule-based OWL reasoners are based on the implementation of entailment rules in a rule engine.
They have limited TBox reasoning completeness because they may not implement each
entailment rule or they choose the performance instead of the completeness. Hybrid reasoners
that integrate a DL reasoner and a rule engine can combine the strong points of both sides.
SementicWeb introduces custom rules for description of “qualitative descriptor” and
“quantitative descriptor”. Thus, besides the traditional Tbox and Abox reasonings, we introduce
the Rbox reasoner for custom rules inferencing. The DL reasoner is responsible for TBox
reasoning while the Rule-based Reasoner is responsible for both Rbox reasoning and ABox
reasoning. These three kinds of reasoning are separable in the reasoning system for independent
update of ontology, custom rules and instances.
These three kinds of reasoning are not independent. DL reasoner generates specific entailment
rules for the Rule-based Reasoner and ABox reasoning or Rbox reasoning is connected with
TBox reasoning. We need to maintain correlations between individual objects.
5.1.4.1 Issues in combination of reasoners




What entailment rules can be generated for the Rule-based reasoner with support of the
DL reasoner?
How can the DL reasoner be used in TBox reasoning to generate entailment rules?
When can specified entailment rules be used for Rule-based reasoning?
When can the structures in the ontology be transformed into entailment rules?
5.1.4.2 Example: Generate specified entailment rules
Let take LUBM as an example. The rdfs2[98] in Jena form is:
[rdfs2: (?x ?p ?y) (?p rdfs:domain ?c) -> (?x rdf:type ?c) ]
This form is always employed by various rule-based reasoners for OWL inferencing. Now we
decide to specify the entailment rule with specific ontology.
When we specify “?c”, the generated first level specified entailment rule is:
[rdfs200: (?x ?p ?y) (?p rdfs:domain ub:Person) -> (?x rdf:type ub:Person)
When we specify “?p”, the generated second level specified entailment rules are:
41
[rdfs201: (?x ub:doctoralDegreeFrom ?y) (ub:doctoralDegreeFrom rdfs:domain ub:Person) ->
(?x rdf:type ub:Person ) ]
[rdfs202: (?x ub:degreeFrom ?y) (ub:degreeFrom rdfs:domain ub:Person) -> (?x rdf:type
ub:Person)]
[rdfs203: (?x ub:advisor ?y) (ub:advisor rdfs:domain ub:Person) -> (?x rdf:type ub:Person) ]
[rdfs204: (?x ub:title ?y) (ub:title rdfs:domain ub:Person) ->(?x rdf:type ub:Person) ]
[rdfs205: (?x ub:age ?y) (ub:age rdfs:domain ub:Person) ->(?x rdf:type ub:Person) ]
[rdfs206: (?x ub:emailAddress ?y) (ub:emailAddress rdfs:domain ub:Person) ->(?x rdf:type
ub:Person) ]
When we got the pairs of “?p” and “?c”(?p rdfs:domain ?c), the generated third level specified
entailment rules are:
[rdfs207: (?x ub:doctoralDegreeFrom ?y) -> (?x rdf:type ub:Person ) ]
[rdfs208: (?x ub:degreeFrom ?y) -> (?x rdf:type ub:Person)]
[rdfs209: (?x ub:advisor ?y) -> (?x rdf:type ub:Person) ]
[rdfs2010: (?x ub:title ?y) ->(?x rdf:type ub:Person) ]
[rdfs2011: (?x ub:age ?y) ->(?x rdf:type ub:Person) ]
[rdfs2012: (?x ub:emailAddress ?y) ->(?x rdf:type ub:Person) ]
These generated specified entailments will be created with support of DL reasoner and will be
selected by level and sent to Rule-based reasoner for inference.
5.1.4.3 Example: Transform structure into entailment rule
Let us take query12 (find all the Chairs in http://www.University0.edu) and dataset LUBM(0,1)
from LUBM as an example.
The query in SPARQL is:
"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> "
"PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#> "
"SELECT ?X ?Y "
"WHERE "
"{?X rdf:type ub:Chair . "
"?Y rdf:type ub:Department . "
"?X ub:worksFor ?Y . "
"?Y ub:subOrganizationOf <http://www.University0.edu>}";
The structure in the ontology concerning “Chair” is:
<owl:Class rdf:ID="Chair">
<rdfs:label>chair</rdfs:label>
<owl:intersectionOf rdf:parseType="Collection">
<owl:Class rdf:about="#Person" />
<owl:Restriction>
<owl:onProperty rdf:resource="#headOf" />
<owl:someValuesFrom>
42
<owl:Class rdf:about="#Department" />
</owl:someValuesFrom>
</owl:Restriction>
</owl:intersectionOf>
<rdfs:subClassOf rdf:resource="#Professor" />
</owl:Class>
From the query and the structure, we can generate the following entailment rule. After we
transform this structure into an entailment rule:
[rule00: (?x rdf:type ub:Chair)<-(?x rdf:type ub:Professor) (?x ub:headOf ?y)(?y rdf:type
ub:Department) ]
The entailment rule could be sent to Rule-based reasoner for either Materialized Inference or
Query-invoked Inference.
5.2
Evaluation
In this section, we shall describe both the process of validating the correctness of our approach
and design and that of evaluating the effectiveness of our system. The evaluation should be
geared towards questions such as: “Is the adaptive reasoning system efficient enough to answer
the questions within the time limit acceptable to users, especially under the condition of evolving
ontology, custom rules and instances?”, “Is the system able to handle large scalable knowledge
base?”, “How does the system choose reasoning mechanism?”. All of the evaluation is based on
the assumption that query has been formulated.
5.2.1 Test Bed
The adaptive reasoning system will use benchmarks LUBM[124], UOBM[145] and the ODU
benchmark generated by our generator[6]. The reasons that why we choose these three systems
are: First, the classes and properties of the ontology from these three benchmarks are all topicrelated to “ScienceWeb” that is totally about the academic products and experience of
researchers, for example, “author of the publication” as a property”, “Fullprofessor” as a class,
even though the exact names for classes and properties may not be same. This relevance makes
these three benchmarks all appropriate enough for the evaluation relying on the experimental
collaborative work on ScienceWeb from real researchers. Second, all of these three benchmarks
are able to generate instances of varying sizes, providing the scalable knowledge base to test the
adaptive reasoning system in terms of scalability that is one of the most crucial features of the
reasoning system. Third, although these three benchmarks are same with the subjects and the
scalability, the distributions for the many properties that relate objects across the knowledge base
are different. This kind of difference could test the performance of the system’s reasoning ability
facing different combinations of rules and related instances, such as “coauthor” rule and the
number of instances of authors. Fourth, the different levels of expressive power supported in the
benchmarks, such as “OWL Lite” or “OWL DL”, tests the supported expressive power for
reasoning of the adaptive reasoning system.
43
5.2.2 Formative Evaluation
In Section 4.5, we described how we have begun the process of formative evaluation by
surveying potential community members on their view on the requirements for ScienceWeb. We
will continue to use the community identified in the survey (Virginia researchers) but will
expand the set by going one level deeper and add the collaborators of the identified researchers
from the preliminary study. We will mainly focus on the components related to the effectiveness
of reasoning ability, such as the effectiveness of combination of DL reasoner and Rule-based
reasoner, the effectiveness of materialized inferencing, the effectiveness of retrieving answers in
the materialized triples, the effectiveness of query-based inferencing in face of different custom
rules and the effectiveness of storage of knowledge base. All these components will be evaluated
by time.
We will choose virtual users, graduate students who have been instructed in the purpose and use
of the system. They will be asked to post queries, compose custom rules and make necessary
changes to the ontology. We will record and analyze the user behavior and conclude certain
aspects of it to help us to build the reasoning system.
5.2.3 Summative Evaluation
In the Summative evaluation, we will still make use of the statistics from the use of the system
and try to evaluate the system based on: “How long do users need to wait after they post a old
query?”, “How long do users need to wait after they post a new query and new custom
rules?”,”How frequently do users update the ontology?”We will conduct evaluations of the
adaptive reasoning system after the system is built and improve it.
The effectiveness of the adaptive reasoning mechanism
The system I will be building is adaptive to the different ways ScienceWeb will be used. As
users change the ontology, custom rules and instances in the storage and post queries, we will
record the usage information and response information from the system continually. We will
record the changes of ontology, custom rules and instances with the date, the queries and
corresponding date, the response time, exact answers and corresponding reasoning mechanism
used as well. Second, using the statistics I will analyze the factors that affect the effectiveness of
reasoning mechanism under different circumstances. This will include the correlation between
frequency of the changes of the ontology, custom rules and instances and the choice of the
reasoning mechanism, the correlation between the kinds of the changes of the ontology, custom
rules and instances and the choice of the reasoning mechanism, the effects of the frequency of
the queries with new qualitative descriptors on the response time and the effects of the frequency
of the repeated queries on the response time. Third, we will set up a schedule for long-term and
short-term reasoning updates such as the time when the full materialization should be done again
to improve the effectiveness of adaptive reasoning mechanism according to the usage
information.
44
Evaluation of the adaptive reasoning system on qualitative queries only involving old
custom rules
This evaluation process is to test the efficiency of Materialized Inferencing in the adaptive
reasoning system. After collecting queries from 10 researchers including students and professors
based on a given ontology and collecting rules for explaining all the qualitative descriptors in the
collected queries from the above researchers, we will use the generator to generate different sizes
of instances scaling from thousands to millions. Then we will execute Materialized Inferencing.
The efficiency of the reasoning system will be tested on qualitative queries only involving old
custom rules. In addition, the efficiency is evaluated by how long it will be spent from
submitting the query to returning the answers, which includes time used on instance retrieving.
Evaluation of the adaptive reasoning system on qualitative queries involving changed
ontology, rules or instances
This evaluation process is to test the efficiency of the triple store and Query-invoked Inferencing
in the adaptive reasoning system. After collecting queries from 10 researchers including students
and professors based on a given ontology and collecting rules for explaining all the qualitative
descriptors in the collected queries from the above researchers, we will use the generator to
generate different sizes of instances scaling from thousands to millions.
We will choose 20-30 researchers including students and professors and provide them with a
script of changes. First, a user will change a property in the ontology. The system will then
execute Query-invoked Inferencing and test the efficiency of it. Second, let them change the
custom rules. Then execute Query-invoked Inferencing and test the efficiency of it. Third, let
them change the instances. Then execute Query-invoked Inferencing and test the efficiency of it.
Fourth, let them change the ontology, the custom rules and the instances together. Then execute
Query-invoked Inferencing and test the efficiency of it.
All the changes include adding, editing and deleting. The efficiency is evaluated by how long it
will be spent from submitting the changes to returning the answers, which includes time used on
loading data and inferencing.
The completeness and soundness of the results of queries
The completeness and soundness of the results is one of the most important factors that prove the
usability of the system. However, it is difficult and time-consuming to validate the completeness
and soundness of the results, especially with unpredictable queries and changing ontology,
custom rules and instances. To evaluate the completeness and soundness, we will record the
feedback from users. They will check the completeness and the soundness of the answers after
they post queries. The completeness and the soundness of the answers will be recorded with the
queries. We will examine the situations at which the reasoning system fails to return the right
and complete answers then make necessary changes, such as add more entailments.
45
5.3
Schedule
We have partitioned the tasks along the lines of our research issues and the time line is presented
in Table 8, below.
Table 8 Schedule of tasks
Tasks
Schedule
Year 1
3
6
9
Year 2
12
15
18
Community feedback
Indentify metrics
Select and create benchmark
Define qualitative queries and custom
rules
Research and Design
DL Reasoning algorithm
Rule-based Reasoning algorithm
Materialized Inferencing
Query-invoked Inferencing
Tool Development
Combine DL reasoner and Rule-based
reasoner
Store, group and index base triples
and inferred triples
Import Query-invoked Inferencing
Optimize inferencing
Testing and Evaluation
Create a test bed
Formative Evaluation
Summative Evaluation
6.
Contributions
The contributions of this work is in designing an architecture and developing new technologies
and an adaptive reasoning system for ScienceWeb, allowing users to: (a) get answers effectively
in real-time after posting existing qualitative queries, (b) get answers effectively in real-time
after changing the ontology, custom rules, or instances and posting qualitative queries, and (c)
get answers effectively in real-time when the size of knowledge base scales from thousands to
millions.
Sufficient reasoning support is a cornerstone for any realization of ScienceWeb. The work in
this thesis will provide efficient reasoning based on a scalable knowledge base. My system is
adaptive under changing situations to ensure the efficient reasoning. To be specific, first, we will
combine both Query-invoked Inference and Materialization Inference with enhanced function
46
such as incremental inference for more efficient query and answering. Second, we will introduce
adaptive mechanisms that switch between Query-invoked Inference with Materialization
Inference after changes have been made. . Third, we will introduce new ways of storing,
grouping and indexing in knowledge base for faster searching and reasoning as the size of the
triple set scales to the millions and the complexity of the Abox increases. Since reasoning is
necessary to many semantic applications, my contributions to the field of efficient reasoning
should enable one to extend these methods to other than the ScienceWeb applications.
47
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Microsoft. Microsoft Academic Search. 2010 [cited 2010 October,27]; Available from:
http://academic.research.microsoft.com/.
Nie, Z., et al., Object-level ranking: bringing order to web objects, in Proceedings of the 14th
international World Wide Web conference 2005: Chiba, Japan. p. 567-574.
Doan, A., et al., Community information management. IEEE Data Eng. Bull, 2006. 29(1): p. 64-72.
Tang, J., et al., Arnetminer: Extraction and mining of academic social networks, in Proceedings of
the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(SIGKDD'2008). 2008, ACM. p. 990-998.
Shi, H., et al., Comparison of Ontology Reasoning Systems Using Custom Rules, in
(accepted)International Conference on Web Intelligence, Mining and Semantics. 2011: Sogndal,
Norway.
Yaseen, A., et al., Performance Evaluation of Oracle Semantic Technologies With Respect To User
Defined Rules, in (submitted) 10th International Workshop on Web Semantics. 2011: Toulouse,
France.
Maly, K., S. Zeil, and M. Zubair. ScienceWeb pre-survey. 2010 [cited 2010 December,3];
Available from: http://www.cs.odu.edu/~zeil/scienceWeb/survey10/.
IEEE. IEEE-The world's largest professional association for the advancement of technology. 2010
[cited 2010 October,27]; Available from: http://www.ieee.org/index.html.
ACM Inc. ACM Digital Library. 2010 [cited 2010 October,27]; Available from:
http://portal.acm.org/.
Springer. Springer-International Publisher Science,Technology,Medicine. 2010 [cited 2010
October,27]; Available from: http://www.springer.com.
Google. Google Scholar. 2010 [cited 2010 October,27]; Available from:
http://scholar.google.com/.
Ley, M. DBLP Bibliography. 2010 [cited 2010 October,27]; Available from:
http://www.informatik.uni-trier.de/~ley/db/.
Ley, M., The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives, in
Proc. SPIRE 2002. 2002: Lisbon, Portugal. p. 1-10.
The Pennsylvania State University. CiteSeerX. 2010 [cited 2010 October,27]; Available from:
http://citeseerx.ist.psu.edu/.
Giles, C., K. Bollacker, and S. Lawrence, CiteSeer: An automatic citation indexing system, in
Proceedings of the third ACM conference on digital libraries. 1998. p. 89-98.
getCITED Inc. getCITED:Academic Research,citation reports and discussion lists. 2010 [cited
2010 October,27]; Available from: http://www.getcited.org/.
Chitea, H., F. Suchanek, and I. Weber, ESTER: Efficient Search on Text, Entities, and Relations, in
Proceedings of the 30th ACM SIGIR Conference (2007). 2007. p. 671–678.
Cafarella, M., et al., Structured querying of web text, in Proceedings of CIDR(2007). 2007:
Asilomar, California, USA. p. 225–234.
Cheng, T., X. Yan, and K. Chang, EntityRank: searching entities directly and holistically, in
Proceedings of the 33rd International Conference on Very Large Data Bases. 2007:
Vienna,Austria. p. 387-398.
DeRose, P., et al., Building structured web community portals: A top-down, compositional, and
incremental approach, in Proceedings of the 33rd International Conference on Very Large Data
Bases. 2007: Vienna, Austria. p. 399-410.
48
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
Kasneci, G., et al., Naga: Searching and ranking knowledge, in Proceedings of the 24th
International Conference on Data Engineering. 2008: Cancun, Mexico. p. 953–962.
Milne, D., I. Witten, and D. Nichols, A knowledge-based search engine powered by wikipedia, in
Proceedings of the sixteenth ACM conference on Conference on information and knowledge
management. 2007: Lisbon, Portugal p. 445-454.
DeRose, P., et al., DBLife: A community information management platform for the database
research community, in CIDR 2007. 2007. p. 169-172.
Department of Mathematics North Dakota State University. The Mathematics Genealogy Project.
2010 [cited 2010 October,27]; Available from: http://genealogy.math.ndsu.nodak.edu/.
Bizer, C., et al., DBpedia-A crystallization point for the Web of Data. Web Semantics: Science,
Services and Agents on the World Wide Web, 2009. 7(3): p. 154-165.
Suchanek, F., G. Kasneci, and G. Weikum, Yago: A large ontology from wikipedia and wordnet.
Web Semantics: Science, Services and Agents on the World Wide Web, 2008. 6(3): p. 203-217.
Aleman-Meza, B., et al., SwetoDblp ontology of Computer Science publications. Web Semantics:
Science, Services and Agents on the World Wide Web, 2007. 5(3): p. 151-155.
Glaser, H., I. Millard, and A. Jaffri, Rkbexplorer. com: a knowledge driven infrastructure for linked
data providers, in Proceedings of the 5th European semantic web conference on The semantic
web: research and applications. 2008: Tenerife, Canary Islands, Spain. p. 797-801.
Wikipedia. Wiki-Wikipedia,the free encyclopedia. 2010 [cited 2010 October,27]; Available from:
http://en.wikipedia.org/wiki/Wiki.
Princeton University. About WordNet. 2010 [cited 2010 December 10,2010]; Available from:
http://wordnet.princeton.edu/.
Niskanen, I. and J. Kantorovitch, Ontology driven data mining and information visualization for
the networked home, in 2010 Fourth International Conference on Research Challenges in
Information Science (RCIS). 2010: Nice, France p. 147-156.
Zeman, M., et al., Ontology-Driven Data Preparation for Association Mining, in Znalosti 2009.
2009: Brno. p. 270-283.
Kuo, Y., et al., Domain ontology driven data mining: a medical case study, in Proceedings of the
2007 international workshop on Domain driven data mining. 2007: San Jose, California p. 11-17.
Singh, S., P. Vajirkar, and Y. Lee, Context-aware Data Mining using Ontologies. Conceptual
Modeling - ER 2003 2003. 2813/2003: p. 405-418.
Happel, H. and S. Seedorf, Applications of ontologies in software engineering, in 2nd
International Workshop on Semantic Web Enabled Software Engineering (SWESE 2006), held at
the 5th International Semantic Web Conference (ISWC 2006). 2006: Athens,GA, USA. p. 5-9.
Lopez, V., M. Pasin, and E. Motta, Aqualog: An ontology-portable question answering system for
the semantic web. The Semantic Web: Research and Applications, 2005. 3532: p. 546-562.
Tablan, V., D. Damljanovic, and K. Bontcheva, A natural language query interface to structured
information, in Proceedings of the 5th European semantic web conference on The semantic web:
research and applications. 2008: Tenerife, Canary Islands, Spain p. 361-375.
Zenz, G., et al., From keywords to semantic queries--Incremental query construction on the
semantic web. Web Semantics: Science, Services and Agents on the World Wide Web, 2009. 7(3):
p. 166-176.
Diaz, A. and G. Baldo, Co-protégé: A groupware tool for supporting collaborative ontology design
with divergence, in The Eighth International Protégé Conference 2005. p. 32-32.
Gruber, T., Ontolingua: A mechanism to support portable ontologies. Vol. KSL-91. 1992: Stanford
University, Knowledge Systems Laboratory. 61.
49
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
Swartout, B., et al., Ontosaurus: a tool for browsing and editing ontologies, in Proceedings of the
6th international The semantic web and 2nd Asian conference on Asian semantic web conference
1996. p. 1564-1571.
Sure, Y., J. Angele, and S. Staab, OntoEdit: Guiding ontology development by methodology and
inferencing. On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE,
2010. Volume 2519/2002: p. 1205-1222.
Corcho, Ó., et al., WebODE: An integrated workbench for ontology representation, reasoning,
and exchange. Knowledge Engineering and Knowledge Management: Ontologies and the
Semantic Web, 2002. Volume 2473/2002: p. 295-310.
Volz, R., et al., Kaon server-a semantic web management system, in Alternate Track Proceedings
of the Twelfth InternationalWorld WideWeb Conference, WWW2003. 2003: Budapest, Hungary.
p. 20-24.
Bechhofer, S., et al., OilEd: a reason-able ontology editor for the semantic web. KI 2001:
Advances in Artificial Intelligence, 2001. Volume 2174/2001: p. 396-408.
Jie, T., et al., ArnetMiner: extraction and mining of academic social networks, in Proceeding of
the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 2008,
ACM: Las Vegas, Nevada, USA.
NIH. PubMed Central. 2010 [cited 2010 October,27]; Available from:
http://www.ncbi.nlm.nih.gov/pmc/.
Sciences, N.A.o. The National Academies: Advisers to the Nation on Science, Engineering, and
Medicine. 2010 [cited 2010 October,27]; Available from: http://www.nationalacademies.org/.
Nardi, D. and R.J. Brachman, An introduction to description logics, in The description logic
handbook: theory, implementation, and applications. 2002. p. 1–40.
Minsky, M., A framework for representing knowledge. The Psychology of Computer Vision, 1975.
Quillian, M.R., Semantic memory. Semantic information processing, 1968: p. 227–270.
Baader, F., The description logic handbook: theory, implementation, and applications. 2003:
Cambridge Univ Pr.
Baader, F., I. Horrocks, and U. Sattler, Description logics. Foundations of Artificial Intelligence,
2008. 3: p. 135-179.
Wikipedia. Discription Logic. 2011 [cited 2011 March 8th]; Available from:
http://en.wikipedia.org/wiki/Description_logic.
Stoilos, G., An Introduction to Description Logics. 2005.
Gruber, T.R., A translation approach to portable ontology specifications. Knowledge acquisition,
1993. 5: p. 199-199.
Mays, E., R. Dionne, and R. Weida, K-Rep system overview. ACM SIGART Bulletin, 1991. 2(3): p.
97.
Areces, C., M. De Rijke, and H. De Nivelle, Resolution in modal, description and hybrid logic.
Journal of Logic and Computation, 2001. 11(5): p. 717.
Hustadt, U., B. Motik, and U. Sattler, Reducing SHIQ- description logic to disjunctive datalog
programs, in Proc. of the 9th Int. Conf. on Principles of Knowledge Representation and Reasoning
(KR 2004). 2004: Morgan Kaufmann, Los Altos. p. 152–162.
Hustadt, U. and R.A. Schmidt, On the relation of resolution and tableaux proof systems for
description logics, in Proc. of the 16th Int. Joint Conf. on Arti_cial Intelligence (IJCAI'99). 1999. p.
110-117.
Hustadt, U. and R. Schmidt, Issues of decidability for description logics in the framework of
resolution. Automated Deduction in Classical and Non-Classical Logics, 2000: p. 532-535.
Kazakov, Y. and B. Motik, A Resolution-Based Decision Procedure for SHOIQ. Journal of
Automated Reasoning, 2008. 40(2): p. 89-116.
50
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
Baader, F., et al., From tableaux to automata for description logics. Fundamenta Informaticae,
2003. 57(2): p. 247-279.
Lutz, C., Interval-based temporal reasoning with general TBoxes, in Proc.of the 17th Int. Joint
Conf. on Artificial Intelligence (IJCAI 2001). 2001. p. 85-96.
Lutz, C. and U. Sattler, Mary likes all cats, in Proceedings of the 2000 International Workshop in
Description Logics (DL2000). 2000. p. 213-226.
Baader, F. and U. Sattler, An overview of tableau algorithms for description logics. Studia Logica,
2001. 69(1): p. 5-40.
Fitting, M., Tableau methods of proof for modal logics. Notre Dame Journal of Formal Logic,
1972. 13(2): p. 237-247.
Schmidt-Schauß , M. and G. Smolka, Attributive concept descriptions with complements. Artificial
intelligence, 1991. 48(1): p. 1-26.
Horrocks, I., O. Kutz, and U. Sattler, The even more irresistible SROIQ, in Proc.of the 10th Int.
Conf. on Principles of Knowledge Representation and Reasoning(KR 2006). 2006: Lake District,
UK. p. 57–67.
Russell, S.J. and P. Norvig, Artificial intelligence: a modern approach. 2009: Prentice hall.
Forgy, C.L., Rete: A fast algorithm for the many pattern/many object pattern match problem* 1.
Artificial intelligence, 1982. 19(1): p. 17-37.
Kiryakov, A., D. Ognyanov, and D. Manov, OWLIM–a pragmatic semantic repository for OWL.
Web Information Systems Engineering, 2005. 3807/2005: p. 182-192.
Ontotext. OWLIM-OWL Semantic Repository. 2010 [cited 2010 October 13]; Available from:
http://www.ontotext.com/owlim/.
Oracle Corporation. Oracle Database 11g R2 2010 [cited 2010 October,13]; Available from:
http://www.oracledatabase11g.com.
Zhou, J., et al., Minerva: A scalable OWL ontology storage and inference system. The Semantic
Web–ASWC 2006, 2006: p. 429-443.
Heflin, J. and Z. Pan, DLDB: Extending Relational Databases to Support Semantic Web Queries, in
Proceddings of Workshop on Practical and Scaleable Semantic Web Systems. 2003. p. 109-113.
Erling, O. and I. Mikhailov, RDF Support in the Virtuoso DBMS. Networked KnowledgeNetworked Media, 2009: p. 7-24.
Franz Inc. AllegroGraph RDFStore Web 3.0's Database. 2011; Available from:
http://www.franz.com/agraph/allegrograph/.
Dolby, J., et al., Scalable highly expressive reasoner (SHER). Web Semantics: Science, Services
and Agents on the World Wide Web, 2009. 7(4): p. 357-361.
Chen, Y., et al., HStar-a semantic repository for large scale OWL documents. The Semantic Web–
ASWC 2006, 2006: p. 415-428.
Jena team and Hewlett-Packard. Jena Semantic Web Framework. 2010 [cited 2010 October,13];
Available from: http://jena.sourceforge.net/.
Lee, T.B., J. Hendler, and O. Lassila, The semantic web. Scientific American, 2001. 284(5): p. 3443.
Antoniou, G., et al., Combining Rules and Ontologies-A survey. Deliverables I3-D3, REWERSE.
2005.
Mei, J., Z. Lin, and H. Boley, ALC : an integration of description logic and general rules, in
Proceedings of the Web Reasoning and Rule Systems. 2007, Springer-Verlag. p. 163-177.
Rosati, R., Semantic and computational advantages of the safe integration of ontologies and
rules. Principles and Practice of Semantic Web Reasoning, 2005. 3703/2005: p. 50-64.
Rosati, R., DL+ log: Tight integration of description logics and disjunctive datalog. Principles of
Knowledge Representation and Reasoning, 2006: p. 68–78.
51
87.
88.
89.
90.
91.
92.
93.
94.
95.
96.
97.
98.
99.
100.
101.
102.
103.
104.
105.
106.
107.
108.
Horrocks, I., et al., SWRL: A semantic web rule language combining OWL and RuleML. W3C
Member submission, 2004. 21.
Grosof, B.N., et al., Description logic programs: Combining logic programs with description logic,
in Proc. of the Twelfth International World Wide Web Conference (WWW 2003). 2003, ACM. p.
48-57.
Donini, F.M., et al., AL-log: Integrating datalog and description logics. Journal of Intelligent
Information Systems, 1998. 10(3): p. 227-252.
Rosati, R., Towards expressive KR systems integrating datalog and description logics: Preliminary
report, in Proc. of DL’99. 1999, Citeseer. p. 160–164.
Drabent, W., J. Henriksson, and J. Maluszynski, HD-rules: a hybrid system interfacing Prolog with
DL-reasoners, in Applications of Logic Programming to the Web, Semantic Web and Semantic
Web Services. 2007, Citeseer. p. 76-90.
Eiter, T., et al., NLP-DL: A Knowledge-Representation System for Coupling Nonmonotonic Logic
Programs with Description Logics, in International Semantic Web Conference. 2005, Citeseer.
Levy, A.Y. and M.C. Rousset, Combining Horn rules and description logics in CARIN. Artificial
intelligence, 1998. 104(1-2): p. 165-209.
Eiter, T., et al., Combining answer set programming with description logics for the semantic web.
Artificial intelligence, 2008. 172(12-13): p. 1495-1539.
Rosati, R., On the decidability and complexity of integrating ontologies and rules. Web Semantics:
Science, Services and Agents on the World Wide Web, 2005. 3(1): p. 61-73.
Mei, J., et al., DatalogDL: Datalog Rules Parameterized by Description Logics. Springer Series:
Semantic Web and Beyond, 2010. 2: p. 171-187.
Grant, J. and D. Beckett, RDF test cases. W3C, February, 2004.
Hayes, P. and B. McBride, RDF semantics. W3C recommendation, 2004. 10: p. 38-45.
Horst, H., Combining RDF and part of OWL with rules: Semantics, decidability, complexity. The
Semantic Web–ISWC 2005, 2005. 3729/2005: p. 668-684.
ter Horst, H.J., Extending the RDFS entailment lemma. The Semantic Web–ISWC 2004, 2004.
3298/2004: p. 77-91.
ter Horst, H.J., Completeness, decidability and complexity of entailment for RDF Schema and a
semantic extension involving the OWL vocabulary. Web Semantics: Science, Services and Agents
on the World Wide Web, 2005. 3(2-3): p. 79-115.
Haarslev, V. and R. Möller, Racer: A core inference engine for the semantic web, in Proceedings
of the 2nd International Workshop on Evaluation of Ontology-based Tools. 2003, Citeseer. p. 2736.
Sirin, E., et al., Pellet: A practical owl-dl reasoner. Web Semantics: Science, Services and Agents
on the World Wide Web, 2007. 5(2): p. 51-53.
Clark & Parsia. Pellet:The Open Source OWL2 Reasoner. 2010 [cited 2010 October,13];
Available from: http://clarkparsia.com/pellet/.
Tsarkov, D. and I. Horrocks, FaCT++ description logic reasoner: System description. Automated
Reasoning, 2006: p. 292-297.
Motik, B., R. Shearer, and I. Horrocks, Hypertableau reasoning for description logics. Journal of
Artificial Intelligence Research, 2009. 36(1): p. 165-228.
Kopena, J. and W. Regli, OWLJessKB: a semantic web reasoning tool. 2005.
Matheus, C.J., et al., BaseVISor: A Forward-Chaining Inference Engine Optimized for RDF/OWL
Triples, in Digital Proceedings of the 5th International Semantic Web Conference, ISWC 2006.
2006, Citeseer: Athens, GA.
52
109.
110.
111.
112.
113.
114.
115.
116.
117.
118.
119.
120.
121.
122.
123.
124.
125.
126.
127.
128.
129.
Carroll, J.J., et al., Jena: implementing the semantic web recommendations, in Proceedings of the
13th international World Wide Web conference on Alternate track papers & posters 2004, ACM.
p. 74-83.
Jang, M. and J.C. Sohn, Bossam: an extended rule engine for OWL inferencing. Rules and Rule
Markup Languages for the Semantic Web, 2004: p. 128-138.
Grosof, B. and C. Neogy. SweetRules Project. 2005 [cited 2011 March 8th]; Available from:
http://sweetrules.semwebcentral.org/.
Reynolds, D. Jena2 Inference Support. 2010 [cited 2011 March 8th]; Available from:
http://jena.sourceforge.net/inference/.
Motik, B. and R. Studer, KAON2–A Scalable Reasoning Tool for the Semantic Web, in Proc. 2nd
ESWC. 2005: Heraklion, Greece.
Information Process Engineering (IPE), Institute of Applied Informatics and Formal Description
Methods (AIFB), and I.M.G. (IMG). KAON2-Ontology Management for the Semantic Web. 2010
[cited 2010 October,13]; Available from: http://kaon2.semanticweb.org/.
Zou, Y., T. Finin, and H. Chen, F-owl: An inference engine for semantic web. Formal Approaches
to Agent-Based Systems, 2005: p. 238-248.
Haarslev, V. and R. Möller, An empirical evaluation of optimization strategies for ABox reasoning
in expressive description logics, in Proc. Workshop on Description Logics. 1999, Citeseer. p. 115–
119.
Horrocks, I., et al., The instance store: DL reasoning with large numbers of individuals, in
Description Logic Workshop. 2004, Citeseer. p. 31-40.
Motik, B. and U. Sattler, A comparison of reasoning techniques for querying large description
logic aboxes. Logic for Programming, Artificial Intelligence, and Reasoning, 2006. 4246/2006: p.
227-241.
Meditskos, G. and N. Bassiliades, Combining a DL reasoner and a rule engine for improving
entailment-Based OWL reasoning. The Semantic Web-ISWC 2008, 2010: p. 277-292.
Meditskos, G. and N. Bassiliades, DLEJena: A practical forward-chaining OWL 2 RL reasoner
combining Jena and Pellet. Web Semantics: Science, Services and Agents on the World Wide
Web, 2010. 8(1): p. 89-94.
Li, H., et al., A Reasoning Algorithm for pD*. The Semantic Web–ASWC 2006, 2006: p. 293-299.
Heymans, S., et al., Ontology reasoning with large data repositories. Ontology Management,
2008: p. 89-128.
Broekstra, J., A. Kampman, and F. Van Harmelen, Sesame: A generic architecture for storing and
querying rdf and rdf schema. The Semantic Web—ISWC 2002, 2002: p. 54-68.
Guo, Y., Z. Pan, and J. Heflin, LUBM: A benchmark for OWL knowledge base systems. Web
Semantics: Science, Services and Agents on the World Wide Web, 2005. 3(2-3): p. 158-182.
Franz Inc. Introcuction of AllegroGraph 4.2. 2011 [cited 2011 March 8th]; Available from:
http://www.franz.com/agraph/support/documentation/v4/agraph-introduction.html.
Jena team and Hewlett-Packard. TDB/Architecture-Jena Wiki. 2009 [cited 2011 March 8th];
Available from: http://www.openjena.org/wiki/TDB/Architecture.
Erling, O. and I. Mikhailov, Towards web scale RDF, in 4th International Workshop on Scalable
Semantic Web Knowledge Base Systems (SSWS 2008),. 2008.
Erling, O. Advances in Virtuoso RDF Triple Storage (Bitmap Indexing). 2006 [cited 2011 March
8th]; Available from:
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSBitmapIndexing.
Motik, B., Reasoning in description logics using resolution and deductive databases. PhD theis,
University Karlsruhe, Germany, 2006.
53
130.
131.
132.
133.
134.
135.
136.
137.
138.
139.
140.
141.
142.
143.
144.
145.
146.
147.
148.
Motik, B., U. Sattler, and R. Studer, Query answering for OWL-DL with rules. Web Semantics:
Science, Services and Agents on the World Wide Web, 2005. 3(1): p. 41-60.
Fokoue, A., et al., The summary abox: Cutting ontologies down to size. The Semantic Web-ISWC
2006, 2006: p. 343-356.
Dolby, J., et al., Scalable semantic retrieval through summarization and refinement, in
Proceedings of the national conference on artificial intelligence 2007. p. 299-304.
Guo, Y. and J. Heflin, A scalable approach for partitioning owl knowledge bases, in Second
International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2006). 2006,
Citeseer. p. 47–60.
Wandelt, S. and R. Möller, Scalability of OWL reasoning: role condensates. On the Move to
Meaningful Internet Systems 2007: OTM 2007 Workshops, 2007. 4806/2007: p. 1145-1154.
Calvanese, D., et al., DL-Lite: Tractable description logics for ontologies, in Proceedings of the
12th National Conference on Artificial Intelligence. 2005. p. 602-607.
Calvanese, D., et al., Data complexity of query answering in description logics, in Proceedings of
the 10th International Conference on the Principles of Knowledge Representation and Reasoning.
2006. p. 260–270.
Hustadt, U., B. Motik, and U. Sattler, Data complexity of reasoning in very expressive description
logics, in Proceedings of the 19th International Joint Conference on Artificial Intelligence. 2005,
Citeseer. p. 466-471.
Eiter, T., et al., Query answering in the description logic Horn-SHIQ, in Proc. of JELIA. 2008. p.
166–179.
Schmidt, M., et al., SP^ 2Bench: A SPARQL Performance Benchmark, in IEEE 25th International
Conference on Data Engineering ( ICDE '09). 2009, IEEE. p. 222-233.
Jena team and Hewlett-Packard. ARQ- A SPARQL Processor for Jena. 2011 [cited 2011 March
8th]; Available from: http://jena.sourceforge.net/ARQ/.
Bizer, C. and A. Schultz, The berlin sparql benchmark. International Journal on Semantic Web &
Information Systems, 2009. 5(2): p. 1-24.
semanticweb.org. SPARQL endpoint. 2009 [cited 2011 March 30]; Available from:
http://semanticweb.org/wiki/SPARQL_endpoint.
W3C. SparqlEndpoints. 2011 [cited 2011 March 30]; Available from:
http://www.w3.org/wiki/SparqlEndpoints.
Harris, S., N. Lamb, and N. Shadbolt, 4store: The Design and Implementation of a Clustered RDF
store, in Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge
Base Systems. 2009. p. 81-96.
Ma, L., et al., Towards a complete OWL ontology benchmark. The Semantic Web: Research and
Applications, 2006: p. 125-139.
Weithöner, T., et al., What’s wrong with OWL benchmarks, in Proceedings of the 2nd
International Workshop on Scalable Semantic Web Knowledge Base Systems. 2006, Citeseer. p.
101–114.
Weithöner, T., et al., Real-world reasoning with OWL. The Semantic Web: Research and
Applications, 2007: p. 296-310.
Yaseen, A., et al., Performance Evaluation of Oracle Semantic Technologies With Respect To User
Defined Rules, in (in preparation for) International conference on Intelligent Semantic WebServices and Applications. 2011: Amman, Jordan.
54
Download