Performance Evaluation of Oracle Semantic Technologies: Data Ashraf Yaseen, Kurt J. Maly, Steven J. Zeil and Mohammad Zubair Department of Computer Science Old Dominion University Norfolk, VA 23529-0162 USA {ayaseen, maly, zeil, zubair}@cs.odu.edu Abstract—Ontology-based reasoning systems have a native rule base but allow also for the addition of application domainspecific rules. Previous work, comparing the performance of these systems, mainly considered systems performance with respect to system supported rule bases. In this paper we present an evaluation of Oracle as an ontology reasoning system with respect to domain-specific rule bases, in the context of a question/answer system called ScienceWeb. We also present UnivGenerator, a tool to generate synthetic data samples in accordance to the ScienceWeb ontology, as a component of the evaluation. Keywords-component; Oracle Semantic ontology, reasoning system, domain-specific rules I. Technology, INTRODUCTION The basic elements of the semantic web [1] are Resource Descriptive Framework (RDF), RDF Schema (RDFS) and the Web Ontology language (OWL). With these elements, domain concepts, properties and relationships can be specified. Other elements may include a query language for RDF (SPARQL) [2] and a language to express custom rules (SWRL) [3]. ScienceWeb is a question/answer system that is being developed on top of a reasoning and inference layer. ScienceWeb focuses on scientific research and the researcher who performs it. Information freely available on the internet is being harvested, organized, stored, and queried in a collaborative environment. The objective of the system is to provide answers to qualitative queries that represent the evolving agreement of the community of researchers. The system allows qualitative descriptors such as: “groundbreaking researchers” or “tenurable record”. Having rule-based definitions of custom descriptors makes it possible to develop systems capable of answering questions that contain qualitative descriptors even when those descriptors involve transitive reasoning as in “What is the PhD advisor genealogy of professor x?” or require algebraic computation across populations such as in “Who are the groundbreaking researchers in Software Engineering”. In a collaborative environment, where users are able to customize their qualitative queries, a fast response is required from the reasoning system. This is a challenge when dealing with custom rules that are required to answer qualitative queries. In order to evaluate the performance (scalability) of a candidate reasoning system, we need large samples of semantic data that are instances of a rich ontology. In this paper, we present a tool to generate synthetic data according to the ScienceWeb ontology called UnivGenerator. We then present an evaluation of the suitability of the Oracle11g reasoning system based upon the data samples generated in the context of a ScienceWeb-like system. The rest of the paper is organized as follows: previous related work is presented in section 2, the ontology and the generator program are explained in section 3, our experiment is explained in section 4, the results and discussions are in section 5 and finally the conclusion is in section 6. II. RELATED WORK A. Reasoning Support Systems A number of ontology systems have been developed for reasoning and querying in the semantic web. Examples of such systems include Jena [4], a Java framework for semantic web applications with a rule-based inference engine over RDFS and OWL. Pellet [5] is a Java-based reasoner supporting OWL-DL. KAON2 [6] is a Java reasoner supporting OWL-Lite and OWL-DL. SPARQL is used as a query language and SWRL for custom rules. OWLIM [7] provides semantic repository and reasoning capabilities over RDFS, OWL Horst and OWL 2 RL, and Oracle 11g [8] semantic technologies. Ontology reasoning systems differ in the storage of semantic data, the degree of reasoning support and the way reasoning is performed. Storage of semantic data varies: some systems store all data in main memory while others are secondary storage-based. In regards to reasoning support, some systems only provide support for RDF/RDFS inference, while others provide more support by including partial or full support of OWL inference. Some systems perform the reasoning during the loading stage; some provide a separate step right after loading; and others perform the reasoning when querying the system. Many current systems provide another level of reasoning support beyond the subclass/superclass types of reasoning by allowing users to define their own rules. These custom rules enable the addition of specialized reasoning capabilities. owl:Thing Project Funded Proj. Researcher Unfunded Proj. Faculty Department University ResearchField Publication Student Patent Adjunct Prof. Asst. Prof. Assoc. Prof. Full Prof. MS Student ComputerProgram PhD Student Article JournalArticle bookChapter PublishedVolume conferencePaper Thesis MastersThesis Book Journal PhDThesis Collection ConfProceedings Figure 1. Inheritance hierarchy for chosen subset of the ScienceWeb ontology Oracle Semantic Technologies provides persistent, secondary storage of semantic data and different degrees of reasoning support by allowing users to select the required level. It provides for domain-specific rules, and reasoning is performed in a separate step. B. Performance Evaluation A number of studies have been done on the performance of reasoning systems [12-18]. These studies address the scalability of the reasoning systems with regard to the size and complexity of the ontology. Benchmarks were created and used to facilitate systems evaluation. For example LUBM [9, 10] is a widely used benchmark in evaluating the performance of ontology reasoning systems. It has a university domain ontology and dataset generator that is able to provide samples that are repeatable and scalable to a specific size. Fourteen test queries come with this benchmark, and several performance metrics. Another benchmark is the UOBM [11], an extended version of LUBM, which provides a higher degree of reasoning by covering OWL Lite and OWL DL. Moreover, it adds relations to the ontology enriching the ontology’s complexity, albeit not to the level needed for ScienceWeb. Previous work on ontological systems’ evaluation addresses the reasoning capabilities, the scalability and the efficiency of each system [12-18]. All of these compared and evaluated the performance of reasoning systems over predefined rule bases, for example RDF/RDFS rules or OWL. We are not aware of any study that addresses the systems’ capabilities and reasoning performance over customized rule bases, including Oracle’s study of performance using the LUBM benchmark [19]. The authors have undertaken a study of the performance of a variety of systems; both open-source and commercial, when performing reasoning over customized rule sets, based both upon the LUBM and ScienceWeb ontologies. [21][22]. Oracle was selected for the more detailed investigation presented in this paper both because of its promising overall performance and because of its prominent position in the industry. Our target is to evaluate Oracle as an ontology reasoning system in terms of reasoning and querying using custom rules in the context of the ScienceWeb system. III. THE SCIENCEWEB ONTOLOGY AND BENCHMARK GENERATOR At the core of ScienceWeb is an ontology that covers the concepts and their relationships used in the science research community. For this performance study we have used the subset depicted in Figure 1. The ontology includes the concepts of Department, Publication, and Researcher, and numerous properties (not shown) such as advisorOf, authorOf, worksFor, etc. This subset has 32 classes and 48 properties (18 object properties and 30 data type properties). All the concepts of the ontology found in the LUBM [10] benchmark can be found in the ScienceWeb ontology, albeit the exact names for classes and properties may not be same. For ScienceWeb the scalability issues arise from both the complexity of the class tree as well as the complexity of the relationships. We found both LUBM and UOBM wanting in that regard. Consider for example, the complexity of Publication subtree shown in Figure 1. This is already more elaborate than the corresponding structures in LUBM and UOBM and, as ScienceWeb matures, we anticipate that a still richer ontological structure will be required. This complexity has a direct effect on the generation of benchmark data. The LUBM generator, for example, generates a flat list of publication objects all of the base Publication type. We felt that a more realistic distribution would be required for even basic prototyping of ScienceWeb, and developed our own tool for this purpose. The tool is called UnivGenerator, and the instances generated are repeatable and scalable to different sizes. It generates semantic OWL data in XML/RDF format. In order for samples’ generation to be repeatable, a seed value is set at the beginning of each run. Using different seed values and keeping the same settings for the rest of the parameters will generate a different sample with similar size. To provide randomness in the output set and to control the size of the output we have a property for each class that gives the range of possible values. For instance, we will have a range property for the number of universities, departments, and number of full professors in a department. The minimum and the maximum can be the same if only one value is desired. For example a Full Professor might have the range of (1,1) for the property hasWebsite. On the other hand the range of number of authors for a paper might be set to (1,8) in the field of Computer Science. University Project * * Department * Faculty Researcher * Student Figure 2. Examples of aggregation in ScienceWeb We are in the process of developing a statistical model of these ranges for the field of Computer Science by obtaining real data for the ranges of the top level classes in the tree (e.g., universities, departments, researchers) and sampling the various other relationships (e.g., number of projects per faculty, number of conference papers, number of authors). In the meantime we have estimated these ranges according to our perception. To control the size of the output (number of OWL triples), the user of the generator fixes the top level ranges and provides appropriate ranges for the properties that should be randomly selected. These ranges describe desired values for the arity of these properties, as opposed to the more relaxed ranges that might be encoded into the OWL ontology specification to describe the legal ranges of property arities. This distinction is important to the generation of a representative internal structure for the generated knowledge base. We have developed a spreadsheet that will allow the user to experiment with these parameters and it will provide an estimate of the number of triples that will be generated by the tool. Once a suitable set of parameters has been derived, the parameters can be input to the generator from the spreadsheet through a text file. There are two types of properties that have ranges associated with them: those that reflect relationships between objects and those that reflect the scalar value of a property. For object-valued properties, the input parameter ranges specify the arity of the relationship. For scalar-valued properties, the range controls the value of that scalar. For example, if the Journal to JournalArticle property contains() has a parameter range of (1,20), this would indicate that the generator should attempt to provide 1 to 20 articles for each issue of a journal. If the journal property publicationYear() has a range of (1965,2010), then any value in that range might be generated as a property value. In a future version of the generator, we hope to provide a selection of common distributions rather than simply the current selection from a uniform distribution. The generator program achieves this linking of objects by processing the ontology in a specific order (the classes are set and fixed according to the ScienceWeb ontology, so any changes of the ontology require changes in the program source code. In the future we may want to generalize this for other ontologies in this domain). There are a small handful of aggregation hierarchies rooted at the types ResearchField, PublicationType, Projects, Universities, and Researchers. For example, universities contain departments which contain faculty (Figure 2). In general, the aggregation hierarchies are first used to guide creation of instances. Other object-valued properties can be used, but arity information on aggregation relationships is often easily estimated. What is essential is that a minimal collection of properties be selected that span the set of classes in the ontology. These are visited in the order listed, and, for each instance, a random number (according to the range values supplied in the generator input parameters) of contained objects are created. For example, for each university object, a random number of departments are created, then within each department a random number of faculty are created. When an instance is created of a class that is a non-leaf in the inheritance hierarchy, parameters are used to control which of its subclasses will be selected as the class for the new instance. The code controlling this is currently specific to each subclass and may be generalized in the future. Properties that do not constitute one of the primary aggregation relationships (including properties whose membership span the different aggregation hierarchies) are populated after all aggregation hierarchies have been processed. This can then be done by selection from among the already generated instances of the appropriate type. Again, input parameters to the generator will specify the minimum and maximum multiplicity of the assigned property instances (min and max number of co-authors on a journal paper) and a random selection is made within the specified range. For example, each Publication has a number of authors. This number can vary within the specified range. Initially, an empty list of authors to the current publication is created in the PublicationType phase. The actual assignment of authors will be done once researchers have been created in the Researcher phase. Some publication types can reference other publication types; the number of such references is again governed by a specified range. After all publication templates have been created references can be linked (at this time we do reference only within the generated universe); the program makes sure that no self references are generated. In the University phase, the program will create the specified number of universities and for each university, the program will create a number of departments; and for each department, the program will create a number of researchers. Each such object has random values selected for such properties as ID, name, contact Info, website. Completing this simple generation phase will be generation of researchers: A researcher can be a faculty member (full professor, associate professor, assistant professor or Lecturer) or a student (PhD student or master student). For each of these objects we set property values such as: number of research interests, number of publications and details such as: name, contact info, graduation info. In the Researcher phase, a list of publications is created for each researcher type, with a specific size. At this stage, for the current researcher type, the program will go through the list of publications, randomly selecting and assigning publications to the current researcher type. While doing so, there are preconditions the program needs to satisfy: The selected publication-type must belong to a research field that had been selected for the current researcher, that it had not been already selected (no duplications) and that the selected publication does not exceed its max-number-of-authors. (Since there is a limit of the number of authors that can be assigned to one publication type.) IV. EXPERIMENTAL DESIGN The basic concern addressed in this study is: How does performance scale when answering queries in the context of ScienceWeb, where domain-specific rules are added to native logic? A. Targeted system The Oracle Database (11g r2) Semantic Technologies was used in this study [8]. Although Oracle provides both incremental and batch loading mechanisms of semantic data, systems with domainspecific rules are limited to batch loading because inference entailments must be computed before any queries employing those rules can be answered. Oracle provides mechanisms for submitting queries directly in SQL-related notation but also provides APIs via which other systems, such as Jena or Sesame, may submit queries. Inference is done all at once, using forward chaining; Triples are inferred and stored ahead of query time. i.e., there is no “on-the-fly” reasoning. This should result in fast query response times. Incremental inference is supported, but only over various native rule bases provided, such as RDF, RDFS, and OWL. Incremental inference over domain-specific rule bases is not supported[8, 19]. While working with domainspecific rules, any changes on the contents of the Oracle-DB, whether a change to the rules, the ontology or even the semantic data, requires us to redo the whole procedure of loading and computing inference entailments from the beginning. B. Experiment Settings and Performance Metrics We conducted a number of tests addressing the scalability and the efficiency of the targeted system. The experiment was done on a PC with a 2.40 GHz Intel Xeon processor and 16 G memory, running Windows Server 2008 R2 Enterprise. The performance metrics used are: Load time: the time spent loading the sample data from the input files into the DB. Inference time or reasoning time: the time it takes to reason about the data stored in the database. Query response time: the time it takes to query the database C. The test procedure For each test sample, we created an RDF_data table, a semantic model and a staging table. Then, we loaded the sample data into the database. After that entailments were computed and stored. Finally, we submitted selected queries about the sample data. Time was recorded during each stage: loading time, inference time, and query response time. D. Rules and Queries Each query of the custom query set we created answers a question that requires reasoning over a specific rule or sometimes more than one rule. Because the ScienceWeb knowledge base does not yet exist, we cannot claim that these rules and queries exercise the reasoning system in ways that are representative of the eventual system. However, they are designed to exhibit the key kinds of reasoning that we anticipate; including summary statistics over large sets, reasoning over transitive closures, and reasoning over deeply recursive rules. The following are the rules and the queries used in our experiment : Coauthor rule: Authors of a common publication are called “co-authors” of one another. [(?x :authorOf ?p) (?y :authorOf ?p) notEqual(?x,?y) -> (?x :coAuthor ?y)] Collaborator Of rule: Researchers are “Collaborators” if they are coauthors or if one is the advisor of the other. [(?x :coAuthor ?y)-> (?x :collaboratorOf ?y)] [(?x :advisorOf ?y)-> (?x :collaboratorOf ?y)] [(?x :advisedBy ?y)-> (?x :collaboratorOf ?y)] Ground Breaking rule : A “Ground Breaking” researcher in a specific field, is an author of a publication, published prior to 1990, in that research field with a Google citation count greater than 1,000. [ (?x :authorOf ?p) (?p :inResearchField ?f) (?p :hasGCCount ?c) (?p :inYear ?n) greaterThan(?c,50000)lessThan(?n,1990) -> (?x :IsGroundBreaking ?f)] Cool Project rule: A project that is funded with more than $500,000/yr is said to be a “cool” project. [ (?x :hasAmount ?c) (?x rdf:type :Projects) (?x :inResearchField ?f) greaterThan(?c,"500000"^^xsd:int)-> (?x :coolProject ?f)] Same Advisor rule: Researchers have the “same advisor” if they were advised by the same person but they themselves are distinct people. [(?x :advisedBy ?z) (?y :advisedBy ?z) notEqual(?x,?y) -> (?x :sameAdvisor ?y)] Colleague Of rule : Two faculty are “Colleagues” if they work for the same department but are distinct people. [(?x :worksFor ?z) (?y :worksFor ?z) (?x rdf:type :Faculty) (?y rdf:type :Faculty) notEqual(?x,?y) -> (?x :colleagueOf ?y)] Research Ancestor rule (transitive): A researcher is a “research ancestor” of another researcher if the researcher is a (direct or indirect) advisor of that researcher. [(?a :advisorOf ?b) -> (?a :RAncestorOf ?b)] [ (?a :advisorOf ?c) (?c :RAncestorOf ?b) -> (?a :RAncestorOf ?b)] Multi-Generational Advisor rule (recursive): A “Multi-generational advisor” is a researcher who is the advisor of two different researchers where one of those researchers is an advisor of the other. Any researchancestor of a multi-generational advisor is a multi-generational advisor himself/herself [ (?z :advisorOf ?y) (?z :advisorOf ?w) (?w :advisorOf ?y) -> (:Multi_generational rdf:type Class ) (?z rdf:subClassOf :Multi_generational)] [ (?z :RAncestorOf ?y) (?y rdf:subClassOf :Multi_generational) -> (?z rdf:subClassOf :Multi_generational)] The queries used for our experiment are all expressed as counts of a set of selected objects both to exercise summary capabilities, to simplify the evaluation, and to eliminate the time for formatting and output of large result sets as a factor in the timing. Again, formal statements can be found in [cite full version]. Query1: returns and counts number of triples in the model-table select count(*) from univ_rdf_data; Query2: returns and counts all triples in the Model (including the inferred ones) select count(*) from (SELECT s, p , o FROM TABLE(SEM_MATCH('(?s ?p ?o)', SEM_Models('univ'), SEM_Rulebases('OWLPRIME','UNIV_RB'),SEM_ALIASES(SEM_ALI AS('','http://www.owlontologies.com/OntologyUniversityResearchModel.owl#')), null))); Query3: returns and counts all co-authors(as a result of executing the co-author rule) select count(*)from (SELECT a as "Author", b as "Co-Author" FROM TABLE(SEM_MATCH('(?a :coAuthorOf ?b)',SEM_Models('univ'), SEM_Rulebases('OWLPRIME','UNIV_RB'),SEM_ALIASES(SEM_ALI AS('','http://www.owlontologies.com/OntologyUniversityResearchModel.owl#')), null)) ) ; Query4: returns and counts all Research Ancestors(as a result of executing research ancestors rule) select count(*)from (SELECT a as "Advisor", b as "Advisee" FROM TABLE(SEM_MATCH('(?a :RAncestorOf ?b)', SEM_Models('univ'), SEM_Rulebases('OWLPRIME','UNIV_RB'),SEM_ALIASES(SEM_ALI AS('','http://www.owlontologies.com/OntologyUniversityResearchModel.owl#')), null)) ) ; Query5: returns and counts all Multi-generational(as a result of executing multi-generational advisor rule) select count(*)from ( SELECT z as "Multi-generational" FROM TABLE(SEM_MATCH('(?z :Multi_generational ?z)', SEM_Models('univ'),SEM_Rulebases('OWLPRIME', 'UNIV_RB'),SEM_ALIASES(SEM_ALIAS('','http://www.owlontologies.com/OntologyUniversityResearchModel.owl#')), null))); E. Data Samples Three tests were conducted. In each test a number of data samples were generated. Each sample was studied alone, first by loading the sample into the DB, creating the entailment, and query about that data in that sample. Time was recorded during each stage; loading time, inference time, and query response time. The first test is a direct examination of the effect of sample size upon performance time using a minimal domainspecific rule set. 24 data samples of different sizes were generated, the sizes ranging from 3,500 to 4 million triples. One single domain-specific rule was used in this test, the coauthor rule, and only the co-author query was used. The second test examined the variance in performance, due to random changes in internal structure of the data set, on samples that were of approximately the same size, 24 samples were generated with the same input parameters, but with different seed values each time, which results in multiple samples of similar size but varying internal connectivity. Again, one single domain-specific rule was used in this test, the co-author rule, and its corresponding query, the co-author query. The third test focused on the effect of adding more elaborate domain-specific rules. 16 samples of different sizes were generated. These samples are more complicated than the ones used in the previous tests. All domain-specific rules were used in this test and all queries were executed. To support this, the number of relations was increased among some classes. In this test, each sample was tested (loaded in the database, entailments generated, and queries applied) six times, with the ordering of tests determined randomly. This procedure was chosen to explore some unexpected variance in observed times, as discussed later. Oracle bulk load was used to load the sample data into the database. It requires the data to be in N-Triple format, so another tool, RDF2RDF [20], was used for format translation. Loading the data into a semantic model in the database is done in two steps. First the data is loaded from the input file into a staging table using sql-loader; a tool from Oracle. After that, the data is loaded from that staging table into the semantic model, by calling a specific PL/SQL procedure. The inference is done by executing a “create_entailment” procedure; Inferred triples are generated and saved in the DB. V. RESULTS AND DISCUSSIONS The results from the first test are presented in Figures 3 and 4. In this test, inferencing was performed over OWL Prime type relationships and a single domain-specific rule. Figure 3 shows a faster than linear growth, as the sample size increases, in the time required to load the sample into the staging table, the time required to load it from the staging table into the DB and the time required to reason about the data as well. A log-log plot of this same data has a linear correlation coefficient of 0.98 and a linear regression slope of 1.5, suggesting that the growth is polynomial and significantly slower than the square of the sample size. Figure 3. Load/Inference performance (Test 1) Figure 4. Query performance & caching effect (Test 1) Since all the reasoning is done at once upon creating the entailment, where all the inferred data is saved ahead in the database, querying does not take much time. Figure 4 shows the query response times. Note that the total time required is orders of magnitude smaller than the times shown in Figure 3. Again, faster than linear growth is observed as the sample size increases. The logarithms of the sample size and the first execution time have a linear correlation of 0.97 and a linear regression slope of 1.75, again indicating a low-level polynomial growth rate. The same query was executed 3 times over the same sample data to measure the possible effects of caching. The 3rd execution of the query takes less time than the 2nd which in its turn takes less time than the 1st execution. Any number of executions after the 3rd one (not shown in the figure) will result in approximately the same response time as the 3rd one. Figures 5 and 6 show the results of the second test. In this the second test, UnivGenerator was fed with different seed values, but same values for the rest of the parameters. Hence, the samples generated have similar sizes and complexities. But, random selection results in samples with different internal connectivity. Each one of the samples undergoes the same testing procedure we conducted on the first test. The time for loading and inference was recorded and compared. Figure 5. Load/Inference performance (Test 2) Figure 6. Query performance & cashing effect (Test 2) Working with samples of similar sizes & similar complexities should result in similar performance metrics. This is confirmed in Figure 5, which shows the loading/inference time for these samples. Figure 6 shows that queryingQuerying over samples of similar sizes and similar complexities will result in a similar performance metrics. The figure also shows the caching effect when executing the same query more than once over each sample. The results of the third test are shown in Figures 7, 8 and 9. Having all rules now in the domain-specific rule base, and 5 queries, the testing procedure was conducted 6 times. The idea behind executing the same sample 6 times is to test the system’s consistency over the same data samples. Figure 7. Load data from input file into S-Table (Test 3) Figure 8. Load data from S-Table to model (Test 3) Figure 9. Create entailment (Test 3) When comparing the results of one sample, the time it takes to load the data of that sample into a staging table is almost the same from one run to another. This loading process takes effect outside the Oracle DBMS, through the use of an external oracle tool called sql-loader. With an average standard deviation of 0.21 seconds for each of the 16 samples (executed 6 times each), the variance is negligible. When loading the samples in the second step from the staging table into the semantic model, moderate differences were spotted from one run to another, with an average standard deviation of 1.0 seconds. After creating the entailments for the samples in this test, we noticed significant differences in inference time among the different runs of the same sample and among all samples as well. For example, on the 2nd run of sample2, the inference took 1,344 seconds. However, in the 4th run the inference took 180.16 seconds. The standard deviation of the inference times of sample2 executed 6 times was 507 seconds, quite large compared to the mean of 804 seconds. As the number of triples in a sample data set increases, one would expect an increase in the inference time. However, Figure 9 shows a different behavior. For example, sample 8 of size 291,720 triples has an average inference time of 2664 seconds over six runs. Whereas, sample 9 of size 328,035 triples has an average inference time of 1942 seconds. The system was sometimes able to perform the reasoning in less time for a bigger sample size! We extended our experiment for this test to closely study the variance observed in the inference stage of Test 3. Suspecting possible caching effects, we varied the order in which samples were tested. Suspecting possible interference from other processes running on the server, we performed the experiment multiple times over periods of several days and at many different times of day. Neither of these postulated effects appeared to be a significant contributor to the high variance. Suspecting an only partly understood relationship between the complexity of the domain-specific rule base and the variance, we excluded all domain-specific rules except one rule, the coauthor rule. And we conducted the whole testing procedure again on the same samples. Figure 10 shows the performance results of the inference. The system shows consistency when running the same sample multiple times, and as expected the bigger the size of the sample, the more time it takes for the inference. Hence, the number of domain-specific rules has an impact on this high variance of Oracle when reasoning over sample data. We then selected one sample, with the use of all custom rules. We ran this sample many times to see how often this behavior will occur. Figure 11 shows the variation in the inference time among the different runs on the same sample. The minimum inference time for sample 3, recorded on the 5th run, was 3 minutes and 17.69 seconds, and the maximum inference time recorded on the 33rd run was 20 minutes and 25 seconds. The overall standard deviation equals 332 seconds, a significant fraction of the expected value of 816 seconds. The reason for this large, very real variation, is an open question [3] [4] [5] [6] [7] [8] [9] [10] Figure 10. Create entailment (one rule) [11] Figure 11. Create entailment on sample 3 w/all rules 35 times VI. CONCLUSION We have explored in this paper the feasibility of developing a system to handle qualitative queries, using ontologies and reasoning systems that can handle customized rules. When working with a realistic model (ScienceWeb) challenging issue of scalability arises when the size of the semantic data increases to millions of triples. Moreover, another serious challenging issue of real-time response arises when many customized rules are used. Oracle offers a decent scalability for the kinds of datasets anticipated for ScienceWeb. Time required for critical operations generally grew more slowly than the square of the dataset size, it also offers good response time when querying the semantic data. This fast response comes in part from the heavy pre-computing of the entailed relationships at the beginning. Although this has a positive implication in query response times, it has a negative one for evolving systems. Any change in the ontology, the semantic data or the rule set requires setting up the system from the beginning. Of some concern is the absolute magnitude of the time required for operations. Loading new information and computing the consequent entailments is far too slow for an interactive system, even though that time grows quite slowly. Also of concern is a high degree of variance in time required to perform inferencing for even small number of rules, which raises concerns about the consistent responsiveness of systems based upon this platform. [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] REFERENCES [1] [2] T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific american, vol. 284, no. 5, pp. 28-37, 2001. E. Prud'hommeaux and A. Seaborne. SPARQL Query Language for RDF (W3C Recommendation 15 January 2008). World Wide Web Consortium 2008. [22] I. Horrocks, P. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, and M. Dean, "SWRL: A semantic web rule language combining OWL and RuleML," W3C Member submission, vol. 21, 2004. Jena team and Hewlett-Packard, Jena Semantic Web Framework. 2010; http://jena.sourceforge.net/ Pellet: The Open Source OWL2 Reasoner. 2010; http://clarkparsia.com/pellet/ Information Process Engineering (IPE), Institute of Applied Informatics and Formal Description Methods (AIFB), and I. M. G. (IMG), “KAON2-Ontology Management for the Semantic Web,” 2010; http://kaon2.semanticweb.org/. Ontotext. OWLIM-OWL Semantic Repository. 2010; http://www.ontotext.com/owlim/ Oracle Corporation. Oracle Database 11g R2. 2010; http://www.oracledatabase11g.com Lehigh University Benchmark (LUBM). http://swat.cse.lehigh.edu/projects/lubm/ Y. Guo, Z. Pan, and J. Heflin, "LUBM: A benchmark for OWL knowledge base systems," Web Semantics: Science, Services and Agents on the World Wide Web, vol. 3, no. 2-3, pp. 158-182, 2005. L. Ma, Y. Yang, Z. Qiu, G. Xie, Y. Pan, and S. Liu, “Towards a Complete OWL Ontology Benchmark,” The Semantic Web: Research and Applications, 2006, pp. 125-139. C. Lee, S. Park, D. Lee, J.-w. Lee, O.-R. Jeong, and S.-g. Lee, “A comparison of ontology reasoning systems using query sequences,” Proc. of the 2nd international conference on Ubiquitous information management and communication Suwon, Korea: ACM, 2008. B. Motik and U. Sattler, “A Comparison of Reasoning Techniques for Querying Large Description Logic ABoxes,” Logic for Programming, Artificial Intelligence, and Reasoning, 2006, pp. 227-241. T. Gardiner, I. Horrocks, and D. Tsarkov, "Automated benchmarking of description logic reasoners," Proc. of the Workshop on Description Logics (DL’06). vol. 189 of CEUR Lake District, UK, 2006, pp. 167– 174. J. Bock, P. Haase, Q. Ji, and R. Volz, "Benchmarking OWL reasoners," Proc. of the ARea2008 Workshop. vol. 350 Tenerife, Spain, 2008. E. Franconi, M. Kifer, W. May, T. Weithöner, T. Liebig, M. Luther, S. Böhm, F. von Henke, and O. Noppens, "Real-World Reasoning with OWL," The Semantic Web: Research and Applications. vol. 4519: Springer Berlin / Heidelberg, 2007, pp. 296-310. K. Rohloff, M. Dean, I. Emmons, D. Ryder, and J. Sumner, "An evaluation of triple-store technologies for large data stores," Proc. of the 2007 OTM Confederated international conference on On the move to meaningful internet systems, 2007, pp. 1105-1114. T. Weithöner, T. Liebig, M. Luther, and S. Böhm, "What's Wrong with OWL Benchmarks?," Proc.of the Second Int. Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2006), Athens, GA, USA, 2006, pp. 101-114. P. D. Xavier Lopez, Director and P. D. Souripriya Das, Architect, "Semantic Technologies in Oracle Database 11g Release 2:Capabilities, Interfaces, Performance," Sessions at the Semantic Technology Conference Jun-2010. Minack, Enrico, RDF2RDF. July, 2010; http://www.l3s.de/~minack/rdf2rdf/ H. Shi, K. Maly, S. Zeil, and M. Zubair “Comparison of Ontology Reasoning Systems Using Custom Rules”, WIMS 2011, Article No.: 16 , Sogndal, Norway, May 2011 H.Shi “Adaptive Reasoning for Semantic Queries: a White Paper”, http://www.cs.odu.edu/~maly/papers/AdaptiveReasoning.docx