Ontological Framework for Enabling Free-Form Search in Scientific Discovery Chaitali Gupta, Madhusudhan Govindaraju Grid Computing Research Laboratory SUNY Binghamton 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 1 Motivation Most computer users today do not have to write programs most end users of Grid and scientific data sets should be shielded from low-level details Web Search engines search billions of web pages use Natural Language Processing (NLP) and Information Retrieval (IR) technologies return many links for any given search XML based technology and ontologies can be used to categorize and organize information machine-readable and understandable manner retrieve specific information from Grid/scientific services. 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 2 Project Vision Our vision is that Web semantics can be leveraged to build search engine like interfaces even for Grid/Scientific Application Meta-Data. abstract away the fundamental complexity of XML based services specifications and toolkits Add a search box on portal dashboards Automatically convert queries to Job description specification formats 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 4 Related Work MDS. WSRF compliant service to publish/retrieve resource information Condor ClassAds. Combines schema, data, and query in a simple but powerful query specification language. Condor Gangmatching. Overcomes bilateral matching limitations of the ClassAds. 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5 Comparing with SPARQL SPARQL Query Query in the Proposed Framework PREFIX dc:<http://example.org/dc/element/1.1/> PREFIX ns:<http:/example.org/ns#> SELECT ?machine-name ?CPU WHERE { ?x ns:cpu ?cpu. FILTER (?cpu > 2.0). ?x dc:machine-name ?machine-name. } “All machine names with CPU speed greater than 2.0 GHz” 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 6 Scope of Free-Form Queries The problem of processing and acting upon arbitrary English is an extremely challenging actively addressed in the AI community Use many techniques from NLP and semantic web Scope of our work is therefore limited cannot accept any free-form query designed to accept a limited form of English with a vocabulary taken from the ontology. 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 7 Example queries for New York State Grid (NYSGrid) List all sites of NYSGrid All Sites of NYSGrid with Xeon processors Processor configuration of nodes at Binghamton site of NYSGrid All machine names in NYSGrid with CPU speed greater than 2.0GHz speed Status of job ID 117 running on NYSGrid Names of 16 free nodes on the NYSGrid with at least 4GB of memory List all nodes of NYSGrid having CPU speed greater than 1Ghz and less than 4 Ghz 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 8 Example ontology model 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 9 System Components WSDL Processor User Query Interface Query Processor Match Processor Ontology Matcher Dictionary Matcher direct, stripped matching, hypernyms, hyponym Lexicon how people use words etc. Relevance Checker Glossary, input and output parameters of the Web service 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 10 Example query that lights up the model The Ontology Matcher retrieves the ontologies from the ontology repository and matches them with the user query. Ontologies built in OWL for storing the vocabularies concepts include “CPU”, “memory”, “storage”, “job”, etc. use Jena to process OWL models/statements <subject, object, predicate> 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 11 Time (in millisecs) System Components 600 560 520 480 440 400 360 320 280 240 200 160 120 80 40 0 Execution time w ith Ontology and Dictionary Matcher Execution time w ith Ontology Matcher 5 6 7 8 9 10 11 12 15 16 Length of Client Query excluding stop w ords Queries that hit Ontology Matcher have an average of 95% - 96% better performance benefit than those requiring both Ontology and Dictionary Matcher. 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 13 Performance of System Components Execution time (in millisecs) 350 300 250 200 150 100 50 0 Query Processor WSDL Processor Dictionary Matcher Ontology Matcher Lexicon Execution time taken by the major components 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 14 System Components 87.50% Precision 85.00% Recall 82.50% 80.00% 77.50% 75.00% 72.50% 70.00% 67.50% 65.00% Dom ain Dependent Ontologies Dom ain Independent Methodologies Recall and Precision increases when domain dependent ontologies are considered. 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 16 Research Challenges Design algorithms to automatically infer the context of user queries and map them to an appropriate set of Grid and scientific services. Automatically extend and update domain knowledge using Semantic Web techniques and WordNet. Build a feedback loop for cases that don’t work Enable construction of simple workflows multiple Grid services may be needed for a query merging results from different services 7/27/2016 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 17