Ontological Framework for Enabling Free-Form Search in Scientific Discovery Chaitali Gupta, Madhusudhan Govindaraju

advertisement
Ontological Framework for
Enabling Free-Form Search in
Scientific Discovery
Chaitali Gupta, Madhusudhan Govindaraju
Grid Computing Research Laboratory
SUNY Binghamton
7/27/2016
E-science Microsoft Workshop
2008: Semantics Birds of a
Feather Session:
1
Motivation
 Most computer users today do not have to write programs
 most end users of Grid and scientific data sets should be
shielded from low-level details
 Web Search engines search billions of web pages
 use Natural Language Processing (NLP) and Information
Retrieval (IR) technologies
 return many links for any given search
 XML based technology and ontologies can be used to
categorize and organize information
 machine-readable and understandable manner
 retrieve specific information from Grid/scientific services.
7/27/2016
E-science Microsoft Workshop 2008:
Semantics Birds of a Feather Session:
2
Project Vision
 Our vision is that Web semantics can be leveraged
to build search engine like interfaces even for
Grid/Scientific Application Meta-Data.
 abstract away the fundamental complexity of XML
based services specifications and toolkits
 Add a search box on portal dashboards
 Automatically convert queries to Job description
specification formats
7/27/2016
E-science Microsoft Workshop 2008:
Semantics Birds of a Feather Session:
4
Related Work
 MDS.
 WSRF compliant service to publish/retrieve
resource information
 Condor ClassAds.
Combines schema, data, and query in a simple
but powerful query specification language.
 Condor Gangmatching.
Overcomes bilateral matching limitations of the
ClassAds.
7/27/2016
E-science Microsoft Workshop 2008:
Semantics Birds of a Feather Session:
5
Comparing with SPARQL
SPARQL Query
Query in the
Proposed
Framework
PREFIX
dc:<http://example.org/dc/element/1.1/>
PREFIX ns:<http:/example.org/ns#>
SELECT ?machine-name ?CPU
WHERE { ?x ns:cpu ?cpu.
FILTER (?cpu > 2.0).
?x dc:machine-name ?machine-name.
}
“All machine names
with CPU speed
greater than 2.0
GHz”
7/27/2016
E-science Microsoft Workshop 2008:
Semantics Birds of a Feather Session:
6
Scope of Free-Form Queries
 The problem of processing and acting upon arbitrary
English is an extremely challenging
 actively addressed in the AI community
 Use many techniques from NLP and semantic web
 Scope of our work is therefore limited
 cannot accept any free-form query
 designed to accept a limited form of English with a
vocabulary taken from the ontology.
7/27/2016
E-science Microsoft Workshop 2008:
Semantics Birds of a Feather Session:
7
Example queries for New York State Grid
(NYSGrid)
 List all sites of NYSGrid
 All Sites of NYSGrid with Xeon processors
 Processor configuration of nodes at Binghamton site of
NYSGrid
 All machine names in NYSGrid with CPU speed greater than
2.0GHz speed
 Status of job ID 117 running on NYSGrid
 Names of 16 free nodes on the NYSGrid with at least 4GB of
memory
 List all nodes of NYSGrid having CPU speed greater than
1Ghz and less than 4 Ghz
7/27/2016
E-science Microsoft Workshop 2008:
Semantics Birds of a Feather Session:
8
Example ontology model
7/27/2016
E-science Microsoft Workshop 2008:
Semantics Birds of a Feather Session:
9
System Components




WSDL Processor
User Query Interface
Query Processor
Match Processor
 Ontology Matcher
 Dictionary Matcher
 direct, stripped matching, hypernyms, hyponym
 Lexicon
 how people use words etc.
 Relevance Checker
 Glossary, input and output parameters of the Web service
7/27/2016
E-science Microsoft Workshop 2008:
Semantics Birds of a Feather Session:
10
Example query that lights up the model
 The Ontology Matcher retrieves the ontologies from
the ontology repository and matches them with the
user query.
 Ontologies built in OWL for storing the vocabularies
 concepts include “CPU”, “memory”, “storage”,
“job”, etc.
 use Jena to process OWL models/statements
<subject, object, predicate>
7/27/2016
E-science Microsoft Workshop 2008:
Semantics Birds of a Feather Session:
11
Time (in millisecs)
System Components
600
560
520
480
440
400
360
320
280
240
200
160
120
80
40
0
Execution time w ith
Ontology and
Dictionary Matcher
Execution time w ith
Ontology Matcher
5
6
7
8
9
10
11
12
15
16
Length of Client Query excluding stop w ords
 Queries that hit Ontology Matcher have an average
of 95% - 96% better performance benefit than those
requiring both Ontology and Dictionary Matcher.
7/27/2016
E-science Microsoft Workshop 2008:
Semantics Birds of a Feather Session:
13
Performance of System Components
Execution time (in millisecs)
350
300
250
200
150
100
50
0
Query
Processor
WSDL
Processor
Dictionary
Matcher
Ontology
Matcher
Lexicon
 Execution time taken by the major components
7/27/2016
E-science Microsoft Workshop 2008:
Semantics Birds of a Feather Session:
14
System Components
87.50%
Precision
85.00%
Recall
82.50%
80.00%
77.50%
75.00%
72.50%
70.00%
67.50%
65.00%
Dom ain Dependent
Ontologies
Dom ain Independent
Methodologies
 Recall and Precision increases when domain
dependent ontologies are considered.
7/27/2016
E-science Microsoft Workshop 2008:
Semantics Birds of a Feather Session:
16
Research Challenges
 Design algorithms to automatically infer the context
of user queries and map them to an appropriate set
of Grid and scientific services.
 Automatically extend and update domain knowledge
using Semantic Web techniques and WordNet.
Build a feedback loop for cases that don’t work
 Enable construction of simple workflows
 multiple Grid services may be needed for a query
 merging results from different services
7/27/2016
E-science Microsoft Workshop 2008:
Semantics Birds of a Feather Session:
17
Download