Implementing Ontologies in (my)Grid environments Professor Carole Goble, Chris Wroe

advertisement
Implementing Ontologies in
(my)Grid environments
Professor Carole Goble, Chris Wroe
University of Manchester, UK
myGrid project
Geodise project
http://www.mygrid.org.uk
http://www.geodise.org
GGF Sem-Grd co-chair
http://www.semanticgrid.org
EIC Journal of Web Semantic http://www.jws.ac.uk
Semantic Web Science Association member
Take home message
• The Grid is metadata driven middleware
– Schemas and ontologies are prevalent and
pervasive for carrying semantics
• Information finding, integration and exchange is a
core component of Grid computing
• Semantics for applications of the Grid
• Semantics for the infrastructure of the Grid
• The Grid as a mechanism for delivering ontology and
schema services?
• Context: myGrid, Geodise and GRIP
The Grid Problem
(Foster, Kesselman, Tueke)
“flexible, secure, coordinated resource sharing among
dynamic collections of individuals, institutions, and
resources - what we refer to as virtual organizations."
a low level framework to allow inter-operation of resources.
mainly for the benefit of application developers
deploy standard tasks on the Grid in a straightforward manner
Challenging Technical Requirements
• Dynamic formation and management of virtual
organizations
• Online negotiation of access to services: who, what,
why, when, how
• Configuration of applications and systems able to
deliver multiple qualities of service
• Autonomic management of distributed infrastructures,
services, and applications
• Management of distributed state as a fundamental
issue
Interacting with Grid middleware services
• Empower the user or a process to discover and
orchestrate Grid enabled resources as required.
• Means cataloguing and indexing available resources
using agreed vocabularies.
– As in digital libraries
• Many tasks involve the communication of information
between sets of Grid services to perform a more
complex overall goal.
– Requires the adoption of standard schemas and
semantics for data interchange between Grid
services or a mechanism to map between
schemas.
myGrid
•
•
•
•
EPSRC UK e-Science pilot project
Open Source Upper Middleware for Bioinformatics
Data intensive not compute intensive
Sharing knowledge and sharing components
myGrid: Integration of Life Science Information
ID
DE
DE
DE
GN
OS
OC
OC
KW
FT
FT
SQ
MURA_BACSU
STANDARD;
PRT;
429 AA.
PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE
(EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE
ENOLPYRUVYL TRANSFERASE) (EPT).
MURA OR MURZ.
BACILLUS SUBTILIS.
BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE;
BACILLUS.
PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE.
ACT_SITE
116
116
BINDS PEP (BY SIMILARITY).
CONFLICT
374
374
S -> A (IN REF. 3).
SEQUENCE
429 AA; 46016 MW; 02018C5C CRC32;
MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI
GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP
RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT
IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI
Experiment life cycle
Personalised registries
Personalised workflows
Info repository views
Personalised annotations
Personalised metadata
Security
Resource & service discovery
Repository creation
Workflow creation
Database query formation
Forming experiments
Personalisation
Discoverying and reusing
experiments and resources
Workflow
discovery &
refinement
Resource &
service discovery
Repository
creation
Provenance
Providing services &
experiments
Service registration
Workflow deposition
Metadata Annotation
Third party registration
Executing
experiments
Managing
experiments
Information repository
Metadata management
Provenance
management
Workflow evolution
Event notification
Workflow enactment
Distributed Query
processing
Job execution
Provenance
generation
Single sign-on
authorisation
Event notification
myGrid in a nutshell
•
An example of a “second generation” open service-based Grid
project, specifically a testbed for the OGSI, OGSA and OGSADAI base services;
– myGrid Information Repository that is OGSA-DAI compliant
•
Developing high level services for data intensive integration,
rather than computationally intensive problems;
– Workflow & distributed query processing
•
Developing high level services for e-Science experimental
management;
– Provenance, change notification and personalisation
•
Developing Semantic Grid capabilities and knowledge-based
technologies, such as semantic-based resource discovery and
matching.
– Metadata descriptions and ontologies for service discovery, component
discovery and linking components.
myGrid
UTOPIA
Third party applications
Lab Workbench application
Web Portal
Resource
annotations
Ontology
Services
Ga
tew
Shared metadata and data
repositories mIR
ay
Semantic-based Services
Inference engines
Service & resource
registration &
discovery
Personalisation
e-Science Services
Literature
Provenance
Change & event
notification
SoapLab
Databases
Distributed Query
Processing
Analytical
Tools
ab
L
ap
o
S
Workflow
Integration Services
Service based architecture
•
Find them
Publication, registration,
discovery, matchmaking,
deregistration.
•
Run them.
Execution,
monitoring,
exception
handling.
Organise them.
Interoperation,
composition,
substitution.
•
•
•
Each bio resource is a service
– Database, archive, analysis,
tool, person, instrument, a
workflow …
Each myGrid architectural
component is a service
– Workflow enactment engine,
event notification service,
registry, scheduler…
Services come and go
Services are not owned by the
user
Services have different levels
and kinds of metadata
Realizing a Service-Oriented Architecture
How Do I
•
•
•
•
•
•
•
•
•
Create, name, manage, discover services?
Render resources, data, sensors as services?
Negotiate service level agreements?
Express & negotiate policy?
Organize & manage service collections?
Establish identity, negotiate authentication?
Manage VO membership & communication?
Compose services efficiently?
Achieve interoperability?
Metadata: agreed and shared schemas and vocabularies
(Foster, Argonne Labs, 2003)
Roles of a service ontology
•
•
•
•
•
•
Discovery of an appropriate Web Service within a registry by
its properties and capabilities;
Invocation by some agent/service;
Interoperability is increased by describing the semantic type of
inputs and outputs;
Composition of new services;
Verification of a service’s properties;
Execution monitoring by tracking what is happening to the
described aspects of a service and its sub-services.
Services: Soaplab & EMBOSS
Workflows and Services
•
•
•
Workflow specification
– Finding classes of services
– Blastn compares a
nucleotide query sequence
against a nucleotide
sequence database (usually
– intelligent misuse of
services…)
– Guiding service composition
– Service A outputs compatible
with Service B inputs
Dynamic workflow service
invocation and service discovery
– Choose services instances
when running workflow
Workflow discovery
– Finding workflows that others
have done, and that I have
done myself
Bioinformaticians
Exemplars
Graves Disease
Lab Book
Workflow Editor
Tool
Providers
Talisman
Generic
Applications
LSID/R
Gateway
Service Registration
& Discovery
Personalisation
Knowledge Mgt
Provenance
Metadata Mgt
Notification
Workflow enactment
Service
providers
Portal
Information
Repository
Core
components
Distributed Query Processing
Soaplab
Communication fabric
Bio Services
Text Extraction
Services
Architecture
Knowledge
Knowledge
Services Ontology Server
Service
Semantic registration
c
Stru
s
regi
l
a
r
tu
on
trati
Service
Registry
Registry
Reasoner
UDDI
Matcher
KB Store
Registry
View
Notification
Notification
Service
Service
RDF-based
UDDI
Service Discovery
Test
Data
Discover
Workflow or
Service
JMS
Workflow
enactment
engine
Provenance
service
mIR
Skufl & WSFL
mG Object Discovery
m Info Repository
Workflow
templates
Workflow
instances
Metadata
Concepts
Provenance Data
DB2
DB2
Distributed
Query Processor
Job Execution
Information
Extraction
PESTO
Service Service Service
SoapLab
Architecture
Knowledge
Knowledge
Services Ontology Server
Service
Semantic registration
c
Stru
s
regi
l
a
r
tu
on
trati
Service
Registry
Registry
Reasoner
UDDI
Matcher
KB Store
Registry
View
Notification
Notification
Service
Service
RDF-based
UDDI
Service Discovery
Test
Data
Discover
Workflow or
Service
JMS
Workflow
enactment
engine
Provenance
service
mIR
Skufl & WSFL
mG Object Discovery
m Info Repository
Workflow
templates
Workflow
instances
Metadata
Concepts
Provenance Data
DB2
DB2
Distributed
Query Processor
Job Execution
Information
Extraction
PESTO
Service Service Service
SoapLab
Service Discovery Requirements
•
•
•
•
•
•
descriptions must be attached to different resources
– services and workflows
descriptions maybe in different myGrid services
– registries and myGrid Information Repositories
publication of descriptions must be supported both for the author
of the service and third parties;
third party annotations are a view of a service and discovery
should offer a variety of views based upon third party annotations;
there is a need for control over who make add and alter third party
annotations;
we must support two types of discovery:
– using cross-domain knowledge independent of application
• Quality of service, ownership, location, organisations …
– requiring access to common application domain ontologies
• Biology and bioinformatics
Discovering services
based on their application domain properties
•
•
•
•
Finding a service that will fulfil some task e.g. aligning of
biological sequences.
– What services perform a specific kind of task Æ what services
can I used to perform a biological sequence similarity search?
Finding a service that will accept or produce some kind of data.
– What services produce this kind of data Æ from where can I
find sequence data for a protein?
– What services consume this kind of data Æ if I have protein
sequence data, what can I do with it?
Class of service:
– a protein sequence alignment, a protein sequence database
Specific example of an abstract service:
– BLAST, BLASTn, SWISS-PROT
Applies to class of services and workflow specifications
Representing and using domain metadata
•
•
•
•
•
•
•
•
Classification of services/workflows
Imprecise (best effort) substitutions of services/workflows
Service/workflow organisation & indexing, matching and
substitution
– “BLAST” finds tblastx, tblastn, psi-blast, marks_super_blast.
– “Alignment” finds ClustalW, Blast, Smith-Waterman,
Needleman-Wunsch
Expanded selection of services based on expansion of in-hand
object
A vocabulary for expressing service descriptions without predetermining every description
A reasoning process to manage:
– coherency of the classifications and the descriptions when they
are created,
– the service discovery, matching and composition when they are
deployed.
Ontologies in DAML+OIL/OWL based on the DAML-S ontology
W3C OWL Web Ontology Language 1.0 Reference
– http://www.w3.org/TR/owl-ref/
myGrid service classification
Taxonomic approach
Protein pairwise
alignment
missing
Classification:
By operation?
By data source?
By algorithm?
Sequence
alignment
Pairwise
SmithWaterman
Multiple
BLAST
BLASTn
Implicit: over
nucleotide
sequences
Alignment applies to
sequences not
pathways and
needs 2 at least 2
inputs
BLASTp
Implicit: over
protein
sequences
tBLASTn
Originally Based on DAML-S
• US DARPA Agent Markup Language – Services
http://www.daml.org
• An Upper Ontology for Services
Resource
What it does
provides
presents
Service profile
description
functionalities
Service
supports
describedBy
Service grounding
Service model
functional
attributes
How to access it
How it works
Why pick a Description Logic style language?
•
Descriptive capability
– Directly and richly describe the
properties of a service
– Compositional – conceptual lego
– Post-coordinated composition
SNPolymorphism of
CFTRGene causing Defect in
MembraneTransport of
ChlorideIon causing Increase
in Viscosity of Mucus in
CysticFibrosis…
• Non-predetermined descriptions are
classified too.
•
Classification and reasoning based on
the properties
– Define the properties and the
structure takes care of itself
– The inverse of OO style ontology
building.
– If the properties change so does the
classification
Hand which is
anatomically
normal
Reasoning in DAML+OIL / OWL
•
•
•
•
•
Consistency
– check if knowledge is
meaningful
Subsumption
– structure knowledge,
compute classification
Equivalence
– check if two classes denote
same set of instances
Instantiation
– check if individual i instance
of class C
Retrieval
– retrieve set of individuals
that instantiate C
Suite
Specialises. All concepts are
subclassed from those in the more
general ontology.
Upper level
ontology
Task
ontology
Informatics
ontology
Web service
ontology
Contributes concepts to form
definitions.
Molecular
Publishing
Organisation
parameters:
output, ontology
biology ontologyinput,
ontology
precondition, effect
performs_task
uses-resource
Bioinformatics
is_function_of
ontology
Discovering services
based on their operational properties
•
•
•
•
•
•
•
•
What resources does a specific organisation provide?
Who authored this resource?
Third party metadata
What services offering x currently give the best quality of
service?
Which service would the local bioinformatics expert suggest we
use?
Data quality, quality of service, cost, geographical location,
authorisation, provenance of data and so on.
Instance service description of a specific service
– BLAST, SWISS-PROT as offered by the EBI is 80% reliable.
Invoked instance service description
– BLAST as offered by the EBI on a particular date, with
particular parameters when a service invoked.
Applies to instances of services and workflows
RDF based
UDDI metadata for service instances
What do we need to discover
Tiered levels of abstraction
Classes of services
Domain “semantic”
Unexecutable
“Potentials”
Description
Sequence
alignment
Ontology
Description
Blastn
Ontology
Instances of services
Description
Business “operational”
Executable
Data model
Ontology
“Actuals”
Description
Service Data Element
Blastn@EBI
Blastn@EBI
invoked proxy
myGrid Find Service
Word-based
discovery
Discovery Client
Find Service
Syntactic
discovery
UDDI-M
RDF
Semantic
discovery
Ontologies
Views
Views
Ontology
Server
Third party
description
publishes
Service
Gather service descriptions
Org.
registry
Third Party
publishes
Public
registry
UDDI
WSDL
Description
Store
Reasoner
FaCT
Matcher
KAON
Pedro interface to Service Discovery
Multiple (data) types….BLASTn
Conceptual Type in Ontology
“plumbing syntax” in XSD in WSDL
Life Science ID
urn:LSID:ebi.ac.uk:SWISS-PROT/accession:P34355:3
Formats: FASTA, BSML, Agave …
MIME types
1. User selects values from a drop
down list to create a property based
description of their required service.
Values are constrained to provide only
sensible alternatives.
2. Once the user has
entered a partial
description they submit
it for matching. The
results are displayed
below.
3. The user adds
the operation to
the growing
workflow.
4. The workflow
specification is complete
and ready to match
against those in the
workflow repository.
Open questions
• Operational metadata – simple data model (in RDF)
or ontology? Does it matter?
• Associating semantic types with either service
instances or WSDL abstract service descriptions
•
•
•
•
•
Querying the service instance directly.
Using UDDI's tModel capabilities directly.
Extending the associated WSDL.
A separate mapping table/database
Directly putting the semantic descriptions in the UDDI-M
metadata for the view
• Service discovery during workflow enactment.
Roles of Ontologies in myGrid
Service and workflow
Describing, Linking and
matching
Indexing
& provisioning
Provenance records
Service & resource
registration &
discovery
Ontologies
Annotating
resources
Help
Knowledge-based guidance
and recommendation
Composing and validating
workflows and service
compositions & negotiations
Change & event
Notification topics
Controlling contents of,
and indexing, metadata and data
Schema mediation
Geodise: Knowledge Management for Engineering Design Search and
Optimisation
Addresses issues in the process of acquiring, modelling and sharing domain knowledge
using state of the art technology of Ontology and semantic web as well as traditional best
practice of rule based reasoning and templates, etc.
Ontology
service
You may be able to do
objectiveFunctionAnalysis using
“meshFile”, but still need input
(“fluentJournalFile”)
Rule based
reasoning for
workflow
advising using
JESS
To
resource
repository
Convert to
.m file
submit
Task
ontology
Ontology
assisted editor
for gambit
journal file
MATLAB
Engine
Ontology
representing
the
GEODISE
domain with
elements
selected that
will be used
to enrich a
piece of
design
workflow
Geodise
An ontology log file in the right pane that has been annotated in RDF to
describe the design parameter contained in the middle pane. The
whole log can be enriched in this way the aim being to automate this
process as far as possible
Grid Interoperability Project
Interoperable Resource Broker
Resource Discovery
Service
NJS
NJS
Delegates resource check
Broker
Broker
Unicore Broker
Diagram
Of Broker
Architecture
Other
Brokers
Globus Broker
Delegates translation
Lookup
resources
IDB
Nodal
Grid Search
Filter
Uses to
Drive MDS
Search
Uses to drive
MDS search
Translator
Filter
Ontology engine
Resource Discovery
Service
Hierarchical
Grid Search
[Brooke]
Take home message
• The Grid is metadata driven middleware
– Schemas and ontologies are prevalent and
pervasive for carrying semantics
• Information finding, integration and exchange is a
core component of Grid computing
• Semantics for applications of the Grid
• Semantics for the infrastructure of the Grid
• The Grid as a mechanism for delivering ontology and
schema services?
• Context: myGrid, Geodise and GRIP
http://www.mygrid.org.uk/
http://www.geodise.org
Spares
myGrid ThreeTier
Architecture
DAML+OIL / OWL Ontology Languages
• DAML+ OIL designed to describe and reason over
ontologies
• Maps to RDF and RDFS
• Ontologies incorporate information about classes,
properties, and individuals (instances), each of which
can have an ID which is URI reference.
– sequence of axioms and facts
– inclusion references to other ontologies
• Ontologies can also reference XML Schema
datatypes, by a name for the datatype
• Equivalent to the expressive Description Logic SHIQ
• W3C OWL Web Ontology Language 1.0 Reference
– http://www.w3.org/TR/owl-ref/
Semantic Grid: the gap
•
•
•
A gap between grid computing endeavours and the vision of
Grid computing
– high degree of easy-to-use and seamless automation
– flexible collaborations and computations on a global scale.
To support the full richness of the grid computing vision we need
knowledge to be explicitly asserted & explicitly used.
The Semantic Grid http://www.semanticgrid.org
Download