Implementing Ontologies in (my)Grid environments Professor Carole Goble, Chris Wroe University of Manchester, UK myGrid project Geodise project http://www.mygrid.org.uk http://www.geodise.org GGF Sem-Grd co-chair http://www.semanticgrid.org EIC Journal of Web Semantic http://www.jws.ac.uk Semantic Web Science Association member Take home message • The Grid is metadata driven middleware – Schemas and ontologies are prevalent and pervasive for carrying semantics • Information finding, integration and exchange is a core component of Grid computing • Semantics for applications of the Grid • Semantics for the infrastructure of the Grid • The Grid as a mechanism for delivering ontology and schema services? • Context: myGrid, Geodise and GRIP The Grid Problem (Foster, Kesselman, Tueke) “flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources - what we refer to as virtual organizations." a low level framework to allow inter-operation of resources. mainly for the benefit of application developers deploy standard tasks on the Grid in a straightforward manner Challenging Technical Requirements • Dynamic formation and management of virtual organizations • Online negotiation of access to services: who, what, why, when, how • Configuration of applications and systems able to deliver multiple qualities of service • Autonomic management of distributed infrastructures, services, and applications • Management of distributed state as a fundamental issue Interacting with Grid middleware services • Empower the user or a process to discover and orchestrate Grid enabled resources as required. • Means cataloguing and indexing available resources using agreed vocabularies. – As in digital libraries • Many tasks involve the communication of information between sets of Grid services to perform a more complex overall goal. – Requires the adoption of standard schemas and semantics for data interchange between Grid services or a mechanism to map between schemas. myGrid • • • • EPSRC UK e-Science pilot project Open Source Upper Middleware for Bioinformatics Data intensive not compute intensive Sharing knowledge and sharing components myGrid: Integration of Life Science Information ID DE DE DE GN OS OC OC KW FT FT SQ MURA_BACSU STANDARD; PRT; 429 AA. PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE ENOLPYRUVYL TRANSFERASE) (EPT). MURA OR MURZ. BACILLUS SUBTILIS. BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; BACILLUS. PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). CONFLICT 374 374 S -> A (IN REF. 3). SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI Experiment life cycle Personalised registries Personalised workflows Info repository views Personalised annotations Personalised metadata Security Resource & service discovery Repository creation Workflow creation Database query formation Forming experiments Personalisation Discoverying and reusing experiments and resources Workflow discovery & refinement Resource & service discovery Repository creation Provenance Providing services & experiments Service registration Workflow deposition Metadata Annotation Third party registration Executing experiments Managing experiments Information repository Metadata management Provenance management Workflow evolution Event notification Workflow enactment Distributed Query processing Job execution Provenance generation Single sign-on authorisation Event notification myGrid in a nutshell • An example of a “second generation” open service-based Grid project, specifically a testbed for the OGSI, OGSA and OGSADAI base services; – myGrid Information Repository that is OGSA-DAI compliant • Developing high level services for data intensive integration, rather than computationally intensive problems; – Workflow & distributed query processing • Developing high level services for e-Science experimental management; – Provenance, change notification and personalisation • Developing Semantic Grid capabilities and knowledge-based technologies, such as semantic-based resource discovery and matching. – Metadata descriptions and ontologies for service discovery, component discovery and linking components. myGrid UTOPIA Third party applications Lab Workbench application Web Portal Resource annotations Ontology Services Ga tew Shared metadata and data repositories mIR ay Semantic-based Services Inference engines Service & resource registration & discovery Personalisation e-Science Services Literature Provenance Change & event notification SoapLab Databases Distributed Query Processing Analytical Tools ab L ap o S Workflow Integration Services Service based architecture • Find them Publication, registration, discovery, matchmaking, deregistration. • Run them. Execution, monitoring, exception handling. Organise them. Interoperation, composition, substitution. • • • Each bio resource is a service – Database, archive, analysis, tool, person, instrument, a workflow … Each myGrid architectural component is a service – Workflow enactment engine, event notification service, registry, scheduler… Services come and go Services are not owned by the user Services have different levels and kinds of metadata Realizing a Service-Oriented Architecture How Do I • • • • • • • • • Create, name, manage, discover services? Render resources, data, sensors as services? Negotiate service level agreements? Express & negotiate policy? Organize & manage service collections? Establish identity, negotiate authentication? Manage VO membership & communication? Compose services efficiently? Achieve interoperability? Metadata: agreed and shared schemas and vocabularies (Foster, Argonne Labs, 2003) Roles of a service ontology • • • • • • Discovery of an appropriate Web Service within a registry by its properties and capabilities; Invocation by some agent/service; Interoperability is increased by describing the semantic type of inputs and outputs; Composition of new services; Verification of a service’s properties; Execution monitoring by tracking what is happening to the described aspects of a service and its sub-services. Services: Soaplab & EMBOSS Workflows and Services • • • Workflow specification – Finding classes of services – Blastn compares a nucleotide query sequence against a nucleotide sequence database (usually – intelligent misuse of services…) – Guiding service composition – Service A outputs compatible with Service B inputs Dynamic workflow service invocation and service discovery – Choose services instances when running workflow Workflow discovery – Finding workflows that others have done, and that I have done myself Bioinformaticians Exemplars Graves Disease Lab Book Workflow Editor Tool Providers Talisman Generic Applications LSID/R Gateway Service Registration & Discovery Personalisation Knowledge Mgt Provenance Metadata Mgt Notification Workflow enactment Service providers Portal Information Repository Core components Distributed Query Processing Soaplab Communication fabric Bio Services Text Extraction Services Architecture Knowledge Knowledge Services Ontology Server Service Semantic registration c Stru s regi l a r tu on trati Service Registry Registry Reasoner UDDI Matcher KB Store Registry View Notification Notification Service Service RDF-based UDDI Service Discovery Test Data Discover Workflow or Service JMS Workflow enactment engine Provenance service mIR Skufl & WSFL mG Object Discovery m Info Repository Workflow templates Workflow instances Metadata Concepts Provenance Data DB2 DB2 Distributed Query Processor Job Execution Information Extraction PESTO Service Service Service SoapLab Architecture Knowledge Knowledge Services Ontology Server Service Semantic registration c Stru s regi l a r tu on trati Service Registry Registry Reasoner UDDI Matcher KB Store Registry View Notification Notification Service Service RDF-based UDDI Service Discovery Test Data Discover Workflow or Service JMS Workflow enactment engine Provenance service mIR Skufl & WSFL mG Object Discovery m Info Repository Workflow templates Workflow instances Metadata Concepts Provenance Data DB2 DB2 Distributed Query Processor Job Execution Information Extraction PESTO Service Service Service SoapLab Service Discovery Requirements • • • • • • descriptions must be attached to different resources – services and workflows descriptions maybe in different myGrid services – registries and myGrid Information Repositories publication of descriptions must be supported both for the author of the service and third parties; third party annotations are a view of a service and discovery should offer a variety of views based upon third party annotations; there is a need for control over who make add and alter third party annotations; we must support two types of discovery: – using cross-domain knowledge independent of application • Quality of service, ownership, location, organisations … – requiring access to common application domain ontologies • Biology and bioinformatics Discovering services based on their application domain properties • • • • Finding a service that will fulfil some task e.g. aligning of biological sequences. – What services perform a specific kind of task Æ what services can I used to perform a biological sequence similarity search? Finding a service that will accept or produce some kind of data. – What services produce this kind of data Æ from where can I find sequence data for a protein? – What services consume this kind of data Æ if I have protein sequence data, what can I do with it? Class of service: – a protein sequence alignment, a protein sequence database Specific example of an abstract service: – BLAST, BLASTn, SWISS-PROT Applies to class of services and workflow specifications Representing and using domain metadata • • • • • • • • Classification of services/workflows Imprecise (best effort) substitutions of services/workflows Service/workflow organisation & indexing, matching and substitution – “BLAST” finds tblastx, tblastn, psi-blast, marks_super_blast. – “Alignment” finds ClustalW, Blast, Smith-Waterman, Needleman-Wunsch Expanded selection of services based on expansion of in-hand object A vocabulary for expressing service descriptions without predetermining every description A reasoning process to manage: – coherency of the classifications and the descriptions when they are created, – the service discovery, matching and composition when they are deployed. Ontologies in DAML+OIL/OWL based on the DAML-S ontology W3C OWL Web Ontology Language 1.0 Reference – http://www.w3.org/TR/owl-ref/ myGrid service classification Taxonomic approach Protein pairwise alignment missing Classification: By operation? By data source? By algorithm? Sequence alignment Pairwise SmithWaterman Multiple BLAST BLASTn Implicit: over nucleotide sequences Alignment applies to sequences not pathways and needs 2 at least 2 inputs BLASTp Implicit: over protein sequences tBLASTn Originally Based on DAML-S • US DARPA Agent Markup Language – Services http://www.daml.org • An Upper Ontology for Services Resource What it does provides presents Service profile description functionalities Service supports describedBy Service grounding Service model functional attributes How to access it How it works Why pick a Description Logic style language? • Descriptive capability – Directly and richly describe the properties of a service – Compositional – conceptual lego – Post-coordinated composition SNPolymorphism of CFTRGene causing Defect in MembraneTransport of ChlorideIon causing Increase in Viscosity of Mucus in CysticFibrosis… • Non-predetermined descriptions are classified too. • Classification and reasoning based on the properties – Define the properties and the structure takes care of itself – The inverse of OO style ontology building. – If the properties change so does the classification Hand which is anatomically normal Reasoning in DAML+OIL / OWL • • • • • Consistency – check if knowledge is meaningful Subsumption – structure knowledge, compute classification Equivalence – check if two classes denote same set of instances Instantiation – check if individual i instance of class C Retrieval – retrieve set of individuals that instantiate C Suite Specialises. All concepts are subclassed from those in the more general ontology. Upper level ontology Task ontology Informatics ontology Web service ontology Contributes concepts to form definitions. Molecular Publishing Organisation parameters: output, ontology biology ontologyinput, ontology precondition, effect performs_task uses-resource Bioinformatics is_function_of ontology Discovering services based on their operational properties • • • • • • • • What resources does a specific organisation provide? Who authored this resource? Third party metadata What services offering x currently give the best quality of service? Which service would the local bioinformatics expert suggest we use? Data quality, quality of service, cost, geographical location, authorisation, provenance of data and so on. Instance service description of a specific service – BLAST, SWISS-PROT as offered by the EBI is 80% reliable. Invoked instance service description – BLAST as offered by the EBI on a particular date, with particular parameters when a service invoked. Applies to instances of services and workflows RDF based UDDI metadata for service instances What do we need to discover Tiered levels of abstraction Classes of services Domain “semantic” Unexecutable “Potentials” Description Sequence alignment Ontology Description Blastn Ontology Instances of services Description Business “operational” Executable Data model Ontology “Actuals” Description Service Data Element Blastn@EBI Blastn@EBI invoked proxy myGrid Find Service Word-based discovery Discovery Client Find Service Syntactic discovery UDDI-M RDF Semantic discovery Ontologies Views Views Ontology Server Third party description publishes Service Gather service descriptions Org. registry Third Party publishes Public registry UDDI WSDL Description Store Reasoner FaCT Matcher KAON Pedro interface to Service Discovery Multiple (data) types….BLASTn Conceptual Type in Ontology “plumbing syntax” in XSD in WSDL Life Science ID urn:LSID:ebi.ac.uk:SWISS-PROT/accession:P34355:3 Formats: FASTA, BSML, Agave … MIME types 1. User selects values from a drop down list to create a property based description of their required service. Values are constrained to provide only sensible alternatives. 2. Once the user has entered a partial description they submit it for matching. The results are displayed below. 3. The user adds the operation to the growing workflow. 4. The workflow specification is complete and ready to match against those in the workflow repository. Open questions • Operational metadata – simple data model (in RDF) or ontology? Does it matter? • Associating semantic types with either service instances or WSDL abstract service descriptions • • • • • Querying the service instance directly. Using UDDI's tModel capabilities directly. Extending the associated WSDL. A separate mapping table/database Directly putting the semantic descriptions in the UDDI-M metadata for the view • Service discovery during workflow enactment. Roles of Ontologies in myGrid Service and workflow Describing, Linking and matching Indexing & provisioning Provenance records Service & resource registration & discovery Ontologies Annotating resources Help Knowledge-based guidance and recommendation Composing and validating workflows and service compositions & negotiations Change & event Notification topics Controlling contents of, and indexing, metadata and data Schema mediation Geodise: Knowledge Management for Engineering Design Search and Optimisation Addresses issues in the process of acquiring, modelling and sharing domain knowledge using state of the art technology of Ontology and semantic web as well as traditional best practice of rule based reasoning and templates, etc. Ontology service You may be able to do objectiveFunctionAnalysis using “meshFile”, but still need input (“fluentJournalFile”) Rule based reasoning for workflow advising using JESS To resource repository Convert to .m file submit Task ontology Ontology assisted editor for gambit journal file MATLAB Engine Ontology representing the GEODISE domain with elements selected that will be used to enrich a piece of design workflow Geodise An ontology log file in the right pane that has been annotated in RDF to describe the design parameter contained in the middle pane. The whole log can be enriched in this way the aim being to automate this process as far as possible Grid Interoperability Project Interoperable Resource Broker Resource Discovery Service NJS NJS Delegates resource check Broker Broker Unicore Broker Diagram Of Broker Architecture Other Brokers Globus Broker Delegates translation Lookup resources IDB Nodal Grid Search Filter Uses to Drive MDS Search Uses to drive MDS search Translator Filter Ontology engine Resource Discovery Service Hierarchical Grid Search [Brooke] Take home message • The Grid is metadata driven middleware – Schemas and ontologies are prevalent and pervasive for carrying semantics • Information finding, integration and exchange is a core component of Grid computing • Semantics for applications of the Grid • Semantics for the infrastructure of the Grid • The Grid as a mechanism for delivering ontology and schema services? • Context: myGrid, Geodise and GRIP http://www.mygrid.org.uk/ http://www.geodise.org Spares myGrid ThreeTier Architecture DAML+OIL / OWL Ontology Languages • DAML+ OIL designed to describe and reason over ontologies • Maps to RDF and RDFS • Ontologies incorporate information about classes, properties, and individuals (instances), each of which can have an ID which is URI reference. – sequence of axioms and facts – inclusion references to other ontologies • Ontologies can also reference XML Schema datatypes, by a name for the datatype • Equivalent to the expressive Description Logic SHIQ • W3C OWL Web Ontology Language 1.0 Reference – http://www.w3.org/TR/owl-ref/ Semantic Grid: the gap • • • A gap between grid computing endeavours and the vision of Grid computing – high degree of easy-to-use and seamless automation – flexible collaborations and computations on a global scale. To support the full richness of the grid computing vision we need knowledge to be explicitly asserted & explicitly used. The Semantic Grid http://www.semanticgrid.org