Semantic Web for Life Sciences 14th Annual PRISM Forum NeSC Edinburgh UK 5/2/2005 Eric Neumann 1 Outline 1. Why Semantics in Drug Discovery are important 2. What is the Semantic Web? Some Basics 3. Working Example: BioDASH 4. Semantic Web for Life Sciences Eric Neumann 2 Why the focus on Semantics? Scientists define the semantics of their information Consequently, databases today are not directly usable or extensible by the scientists Shift from passive data exchange (hidden semantics) to interpretable information (explicit semantics) semantic interoperability Different models can be connected together if semantics are clear and explicit New set of standards based on RDF-OWL Optimal Situation: If defining semantics led directly to data structures (RDBM, KB, XML, Documents) Eric Neumann 3 Information Technologies: where are things headed? Tools are available to develop solid systems where clear requirements can be obtained APIs are useful if there is a common model for information handling As applications become more complex, it is necessary to include semantics into such common models Semantic Interoperability requires a different methodology to develop applications and database systems Eric Neumann 4 Modern Drug Discovery Challenges Qualified Targets Lead Optimization Lead Generation Toxicity & Safety Molecular Mechanisms Biomarkers Pharmacogenomics Clinical Trials Eric Neumann 5 New Regulatory Issues Confronting Pharmaceutics Safety/Efficacy ADME Optim from Innovation or Stagnation, FDA Report March 2004 9Select an Advanced Scientifically Qualified Target 9Screening 9Hit Evaluation 9Selecting a Lead 9Compound Optimization 9Select an Early Development Compound (EDC) Eric Neumann 6 Drug Discovery Opportunities for Semantic Web Aggregation for Discovery Research Knowledge Ever increasing, especially with Systems Biology Info Extraction from Scientific Literature HTS models and Lead-Target knowledge-bases Compound-Target Spaces, Relationships, and Rules Patents and Competitive Intelligence PreClinical testing: safety/tox and cross-species relations Biomarkers and integration of multiple platforms Clinical Trials Consolidation with CVs eCTD and alignment with Healthcare semantics Future Submission Mechanisms Improve submission process Better Use of Regulatory Documents Support for FDA’s Critical Path Initiative Eric Neumann 7 Semantic Issues facing Drug Discovery Common meta-model(s) for all data Connected Knowledge for Cross-Functional uses Support for Decision Making Mapping unstructured (text) information (IR) More Productive queries Discovery of all related and important facts Internal system for organizing and aggregating important information from users insights, alerts, opportunities, best practices related to all forms of internal information and data Eric Neumann 8 Knowledge Aggregation Networks Eric Neumann Courtesy of 9 BeyondGenomics Where Can Semantic Web Help Now? o Papers about the hedgehog gene o Papers about the hedgehog gene o Papers that disagree with this one o The paper where this idea first came from o The most commonly cited reviews about prions o The names and contact details of authors who have used method W to investigate protein X o Molecular biology research groups within 100 miles of Boston that have used method Y o The work/collaborations of Dr Z source: Nature Publishing Group Eric Neumann Even Google can’t help much 10 Proposed Benefits of Semantic Web in Pharma Create “minimal network” connecting the communities involved in making decisions Publish meaning, not just data Allow individuals (trusted adopters at first, then more widespread) to edit, annotate and publish their knowledge Working across boundaries Scalable, evolutionary and decentralized record of knowledge created and used Eric Neumann 11 Communities of Practice (A KM Concept) Dense relations of mutual engagement organized around a motivating principle (Wenger) Semantic Web enables engagement Formal / understood Informal / evolutionary Preserves diversity and encourages serendipity Eric Neumann 12 Semantics in Communities Multiple communities and subcommunities Biology: Genes, proteins, pathways MoA Chemistry: Structure, activity, lipinski Animal screen: ADME, tox Legal: Intellectual property Business: Market size, competition, cost Regulatory: FDA compliance Informal understanding of community intersections inherent in drug discovery, but reliant on human input for understanding The first companies to codify and automate elements of understanding have an edge Eric Neumann 13 Boundary and Semantics Boundary objects: artifacts, documents, reification around which communities can be organized Brokering: transferring elements of knowledge from one community to another (people driven) Semantics – context – critical to success of both concepts Eric Neumann 14 Strawman: Pharma as Investor Drug discovery not dissimilar to venture investing: multiple bets, aiming for the long home run Data submitted from multiple areas of company (genomics,proteomics, combichem, ADME) Processed within therapeutic areas Add layers of geographic, cultural and lexical complexity Boundary objects and brokers critical to creating knowledge feedback Eric Neumann 15 Semantic Web – Boundary Development Design philosophy of SW creates technical space to create boundary systems Easy to quickly reify central dogma (not married to a schema afterwards) Easy to publish – like the web Anyone can publish a boundary object, edit another's boundary object, or comment Eric Neumann 16 Brokering Knowledge FDA: when is enough to submit? Pharma case 80% of application done in strong TA Elected not to file, spent three years finishing final 20% Beaten to market Hard lessons learned – how to broker those lessons across the enterprise? Eric Neumann 17 Semantic Web – Broker ID If everyone can publish knowledge in structured form, easier to identify the brokers Their “subgraphs” (hypotheses, models, statements of belief) become more heavily connected Like the web – Google ranks sites by connectivity... Eric Neumann 18 The Current Web ¾ ¾ ¾ Eric Neumann What the computer sees: “Dumb” links No semantics - <a href> treated just like <bold> Minimal machineprocessable information 19 The Semantic Web ¾ ¾ Eric Neumann Machine-processable semantic information Semantic context published – making the data more informative to both humans and machines 20 Semantic Web Featured Elements Designed to work on Web backbone RDF is a kind of XML All referenced resources are URIs Distributed RDF data forms a graph, that can be merged with other RDF graphs whose nodes coincide Ontologies are defined using OWL 3 levels of logic OWL is a form of RDF Documents can reference multiple OWL ontologies namespaces RDF-OWL can be queried Rules (SW) can be applied to perform inferences and productions Eric Neumann 21 The Technologies: URI Uniform Resource Identifiers (URIs) URI has two different uses: Unambiguous name for something Location of a document (URL) URIs can be used to identify definitions for concepts Especially useful for ontologies & metadata http://www.w3.org/2004/10/jtw-virtconf.html Eric Neumann 22 The Technologies: RDF Think: "Relational Data Format" W3C standard for making statements of fact or belief Descriptive statements are expressed as triples: (Subject, Verb, Object) We call verb a “predicate” or a “property” Subject Eric Neumann Property Object 23 RDF: Represents Knowledge Sources: literature, databases Anywhere an assertion of relationship is made “Triples” connect to one another, allowing Integration across databases Eric Neumann 24 Example (using N3) :GSK3betaTx1 a ls:Transcript ; ls:expressedBy :GSK3beta ; # Gene ls:unigene "u434343" ; # "unigeneID“ ls:translatesTo "MBGVGTANAC" ; # Literal ls:fullCds [ls:startsAt "1" ; ls:stopsAt "345" ] ; ls:hasExons (:ex1 :ex2 :ex3) ; # ls:Exon ; ls:hasAnnotation "The only known transcript" . :GSK3beta_Tgt a ls:Target ; ls:references :GSK3beta ; ls:inPathway :Wnt ; ls:for :DBP, :AKAP, :CHIR99021, :CHIR98014, :ARA014418, :SB216763, SB415286, :bisindolmaleimide, :TDZD-8, :OTDZT, :CDT-ethanone ; ls:for :DBP, :AKAP, :CHIR99021, :CHIR98014, :ARA014418, :SB216763 ; ls:code "GSK-3beta" ; ls:xref <urn:lsid:uniprot.org.lsid.biopathways.org:uniprot:P49841> ; ls:contextDisease :DiabetesType2 . :DiabetesType2 a ls:Disease ; rdfs:label "Type 2 Diabetes" ; dc:comment "nonjunvenile diabetes " ; ls:omim <urn:lsid:ncbi.nlm.nih.gov.lsid.biopathways.org:omim:125853> ; ls:affectsTissue nlm:Muscle, nlm:Liver, nlm:Spleen . Eric Neumann 25 Bringing together Databases SPARQL RDBM SW-Space RDF Query RDB-Access “SQARQL” RDF RDBM Results RDBM Eric Neumann 26 Why begin to utilize Semantic Web technology? Any data or docs that can be accessed over the intranet can be semantically linked (aggregated) today All our resources can be linked together in a navigatable and searchable way Supports web-based use of CV and ontologies today Local copy storage is a bad idea OWL is fast becoming the Ontology exchange format Easily created and maintained within a de-centralized model Use to manage structure and content of intranet sites Newsfeeds based on SW (Information Library Services) RSS 1.0 = RDF Works with current web technologies No proprietary technologies needed Eric Neumann 27 Low Hanging Fruit: Bridging Ontologies via Rules Projecting protein properties onto compounds Cmpd {:cmpd :targets ?prot . ?prot :bio-process ?proc } Î {:cmpd :affectsProcess ?proc }. :affectsProcess :targets Prot GO:Proc Eric Neumann 28 Key Ingredient: Life Science Identifiers (LSID) To be used for all bio molecular and chemical entities Retro-fit legacy data (+ revisions), as well as supporting meta-data linked to these data objects URI’s, URL’s, and URN’s Resolvers and Handlers - best practice needed Prototype Resolver @ http://lsid.biopathways.org Casein Kinase 1 : urn:lsid:uniprot.org.lsid.biopathways.org:uniprot:P48729 Eric Neumann 29 Use Case: Aggregation of Drug Discovery Data with Multiple Ontologies Eric Neumann 30 From Portal… …to Knowledge Aggregator Related sites Participants: R. Lewis, V. Mikol, Tom Glenn, Eric Davidson, Eric Neumann, Beth Koch, Daniel Schirlin •Introduction of participants •Role of Eric Neumann and expectations: take advantage of and transfer •owledge; not so much technology as how to best build on •knowledge, competencies, and networks. Brief rec •ap of material presented by subgroups at Wor •kshop on day 3 (shared via e-mail) •Short update on Druggable Workshop outcomes •Discussion of the 3 proposed actions plus a fourth potential one: • Target Validation presented by subgroups •ap of material presented by subgroups at Wor •kshop on day 3 (shared via e-mail) •Short update on Druggable Workshop outcomes •Discussion of the 3 proposed actions plus a fourth potential one: • Target Validation Entrez Genome Genomic Biology GEO HomoloGene Map Viewer OMIM RefSeq UniGene UniSTS Feedback Help Desk Corrections K-Net <target id=m#gsk3b> <dis res=m#ALZ> <struct res=m#gsk3> <loc res=m#csf> <mech res=m#prosCanc> <gene res=m#gsk3beta> <mech res=m#wnt> GeneRIFs: 1. 6-OHDA inhibited phosphorylation of GSK3beta at Ser9, & induced hyperphosphorylation of Tyr216 with little effect on expression. GSK3beta is a critical intermediate in pro-apoptotic signaling cascades that are associated with neurodegenerative diseases. About GeneRIFs PubMed2. A glycogen synthase kinase 3-beta promoter gene single nucleotide polymorphism is associated with age at onset and response to total sleep deprivation in bipolar depression. Subscriptions PubMed3. the reduction in brain GSK-3beta is reflected in CSF of schizophrenia patients PubMed4. overexpression of human GSK-3beta in skeletal muscle of male mice resulted in impaired glucose tolerance despite raised insulin levels PubMed5. GSK3 beta may function as a repressor to suppress AR-mediated transactivation and cell growth RefSeq Gene Map Viewer PubMed6. a mechanism involving GSK-3beta activation may be responsible for tumor necrosis factor-related apoptosis-inducing ligand resistance in prostate cancer cells PubMed7. GSK3beta is connected to tau by 14-3-3 and Ser(9)-phosphorylated GSK3beta phosphorylates tau PubMed8. Most importantly, knocking down GSK-3beta expression via a small interference RNA-mediated gene silencing approach also reduced R1881-stimulated gene expression, demonstrating the specificity of GSK-3beta involvement. PubMed9. glycogen synthase kinase-3 beta phosphorylates the androgen receptor, thereby inhibiting androgen receptor-driven transcription Eric Neumann 31 Haystack Semantic Web Browser – MIT/IBM http://haystack.lcs.mit.edu Eric Neumann 32 BioDASH Topic View Eric Neumann 33 Bridging Chemistry and Molecular Biology •Different Views have different semantics: Lenses • When there is a correspondence between objects, a semantic binding is possible Uniprot:P49841 Apply Correspondence Rule: if ?target.xref.lsid == ?bpx:prot.xref.lsid then ?target.correspondsTo.?bpx:prot Eric Neumann 34 Bridging Chemistry and Molecular Biology •Lenses can aggregate, accentuate, or analyze new result sets • Behind the lens, the data can be persistently stored as RDF-OWL Eric Neumann 35 Pathway Polymorphisms •Identify targets with lowest chance of variance •Predict parts of pathways with highest variability •Select mechanisms of action that are minimally impacted by polymorphisms Eric Neumann 36 Pathway Semantic Lens add { :predicateSet rdf:type graph:CollectionPredicateSet ; rdf:type graph:PredicateSet ; dc:title "BioPAX pathway arrows" ; hs:member biopax:NEXT-STEP ; hs:member :pointingTo ; hs:member ${ rdf:type vowl:RDFQueryLens ; vowl:sourceExistential ?s ; vowl:targetExistential ?t ; rdfs:label "" ; vowl:existentials @( ?s ?t ?type ) ; vowl:statement ${ vowl:subject ?type ; vowl:predicate biopax:LEFT ; vowl:object ?s } ; vowl:statement ${ vowl:subject ?type ; vowl:predicate biopax:RIGHT ; vowl:object ?t } } } Eric Neumann 37 Power of Semantic Lenses in Research Separates information collection and presentation from information processing: not all require coding! Database federation can be achieved using lenses Allows users to create powerful context-specific views of combined information, that can be annotated and shared Lenses do not require programming, can be extended, and can be shared/traded Less development time, more definition be scientistsÆ More can be achieved in less time and for less cost! Eric Neumann 38 Semantic Web for Life Sciences An Open Scientific Forum for Defining Cross-Disciplinary Life Science needs Show-Casing Working Examples Initiating SW Work Groups Capturing Best Practices Charter almost completed Promote LSID awareness and use Sandbox for BioDASH demo and semantic lenses Identify Semantic Issues for CT and HC Recent members include Merck and caBIG/NCI Eric Neumann 39 Semantic Web for Life Sciences Participants MIT, Oct 27, 28 Jackson Laboratories Berlex Biosciences Novartis SanofiSanofi-Aventis Woods Hole Oceanographic Institute Fred Hutchinson Cancer Research Center Infinity Pharmaceuticals AstraZeneca R&D Elsevier Millenium Pharmaceuticals Nature Publishing Group Pacific Northwest National Laboratory Stanford Medical Informatics Harvard Partners Affymetrix Mayo Clinic American Chemical Society European Bioinformatics Institute National Science Foundation Hewlett-Packard Pfizer Genentech MacArthur Foundation National Center for Genome Resources Oracle BioGrid SemantxLS PRISM Forum Swiss Institute of Bioinformatics National Cancer Institue (Center for Bioinformatics) Children's Hospital IBM INRIA University of Michigan University of Massachusetts Boston Harvard Medical School AGFA Healthcare MIT / CSBi KEVRIC Chevron Texaco University of Cambridge (UK) Fujitsu Laboratories of America Broad Institute / MIT MITRE Genstruct Network Inference Alzheimer's Research Forum German Cancer Research Center Stanford Medical Informatics Annotea BioPAX HydroJoule University of Manchester VTT Finland Matsushita / W3C SkyPrise Djinnisys Siderean Yale Center for Medical Informatics MIND (University of Maryland) DSTC Pty Ltd Technion – Israel Institute of Technology Columbia University Intelligent Solutions Panther Informatics Image Bioinformatics Lab, University of Oxford University of Colorado Northeastern University Tucana Technologies University of Georgia Japan Biological Information Consortium University of Zurich University of Michigan Life Sciences Insights De Novo Pharmaceuticals European Network of Excellence REWERSE Eric Neumann Object Management Group 40 Semantic Web Resources Semantic Web - http://www.w3.org/sw/ SWLS Workshop Report - http://www.w3.org/2004/10/swlsworkshop-report.html http://semwebcentral.org/ http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/ Tools JENA (HP) Haystack (MIT) Longwell/SIMILE/Piggy-Bank (HP) RDB Access http://www.w3.org/2003/01/21-RDF-RDB-access/ SPARQL Eric Neumann 41 Acknowledgments Melissa Cline, Affymetrix, Pasteur Inst. Ryan Lee, MIT Joanne Luciano, Harvard Medical School Eric Prud’ hommeaux, W3C Dennis Quan, IBM Susie Stephens, Oracle John Wilbanks, Science Commons/W3C Ian Wilson, Univ Colorado Eric Neumann 42