Semantic Web for Life Sciences 14th Annual PRISM Forum NeSC Edinburgh UK

advertisement
Semantic Web for Life Sciences
14th Annual PRISM Forum
NeSC
Edinburgh UK
5/2/2005
Eric Neumann
1
Outline
1. Why Semantics in Drug Discovery are important
2. What is the Semantic Web? Some Basics
3. Working Example: BioDASH
4. Semantic Web for Life Sciences
Eric Neumann
2
Why the focus on Semantics?
Scientists define the semantics of their information
Consequently, databases today are not directly usable or
extensible by the scientists
Shift from passive data exchange (hidden semantics) to
interpretable information (explicit semantics)
semantic interoperability
Different models can be connected together if semantics
are clear and explicit
New set of standards based on RDF-OWL
Optimal Situation: If defining semantics led directly to data
structures (RDBM, KB, XML, Documents)
Eric Neumann
3
Information Technologies: where are things
headed?
Tools are available to develop solid systems where clear
requirements can be obtained
APIs are useful if there is a common model for information
handling
As applications become more complex, it is necessary to
include semantics into such common models
Semantic Interoperability requires a different methodology
to develop applications and database systems
Eric Neumann
4
Modern Drug Discovery Challenges
Qualified
Targets
Lead
Optimization
Lead
Generation
Toxicity &
Safety
Molecular
Mechanisms
Biomarkers
Pharmacogenomics
Clinical
Trials
Eric Neumann
5
New Regulatory Issues Confronting Pharmaceutics
Safety/Efficacy
ADME Optim
from Innovation or Stagnation, FDA Report March 2004
9Select an Advanced Scientifically Qualified Target
9Screening
9Hit Evaluation
9Selecting a Lead
9Compound Optimization
9Select an Early Development Compound (EDC)
Eric Neumann
6
Drug Discovery Opportunities for Semantic Web
Aggregation for Discovery Research Knowledge
Ever increasing, especially with Systems Biology
Info Extraction from Scientific Literature
HTS models and Lead-Target knowledge-bases
Compound-Target Spaces, Relationships, and Rules
Patents and Competitive Intelligence
PreClinical testing: safety/tox and cross-species relations
Biomarkers and integration of multiple platforms
Clinical Trials Consolidation with CVs
eCTD and alignment with Healthcare semantics
Future Submission Mechanisms
Improve submission process
Better Use of Regulatory Documents
Support for FDA’s Critical Path Initiative
Eric Neumann
7
Semantic Issues facing Drug Discovery
Common meta-model(s) for all data
Connected Knowledge for Cross-Functional uses
Support for Decision Making
Mapping unstructured (text) information (IR)
More Productive queries
Discovery of all related and important facts
Internal system for organizing and aggregating important
information from users
insights, alerts, opportunities, best practices
related to all forms of internal information and data
Eric Neumann
8
Knowledge Aggregation
Networks
Eric Neumann
Courtesy of
9
BeyondGenomics
Where Can Semantic Web Help Now?
o
Papers about the hedgehog gene
o
Papers about the hedgehog gene
o
Papers that disagree with this one
o
The paper where this idea first came from
o
The most commonly cited reviews about prions
o
The names and contact details of authors who
have used method W to investigate protein X
o
Molecular biology research groups within 100
miles of Boston that have used method Y
o
The work/collaborations of Dr Z source: Nature
Publishing Group
Eric Neumann
Even Google can’t
help much
10
Proposed Benefits of Semantic Web in Pharma
Create “minimal network” connecting the communities
involved in making decisions
Publish meaning, not just data
Allow individuals (trusted adopters at first, then more
widespread) to edit, annotate and publish their knowledge
Working across boundaries
Scalable, evolutionary and decentralized record of
knowledge created and used
Eric Neumann
11
Communities of Practice
(A KM Concept)
Dense relations of mutual engagement
organized around a motivating principle
(Wenger)
Semantic Web enables engagement
Formal / understood
Informal / evolutionary
Preserves diversity and encourages serendipity
Eric Neumann
12
Semantics in Communities
Multiple communities and subcommunities
Biology: Genes, proteins, pathways MoA
Chemistry: Structure, activity, lipinski
Animal screen: ADME, tox
Legal: Intellectual property
Business: Market size, competition, cost
Regulatory: FDA compliance
Informal understanding of community
intersections inherent in drug discovery, but
reliant on human input for understanding
The first companies to codify and automate
elements of understanding have an edge
Eric Neumann
13
Boundary and Semantics
Boundary objects: artifacts, documents,
reification around which communities can be
organized
Brokering: transferring elements of knowledge
from one community to another (people driven)
Semantics – context – critical to success of both
concepts
Eric Neumann
14
Strawman: Pharma as Investor
Drug discovery not dissimilar to venture
investing: multiple bets, aiming for the long
home run
Data submitted from multiple areas of company
(genomics,proteomics, combichem, ADME)
Processed within therapeutic areas
Add layers of geographic, cultural and lexical
complexity
Boundary objects and brokers critical to
creating knowledge feedback
Eric Neumann
15
Semantic Web – Boundary Development
Design philosophy of SW creates technical
space to create boundary systems
Easy to quickly reify central dogma (not married
to a schema afterwards)
Easy to publish – like the web
Anyone can publish a boundary object, edit
another's boundary object, or comment
Eric Neumann
16
Brokering Knowledge
FDA: when is enough to submit? Pharma case
80% of application done in strong TA
Elected not to file, spent three years finishing final
20%
Beaten to market
Hard lessons learned – how to broker those
lessons across the enterprise?
Eric Neumann
17
Semantic Web – Broker ID
If everyone can publish knowledge in structured
form, easier to identify the brokers
Their “subgraphs” (hypotheses, models,
statements of belief) become more heavily
connected
Like the web – Google ranks sites by connectivity...
Eric Neumann
18
The Current Web
¾
¾
¾
Eric Neumann
What the computer
sees: “Dumb” links
No semantics - <a href>
treated just like <bold>
Minimal machineprocessable
information
19
The Semantic Web
¾
¾
Eric Neumann
Machine-processable
semantic information
Semantic context
published – making the
data more informative
to both humans and
machines
20
Semantic Web Featured Elements
Designed to work on Web backbone
RDF is a kind of XML
All referenced resources are URIs
Distributed RDF data forms a graph, that can be merged with other
RDF graphs whose nodes coincide
Ontologies are defined using OWL
3 levels of logic
OWL is a form of RDF
Documents can reference multiple OWL ontologies
namespaces
RDF-OWL can be queried
Rules (SW) can be applied to perform inferences and productions
Eric Neumann
21
The Technologies: URI
Uniform Resource Identifiers (URIs)
URI has two different uses:
Unambiguous name for something
Location of a document (URL)
URIs can be used to identify definitions for
concepts
Especially useful for ontologies & metadata
http://www.w3.org/2004/10/jtw-virtconf.html
Eric Neumann
22
The Technologies: RDF
Think: "Relational Data Format"
W3C standard for making statements of fact or
belief
Descriptive statements are expressed as triples:
(Subject, Verb, Object)
We call verb a “predicate” or a “property”
Subject
Eric Neumann
Property
Object
23
RDF: Represents Knowledge
Sources: literature, databases
Anywhere an assertion of relationship is
made
“Triples” connect to one another,
allowing Integration across databases
Eric Neumann
24
Example (using N3)
:GSK3betaTx1 a ls:Transcript ;
ls:expressedBy :GSK3beta ; # Gene
ls:unigene "u434343" ; # "unigeneID“
ls:translatesTo "MBGVGTANAC" ; # Literal
ls:fullCds [ls:startsAt "1" ; ls:stopsAt "345" ] ;
ls:hasExons (:ex1 :ex2 :ex3) ; # ls:Exon ;
ls:hasAnnotation "The only known transcript" .
:GSK3beta_Tgt a ls:Target ;
ls:references :GSK3beta ;
ls:inPathway :Wnt ;
ls:for :DBP, :AKAP, :CHIR99021, :CHIR98014, :ARA014418, :SB216763,
SB415286, :bisindolmaleimide, :TDZD-8, :OTDZT, :CDT-ethanone ;
ls:for :DBP, :AKAP, :CHIR99021, :CHIR98014, :ARA014418, :SB216763 ;
ls:code "GSK-3beta" ;
ls:xref <urn:lsid:uniprot.org.lsid.biopathways.org:uniprot:P49841> ;
ls:contextDisease :DiabetesType2 .
:DiabetesType2 a ls:Disease ;
rdfs:label "Type 2 Diabetes" ;
dc:comment "nonjunvenile diabetes " ;
ls:omim <urn:lsid:ncbi.nlm.nih.gov.lsid.biopathways.org:omim:125853> ;
ls:affectsTissue nlm:Muscle, nlm:Liver, nlm:Spleen .
Eric Neumann
25
Bringing together Databases
SPARQL
RDBM
SW-Space
RDF
Query
RDB-Access
“SQARQL”
RDF
RDBM
Results
RDBM
Eric Neumann
26
Why begin to utilize Semantic Web technology?
Any data or docs that can be accessed over the intranet can be
semantically linked (aggregated) today
All our resources can be linked together in a navigatable and
searchable way
Supports web-based use of CV and ontologies today
Local copy storage is a bad idea
OWL is fast becoming the Ontology exchange format
Easily created and maintained within a de-centralized model
Use to manage structure and content of intranet sites
Newsfeeds based on SW (Information Library Services)
RSS 1.0 = RDF
Works with current web technologies
No proprietary technologies needed
Eric Neumann
27
Low Hanging Fruit:
Bridging Ontologies via Rules
Projecting protein properties onto compounds
Cmpd
{:cmpd :targets ?prot .
?prot :bio-process ?proc }
Î
{:cmpd :affectsProcess ?proc }.
:affectsProcess
:targets
Prot
GO:Proc
Eric Neumann
28
Key Ingredient: Life Science Identifiers (LSID)
To be used for all bio molecular and chemical entities
Retro-fit legacy data (+ revisions), as well as supporting meta-data
linked to these data objects
URI’s, URL’s, and URN’s
Resolvers and Handlers - best practice needed
Prototype Resolver @ http://lsid.biopathways.org
Casein Kinase 1 :
urn:lsid:uniprot.org.lsid.biopathways.org:uniprot:P48729
Eric Neumann
29
Use Case:
Aggregation of Drug Discovery Data with
Multiple Ontologies
Eric Neumann
30
From Portal…
…to Knowledge Aggregator
Related sites
Participants: R. Lewis, V. Mikol, Tom Glenn, Eric Davidson,
Eric Neumann, Beth Koch, Daniel Schirlin
•Introduction of participants
•Role of Eric Neumann and expectations: take advantage of and transfer
•owledge; not so much technology as how to best build on
•knowledge, competencies, and networks. Brief rec
•ap of material presented by subgroups at Wor
•kshop on day 3 (shared via e-mail)
•Short update on Druggable Workshop outcomes
•Discussion of the 3 proposed actions plus a fourth potential one:
• Target Validation presented by subgroups
•ap of material presented by subgroups at Wor
•kshop on day 3 (shared via e-mail)
•Short update on Druggable Workshop outcomes
•Discussion of the 3 proposed actions plus a fourth potential one:
• Target Validation
Entrez Genome
Genomic Biology
GEO
HomoloGene
Map Viewer
OMIM
RefSeq
UniGene
UniSTS
Feedback
Help Desk
Corrections
K-Net
<target id=m#gsk3b>
<dis res=m#ALZ>
<struct res=m#gsk3>
<loc res=m#csf>
<mech res=m#prosCanc>
<gene res=m#gsk3beta>
<mech res=m#wnt>
GeneRIFs:
1. 6-OHDA inhibited phosphorylation of GSK3beta at Ser9, & induced hyperphosphorylation of Tyr216 with little effect on expression. GSK3beta
is a critical intermediate in pro-apoptotic signaling cascades that are associated with neurodegenerative diseases.
About GeneRIFs
PubMed2. A glycogen synthase kinase 3-beta promoter gene single nucleotide polymorphism is associated with age at onset and response to total
sleep deprivation in bipolar depression.
Subscriptions
PubMed3. the reduction in brain GSK-3beta is reflected in CSF of schizophrenia patients
PubMed4. overexpression of human GSK-3beta in skeletal muscle of male mice resulted in impaired glucose tolerance despite raised insulin levels
PubMed5. GSK3 beta may function as a repressor to suppress AR-mediated transactivation and cell growth
RefSeq
Gene
Map Viewer
PubMed6. a mechanism involving GSK-3beta activation may be responsible for tumor necrosis factor-related apoptosis-inducing ligand resistance
in prostate cancer cells
PubMed7. GSK3beta is connected to tau by 14-3-3 and Ser(9)-phosphorylated GSK3beta phosphorylates tau
PubMed8. Most importantly, knocking down GSK-3beta expression via a small interference RNA-mediated gene silencing approach also reduced
R1881-stimulated gene expression, demonstrating the specificity of GSK-3beta involvement.
PubMed9. glycogen synthase kinase-3 beta phosphorylates the androgen receptor, thereby inhibiting androgen receptor-driven transcription
Eric Neumann
31
Haystack Semantic Web Browser – MIT/IBM
http://haystack.lcs.mit.edu
Eric Neumann
32
BioDASH Topic View
Eric Neumann
33
Bridging Chemistry and Molecular Biology
•Different Views have different semantics:
Lenses
• When there is a correspondence between
objects, a semantic binding is possible
Uniprot:P49841
Apply Correspondence Rule:
if ?target.xref.lsid == ?bpx:prot.xref.lsid
then ?target.correspondsTo.?bpx:prot
Eric Neumann
34
Bridging Chemistry and Molecular Biology
•Lenses can aggregate, accentuate,
or analyze new result sets
• Behind the lens, the data can be
persistently stored as RDF-OWL
Eric Neumann
35
Pathway Polymorphisms
•Identify targets with
lowest chance of
variance
•Predict parts of
pathways with
highest variability
•Select mechanisms
of action that are
minimally impacted
by polymorphisms
Eric Neumann
36
Pathway Semantic Lens
add { :predicateSet
rdf:type graph:CollectionPredicateSet ;
rdf:type graph:PredicateSet ;
dc:title "BioPAX pathway arrows" ;
hs:member biopax:NEXT-STEP ;
hs:member :pointingTo ;
hs:member ${
rdf:type
vowl:RDFQueryLens ;
vowl:sourceExistential
?s ;
vowl:targetExistential
?t ;
rdfs:label
"" ;
vowl:existentials
@( ?s ?t ?type ) ;
vowl:statement ${
vowl:subject
?type ;
vowl:predicate biopax:LEFT ;
vowl:object
?s
} ;
vowl:statement ${
vowl:subject
?type ;
vowl:predicate biopax:RIGHT ;
vowl:object
?t
}
}
}
Eric Neumann
37
Power of Semantic Lenses in Research
Separates information collection and presentation from
information processing: not all require coding!
Database federation can be achieved using lenses
Allows users to create powerful context-specific views of
combined information, that can be annotated and shared
Lenses do not require programming, can be extended, and
can be shared/traded
Less development time, more definition be scientistsÆ
More can be achieved in less time and for less cost!
Eric Neumann
38
Semantic Web for Life Sciences
An Open Scientific Forum for
Defining Cross-Disciplinary Life Science needs
Show-Casing Working Examples
Initiating SW Work Groups
Capturing Best Practices
Charter almost completed
Promote LSID awareness and use
Sandbox for BioDASH demo and semantic lenses
Identify Semantic Issues for CT and HC
Recent members include Merck and caBIG/NCI
Eric Neumann
39
Semantic Web for Life Sciences Participants
MIT, Oct 27, 28
Jackson Laboratories
Berlex Biosciences
Novartis
SanofiSanofi-Aventis
Woods Hole
Oceanographic Institute
Fred Hutchinson Cancer Research
Center
Infinity
Pharmaceuticals
AstraZeneca R&D
Elsevier
Millenium
Pharmaceuticals
Nature Publishing Group
Pacific Northwest National Laboratory
Stanford Medical
Informatics
Harvard Partners
Affymetrix
Mayo Clinic
American Chemical
Society
European Bioinformatics Institute
National Science
Foundation
Hewlett-Packard
Pfizer
Genentech
MacArthur Foundation
National Center for Genome Resources
Oracle
BioGrid
SemantxLS
PRISM Forum
Swiss Institute of
Bioinformatics
National Cancer Institue (Center for
Bioinformatics)
Children's Hospital
IBM
INRIA
University of
Michigan
University of
Massachusetts Boston
Harvard Medical School
AGFA Healthcare
MIT / CSBi
KEVRIC
Chevron Texaco
University of Cambridge
(UK)
Fujitsu Laboratories of America
Broad Institute / MIT
MITRE
Genstruct
Network Inference
Alzheimer's Research
Forum
German Cancer Research Center
Stanford Medical
Informatics
Annotea
BioPAX
HydroJoule
University of Manchester
VTT Finland
Matsushita / W3C
SkyPrise
Djinnisys
Siderean
Yale Center for Medical
Informatics
MIND (University of Maryland)
DSTC Pty Ltd
Technion – Israel Institute of
Technology
Columbia
University
Intelligent
Solutions
Panther Informatics
Image Bioinformatics Lab, University
of Oxford
University of
Colorado
Northeastern University
Tucana
Technologies
University of Georgia
Japan Biological Information
Consortium
University of Zurich
University of Michigan
Life Sciences
Insights
De Novo Pharmaceuticals
European Network of Excellence
REWERSE
Eric Neumann
Object
Management Group
40
Semantic Web Resources
Semantic Web - http://www.w3.org/sw/
SWLS Workshop Report - http://www.w3.org/2004/10/swlsworkshop-report.html
http://semwebcentral.org/
http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/
Tools
JENA (HP)
Haystack (MIT)
Longwell/SIMILE/Piggy-Bank (HP)
RDB Access http://www.w3.org/2003/01/21-RDF-RDB-access/
SPARQL
Eric Neumann
41
Acknowledgments
Melissa Cline, Affymetrix, Pasteur Inst.
Ryan Lee, MIT
Joanne Luciano, Harvard Medical School
Eric Prud’ hommeaux, W3C
Dennis Quan, IBM
Susie Stephens, Oracle
John Wilbanks, Science Commons/W3C
Ian Wilson, Univ Colorado
Eric Neumann
42
Download