Joanne S. Luciano, PhD
Predictive Medicine, Inc., Belmont, MA (predmed.com)
Rensselaer Polytechnic Institute, Troy, NY (twc.rpi.edu)
Semantic Computing in Healthcare Industry
September 16-18, 2013
Hyatt Regency Irvine, Irvine, CA, USA
Joanne S. Luciano
Email: jluciano@rpi.edu
jluciano@predmed.com
Enabling
Health and
Wellbeing through
Knowledge
Technology
Interests
Community
BS, MS Computer Science
PhD Cognitive and Neural Systems
(Computational Neuroscience)
Wang Labs
Harvard Medical School
MITRE
Lotus Development
Predictive Medicine, Inc.
Rensselaer Polytechnic Institute
Flying planes, climbing rocks, traveling, balancing rocks, art & music, tbd
BioPathways Consortium, BioPAX, W3C
HCLSIG
1.
Introduction (10 minutes)
1.
Scope
1.
BioMed Domain not Ontology Building
2.
Overview
3.
Background
1.
2.
3.
Health Care, Life Science, and People
Reference or Application
Function (Use Case)
2.
Examples 3 reference, 1 application, Best Practices (40 Minutes)
1.
UMLS – High level across biomedicine (5)
2.
BioPAX – Mid level – biological pathways (10)
3.
Gene Ontology (“GO”) – Gene annotation (5)
4.
Influenza Ontology (5)
5.
Best Practices (10)
1.
Process: Start with Use Case, develop prototype
2.
Details: Converting Data to RDF to support Linked Open Data
3.
Standards: BioMedical Ontology Best practices (BioPortal, BFO, SIO)
4.
Get Involved!
3.
Conclusion (5 minutes)
1.
Recap, Where to go for more information
Scope
BioMed Domain
Overview
What will be presented
Background
1.
Domain: Health Care, Life Science, and People
2.
What are you building: Reference vs. Application
3.
Why: Function (Use Case)
Health Care & Life Science
The Open Biological and Biomedical Ontologies http://www.obofoundry.org
Goal: a suite of orthogonal interoperable reference ontologies
Barry Smith
U Buffalo, NCBO
From: Nat Biotechnol. 2007 November; 25(11): 1251.
doi: 10.1038/nbt1346
Knowledge Management
Annotate data (such as genomes)
Access information (search, find, and access)
Map across ontologies relate
Data integration and exchange
Model dynamic cellular processes
Identify Drug Interactions
Decision support
SafetyCodes
Diabetic Care
Lab Alerts
(Bodenreider YBMI 2008) http://themindwobbles.wordpress.com/2009/05/04/olivier-bodenreider-nlm-bestpractices-pitfalls-and-positives-cbo-2009/
1.
Domain: Health Care, Life Science, and People
1.
Times have changed
2.
Data Driven Medicine
3.
Health Care Singularity
2.
What are you building: Reference vs. Application
1.
Ontology Spectrum
2.
Reference vs Application Ontology
3.
Why: Function (Use Case)
1.
Link, Aggregate, Search, Integrate, etc.
Aging population (end of life costly)
More people with chronic illnesses
The end of the blockbuster era
Personalized Medicine (right treatment to the right patient at the right time)
Need lower cost drug development
Improved patient response to treatment
(Evidence Based)
Web and Mobile
The technology itself (ubiquitous)
Patients increasingly engaging
9
Photo: http://www.flickr.com/photos/sepblog/4014143391/
10
Existing formalisms
Weak
Semantics
Strong
Semantics
Reuse of terminological resources for efficient ontological engineering in Life Sciences by Jimeno-Yepes, Antonio; Jiménez-Ruiz, Ernesto; Berlanga-Llavori,
Rafael; Rebholz-Schuhmann, Dietrich Journal: BMC
Bioinformatics Vol.
10 Issue Suppl 10 DOI: 10.1186/1471-2105-10-
S10-S4 http://www.mkbergman.com/wpcontent/themes/ai3v2/images/2007Posts/070501d_SemanticSpe ctrum.png
Application vs. Reference
Ontology
Reference Ontology
Intended as an authorative source
True to the limits of what is known
Used by others
Application Ontology
Built to support a particular application (use case)
Reused rather than define terms
Skeleton structure to support application
Terms defined refine or create new concepts directly or through new classes based on inference http://www.nlm.nih.gov/research/umls/presentations/2004-medinfo_tut.pdf
The UMLS, or Unified Medical Language System
Enables Interoperability between computer systems
Files
Software that brings together many health and biomedical
vocabularies and standards
You can use the UMLS to enhance or develop applications, such as electronic health records, classification tools, dictionaries and language translators.
http://www.nlm.nih.gov/research/umls/presentations/2004-medinfo_tut.pdf
http://www.nlm.nih.gov/research/umls/quickstart.html
Unified Medical Language System
Use UMLS to link health information, medical terms, drug names, and billing codes across different computer systems.
Some examples:
Linking terms and codes between doctor, pharmacy, and insurance company
Patient care coordination among several departments within a hospital
SNOMED, ICD-9, LOINC, RxNorm
– clinical setting, more about this later in the next part of the tutorial
The UMLS has many other uses, including search engine retrieval, data mining, public health statistics reporting, and terminology research.
http://www.nlm.nih.gov/research/umls/presentations/2004-medinfo_tut.pdf
Unified Medical Language System
The UMLS has three tools, called the UMLS
Knowledge Sources:
Metathesaurus : Terms and codes from many vocabularies, including CPT®, ICD-10-CM, LOINC®,
MeSH ®, RxNorm, and SNOMED CT®
Semantic Network : Broad categories (semantic types) and their relationships (semantic relations)
SPECIALIST Lexicon and Lexical Tools : Natural language processing tools
Unified Medical Language System
NLM uses the Semantic Network and Lexical Tools to produce the Metathesaurus.
Metathesaurus production involves:
Processing the terms and codes using the Lexical Tools
Grouping synonymous terms into concepts
Categorizing concepts by semantic types from the Semantic
Network
Incorporating relationships and attributes provided by vocabularies
Releasing the data in a common format
They can be accessed separately or in any combination according to your needs.
Unified Medical Language System
The UMLS Terminology Services (UTS) provides three ways to access the UMLS:
Web Browsers You can search the data through UTS applications:
Metathesaurus Browser - Retrieve UMLS concept information, including CUIs, semantic types, and synonymous terms.
Semantic Network Browser - View the names, definitions, and hierarchical structure of the Semantic Network.
Local Installation download, customize and load into your database system, or browse your data using the MetamorphoSys
RRF browser.
Web Services APIs You can use NLM’s application programming interfaces (APIs) to query the UMLS data within your own application.
Unified Medical Language System
Request a license (FREE)
Sign up for a UMLS Terminology Services (UTS) account.
UMLS licenses are issued only to individuals
NLM is a member of the IHTSDO (owner of SNOMED CT), and there is no charge for SNOMED CT use in the United
States and other member countries. Some uses of the
UMLS may require additional agreements with individual terminology vendors.
The UTS account allows you to browse, download, and query the UMLS.
Bio
PA
X
An abstract data model for biological pathway integration
Initiative arose from the community
19
Biological Pathways of the Cell
BioPAX
What’s a pathway?
Depends on who you ask!
A series of chemical reactions catalyzed by enzymes and connected
Metabolic
Pathways The products of one are the reactants of the next
BioPAX
Level 1 e.g. Conversion, Transport 20
Biological Pathways of the Cell
BioPAX
BioPAX
Level 2
Molecular Interaction Networks
All molecular interactions are
One classification:
• Short range repulsive interactions in nature and can be described by
• Electrostatics (charge-charge
Coulombs Law: • Dipolar interactions
• Fluctuating dipole
•
•
Electrons to nuclei in atoms • Solvent, counter ion, and entropic
• Molecules to molecules in liquids and solids.
• Water and the hydrophobic effect
http://ww2.chemistry.gatech.edu/~lw26/structure/molecular_interactions/mol_int.html
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1751541/?tool=pubmed
21
Biological Pathways of the Cell
BioPAX
BioPAX
Level 3
Signaling molecules trigger cellular responses.
They bind to the cell surface causing a cascade of activation reactions:
Signaling
Pathways
A
B
C….
Adapted from Cell Signalling Biology - Michael J. Berridge - www.cellsignallingbiology.org - 2012 and http://www.hartnell.edu/tutorials/biology/signaltransduction.html
22
Biological Pathways of the Cell
BioPAX
The modulation of any of the stages of gene expression that control: which genes are switched on and off when, how long, and how much
Gene regulation may occur in the following stages of gene expression:
Transcription
Post-transcriptional modification
RNA transport
Translation mRNA degradation
Post-translational modifications
Gene
Regulation http://en.wikipedia.org/wiki/Regulation_of_gene_expression http://www.biology-online.org/dictionary/Gene_regulation
23
BioPAX
Biological Pathways of the Cell
What’s a pathway?
Depends on who you ask!
Metabolic
Pathways
BioPAX
Level 1
Molecular
Interaction
Networks
BioPAX
Level 2
Signaling
Pathways
BioPAX
Level 3
Gene
Regulation
BioPAX
Level 4
24
(
)
Instances
(Individuals)
(data) phosphoglucose isomerase
5.3.1.9
25
parts
a set of interactions how the parts are known to interact
Level 1 v1.0 (July 7th, 2004)
26
>200 DBs and tools
Application
User
Before BioPAX
Database
With BioPAX
Common “ computable semantic ” enables scientific discovery
Standard representation of gene and gene product attributes across species and databases
Structured and controlled vocabularies
Organized as 3 independent Ontologies
Molecular Interactions
Biological Processes
Cellular Location
http://bib.oxfordjournals.org/content/early/2
011/02/16/bib.bbr002.full
http://bib.oxfordjournals.org/content/early/2011/02/16/bib.bbr002.full
Annotation of genomes to enable the analysis of the genome through the annotation terms.
Why annotate?
•Assess quality of assembly
•Characterize assembly
•Identify genes/suites of genes which are of a priori interest.
•Identify genes/suites of genes which have been experimentally determined to be of interest (i.e., significantly differentially expressed).
•Gene enrichment analysis (comparison of a set of genes of interest to a null set).
Manually-assigned evidence codes fall into four general categories: experimental, computational analysis, author statements, and curatorial statements.
Adapted from: http://people.oregonstate.edu/~knausb/rna_seq/annot.pdf
Rhee, S.Y, Wood, V., Dolinski, K. and Draghici, S. 2008. Use and misuse of the gene ontology annotations. Nature Reviews Genetics 9:509-515.
See also: http://www.geneontology.org/GO.evidence.shtml
Sequence Ontology (SO) ‘terms and relationships used to describe the features and attributes of biological sequence.’ (E.g., binding_site, exon, etc.) sequence_attribute feature_attribute polymer_attribute sequence_location variant_quality sequence_collection contig_collection genome peptide_collection variant_collection sequence_feature junction region sequence_alteration sequence_variant functional_variant structural_variant
SO http://www.sequenceontology.org/
Relationship (lots!)
Directed, Acyclic Graph
‘standardizing the representation of gene and gene product attributes across species and databases’
Structured and controlled vocabularies
[1] Rhee, S.Y, Wood, V., Dolinski, K. and Draghici, S. 2008. Use and misuse of the gene ontology annotations. Nature
Reviews Genetics 9:509-515.
[2] http://people.oregonstate.edu/~knausb/rna_seq/annot.pdf
http://people.oregonstate.edu/~knausb/rna
_seq/annot.pdf
Components
•
Understand disease heterogeneity
Comprehend disease progression
•
Determine genetic and environmental contributors
Create treatments against relevant targets
drugs against relevant targets (molecular structures)
Yoga against stress
Exercise against obesity
Elimination against food intolerance or allergy
•
Develop markers to predict response
•
Identify concrete endpoints to measure response
Quickly retrieve pharmacogenomic markers of patients when needed
No central storage of data is necessary, giving patients full control over their personal health information.
http://safety-code.org/
Photo: http://www.flickr.com/photos/sepblog/4014143391/
37
Semantic Web Methodology & Technology Development Process
Fox, Peter & McGuinness, Deborah 2008 http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology
http://bioportal.bioontology.org/
Provides access to commonly used biomedical ontologies and to tools for working with them. BioPortal allows you to
Browse
the library of ontologies
mappings between terms in different ontologies
a selection of projects that use BioPortal resources
Search
biomedical resources for a term
for a term across multiple ontologies
Receive recommendations
on which ontologies are most relevant for a corpus
Annotate text
with terms from ontologies
All information available through the BioPortal Web site is also available through the NCBO Web service REST API. Please see REST API documentation for more information.
http://www.bioontology.org/wiki/index.php/NCBO_REST_services
Healthcare Singularity and the age of Semantic Medicine
2,300 years after the first report of angina for the condition to be commonly taught in medical curricula, modern discoveries are being disseminated at an increasingly rapid pace.
Focusing on the last 150 years, the trend still appears to be linear, approaching the axis around 2025.
http://research.microsoft.com/enus/collaboration/fourthparadigm/4th_paradigm_book_part2_gillam.pdf
Tutorial sources
BioPortal
W3C HCLSIG
Consortia to join
W3C HCLSIG
OpenPHACTS
Identifiers.org
Pistoia Alliance
BioPAX (check for new name)
HL-7 and RIM: http://www.w3.org/2013/HCLStutorials/RIM/#%286%29
RDF RIM Tutorial Eric Prud'hommeaux , < eric@w3.org
>
Basic understanding of the structure of how data written in
HL7's RIM can be expressed in RDF.
It is not a substitute for HL7's documentation, but instead the author's notion of a quick way to familiarize oneself with the concepts and terms used in the RIM and how the graph structure of RDF is a natural way to represent this data.
Copyright © 2013 W3C ®
( MIT , ERCIM , Keio , Beihang ) Usage policies apply .
The Translational Medicine Ontology and Knowledge
Base: driving personalized medicine by bridging the gap between bench and bedside
Luciano et al. Journal of Biomedical Semantics 2011,
2(Suppl 2):S1 http://www.jbiomedsem.com/content/2/S2/S1
Overview of selected types, subtypes
(overlap) and existential restrictions
(arrows) in the Translational Medicine
Ontology.
Translational
Medicine Ontology with mappings to ontologies and terminologies listed in the NCBO
BioPortal.
The TMO provides a global schema for
Indivo-based electronic health records (EHRs) and can be used with formalized criteria for
Alzheimer’s Disease.
The TMO maps types from Linking
Open Data sources.
(earlier work: 10 years in Software Research & Development and Product Development)
World Congress on Neural
Networks,
July 11-15, 1993,
Portland, Oregon
SIG
Mental Function and
Dysfunction
PhD
US Patents
No. 6,063,028
Awarded
Patents
Offered at
Ocean Tomo
Auction
Chicago, IL
BioPAX
EMPWR
Patents Sold to Advanced
Biological
Laboratories
Belgium
U Pitt
Greg Siegle
Collaboration
Center for
Yuezhang Proactive
Xiao
Master’s
Depression
Treatment
Thesis
(RPI)
Proposal
Approved
1995 1997 2001 2006
?
2008 2009 2010 2011 2012
1996 2000
Actively
Samson,
Mc Lean
Hospital
Depression
Modeling of Cognitive and
Brain Disorders
Poster
Presented
ISMB 1997
PSB 1998
Linked Data
W3C HCLS
BioDASH
EPOS
US Patent No.
6,317,73
Awarded
Rensselaer
(RPI)
(RPI)
Actively
SEEKING
FUNDING
Nightingale
FUNDING
Nightingale