Semantic Web for Health Care and Biomedical Informatics Keynote at NSF Biomed Web Workshop, December 4-5, 2007 Amit P. Sheth amit.sheth@wright.edu Thanks Pablo Mendes, Satya Sahoo and Kno.e.sis team; Collaborators at Athens Heart Center (Dr. Agrawal), NLM (Olivier Bodenreider), CCRC, UGA (Will York), CCHMC (Bruce Aronow) Knowledge Enabled Information and Services Science Outline • Semantic Web – very brief intro • Scenarios to demonstrate the applications and benefit of semantic web technologies – Health care – Biomedical Research Knowledge Enabled Information and Services Science Biomedical Informatics... ...needs a connection Hypothesis Validation Experiment design Predictions Personalized medicine Biomedical Informatics Semantic Web research aims at providing this connection! Etiology Pathogenesis Clinical findings Diagnosis Prognosis Treatment Pubmed Clinical Trials.gov Medical Informatics Genome Transcriptome Proteome Genbank Metabolome Physiome ...ome Uniprot More advanced capabilities for search, integration, analysis, linking to new insights and discoveries! Bioinformatics Knowledge Enabled Information and Services Science Evolution of the Web Web as an oracle / assistant / partner - “ask to the Web” - using semantics to leverage 2007 text + data + services + people Web of people - social networks, user-created content - GeneRIF, Connotea Web of services - data = service = data, mashups - ubiquitous computing 1997 Web of databases - dynamically generated pages - web query interfaces Web of pages - text, manually created links - extensive navigation Knowledge Enabled Information and Services Science Semantic Web Enablers and Techniques • Ontology: Agreement with Common Vocabulary & Domain Knowledge; Schema + Knowledge base • Semantic Annotation (meatadata Extraction): Manual, Semi-automatic (automatic with human verification), Automatic • Reasoning/computation: semantics enabled search, integration, complex queries, analysis (paths, subgraph), pattern finding, mining, hypothesis validation, discovery, visualization Knowledge Enabled Information and Services Science Maturing capabilites and ongoing research • Text mining: Entity recognition, Relationship extraction • Integrating text, experimetal data, curated and multimedia data • Clinical and Scientific Workflows with semantic web services • Hypothesis driven retrieval of scientific literature, Undiscovered public knowledge Knowledge Enabled Information and Services Science Metadata and Ontology: Primary Semantic Web enablers Deep semantics Shallow semantics Knowledge Enabled Information and Services Science Characteristics of Semantic Web Self Describing Easy to Understand Semantic Web: Machine & IssuedThe by Human a XML, Trusted RDF & Ontology Readable Authority Convertible Can be Secured Adapted from William Ruh (CISCO) Knowledge Enabled Information and Services Science Many ontologies exist Open Biomedical Ontologies Knowledge Enabled Information and Services Science Open Biomedical Ontologies, http://obo.sourceforge.net/ Drug Ontology Hierarchy (showing is-a relationships) non_drug_ reactant interaction_ property formulary_ property formulary indication monograph _ix_class prescription _drug_ property cpnum_ group property indication_ property brandname_ individual brandname_ undeclared prescription _drug_ brand_name brandname_ composite generic_ composite prescription _drug prescription _drug_ generic owl:thing interaction interaction_ with_prescri ption_drug generic_ individual Knowledge Enabled Information and Services Science interaction_ with_non_ drug_reactant interaction_ with_mono graph_ix_cl ass N-Glycosylation metabolic pathway N-glycan_beta_GlcNAc_9 GNT-I attaches GlcNAc at position 2 N-acetyl-glucosaminyl_transferase_V N-glycan_alpha_man_4 GNT-V attaches GlcNAc at position 6 UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021 Knowledge Enabled Information and Services Science Opportunity: exploiting clinical and biomedical data binary text Scientific Literature Health Information Services PubMed 300 Documents Published Online each day Elsevier iConsult NCBI User-contributed Content (Informal) Public Datasets GeneRifs Genome, Protein DBs new sequences daily Clinical Data Personal health history Laboratory Data Lab tests, RTPCR, Mass spec Search, browsing, complex query, integration, workflow, analysis, hypothesis validation, decision support. Knowledge Enabled Information and Services Science Scenario 1: • Status: In use today • Where: Athens Heart Center • What: Use of semantic Web technologies for clinical decision support Knowledge Enabled Information and Services Science Operational since January 2006 Knowledge Enabled Information and Services Science Active Semantic Electronic Medical Records (ASEMR) Goals: • Increase efficiency with decision support • formulary, billing, reimbursement • real time chart completion • automated linking with billing • Reduce Errors, Improve Patient Satisfaction & Reporting • drug interactions, allergy, insurance • Improve Profitability Technologies: • Ontologies, semantic annotations & rules • Service Oriented Architecture Thanks -- Dr. Agrawal, Dr. Wingeth, and others. ISWC2006 paper Knowledge Enabled Information and Services Science Demonstration Knowledge Enabled Information and Services Science ASMER Efficiency Chart Completion before the preliminary deployment 600 400 Same Day 300 Back Log 200 100 0 M ar 04 M ay 04 Ju l0 Se 4 pt 04 N ov 04 Ja n 05 M ar 05 M ay 05 Ju l0 5 Chart Completion after the preliminary deployment Charts 04 Ja n Charts 500 700 600 500 400 300 200 100 0 Month/Year Same Day Back Log Sept 05 Nov 05 Jan 06 Mar 06 Month/Year Knowledge Enabled Information and Services Science Scenario 2: • Status: Demonstration • Where: W3C Health Care and Life Sciences (HCLS) interest group • What: Using semantic web to aggregate and query data about Alzheimer’s • http://www.w3.org/2001/sw/hcls/ Knowledge Enabled Information and Services Science Scenario 2: Scientific Data Sets for Alzheimer’s Knowledge Enabled Information and Services Science SPARQL Query spanning multiple sources Knowledge Enabled Information and Services Science Scenario 3 • Status: Completed research • Where: NIH • What: Understanding the genetic basis of nicotine dependence. Integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. • How: Semantic Web technologies (especially RDF, OWL, and SPARQL) support information integration and make it easy to create semantic mashups (semantically integrated resources). Knowledge Enabled Information and Services Science Motivation • NIDA study on nicotine dependency • List of candidate genes in humans • Analysis objectives include: o Find interactions between genes o Identification of active genes – maximum number of pathways o Identification of genes based on anatomical locations • Requires integration of genome and biological pathway information Knowledge Enabled Information and Services Science Genome and pathway information integration Reactome KEGG •pathway •pathway •protein •protein •pmid •pmid HumanCyc •pathway •protein •pmid Entrez Gene •GO ID •HomoloGene ID GeneOntology HomoloGene Knowledge Enabled Information and Services Science JBI Knowledge Enabled Information and Services Science Entrez Knowledge Model (EKoM) BioPAX ontology Knowledge Enabled Information and Services Science Deductive Reasoning Protein-Protein Interaction RULE: given that two genes interact with each other, given certain number of parameters being met, we can assert that the gene products also interact with each other IF (x have_common_pathway y) AND (x rdf:type gene) AND (y rdf:type gene) AND (x has_product m) AND (y has_product n) AND (m rdf:type gene_product) AND (n rdf:type gene_product) THEN (m ? n) has_product gene_product associated_with associated_with has_product gene1 database_identifier 2 interacts_with have_common_pathway gene2 gene_product database_identifier 1 Knowledge Enabled Information and Services Science Scenario 4 • Status: Completed research • Where: NIH • What: queries across integrated data sources – Enriching data with ontologies for integration, querying, and automation – Ontologies beyond vocabularies: the power of relationships Knowledge Enabled Information and Services Science Use data to test hypothesis Link between glycosyltransferase activity and congenital muscular dystrophy? Gene name Interactions Glycosyltransferase GO gene Sequence PubMed OMIM Congenital muscular dystrophy Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07 Knowledge Enabled Information and Services Science In a Web pages world… (GeneID: 9215) has_associated_disease Congenital muscular dystrophy, type 1D has_molecular_function Acetylglucosaminyltransferase activity Knowledge Enabled and Services Science at HCLS Workshop, WWW07 Adapted from: Information Olivier Bodenreider, presentation With the semantically enhanced data SELECT DISTINCT ?t ?g ?d { ?t is_a GO:0016757 . GO:0016757 ?g has molecular functionglycosyltransferase ?t . ?g has_associated_phenotype ?b2 . isa ?b2 has_textual_description ?d . FILTER (?d, “muscular distrophy”, “i”) . GO:0008194 FILTER (?d, “congenital”,GO:0016758 “i”) } GO:0008375 acetylglucosaminyltransferase GO:0008375 acetylglucosaminyltransferase MIM:608840 Muscular dystrophy, congenital, type 1D has_molecular_function LARGE EG:9215 has_associated_phenotype From medinfo paper. Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07 Knowledge Enabled Information and Services Science Scenario 5 • Status: Research prototype and in progress • Workflow withSemantic Annotation of Experimental Data already in use • Where: UGA • What: – Knowledge driven query formulation – Semantic Problem Solving Environment (PSE) for Trypanosoma cruzi (Chagas Disease) Knowledge Enabled Information and Services Science Knowledge driven query formulation Complex queries can also include: - on-the-fly Web services execution to retrieve additional data - inference rules to make implicit knowledge explicit Knowledge Enabled Information and Services Science T.Cruzi PSE Query Interface Figure 4: Semantic annotation of ms scientific data Knowledge Enabled Information and Services Science N-Glycosylation Process (NGP) Cell Culture extract Glycoprotein Fraction proteolysis Glycopeptides Fraction 1 n Separation technique I Glycopeptides Fraction n PNGase Peptide Fraction Separation technique II n*m Peptide Fraction Mass spectrometry ms data ms/ms data Data reduction ms peaklist Data reduction ms/ms peaklist binning Glycopeptide identification and quantification N-dimensional array Peptide identification Peptide list Data correlation Knowledge Enabled Information and Services Science Signal integration Semantic Web Process to incorporate provenance Agent Biological Sample Analysis by MS/MS O Semantic Annotation Applications Agent Raw Data to Standard Format I Data Preprocess O Raw Data Agent I Standard Format Data (Mascot/ Sequest) O Filtered Data Agent DB Search I Search Results O Final Output Storage Biological Information Knowledge Enabled Information and Services Science Results Postprocess (ProValt) I O ProPreO: Ontology-mediated provenance parent ion charge 830.9570 194.9604 2 580.2985 0.3592 parent ion m/z 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 fragment ion m/z 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 parent ion abundance fragment ion abundance ms/ms peaklist data Mass Spectrometry (MS) Data Knowledge Enabled Information and Services Science ProPreO: Ontology-mediated provenance <ms-ms_peak_list> <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” mode=“ms-ms”/> <parent_ion m-z=“830.9570” abundance=“194.9604” z=“2”/> <fragment_ion m-z=“580.2985” abundance=“0.3592”/> <fragment_ion m-z=“688.3214” abundance=“0.2526”/> <fragment_ion m-z=“779.4759” abundance=“38.4939”/> Ontological <fragment_ion m-z=“784.3607” abundance=“21.7736”/> Concepts <fragment_ion m-z=“1543.7476” abundance=“1.3822”/> <fragment_ion m-z=“1544.7595” abundance=“2.9977”/> <fragment_ion m-z=“1562.8113” abundance=“37.4790”/> <fragment_ion m-z=“1660.7776” abundance=“476.5043”/> </ms-ms_peak_list> Semantically Annotated MS Data Knowledge Enabled Information and Services Science Scenario 6 • When: Research in progress • Where: Athens Heart Center and Cincinatti Children’s Hospital Medical Center • What: scientific literature mining – Dealing with unstructured information – Extracting knowledge from text – Complex entity recognition – Relationship extraction Knowledge Enabled Information and Services Science Heart Failure Clinical Pathway Disease causes Angiotension Receptor Blocker (ARB) Ontology: A Framework for Schema-Driven Relationship Discovery from Unstructured Text, Ramakrishnan, et. al., ISWC 2006, LNCS 4273, pp. 583-596 Knowledge Enabled Information and Services Science Contextual delivery of information Knowledge Enabled Information and Services Science • Two technical challenges – Text mining – Workflow adaptation Knowledge Enabled Information and Services Science Extracting the Relationship Diabetes mellitus adversely affects the outcomes in patients with myocardial infarction (MI), due in part to the exacerbation of left ventricular (LV) remodeling. Although angiotensin II type 1 receptor blocker (ARB) has been demonstrated to be effective in the treatment of heart failure, information about the potential benefits of ARB on advanced LV failure associated with diabetes is lacking. To induce diabetes, male mice were injected intraperitoneally with streptozotocin (200 mg/kg). At 2 weeks, anterior MI was created by ligating the left coronary artery. These animals received treatment with olmesartan (0.1 mg/kg/day; n = 50) or vehicle (n = 51) for 4 weeks. Diabetes worsened the survival and exaggerated echocardiographic LV dilatation and dysfunction in MI. Treatment of diabetic MI mice with olmesartan significantly improved the survival rate (42% versus 27%, P < 0.05) without affecting blood glucose, arterial blood pressure, or infarct size. It also attenuated LV dysfunction in diabetic MI. Likewise, olmesartan attenuated myocyte hypertrophy, interstitial fibrosis, and the number of apoptotic cells in the noninfarcted LV from diabetic MI. Post-MI LV remodeling and failure in diabetes were ameliorated by ARB, providing further evidence that angiotensin II plays a pivotal role in the exacerbated heart failure after diabetic MI. ARB causes heart failure Angiotensin II type 1 receptor blocker attenuates exacerbated left ventricular remodeling and failure in diabetes-associated myocardial infarction., Matsusaka H, et. al. Knowledge Enabled Information and Services Science Problem – Extracting relationships between MeSH terms from PubMed Biologically active substance UMLS Semantic Network complicates affects causes causes Lipid affects Disease or Syndrome instance_of instance_of ??????? Fish Oils Raynaud’s Disease MeSH 9284 documents 5 documents Knowledge Enabled Information and Services Science 4733 documents PubMed Background knowledge used • UMLS – A high level schema of the biomedical domain – 136 classes and 49 relationships – Synonyms of all relationship – using variant lookup (tools from NLM) – 49 relationship + their synonyms = ~350 mostly verbs • MeSH – 22,000+ topics organized as a forest of 16 trees – Used to query PubMed • PubMed T147—effect T147—induce T147—etiology T147—cause T147—effecting T147—induced – Over 16 million abstract – Abstracts annotated with one or more MeSH terms Knowledge Enabled Information and Services Science Method – Parse Sentences in PubMed SS-Tagger (University of Tokyo) SS-Parser (University of Tokyo) • Entities (MeSH terms) in sentences occur in modified forms • “adenomatous” modifies “hyperplasia” (TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ • “An excessive endogenous or exogenous modifies exogenous) ) (NN stimulation) ) (PP (IN by) (NPstimulation” (NN estrogen) ) ) ) (VP (VBZ “estrogen” induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT • Entities can also occur) as of 2 or more other entities the) (NN endometrium) ) ) composites ))) • “adenomatous hyperplasia” and “endometrium” occur as “adenomatous hyperplasia of the endometrium” Knowledge Enabled Information and Services Science Method – Identify entities and Relationships in Parse Tree Modifiers Modified entities Composite Entities TOP S VP NP VBZ PP NP DT the JJ excessive JJ endogenous IN by ADJP NP induces NN estrogen NP NN stimulation JJ adenomatous CC or PP NN hyperplasia IN of NP JJ exogenous DT the Knowledge Enabled Information and Services Science NN endometrium • What can we do with the extracted knowledge? • Semantic browser demo Knowledge Enabled Information and Services Science Evaluating hypotheses Migraine affects Magnesium Stress inhibit Patient isa Calcium Channel Blockers Complex Query Keyword query: Migraine[MH] + Magnesium[MH] PubMed Supporting Document sets retrieved Knowledge Enabled Information and Services Science Workflow Adaptation: Why and How • Volatile nature of execution environments – May have an impact on multiple activities/ tasks in the workflow • HF Pathway – New information about diseases, drugs becomes available – Affects treatment plans, drug-drug interactions • Need to incorporate the new knowledge into execution – capture the constraints and relationships between different tasks activities Knowledge Enabled Information and Services Science Workflow Adaptation Why? New knowledge about treatment found during the execution of the pathway New knowledge about drugs, drug drug interactions Knowledge Enabled Information and Services Science Workflow Adaptation: How • Decision theoretic approaches – Markov Decision Processes • Given the state S of the workflow when an event E occurs – What is the optimal path to a goal state G – Greedy approaches rely on local optimization • Need to choose actions based on optimality across the entire horizon, not just the current best action – Model the horizon and use MDP to find the best path to a goal state Knowledge Enabled Information and Services Science Conclusion • semantic web technologies can help with: – Fusion of data: semi-structured, structured, experimental, literature, multimedia – Analysis and mining of data, extraction, annotation, capture provenance of data through annotation, workflows with SWS – Querying of data at different levels of granularity, complex queries, knowledge-driven query interface – Perform inference across data sets Knowledge Enabled Information and Services Science Take home points • Shift of paradigm: from browsing to querying • Machine understanding: – extracting knowledge from text – Inference, software interoperation • Semantic-enabled interfaces towards hypothesis validation Knowledge Enabled Information and Services Science References 1. 2. 3. 4. 5. 6. • A. Sheth, S. Agrawal, J. Lathem, N. Oldham, H. Wingate, P. Yadav, and K. Gallagher, Active Semantic Electronic Medical Record, Intl Semantic Web Conference, 2006. Satya Sahoo, Olivier Bodenreider, Kelly Zeng, and Amit Sheth, An Experiment in Integrating Large Biomedical Knowledge Resources with RDF: Application to Associating Genotype and Phenotype Information WWW2007 HCLS Workshop, May 2007. Satya S. Sahoo, Kelly Zeng, Olivier Bodenreider, and Amit Sheth, From "Glycosyltransferase to Congenital Muscular Dystrophy: Integrating Knowledge from NCBI Entrez Gene and the Gene Ontology, Amsterdam: IOS, August 2007, PMID: 17911917, pp. 1260-4 Satya S. Sahoo, Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner , Amit P. Sheth, An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence, submitted, 2007. Cartic Ramakrishnan, Krzysztof J. Kochut, and Amit Sheth, "A Framework for Schema-Driven Relationship Discovery from Unstructured Text", Intl Semantic Web Conference, 2006, pp. 583596 Satya S. Sahoo, Christopher Thomas, Amit Sheth, William S. York, and Samir Tartir, "Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies", 15th International World Wide Web Conference (WWW2006), Edinburgh, Scotland, May 23-26, 2006. Demos at: http://knoesis.wright.edu/library/demos/ Knowledge Enabled Information and Services Science