Harnessing the Semantic Web to Answer Scientific Questions: A Health Care and Life Sciences Interest Group demo Susie Stephens, Principal Research Scientist, Lilly Agenda • Health Care and Life Sciences Interest Group • Scientific Use Case • Technological Approach • Demonstration • Benefits of the Semantic web Health Care & Life Sciences Interest Group HCLSIG is chartered to develop and support the use of Semantic Web technologies and practices to improve collaboration, research and development, and innovation adoption in the Health Care and Life Science domains More details on HCLS are available at: http://www.w3.org/2001/sw/hcls/ Benefits of Semantic Web Technologies • Fusion of data across many scientific disciplines • Easier recombination of data • Querying of data at different levels of granularity • Capture provenance of data through annotation • Perform inference across data sets • Machine processable approach • Data can be assessed for inconsistencies Scientific Use Case • Use case focuses on Alzheimer’s Disease • AD is a devastating illness that impacts 26.6 million people worldwide • Prevalence is predicted to quadruple to 106.8 million by 2050 • Many different types of evidence need to be integrated • An active Web community exists for AD research Scientific Data Sets • Integration and analysis of heterogeneous data sets • Hypothesis, Genome, Pathways, Molecular Properties, Disease, etc. PDSPki Gene Ontology NeuronDB Reactome BAMS Antibodies NC Annotations Entrez Gene Allen Brain Atlas BrainPharm MESH Mammalian Phenotype SWAN AlzGene PubChem Homologene Publications Scientific Hypothesis & Research Questions Scientific Hypothesis: - Amyloid beta peptide may impair memory by inhibiting long-term potentiation (LTP) Research Questions: - By what mechanism does amyloid beta inhibit LTP? - Can we identify a novel therapeutic target based on this mechanism? - How can we validate the therapeutic target? Technological Approach • Careful modeling that reflect biology to enable integration of data sources • All bio-entities were assigned URIs • Most data translated to RDF and managed in a triple store • Other data maintained in original store and mapped to RDF • Using a reasoner to infer triples to increase expressiveness of queries • Query data with SPARQL and visualization tools Conclusions • Semantic Web provides ability to query across many disparate data sources to discover new insights • Potential to identify patterns and insights across many data sources • Data needs to be carefully modeled • Flexible re-use of data, which is important in a discipline where knowledge is frequently updated Acknowledgements HCLS Demo Contributors HCLS Demo Contributors • John Barkley (NIST) • Susie Stephens (Eli Lilly) • Olivier Bodenreider (NLM, NIH) • Mike Travers ( • Bill Bug (Drexel University College of Medicine) • Gwen Wong (SWAN) • Huajun Chen (Zhejiang University) • Elizabeth Wu (SWAN) • Paolo Ciccarese (SWAN) • Kei Cheung (SenseLab, Yale) Data Providers •Tim Clark (SWAN) • Judith Blake (MGD.) • Don Doherty (Brainstage Research Inc.) • Mikail Bota (BAMS) • Kerstin Forsberg (AstraZeneca) • David Hill (MGD) • Ray Hookaway (HP) • Oliver Hoffman (CL) • Vipul Kashyap (Partners Healthcare) • Minna Lehvaslaiho (CL) • June Kinoshita (AlzForum) • Colin Knep (Alzforum) • Joanne Luciano (Harvard Medical School) • Maryanne Martone (CCDB) • Scott Marshall (University of Amsterdam) • Susan McClatchy (MGD) • Chris Mungall (NCBO) • Simon Twigger (RGD) • Eric Neumann (Teranode) • Allen Brain Institute • Eric Prud’hommeaux (W3C) • Jonathan Rees (Science Commons) • Alan Ruttenberg (Science Commons) • Matthias Samwald (Medical University of Vienna) Vendor Support • OpenLink - Kingsley Idehen, Ivan Mikhailov, Orri Erling, Mitko Iliev • HP - Ray Hookaway, Jeannine Crockford NeuronDB PDSPki GO Reactome Genes/proteins Interactions Cellular location Processes (GO) Molecular function Cell components Biological process Annotation gene PubMedID Antibodies Genes Antibodies NC Annotations Genes/Proteins Processes Cells (maybe) PubMed ID PubMedID Hypothesis Questions Evidence Genes SWAN Proteins Chemicals Neurotransmitters Entrez Gene Genes Protein GO pubmedID Interaction (g/p) Chromosome C. location Allen Brain Atlas Genes Brain images Gross anatomy -> neuroanatomy BAMS Protein Neuroanatomy Cells Metabolites (channels) PubmedID MESH Drugs Anatomy Phenotypes Compounds Chemicals PubMedID PubChem Genes Phenotypes Disease PubMedID Genes Mammalian Species Phenotype Gene Orthologies Polymorphism Proofs Population Alz Diagnosis Homologene AlzGene Protein (channels/receptors) Neurotransmitters Neuroanatomy Cell Compartments Currents BrainPharm Drug Drug effect Pathological agent Phenotype Receptors Channels Cell types pubMedID Disease Name Structure Properties Mesh term PubChem