Integrating Literature and Experimental Data Fan Meng, Ph.D. Microarray Laboratory Psychiatry Department and Molecular & Behavioral Neuroscience Institute University of Michigan High Throughput Data Analysis Overview Integrative Exploration → Hypothesis freewheeling glamorous System → Pathway/Network/Gene Set Molecular → Gene/Transcript/SNP/Genome rigid dull Raw Data: Expression/Genotype/Sequence MGREP Concept Mapping Engine Concepts Remove Common Words Single Word Variation Combine with Word Order Permutation Radix-tree Match Figure 1. Overview of our free text-to-ontology mapping method Key Idea: While classical concept match algorithms use the time consuming approach of generating concept variations during concept match, mgrep pre-generate concept variations and uses highly efficient string match algorithms to achieve two orders of magnitude increase in speed over MetaMap. Evaluation of MGREP by NCBO Precision of Mgrep and MetaMap using the 'diseases' dictionary Data Source Mgrep MetaMap Clincal Trials 0.87 0.71 Gold Miner 0.73 0.548 GEO 0.88 0.755 MedLine 0.23 0.091 Shah NH, Bhatia N, Jonquet C, Rubin D, Chiang AP, Musen MA (2009) Comparison of concept recognizers for building the Open Biomedical Annotator. BMC Bioinformatics. 2009 Sep 17;10 Suppl 9:S14. MGREP in NCBO Annotator Web Service PubAnatomy • Integrate Medline literature with external data • Enable efficient visual query • Open architecture Linking Literature and Experimental Data • Mapping Medline to brain structures • Integrating multiple data sets – Gene expression from the Allen Brain Atlas – Brain structure relationship from NeuroName – Protein-protein interaction from MiMI • Graphic presentation of data – Allen Brain Atlas – Protein-protein interaction network – Gene Co-expression network PubAnatomy Architecture • Visualization components: Flex • Server-side web services: algorithms and graphics • Backend database: Oracle Internal services algorithm I1 service I1 ithm I2 service I2 dataset I1 … PubAnatomy UI Integration Literature dataset I2 … BioNLP user selection open API … … databases algorithm U1 User plug-ins plugin U1 algorithm U2 plugin U2 dataset U1 … dataset U2 Visualization Components Server-Side Web Services Backend Database PubAnatomy Interface PubAnatomy