Eines bioinformàtiques i estadístiques per a la investigació biomèdica Anàlisi de dades amb Ingenuity Pathways Alex Sánchez Unitat d’Estadística i Bioinformàtica We are drowning in information and starved for knowledge John Naisbitt Who on efficient work is bent, Must choose the fittest instrument. Goehthe (Fausto) 07/07/2010 Esquema de la presentación • Más allá de los microarrays… • Ingenuity Pathways Analysis – Visión general – Componentes – Tipos de estudios • Ejemplos de uso – Exploración y búsqueda de información – Análisis de datos 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 3 Más allá de los microarrays … Un experimento con microarrays... Listas de identificadores (genes, miRNAs, …) seleccionados 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 5 So Where do we go from here? Or, How To Drive A Biologist Crazy? • • • • • • • • gi|84939483 gi|39893845 gi|27394934 gi|18890092 gi|10192893 gi|11243007 gi|20119252 gi|19748300 07/07/2010 • • • • • • • • gi|44308356 gi|50021874 gi|10003001 gi|27762947 gi|24537303 gi|27284958 gi|37373499 … Ted Slater Proteomics Center of Emphasis Pfizer Gobal R&D Michigan De las listas a la Biologia • Enfoque tradicional para el análisis de las listas de genes: de uno en uno – Literatura, bases de datos, ... • Problema: – Tarea lenta, tediosa y, lo que es peor ... – Ignora posibles interacciones • Enfoque alternativo: Análisis Funcional o de “Significación Biológica”. 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 7 Los métodos de Análisis Funcional • Son métodos automáticos para – Identificar procesos biológicos asociados con los resultados experimentales. – Determinar los temas funcionales comunes a grupos de genes seleccionados. – Analizar las conexiones entre genes, moléculas y enfermedades mediante la exploración automática de la literatura para descubrir asociaciones relevantes con los resultados experimentales. • Facilitan el uso de información auxiliar. • Ayudan a entender los fenómenos biológicos subyacentes. 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 8 Herramientas de Análisis Funcional • Docenas de programas en los últimos 10 años http://estbioinfo.stat.ub.es/resources/index.html • Estudio directo de las listas de genes – Basadas en GO u otras BD (KEGG,...) • fatiGO, DAVID, GSEA, Babelomics ... [SerbGO] • Ingenuity Pathways Analysis • Exploracion de relaciones en la literatura – PubMed, Scopus, HighWire, GOPubMed, … – Ingenuity Pathways Analysis • Estudio de pathways asociados con las listas – Pathway Explorer, GenMapp, – Ingenuity Pathway Analysis 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 9 Cursos y materiales • CNIO – 4th Course on Functional Analysis of Gene Expression • Canadian Bioinformatics Workshop – Interpreting gene lists from omics sets • EADGENE and SABRE – Post-analyses Workshop 07/07/2010 Ejemplos de Análisis Funcional Ejemplo 1 • The Polycomb group protein EZH2 is involved in progression of prostate cancer (Nature, 419 (10) 624-629) – Varambally et al. (2002) estudian las diferencias entre cancer de prostata localizado (PCA) y metastático (MET) • EZH2 sobreexpresado en MET • Los casos de PCA con EZH2 alto peor prognosis – Sugieren que EZH2 puede • Estar implicado en la progresión PCAMET • Distinguir el PCA benigno del de mal pronóstico. 07/07/2010 Ejemplo 1 • Análisis de microarrays – Listas de genes up (55) y down (438) reg. • Un análisis funcional permitirá estudiar – Que procesos biológicos (pathways) estan relacionados con los genes de las listas • Bases de datos de anotaciones – Que funciones se presentan en las listas con una frecuencia distinta de la de todos los genes estudiados • Análisis de enriquecimiento – Las herramientas disponibles en Babelomics son una buena opción para este análisis. 07/07/2010 Ejemplo 2 – De genes a Pathways 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 14 Los genes se agrupan por funciones 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 15 Las funciones se asocian a pathways 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 16 Los cambios de expresión se proyectan en el pathway 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 17 Introducción “The Ingenuity View” Ingenuity Pathways Analysis • Ingenuity Pathways Analysis (IPA) is an all-in-one software application that enables researchers to model, analyze, and understand the complex biological and chemical systems at the core of life science research 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 19 IPA Challenge Integrate, Interpret, Gain Therapeutic Insight from Experimental Data Disease Processes Cellular Processes Molecules Experimental Platforms Disease/physiologi cal response Cancer Apoptosis FAS Angiogenesis VEGFA bevacizumab Overlapping cellular processes/pathway s Molecular Interactions Molecular Perturbation 07/07/2010 IPA Challenge Gain Rapid Understanding of Experimental Systems Cancer Disease Processes Cellular Processes Molecules Experimental Platforms 07/07/2010 Apoptosis FAS Angiogenesis VEGFA bevacizumab Search for genes implicated in disease Identify related cellular processes/pathway s Generate hypothesis Guide in vivo/in vitro assays Ingenuity Platform Ingenuity Knowledge Base Content • Findings manually extracted from full text Ontology • Designed to enable computation • Extensive libraries of metabolic and signaling pathways • Consists of biological objects and processes in organized into major branches • Chemical and drug information • Robust, up-to-date synonym library • Scalable best-in-class content acquisition processes • Knowledge infrastructure tools and processes for structuring biological and chemical knowledge 07/07/2010 Ingenuity Knowledge Base: Content Expert Extraction: Full text from top journals • Coverage of peer-reviewed journals, plus review articles and textbooks • Manually extracted by Ph.D. scientists Import Annotations, Findings: • OMIM, GO, Entrez Gene • Tissue and Fluid Expression Location • Molecular Interactions (e.g. BIND, DIP, TarBase) Internally curated knowledge: • Signaling & Metabolic Pathways • Drug/Target/Disease relationships • Toxicity Lists All findings are structured for computation and updated regularly 07/07/2010 17/06/2009 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 24 Como trabajan juntos 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 25 Tipos de análisis 07/07/2010 Preguntas y respuestas 07/07/2010 Instalación, acceso y uso Instalación y puesta en marcha • IPA funciona en línea. – No se instala. Tan sólo se accede a él • Para utilizarlo se necesita una cuenta – Prueba (15 días). – Acceso (IRHUVH y HVH) mediante reserva previa a la UEB y en horario de mañana o tarde. • Funciona en Windows o Mac, pero no en Linux 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 29 Requisitos del sistema 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 30 Acceso 07/07/2010 Formas de arrancar IPA 07/07/2010 El entorno de Ingenuity Pantallas, menús, ayudas 07/07/2010 Pantalla de inicio rápido 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 34 Gestor de proyectos 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 35 Barra de búsqueda 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 36 Ayuda (1) Sistemas de ayuda 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 37 Ayuda (2) Tutoriales 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 38 Ayuda (3). Workflows 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 39 Programa de formación 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 40 Capacidades básicas del programa Búsqueda, Análisis, Comunicación 17/06/2009 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 42 Search & Explore 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 43 Search and Explore Biological & Chemical Knowledge 07/07/2010 Gene View / Chem View 07/07/2010 Dynamic Signalling & Metabolic Pathways 07/07/2010 My Pathway & Lists • Build custom libraries of pathways representing mechanism of action and mechanism of toxicity. Create custom, literaturesupported signaling pathways with proteins of interest. Store collections of custom pathways and lists for subsequent core, IPA-Tox™, IPA-Biomarker™, or IPA-Metabolomics™ analyses. • Use the Grow and Connect tools to edit and expand networks based on the molecular relationships most relevant to the project: – – – – – Transcriptional networks Phosphorylation cascades Protein-Protein or Protein-DNA interaction networks microRNA-mRNA target networks Chemical effects on proteins • Use Search results as building blocks for custom pathways – Identify cross-talk between biological processes and pathways – Understand whether gene lists and signatures are tightly connected at the molecular level 07/07/2010 Path Explorer Path Designer 07/07/2010 Analyze & Interpret data • • • • IPA IPA IPA IPA 07/07/2010 Core Analysis Tox Analysis Biomarker Analysis Metabolomic Analysis Alex Sánchez. Unitat d'Estadística i Bioinformatica 49 07/07/2010 IPA Core Analysis 07/07/2010 IPA-Biomarker™ Analysis • IPA-Biomarker identifies the most promising and relevant biomarker candidates within experimental data. – Prioritize molecular biomarker candidates based on key biological characteristics. – Elucidate mechanism linking potential markers to a disease or biological process of interest. – Perform analysis across biomarker lists to find biomarker candidates unique to a disease stage or common across all stages. – Understand the molecular differences between patient populations. 07/07/2010 IPA.Tox Analysis • IPA-Tox delivers a focused toxicity and safety assessment of candidate compounds. – Enables assessment of the toxicity and safety of compounds early in the development process. – Provides expert molecular toxicology data interpretation to non-expert users. – Reveals clinical pathology endpoints associated with a dataset. – Generates new hypotheses that may not have been revealed using traditional toxicology approaches. – Elucidates mechanism of toxicity and identify potential markers of toxicity. 07/07/2010 IPA-Metabolomics Analysis • IPA-Metabolomics extracts rich pathway information from metabolomics data. – Overcomes the metabolomics data analysis challenge by integrating transcriptomics, proteomics, and metabolomics data to enable a complete systems biology approach. – Provides the critical context necessary to gain insights into cell physiology and metabolism from metabolite data. 07/07/2010 Communicate & Collaborate • • • • Share Report Interactive Pathways Integrate with other software 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 55 Resumen y recapitulación Resumen • El análisis funcional mejora la comprensión de los fenómenos biológicos mediante el estudio simultáneo de grupos de valores. • Ingenuity Pathways permite – Explorar – Analizar – Comunicar y compartir 07/07/2010 Ventajas e inconvenientes Intuitivo y fácil de usar Integración de todas las funciones Muy potente en humanos y cáncer No tan potente en otras especies o enfermedades. No es libre sino que hay que pagarlo No incorpora algoritmos avanzados potentes como GSEA 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 58 Networks and Pathways in IPA Networks • A network is a set of terms (“nodes”) related by a set of relations (“edges”). • IPA transforms a list of genes into a set of relevant networks based on information maintained in the Ingenuity Pathways Knowledge Base (IPKB) • This knowledge base has been abstracted into a large network, called the Global Molecular Network, composed of thousands of genes and gene products that interact with each other. 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 60 A network in IPA 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 61 Networks in IPA • Purpose: – To show as many interactions between user-specified molecules in a given dataset and how they might work together at the molecular level • Why are Ingenuity networks biologically interesting? – Highly-interconnected networks are likely to represent significant biological function 07/07/2010 Key Terminology • Focus Molecule: – Molecules that are from uploaded list, pass filters are applied, and are available for generating networks • Networks: – Generated de novo based upon input data – Do not have directionality – Contain molecules involved in a variety of Canonical Pathways • Canonical Pathways (Signaling and Metabolic) – Are generated prior to data input, based on the literature – Do NOT change upon data input – Do have directionality (proceed “from A to Z”) • My Pathways and Path Designer Pathways – Custom built pathways manually created based on user input 07/07/2010 Viewing networks 07/07/2010 How Networks Are Generated 1. Focus molecules are “seeds” 2. Focus molecules with the most interactions to other focus molecules are then connected together to form a network 3. Non-focus molecules from the dataset are then added 4. Molecules from the Ingenuity’s KB are added 5. Resulting Networks are scored and then sorted based on the score 35 molecules per network for visualization purposes 07/07/2010 Calculation of Score for Networks in IPA • Based on the Right-tailed Fisher's Exact Test • Used as a means to rank/sort networks so that those with the most focus molecules are at the top of the list • Takes into account the number of focus molecules in the network and the size of the network • Not an indication of the quality or biological significance of the network 07/07/2010 Network notation (1) (Help legend) 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 67 Network notation (2) 07/07/2010 Alex Sánchez. Unitat d'Estadística i Bioinformatica 68 Significance Calculations • Measures the likelihood that a function is over-represented by the molecules in your dataset • Expressed as a p-value calculated by using the right-tailed Fisher's Exact Test • Range indicates most significant low level function to least significant low-level function 07/07/2010 Multiple Testing Correction •Benjamini-Hochberg method of multiple testing correction •Calculates False Discovery Rate – Threshold indicates the fraction of false positives among significant functions 0 0.05 5% (1/20) may be a false positive 07/07/2010 1.0 Which p-value calculation should I use? •What is the significance of function X relative to the dataset? – Use right-tailed Fisher’s Exact test result •What is the significance of function X relative to all the other functions in the dataset? – Use Benjamini-Hochberg multiple testing correction 07/07/2010