Diapositiva 1

advertisement
Eines bioinformàtiques i estadístiques per
a la investigació biomèdica
Anàlisi de dades amb Ingenuity Pathways
Alex Sánchez
Unitat d’Estadística i Bioinformàtica
We are drowning in information and
starved for knowledge
John Naisbitt
Who on efficient work is bent,
Must choose the fittest instrument.
Goehthe (Fausto)
07/07/2010
Esquema de la presentación
• Más allá de los microarrays…
• Ingenuity Pathways Analysis
– Visión general
– Componentes
– Tipos de estudios
• Ejemplos de uso
– Exploración y búsqueda de información
– Análisis de datos
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
3
Más allá de los microarrays …
Un experimento con microarrays...
Listas de identificadores (genes, miRNAs, …) seleccionados
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
5
So Where do we go from here?
Or, How To Drive A Biologist Crazy?
•
•
•
•
•
•
•
•
gi|84939483
gi|39893845
gi|27394934
gi|18890092
gi|10192893
gi|11243007
gi|20119252
gi|19748300
07/07/2010








•
•
•
•
•
•
•
•
gi|44308356
gi|50021874
gi|10003001
gi|27762947
gi|24537303
gi|27284958
gi|37373499
…
Ted Slater
Proteomics Center of Emphasis
Pfizer Gobal R&D Michigan







De las listas a la Biologia
• Enfoque tradicional para el análisis de las
listas de genes: de uno en uno
– Literatura, bases de datos, ...
• Problema:
– Tarea lenta, tediosa y, lo que es peor ...
– Ignora posibles interacciones
• Enfoque alternativo: Análisis Funcional o
de “Significación Biológica”.
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
7
Los métodos de Análisis Funcional
• Son métodos automáticos para
– Identificar procesos biológicos asociados con los
resultados experimentales.
– Determinar los temas funcionales comunes a grupos
de genes seleccionados.
– Analizar las conexiones entre genes, moléculas y
enfermedades mediante la exploración automática
de la literatura para descubrir asociaciones
relevantes con los resultados experimentales.
• Facilitan el uso de información auxiliar.
• Ayudan a entender los fenómenos biológicos
subyacentes.
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
8
Herramientas de Análisis Funcional
• Docenas de programas en los últimos 10 años
http://estbioinfo.stat.ub.es/resources/index.html
• Estudio directo de las listas de genes
– Basadas en GO u otras BD (KEGG,...) 
• fatiGO, DAVID, GSEA, Babelomics ... [SerbGO]
• Ingenuity Pathways Analysis
• Exploracion de relaciones en la literatura
– PubMed, Scopus, HighWire, GOPubMed, …
– Ingenuity Pathways Analysis
• Estudio de pathways asociados con las listas
– Pathway Explorer, GenMapp,
– Ingenuity Pathway Analysis
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
9
Cursos y materiales
• CNIO
– 4th Course on Functional Analysis of Gene
Expression
• Canadian Bioinformatics Workshop
– Interpreting gene lists from omics sets
• EADGENE and SABRE
– Post-analyses Workshop
07/07/2010
Ejemplos de Análisis Funcional
Ejemplo 1
• The Polycomb group protein EZH2 is
involved in progression of prostate
cancer (Nature, 419 (10) 624-629)
– Varambally et al. (2002) estudian las
diferencias entre cancer de prostata
localizado (PCA) y metastático (MET)
• EZH2 sobreexpresado en MET
• Los casos de PCA con EZH2 alto  peor prognosis
– Sugieren que EZH2 puede
• Estar implicado en la progresión PCAMET
• Distinguir el PCA benigno del de mal pronóstico.
07/07/2010
Ejemplo 1
• Análisis de microarrays 
– Listas de genes up (55) y down (438) reg.
• Un análisis funcional permitirá estudiar
– Que procesos biológicos (pathways) estan
relacionados con los genes de las listas
• Bases de datos de anotaciones
– Que funciones se presentan en las listas con
una frecuencia distinta de la de todos los
genes estudiados
• Análisis de enriquecimiento
– Las herramientas disponibles en Babelomics
son una buena opción para este análisis.
07/07/2010
Ejemplo 2 – De genes a Pathways
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
14
Los genes se agrupan por funciones
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
15
Las funciones se asocian a pathways
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
16
Los cambios de expresión se
proyectan en el pathway
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
17
Introducción
“The Ingenuity View”
Ingenuity Pathways Analysis
• Ingenuity Pathways Analysis (IPA) is an all-in-one
software application that enables researchers to model,
analyze, and understand the complex biological and
chemical systems at the core of life science research
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
19
IPA Challenge
Integrate, Interpret, Gain
Therapeutic Insight from Experimental Data
Disease
Processes
Cellular
Processes
Molecules
Experimental
Platforms
Disease/physiologi
cal response
Cancer
Apoptosis
FAS
Angiogenesis
VEGFA
bevacizumab
Overlapping
cellular
processes/pathway
s
Molecular
Interactions
Molecular
Perturbation
07/07/2010
IPA Challenge
Gain Rapid Understanding of
Experimental Systems
Cancer
Disease
Processes
Cellular
Processes
Molecules
Experimental
Platforms
07/07/2010
Apoptosis
FAS
Angiogenesis
VEGFA
bevacizumab
Search for genes
implicated in
disease
Identify related
cellular
processes/pathway
s
Generate
hypothesis
Guide in vivo/in
vitro assays
Ingenuity Platform
Ingenuity Knowledge Base
Content
• Findings manually extracted from
full text
Ontology
• Designed to enable computation
• Extensive libraries of metabolic
and signaling pathways
• Consists of biological objects and
processes in organized into major
branches
• Chemical and drug information
• Robust, up-to-date synonym library
• Scalable best-in-class content
acquisition processes
• Knowledge infrastructure tools and
processes for structuring biological
and chemical knowledge
07/07/2010
Ingenuity Knowledge Base:
Content
Expert Extraction: Full text from top journals
• Coverage of peer-reviewed journals, plus review
articles and textbooks
• Manually extracted by Ph.D. scientists
Import Annotations, Findings:
• OMIM, GO, Entrez Gene
• Tissue and Fluid Expression Location
• Molecular Interactions (e.g. BIND, DIP, TarBase)
Internally curated knowledge:
• Signaling & Metabolic Pathways
• Drug/Target/Disease relationships
• Toxicity Lists
All findings are structured for computation
and updated regularly
07/07/2010
17/06/2009
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
24
Como trabajan juntos
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
25
Tipos de análisis
07/07/2010
Preguntas y respuestas
07/07/2010
Instalación, acceso y uso
Instalación y puesta en marcha
• IPA funciona en línea.
– No se instala. Tan sólo se accede a él
• Para utilizarlo se necesita una cuenta
– Prueba (15 días).
– Acceso (IRHUVH y HVH) mediante reserva
previa a la UEB y en horario de mañana o
tarde.
• Funciona en Windows o Mac, pero no en
Linux
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
29
Requisitos del sistema
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
30
Acceso
07/07/2010
Formas de arrancar IPA
07/07/2010
El entorno de Ingenuity
Pantallas, menús, ayudas
07/07/2010
Pantalla de inicio rápido
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
34
Gestor de proyectos
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
35
Barra de búsqueda
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
36
Ayuda (1) Sistemas de ayuda
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
37
Ayuda (2) Tutoriales
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
38
Ayuda (3). Workflows
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
39
Programa de formación
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
40
Capacidades básicas del programa
Búsqueda, Análisis, Comunicación
17/06/2009
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
42
Search & Explore
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
43
Search and Explore
Biological & Chemical Knowledge
07/07/2010
Gene View / Chem View
07/07/2010
Dynamic Signalling & Metabolic Pathways
07/07/2010
My Pathway & Lists
• Build custom libraries of pathways representing mechanism of
action and mechanism of toxicity. Create custom, literaturesupported signaling pathways with proteins of interest. Store
collections of custom pathways and lists for subsequent core,
IPA-Tox™, IPA-Biomarker™, or IPA-Metabolomics™ analyses.
• Use the Grow and Connect tools to edit and expand networks
based on the molecular relationships most relevant to the
project:
–
–
–
–
–
Transcriptional networks
Phosphorylation cascades
Protein-Protein or Protein-DNA interaction networks
microRNA-mRNA target networks
Chemical effects on proteins
• Use Search results as building blocks for custom pathways
– Identify cross-talk between biological processes and pathways
– Understand whether gene lists and signatures are tightly
connected at the molecular level
07/07/2010
Path Explorer Path Designer
07/07/2010
Analyze & Interpret data
•
•
•
•
IPA
IPA
IPA
IPA
07/07/2010
Core Analysis
Tox Analysis
Biomarker Analysis
Metabolomic Analysis
Alex Sánchez. Unitat d'Estadística i Bioinformatica
49
07/07/2010
IPA Core Analysis
07/07/2010
IPA-Biomarker™ Analysis
• IPA-Biomarker identifies the most promising
and relevant biomarker candidates within
experimental data.
– Prioritize molecular biomarker candidates based on
key biological characteristics.
– Elucidate mechanism linking potential markers to a
disease or biological process of interest.
– Perform analysis across biomarker lists to find
biomarker candidates unique to a disease stage or
common across all stages.
– Understand the molecular differences between
patient populations.
07/07/2010
IPA.Tox Analysis
• IPA-Tox delivers a focused toxicity and safety
assessment of candidate compounds.
– Enables assessment of the toxicity and safety of
compounds early in the development process.
– Provides expert molecular toxicology data
interpretation to non-expert users.
– Reveals clinical pathology endpoints associated with
a dataset.
– Generates new hypotheses that may not have been
revealed using traditional toxicology approaches.
– Elucidates mechanism of toxicity and identify
potential markers of toxicity.
07/07/2010
IPA-Metabolomics Analysis
• IPA-Metabolomics extracts rich pathway
information from metabolomics data.
– Overcomes the metabolomics data analysis
challenge by integrating transcriptomics,
proteomics, and metabolomics data to
enable a complete systems biology
approach.
– Provides the critical context necessary to
gain insights into cell physiology and
metabolism from metabolite data.
07/07/2010
Communicate & Collaborate
•
•
•
•
Share
Report
Interactive Pathways
Integrate with other software
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
55
Resumen y recapitulación
Resumen
• El análisis funcional mejora la
comprensión de los fenómenos
biológicos mediante el estudio
simultáneo de grupos de valores.
• Ingenuity Pathways permite
– Explorar
– Analizar
– Comunicar y compartir
07/07/2010
Ventajas e inconvenientes
 Intuitivo y fácil de usar
 Integración de todas las funciones
 Muy potente en humanos y cáncer
 No tan potente en otras especies o
enfermedades.
 No es libre sino que hay que pagarlo
 No incorpora algoritmos avanzados
potentes como GSEA
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
58
Networks and Pathways in IPA
Networks
• A network is a set of terms (“nodes”) related
by a set of relations (“edges”).
• IPA transforms a list of genes into a set of
relevant networks based on information
maintained in the Ingenuity Pathways
Knowledge Base (IPKB)
• This knowledge base has been abstracted into
a large network, called the Global Molecular
Network, composed of thousands of genes and
gene products that interact with each other.
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
60
A network in IPA
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
61
Networks in IPA
• Purpose:
– To show as many interactions between
user-specified molecules in a given dataset
and how they might work together at the
molecular level
• Why are Ingenuity networks biologically
interesting?
– Highly-interconnected networks are likely to
represent significant biological function
07/07/2010
Key Terminology
• Focus Molecule:
– Molecules that are from uploaded list, pass filters are
applied, and are available for generating networks
• Networks:
– Generated de novo based upon input data
– Do not have directionality
– Contain molecules involved in a variety of Canonical
Pathways
• Canonical Pathways (Signaling and Metabolic)
– Are generated prior to data input, based on the literature
– Do NOT change upon data input
– Do have directionality (proceed “from A to Z”)
• My Pathways and Path Designer Pathways
– Custom built pathways manually created based on user
input
07/07/2010
Viewing networks
07/07/2010
How Networks Are Generated
1. Focus molecules are “seeds”
2. Focus molecules with the most
interactions to other focus
molecules are then connected
together to form a network
3. Non-focus molecules from the
dataset are then added
4. Molecules from the Ingenuity’s
KB are added
5. Resulting Networks are scored
and then sorted based on the
score
35 molecules per network for visualization purposes
07/07/2010
Calculation of Score for Networks
in IPA
• Based on the Right-tailed Fisher's Exact Test
• Used as a means to rank/sort networks so that those with the
most focus molecules are at the top of the list
• Takes into account the number of focus molecules in the
network and the size of the network
• Not an indication of the quality or biological significance of the
network
07/07/2010
Network notation (1)
(Help  legend)
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
67
Network notation (2)
07/07/2010
Alex Sánchez. Unitat d'Estadística i Bioinformatica
68
Significance Calculations
• Measures the likelihood that a function is over-represented by
the molecules in your dataset
• Expressed as a p-value calculated by using the right-tailed
Fisher's Exact Test
• Range indicates most significant low level function to least
significant low-level function
07/07/2010
Multiple Testing Correction
•Benjamini-Hochberg method of multiple
testing correction
•Calculates False Discovery Rate
– Threshold indicates the fraction of false positives
among significant functions
0
0.05
5% (1/20) may be a false positive
07/07/2010
1.0
Which p-value calculation should
I use?
•What is the significance of function X
relative to the dataset?
– Use right-tailed Fisher’s Exact test result
•What is the significance of function X
relative to all the other functions in the
dataset?
– Use Benjamini-Hochberg multiple testing
correction
07/07/2010
Download