Divining Systems Biology Knowledge from High-throughput

advertisement
Divining Systems Biology Knowledge from
High-throughput Experiments Using EGAN
Jesse Paquette
ISMB 2010
Biostatistics and Computational Biology Core
Helen Diller Family Comprehensive Cancer Center
University of California, San Francisco
(AKA BCBC HDFCCC UCSF)
High-throughput experiments
• This talk applies to
–
–
–
–
–
–
–
–
Expression microarrays
aCGH
SNP/CNV arrays
MS/MS Proteomics
DNA methylation
ChIP-Seq
RNA-Seq
In-silico experiments
• If parts of the output can be mapped to gene IDs
– You can use EGAN
What do you hope to accomplish?
Collect data
Process data
Differential analysis
Publish!
Clusters and/or gene lists
Produce insight about the underlying biology
New papers!
New testable hypotheses
Drug targets!
New grants!
Leverage organic intelligence
Clusters and/or gene lists
Summarize
Visualize
Produce insight about
the underlying biology
Contextualize
New testable hypotheses
Producing insight from clusters and gene lists
•
Summarize: find enriched pathways (and other gene sets)
– Hypergeometric over-representation
• DAVID
– Global trends
• GSEA
•
Visualize: gene relationships in a graph
– Protein-protein interactions
• Cytoscape
– Network module discovery
• Ingenuity IPA
– Literature co-occurrence
• PubGene
•
Contextualize: pertinent literature
• PubMed
• Google
• iHOP
EGAN: Exploratory Gene Association Networks
•
Methods: state-of-the-art analysis of clusters and gene lists
–
–
–
–
–
•
User Interface: responds quickly to new queries from the biologist
–
–
–
–
–
•
Hypergeometric enrichment of gene sets
Global statistical trends of gene sets
Hypergraph visualization (via Cytoscape libraries)
Literature identification
Network module discovery
Sandbox-style functionality
Dynamic adjustment of p-value cutoffs
Point-and-click interface
All data in-memory for immediate access
Links to external websites
Modular: integrates as a flexible plug-and-play cog
–
–
–
–
–
All data is customizable
Proprietary data can be restricted to the client location
Java runs on almost every OS (PC, Mac, LINUX)
Can be configured and launched from a different application (e.g. GenePattern)
Analyses can be scripted for automation
Gene sets
• A gene set is a a set of semantically related genes
– e.g. Wnt signaling pathway
• EGAN contains a database of gene sets
– > 100k gene sets by default
• KEGG, Reactome, NCI-Nature, Gene Ontology, MeSH, Conserved
Domain, Cytoband, miRNA targets
– You can easily add your own
• Simple file format
• Download from MSigDB (Broad Institute)
Gene-gene relationships
• EGAN also contains
– Protein-protein interactions (PPI)
– Literature co-occurrence
– Chromosomal adjacency
– Kinase-target relationships
• Other possibilities
– Sequence homology
– Expression correlation
Example with microarray and aCGH results
• Mirzoeva et al. (2009) Cancer Research
– UCSF-LBL collaboration
– Analysis of breast cancer cell lines
• Basal vs. luminal
• Discoveries in this presentation
– miRNA regulator of subtype (mir-200)
– Annexin (ANXA1) as potential regulator of ER,
glucocorticoid and EGFR signaling
Gene list - higher expression in
basal cell lines
Gene set/pathway enrichment
Importing gene lists from
publications
Combining expression with
aCGH
Finding network modules
Where to find EGAN
• Website
– http://akt.ucsf.edu/EGAN/
• 2010 paper in Bioinformatics
– http://www.ncbi.nlm.nih.gov/pubmed/19933825
Acknowledgements
• BCBC HDFCCC UCSF
–
–
–
–
Taku Tokuyasu
Adam Olshen
Ritu Roy
Ajay Jain
• LBNL
– Debopriya Das
– Joe Gray
• Funding
– UCSF Cancer Center Support
Grant
• UCSF
– Early adopters
•
•
•
•
•
Ingrid Revet
Antoine Snijders
Stephan Gysin
Sook Wah Yee
Joachim Silber
– Cytoscape gurus
• David Quigley
• Scooter Morris
– OTM
• David Eramian
• Ha Nguyen
– Laura van ’t Veer
– Donna Albertson
– Graeme Hodgson
Download