Improving PPI Networks with Correlated Gene Expression Data

Improving PPI Networks with Correlated Gene Expression Data Jesse Walsh Background • PPI networks are currently derived either computationally or experimentally • It is well known that there are a great number of false positives and false negatives in computationally derived networks • High-throughput – Two major published yeast PPI experiments showed only ≈150 similar interactions out of thousands [1] Goal: Improve the Data Quality • The goal of this study is to improve the quality of computationally predicted protein-protein interactions • Hypothesis: – Proteins that interact may also have similar expression patterns – Gene coexpression is correlated to PPIs Previous Work • Deane et al. (2001) [2] – Proposed EPR metric: use gene expression profiles to assess the quality of computationally predicted protein-protein interactions Figure adapted from Deane et al. [2] Glimpse at the Data Interaction Data: DIP dataset statistics Genome Size Interactions Proteins E. Coli (2008) 4.6 million bp 7447 1863 Yeast (2001) -- 8063 4150 Yeast (2008) 12.5 million bp 18440 4943 DIP (Database of Interacting Proteins) http://dip.doe-mbi.ucla.edu/dip/Main.cgi [3] Affymetrix Expression Data: M3D (Many Microbe Microarrays Database) Number of Experiments: 466 Number of Chips: 907 Genes: 4298 M3D (Many Microbe Microarrays Database) http://m3d.bu.edu/cgi-bin/web/array/index.pl?section=home [4] Expression Data Selection • Concerned about complications from adding to many expression conditions – Knockouts, over-expression, foreign genes • Selected a group of 20 conditions that were published as a part of the same experiment – Hope for more homogenous data – Allen et al. [5] Method Gene Expression Distance • Expression Distance given by: • Summed over 20 conditions – Treated the first condition (wild-type anaerobic) as the reference condition for lack of a true control Expression Distance equation from Deane et al. [2] Method DIP Data • DIP labels as ‘core’ or ‘non-core’ – Corresponds roughly to small scale experiments and high-throughput experiments • 3 Interaction Datasets – Core • Core interactions – Non • Non-core interactions – Rand • 100,000 random interactions were created Method Mapping • DIP interaction set used uniprot protein identifiers, while M3D used gene ids • Ran a blast of protein sequences against translated E. coli genome to map the datasets together • Lost most of my data on this step Number of Interactions Mapping Available CORE 991 220 NON 5999 903 RAND 100,000 100,000 Results Density distribution of squared distances Bin size = .05 Results from Deane et al. [2] Bin size = 1.25 Figure adapted from Deane et al. [2] Results from Deane et al. [2] Least Squares Factorization Figure adapted from Deane et al. [2] Discussion • Mapping/Demerging problem – Kept 1044 of my 6991 interactions (15%) • Case study P22885 – Obsolete since 2005 – Demerged to P0A8P6 and P0A8P7 • Tyrosine recombinase xerC • All three have a perfect ClustalW match Discussion • Shape of curve – Multimers and proteins that link to themselves in the PPI network (the zeros problem) – 66.6% of Core, 43.6% of Non, <0.1% of Rand Conclusion • Cannot predict novel interactions • Cannot assign confidence values to individual interactions • Can provide some measure of the overall quality of a PPI dataset Thank You • • • • • • References: [1] Deeds EJ, Ashenberg O, Shakhnovich EI. “A simple physical model for scaling in protein-protein interaction networks.” Proc. Natl Acad. Sci. USA (2006) 103:311–316 [2] Charlotte M. Deane et al. “Protein Interactions: Two Methods for Assessment of the Reliability of High Throughput Observations.” Molecular & Cellular Proteomics 1.5 349-356 [3] Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The Database of Interacting Proteins: 2004 update. NAR 32 Database issue:D449-51 [4] Faith JJ, Driscoll ME, Fusaro VA, Cosgrove EJ, Hayete B, Juhn FS, Schneider SJ, and Gardner TS. Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata. Nucleic Acids Research [5] Timothy e. Allen et al. “Genome-Scale Analysis of the Uses of the Escherichia coli Genome: Model-Driven Analysis of Heterogeneous Data Sets.” J Bacteriol. 2003 November, 185(21): 63926399

Improving PPI Networks with Correlated Gene Expression Data

Related documents

Products

Support

Improving PPI Networks with Correlated Gene Expression Data

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib