Principal Investigator/Program Director (Last, first, middle): Gerstein, Mark B b. Studies and Results * Genomic surveys of membrane proteins and membrane protein motifs: Liu et al., GenomeBiology (2002) and Zhang et al., JMB (2002). [Related to Aim 1] During this year, we completed two surveys of membrane proteins and membrane protein motifs. In the first, we grouped membrane proteins into families and looked at their relative abundance in a number of different genomes. We also looked at the abundance of a number of different motifs -- in particular, GXXXG. In the second paper, we extended our motif work further, looking at the occurrence of protein motifs, not only in known proteins but also in the intergenic regions of genome. We called the motifs found "pseudomotifs" (in the spirit of pseudogenes). We compared a number of eukaryotic genomes in terms of their occurrence of these motifs. * Helix Packing Calculations: Tsai & Gerstein, Bioinformatics (2002) [Related to Aim 2] This year we continued on the work on 3D helix packing calculations. We performed a detailed sensitivity analysis of our calculations and created a database relevant parameters, which is available through the web. This sensitivity analysis was invaluable in helping us understand which parameters were most useful in our packing calculations. * Helix Interaction Motifs: Schneider et al., FEBS (2002) [Related to Aim 2] We did a composition analysis of membrane proteins, focusing on the implications of composition for helixhelix interactions. * Development of Integrative Database Systems: Lin et al., NAR (2002) and Mateos et al., Genome Research (2002) [Related to Aim 3] This year we set up a new integrated resource, GeneCensus.org, which followed on from last year's system PartsList.org. GeneCensus takes a more sequence and less structural view of genome comparisons focusing on expression data, pathway activities, and protein interactions. It has an extensive section devoted to the occurrence of transmembrane helix interaction motifs in different genomes. In a second paper, we collaborated with IBM researchers on developing a neural network system for integrating many different types of genomic information. This was used in relation to predicting various aspects of protein function and protein class. c. Significance * Our survey of membrane proteins highlights the importance and overrepresentation of specific motifs . * Our various parameter sets for packing calculations and detailed sensitivity analysis will be generally useful for helix packing calculations. * Our GeneCensus website represents new ways of organizing "poly-genomic" information . PHS 398/2590 (Rev. 05/01) Page _______ Continuation Format Page Principal Investigator/Program Director (Last, first, middle): Gerstein, Mark B d. Plans Next year, we plan to continue with the project as outlined in the original proposal. In particular: * Ursula Lehnert will be continuing to work on the project and, as usual, will continue on with the packing calculation in collaboration with Neil Voss, Julian Graham, and Nat Echols (individuals not funded by the project). We believe we should have a significant amount of membrane helix packing and dynamics calculations done. * We plan to continue on database integration. We hope to combine the PartsList and GeneCensus systems together with a smart-linking technology. * We plan to continue on with our inventory membrane proteins, looking in more detail at the occurrence of particular domain combinations. * Finally, we plan to continue surveying the occurrence of membrane to membrane proteins and membrane associated proteins in intergenic regions, looking at the fossil history of these proteins. In particular, we are keen to survey the pseudogenes associated with cytochrome c. e. Publications Calculations of protein volumes: sensitivity analysis and parameter database. J Tsai, M Gerstein (2002) Bioinformatics 18: 985-95. Thermostability of membrane protein helix-helix interaction elucidated by statistical analysis. D Schneider, Y Liu, M Gerstein, DM Engelman (2002) FEBS Lett 532: 231-6. Genomic analysis of membrane protein families: abundance and conserved motifs. Y Liu, DM Engelman, M Gerstein (2002) Genome Biol 3: research0054. Digging deep for ancient relics: a survey of protein motifs in the intergenic sequences of four eukaryotic genomes. ZL Zhang, PM Harrison, M Gerstein (2002) J Mol Biol 323: 811-22. GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing. J Lin, J Qian, D Greenbaum, P Bertone, R Das, N Echols, A Senes, B Stenger, M Gerstein (2002) Nucleic Acids Res 30: 4574-82. Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. A Mateos, J Dopazo, R Jansen, Y Tu, M Gerstein, G Stolovitzky (2002) Genome Res 12: 1703-15. PHS 398/2590 (Rev. 05/01) Page _______ Continuation Format Page Principal Investigator/Program Director (Last, first, middle): Gerstein, Mark B f. Project-generated Resources This past year, we have created a number of new project web resources for this grant, in particular GeneCensus.org. This is added to http://bioinfo.mbb.yale.edu and http://www.partslist.org. These sites make available: - all of our transmembrane helix identifications and surveys of helix interaction motifs - the expression level of membrane proteins with known structure - the parameters from a our packing calculations We also have a page devoted exclusively to this grant: http://thorin.csb.yale.edu/papers/grant/mem . PHS 398/2590 (Rev. 05/01) Page _______ Continuation Format Page Principal Investigator/Program Director (Last, first, middle): PHS 398/2590 (Rev. 05/01) Page _______ Gerstein, Mark B Continuation Format Page