a. Specific Aims The aims for this year were: * Our first aim is to develop methods to inventory all the TM-proteins in the recently sequenced microbial genomes. * Our second aim was to look at protein-protein interactions among helical membrane proteins from a database perspective. We wanted to put the TM-helix oligomerization motifs found in the genetic screens by the Beckwith and Engelman groups (e.g. GXXXG) into a context by comparing them to the helix-helix interfaces in the database of known structures -- both of the many soluble proteins and the few TM ones. * Our final aim was to integrate into a comprehensive database the information on the occurrence and interaction of membrane proteins generated in the first two aims with further information, e.g. related to expression. b. Studies and Results * Gerstein et al. Pac. Symp. (2000). We surveyed the occurrence of membrane proteins in the worm genome. * Drawid & Gerstein, JMB (2000) and Drawid et al. TIG (2000). Previously, using our integrated database system, we were able to connect the prediction of transmembrane helices in yeast with a number of datasets giving measurements of whole genome expression levels. This had the notable result that membrane proteins are expressed at a considerably lower level than soluble proteins -- by ~22% -- and that certain broad groups of membrane proteins are expressed more highly than others -- e.g. 4-TMs are expressed at a higher level than 2-TMs. Last year we extended this analysis to fully predict subcellular localization based on expression level in combination with traditional sequence patterns. * Qian et al. (2001). This paper presents the partslists system (available from partslist.org). This is one component of the integrated database presented in the grant. c. Significance We believe our analysis linking protein structure to gene-expression levels was a new type of study that potentially can highlight the overall structural characteristics of highly expressed proteins. The ability to use it to help predict the subcellular localization of proteins is useful. Our survey of membrane proteins in the worm genome highlighted the large number of 7TM proteins in their genomes. d. Plans We plan to continue with the project as outlined in the original proposal. In particular, we hope to implement a more sophisticated transmembrane identification program and to extend the analysis of expression data from yeast to human. We are very enthusiastic about the integration with the expression data. We hope to use this data as novel way to predict membrane localization and protein subcellular localization. We will continue our building an integrated project database upon the partslist system. e. Publications M Gerstein, J Lin, H Hegyi. "Protein folds in the worm genome." Pac Symp Biocomput 4 : 30-41 (2000). A Drawid, R Jansen, M Gerstein. "Genome-wide analysis relating expression level with protein subcellular localization." Trends Genet 16 : 426-30 (2000). J Qian, B Stenger, C Wilson, J Lin, R Jansen, W Krebs, V Alexandrov, N Echols, S Teichmann, J Park, M Gerstein. "PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information." Nucleic Acids Res 29: 1750-64 (2001). A Drawid, M Gerstein. "A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome." J Mol Biol 301 : 1059-75 (2000). f. Project-generated Resources From our website, http://bioinfo.mbb.yale.edu and http://www.partslist.org we make available: - all of our transmembrane helix identifications - the expression level of membrane proteins with known structure