TEXT S1: CYTOPLASMIC RIBOSOMAL PROTEINS Martin Helmkampf and Jürgen Gadau School of Life Sciences, Arizona State University, Tempe, AZ 85287, United States of America Ribosomes are vital components of the translational machinery that directs protein synthesis in all cells. In eukaryotes, ribosomes residing in the cytoplasm are composed of a large (60S) and small (40S) ribosomal subunit, which together comprise about 80 proteins (CRPs – cytoplasmic ribosomal proteins) and four RNA species (rRNA) [1]. Although the central processes of protein translation are catalyzed by rRNA, CRPs fulfill many important roles relevant to ribosome biogenesis, stability and molecular interaction. These include the facilitation of rRNA folding, protecting the rRNA from nuclease degradation, mRNA tethering during translation, and serving as a binding platform for translation factors. CRPs also link the ribosome to cellular signaling pathways, thus permitting the regulation of translation levels and possibly localized ribosome recruitment [2]. Many CRPs perform extra-ribosomal functions as well, for instance in DNA repair, transcription regulation and apoptosis [3]. It has been hypothesized on the basis of this multifunctionality that ribosomal proteins were coopted from a pre-existing set of proteins during the transformation of the ribosome from a RNA-only complex to a ribonucleoprotein particle [4]. Homology between a substantial number of eubacterial, archeal and eukaryotic ribosomal proteins genes further suggests that this conversion occurred before the divergence of these ancient lineages. Due to their universally essential role, the basic functional and structural features of these genes have since been preserved. In eukaryotes, the number and sequence of CRPs is thus highly conserved, although they can be encoded by a variable number of genes. Gene models coding for Atta cephalotes CRPs were identified by performing BLAST searches against the official gene set v1 (OGS1.0) produced by MAKER. CRP sequences of Drosophila melanogaster, taken from FlyBase (http://flybase.org), served as query sequences. The obtained gene models were inspected and edited if necessary using the annotation editor software Apollo [5]. Care was taken to ensure that the predicted gene structures matched corresponding transcriptomic data. Gene models were also aligned to homologous protein sequences from D. melanogaster, Apis mellifera (obtained from the Ribosomal Protein Gene Database, http://ribosome.med.miyazaki-u.ac.jp), Pogonomyrmex barbatus and Linepithema humile (both unpublished) using MAFFT v6 ([6], default parameters) to monitor the integrity of the reading frame and the extent of the predicted coding domains. Gene homology relations were inferred by querying the annotated D. melanogaster proteins deposited at FlyBase with the translated gene models. Best reciprocal BLAST hits were interpreted as orthologs [7]. Pseudogenized gene copies were identified by searching the A. cephalotes genome assembly using the tblastn program and the D. melanogaster CRP sequences as queries, with the low complexity filter disabled and the e-value cutoff set to 10–4. The number of CRP genes in Nasonia vitripennis was determined by the same strategy. Identity scores between protein sequence pairs were computed by bl2seq, part of the BLAST software package. Nomenclature of the CRP genes follows Wool et al. [4] and Marygold et al. [8]. In total, we identified 89 genes in the A. cephalotes genome that encode the full complement of 79 CRPs traditionally recognized in animal genomes. While the majority of CRPs are represented by single genes, eight are encoded by gene duplicates (RpL11, RpL14, RpS2, RpS3, RpS7, RpS13, RpS19, RpS28), and one by a gene triplicate (RpL22). With the exception of RpL14a/b, all multi-copy genes display identical intron-exon structures and high sequence similarity (95 % on average) between paralogues, suggesting a recent evolutionary origin. This interpretation is supported by the fact that the homologous genes are of single-copy status in A. mellifera and N. vitripennis. In addition, a recent newcomer to the list of ribosomal protein genes, the receptor of activated C kinase (RACK1) [9], has also been identified. Of the two CRPlike genes present in all eukaryotes [8], only RpL24-like could be found, while RpLP0like seems to have been lost. The corresponding proteins are presumably not associated with ribosome function, and might not be as essential as proper CRPs (indeed, loss of CRP-like genes has been reported before, e.g. in Rattus norvegicus). In contrast to other genomes, neither additional CRP-like genes (characterized by low sequence similarity to the reference gene), nor processed pseudogenes were discovered [10]. As in other eukaryotes, RpL40, RpS27A and RpS30 precursors are Cterminally fused to ubiquitin or an ubiquitin-like protein. All genes mentioned above are supported by EST data, testifying to the high expression levels expected from CRPs genes. It is conceivable, however, that only one gene of multiple functional copies is transcribed at a high level, as is generally assumed to be the case in animals [8,10]. Overall, the CRP gene inventory of A. cephalotes is highly similar to that of other insects, both with regard to gene number (88, 80 and 79 in D. melanogaster, A. mellifera and N. vitripennis, respectively) and sequence similarity (78 % identity to D. melanogaster on protein level, range 52–100 %). References 1. Taylor DJ, Devkota B, Huang AD, Topf M, Narayanan E, et al. (2009) Comprehensive Molecular Structure of the Eukaryotic Ribosome. Structure (London, England: 1993) 17: 1591. 2. Brodersen DE, Nissen P (2005) The social life of ribosomal proteins. FEBS Journal 272: 2098. 3. Warner JR, McIntosh KB (2009) How Common Are Extraribosomal Functions of Ribosomal Proteins? Molecular cell 34: 3. 4. Wool I, Chan Y, Gluck A (1995) Structure and evolution of mammalian ribosomal proteins. Biochemistry and Cell Biology 73: 933–947. 5. Lewis SE, Searle SMJ, Harris N, Gibson M, Iyer V, et al. (2002) Apollo: a sequence annotation editor. Genome Biology 3: research0082.0081 - 0082.0014. 6. Katoh K, Misawa K, Kuma Ki, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30: 3059. 7. Wall DP, Fraser HB, Hirsh AE (2003) Detecting putative orthologs. Bioinformatics 19: 1710. 8. Marygold S, Roote J, Reuter G, Lambertsson A, Ashburner M, et al. (2007) The ribosomal protein genes and Minute loci of Drosophila melanogaster. Genome Biology 8: R216. 9. Sengupta J, Nilsson J, Gursky R, Spahn CMT, Nissen P, et al. (2004) Identification of the versatile scaffold protein RACK1 on the eukaryotic ribosome by cryo-EM. Nat Struct Mol Biol 11: 957. 10. Zhang Z, Harrison P, Gerstein M (2002) Identification and Analysis of Over 2000 Ribosomal Protein Pseudogenes in the Human Genome. Genome Research 12: 1466.