TEXT S9: YELLOW/MAJOR ROYAL JELLY PROTEIN FAMILY Rick Overson, Martin Helmkampf, and Jürgen Gadau School of Life Sciences, Arizona State University, Tempe, AZ 85287, United States of America The yellow/major royal jelly protein family is a quickly evolving gene family which curiously has been discovered in all insects investigated to date, as well as in some bacterial and fungal species but in no other non-insect metazoan [1]. Yellow genes function in diverse roles in development, locomotion, melanization, immune response, and mating and courtship behavior [2-4]. An expansion of an ancestral gene similar to the extant yellow-e3 has led to the formation of the major royal jelly protein (MRJP) subfamily, which was first detected in Apis mellifera. In this species, members of the MRJP subfamily have taken on a nutritional role (the production of royal jelly) which in turn regulates reproductive division of labor in the colony [1,5]. A similar but apparently independent expansion of major royal jelly protein-like genes has been detected in the genome of the parasitoid wasp Nasonia vitripennis [6]. In the Atta cephalotes genome we detected a total of 21 yellow/MRJP genes, 13 of which are yellow genes and eight of which are similar to Apis mellifera MRJP genes, using an approach described elsewhere (see section on cytoplasmic ribosomal proteins; the MRJP genes of Apis mellifera served as additional reference genes). Of the 13 yellow genes, nine were identified as single-copy orthologs of the yellow genes of either Drosophila melanogaster or Apis mellifera (Acep_Y-b,-c,-e,-e3,-g,-g2,-h,-y and -x2). In contrast, the Apis mellifera and Nasonia vitripennis gene Y-x1 is partially represented by three gene models: Acep_Y-x1_frag1 contains the first 2/3 of a complete gene (using the Apis mellifera Y-x1 gene as a reference) while Acep_Y-x1_frag2 contains the exact final 1/3 of the gene but it is present on a different scaffold. Acep_Y-x1_frag3 which is located on the same scaffold as AcepY-x1_frag2 also contains the 3’ end of a Y-x1 gene. We included only Acep_Y-x1_frag1 in the phylogenetic analysis below as it is the largest and most complete of the three. Whether this fragmentation is due to repeated duplications and truncations or genome misassembly remains unknown. Finally, a gene termed Acep_Y-1 without clear homology relations to other insect yellow genes was also found. Of the eight MRJP-like genes we detected, three models (Acep_MRJPL1, 2, -3) posses the six exons of a typical complete MRJP or MRJPL gene (Acep_MRJPL3 is missing part of the sixth exon, but it seems to be artifactually cut short by the end of the scaffold). The remaining four MRJP-like genes (Acep_MRJPL4, -5, -6, -7, -8) are presumably pseudogenes as they contain missing exons, large indels, or lack open reading frames. To understand the evolution of the gene family, we performed a phylogenetic analysis of yellow/MRJP genes across insect taxa. We started with the initial reference set of Drosophila melanogaster yellow genes and retrieved homologous genes from the genomes of Apis mellifera, Nasonia vitripennis and Tribolium castaneum from public archives. A yellow gene from the bacterium Dienococus radiodurans was included to serve as the outgroup for the analysis. Amino acid sequences of these genes and those from Atta cephalotes (88 genes in total) were aligned with MAFFT v6 and the L-INS-i algorithm [7]. Positions which were aligned ambiguously were removed using Aliscore v1 [8] with default settings. This resulted in a final dataset containing 254 amino acid positions. The evolutionary model with the best fit to this dataset, LG+G, was determined by ProtTest [9] according to the Akaike Information Criterion corrected for small sample size. Based on this model, a maximum likelihood tree was reconstructed using RAxML v7.2.6 [10]. Nodal support values were obtained by the rapid bootstrap algorithm as implemented in RAxML (500 replicates). The tree (Fig. 1) reveals twelve gene subfamilies within insect yellow/MRJP genes, most of which are characterized by a one-to-one orthologous relationship among the five focal taxa (Y-b, -c, e, -g, -h, -y). Y-x1, -g2 and -e3 display expansions in individual taxa, mainly Drosophila melanogaster and Nasonia vitripennis. Gene losses are encountered rarely, with Y-x1 and Y-x2 being restricted to hymenopterans (although the Y-1 to Y-5 genes in Tribolium castaneum might be co-orthologous to the hymenopteran Y-x1 genes), and hymenopterans in turn lack Y-f (according to the tree, the genes named Y-f in Apis mellifera and Nasonia vitripennis are probably misnamed as they are part of the Y-c clade). Finally, the MRJP subfamiliy is restricted to Hymenoptera, and characterized by independent expansions in all three represented taxa, as all are more closely related to their intraspecific paralogues than to genes in other taxa. Although only three complete MRJP genes could be identified in Atta cephalotes, the existence of five putative pseudogenes indicates that the gene number was originally in the same range as in the other two taxa. It may be that an ancestral gene with the propensity to expand in this manner has done so independently to fulfill varying roles. In the case of Apis mellifera MRJP genes, one of these roles is the production of royal jelly. Their function in ants and parasitoid wasps, however, is unknown. The phylogenetic position of two orphaned genes, Atta cephalotes Y-1 and Drosophila melanogaster Y-k could not be determined reliably. While nodal support values for many yellow/mrjpl subfamilies is strong, only a few inter-clade relationships could be resolved. These include the sister-group relationship between the Y-g and Yg2 genes, and within the well supported monophyly of the Y-b, -c, -f, -y and possibly Y-h genes. Because this clade contains the originally described Y-y gene of Drosophila melanogaster (reviewed in [11]), we refer to it as the yellow core group. References 1. Drapeau MD, Albert S, Kucharski R, Prusko C, Maleszka R (2006) Evolution of the Yellow/Major Royal Jelly Protein family and the emergence of social behavior in honey bees. Genome Research 16: 1385. 2. Claycomb JM, Benasutti M, Bosco G, Fenger DD, Orr-Weaver TL (2004) Gene Amplification as a Developmental Strategy: Isolation of Two Developmental Amplicons in Drosophila. Developmental Cell 6: 145. 3. Drapeau MD, Radovic A, Wittkopp PJ, Long AD (2003) A gene necessary for normal male courtship, yellow, acts downstream of fruitless in the Drosophila melanogaster larval brain. Journal of Neurobiology 55: 53. 4. Wittkopp PJ, True JR, Carroll SB (2002) Reciprocal functions of the Drosophila Yellow and Ebony proteins in the development and evolution of pigment patterns. Development 129: 1849. 5. Schmitzová J, Klaudiny J, Albert Š, Schröder W, Schreckengost W, et al. (1998) A family of major royal jelly proteins of the honeybee Apis mellifera L. Cellular and Molecular Life Sciences 54: 1020. 6. Werren JH, Richards S, Desjardins CA, Niehuis O, Gadau J, et al. (2010) Functional and Evolutionary Insights from the Genomes of Three Parasitoid Nasonia Species. Science 327: 343-348. 7. Katoh K, Misawa K, Kuma Ki, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30: 3059. 8. Kuck P, Meusemann K, Dambach J, Thormann B, von Reumont B, et al. (2010) Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees. Frontiers in Zoology 7: 10. 9. Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21: 2104. 10. Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688. 11. Drapeau MD (2001) The Family of Yellow-Related Drosophila melanogaster Proteins. Biochemical and Biophysical Research Communications 281: 611. Figure 1. Maximum likelihood tree of the yellow/mrjp genes found in the genomes of Atta cephalotes (Ac, highlighted), Apis mellifera (Am), Nasonia vitripennis (Nv), Drosophila melanogaster (Dm) and Tribolium castaneum (Tc). Support values > 50 based on 500 rapid bootstrap replicates are shown at the nodes of the tree.