a. Genome Sequence of the Pea Aphid Acyrthosiphon pisum: Adaptation to Host Plants and Symbiotic Bacteria The International Aphid Genomics Consortium Abstract to be done later Highlights Note, these highlights may be incorporated into a single paragraph near the end of the introduction and/or may be incorporated into a synopsis of the article to be included in a PLOS edition. They will also be the points that we will want to try and highlight when talking with others about the project, so it is very important that we pick the most exciting elements. General Novel Aspects of the Project sequenced a symbiotic system: the aphid host and its primary and secondary symbionts. First Hemimetabolous insect sequenced First Agricultural pest sequenced General Features of the Pea Aphid 1. A high, steady wave of gene duplications characterizes the Acyrthosiphon pisum genome. 2. Abundant chromatin remodeling proteins may enable functional specialization of epigenetic pathways. 3. Unexpected expansion of the microRNA machinery a first observation in the metazoan 4. A. pisum has acquired functional genes from bacteria via lateral gene transfer, but the number is small, and the transfer origins are from a-protobacteria, not from the group containing the primary symbiont Buchnera or most secondary symbionts. Feeding 5. Gene duplication has led to an extensive and diverse family of uniporters of sugars and other compounds in pea aphid 6. The number of detoxification genes is correlated with aphid host variety 7. The pea aphid and its obligate symbiont buchnera engage in a true nutritional symbiosis, particularly in amino acid and purine metabolism. Reproduction 8. Expansion of some regulatory kinases involved in controlling mitosis, possibly involved in reproductive polyphenism? Phenotypic Plasticity 9. Different DNA methylation states are associated with the pea aphid wing polyphenism. Some Striking Differences to Other Sequenced Insects 10. There are many cases of aphid-specific losses and duplications of “toolkit genes” for development, which are known to be highly conserved among metazoans. Modifications of these pathways in aphids are suggested 11. Characterization of cuticular proteins reveals both a large gene expansion of RR-2 proteins and a reduced number of chitinase genes, which might reflect the absence of dramatic exoskeleton reconstruction in hemimetabolous insects 12. Pea aphids are missing many immune-related genes common to other insects, including genes commonly involved in pathogen recognition, immune signaling, and antimicrobial peptides. 13. The gene Period in A. pisum does not contain the motifs necessary for nuclear import and the whole protein seems to be evolving at accelerated rates. Introduction Aphids are among the most severe pests of agricultural crops. These small, soft-bodied insects feed from plants, affecting their growth and acting as vectors of plant viruses. As a result, they have an impact in the production of food and fibers worldwide. Thereis a need to advance the overall understanding of the biological interactions of these pests with their symbionts and host plants. Aphids feed by inserting their slender mouthparts, referred to as stylets, into phloem cells, one of the food conduits of plants. Most of the approximately 5000 species of aphid feed on only one or a few species of host plants, and closely related aphid species tend to feed on related host species. Once an aphid finds a suitable plant, using a variety of visual and chemical cues, it settles to then simultaneously feed and reproduce. Offspring are born live and typically settle close to their mothers, spawning large colonies. Newborn nymphs molt four times, each time growing larger but otherwise looking similar to the previous instar – like other members of the Hemiptera, aphids are hemimetabolous insects, undergoing an incomplete metamorphosis from the juvenile to the adult stages, which may be winged, (alate) or wingless (apterae) and which disperse readily spreading plant diseases.. Phloem fluid provides a diet with high concentrations of simple sugars and an unbalanced mixture of amino acids. Aphids have evolved specialized gut morphology and physiology to reduce the high osmotic potential of phloem fluid. Most aphids harbor intracellular symbiotic bacteria, Buchnera aphidicola, that produce several essential amino acids that are found at low levels in the aphid diet. Aphids and Buchnera have coevolved since the origin of aphids, about 200 million years ago, with aliquots of Buchnera transferred directly from maternal tissues into developing embryos and oocytes every generation. The Buchnera genome underwent a dramatic reduction in gene content — to about 620 genes — soon after the origin of the symbiosis, underscoring the dependence of Buchnera on the host cells. Buchnera have dispensed with many genes that would allow them to live outside aphid cells, and they import all of their food and potentially other essential products from the aphid cell. Some aphids also harbor a variety of other facultative bacterial symbionts that provide ecologically relevant benefits, such as heat tolerance and resistance to parasitoids. Aphids are essentially plant parasites and like many parasites they have evolved complex life cycles with alternative generations of individuals specialized to meet different ecological challenges. They have taken this specialization, called polyphenism, to extremes. Aphids produce forms specialized for sexual versus asexual reproduction, sedentary rapid reproduction versus dispersal, feeding on distinct sets of host plant species, and desiccation resistance or for colony defense in the case of social species. Asexual forms have evolved a highly modified meiosis, which skips the reduction division of Meiosis I, to allow parthenogenetic reproduction. Embryos develop directly within their mothers and, sometimes embryos develop within embryos (paedogenesis), such that females carry their grand-offspring within them. This telescoping of generations promotes short generation times, allowing aphid colonies to rapidly exploit new resources. (last two sentences for summer generations only?) (EXPLAIN WHY WE CHOSE PEA APHID) The pea aphid genome, the first published genome of a hemimetabolous insect, provides an outgroup for the published genomes of multiple holometabolous insects such as flies, beetles, butterflies and bees. The pea aphid genome thus creates a dataset with which a more accurate reconstruction can be made of the gene content in the common ancestor of hemimetabolous and holometabolous insects, which diverged about 310-350 MYA. In addition, this hemipteran genome provides a foundation for exploring the genetic basis of host plant specialization, extreme developmental phenotypic plasticity, and coevolved associations with bacterial endosymbionts. Results Features of The Pea Aphid Genome Genome Sequence and Organization. Initial Sanger sequencing of DNA samples from pea aphid LSR1.AC.F1 strain produced 3.13 million reads. This represents about 464Mb of sequence and about 6.2X coverage of the (clonable) A. pisum genome. The final Acyr 1.0 assembly contained 72,844 contigs, with an N50 length of 10.8 kb, and a total length of 446.6 Mb. Additionally, the sequenced sample also contained cells from the obligate symbiont Buchnera aphidicola. The facultative symbiont Regiella insecticola was sequenced separately, from a large insert library. Since the genome sequence of R. insecticola has not been reported and the only available genome sequence of B. aphidicola is from a different pea aphid strain, we identified and used contaminating reads to assemble complete sequences of both bacterial genomes. After Sanger sequencing and assembly, the pea aphid genome was subjected to additional pyrosequencing on the 454 platform to improve the genome sequence contig and scaffold lengths. <DESCRIBE THE Acyr 2.0 ASSEMBLY HERE AS SOON AS I HAVE IT> Here we report the sequence for both versions of the pea aphid sequence, but the annotation and analysis are based on the first Sanger based assembly Acyr_1.0. Genome Sequence and Organization. This genome has a GC content of 29.6%, the lowest among the range of available insect genomes (34.8% in Apis mellifera to 45% in Drosophila pseudoobscura). Transcript GC content is higher, averaging 38.8% (sd=8.4, N=37994), and is similar to Apis (mean=38.6%, sd=9.7, N=17182) [Supplemental table at http://insects.eugenes.org/arthropods/data/summaries/ arthropod_insect_gc_content.txt ]. Transposable elements. Transposable elements (TEs) are key elements of genome plasticity and account for a large part of many eukaryotic genomes. To find all the TEs inserted in the the pea aphid Acyrthosiphon pisum genome, we used a previously described transposable element annotation pipeline (Reference and see Methods). This procedure revealed near 14,000 (WHY NOT THE EXACT NUMBER?) consensus sequences in the A. pisum genome, which we classified into almost 1400 (EXACT NUMBER) families, according to their structural and coding features. Table 1 shows abundance and coverage of TE families. Figure 1 shows distributions of nucleotide identity per TE categories, estimating order of TE families invasions in the genome. XX percent of the pea aphid genome match consensus sequences for repeats. Additionally, we discovered chimeric TE families and evidence of co-evolution. In particular, Ty3/Gypsy long-terminal-repeat (LTR) retrotransposons are among the bestknown mobile genetic elements. Proteins coded by Ty3/Gypsy LTR retroelements occasionally assume cellular roles. GIN-1 for instance, is an integrase apparently functional, found in humans and other vertebrates [11]. GIN-1 is similar to the integrases coded by the 412/Mdg1 clade of Ty3/Gypsy elements described in arthropods. The evolutionary history of GIN-1 is not exceptional. Screening the A. pisum genome revealed a similar co-evolutionary history between Ty3/Gypsy LTR retroelements and KRB2 a family of nonviral integrases described in humans and other vertebrates. Telomeres. Similarly to other non-dipteran insects, the pea aphid possesses a single candidate telomerase gene. The canonical arthropod telomere repeat TTAGG can be found in long stretches and the vast majority of these are plus/minus matches, indicating that they are at the ends of chromosomes. Sun and Robinson (1966) published karyotypes for various aphids, and the four haploid chromosomes of the pea aphid matches with the four linkage groups of Hawthorne and Via (reference). Of the expected 8 telomeres we were able to identify TTAGG repeat stretches are at the ends of just 5 scaffolds in the assembled genome, of which all but two are very short. The two relatively long scaffolds identified appear to be true, relatively simple telomeres. However the others are more complex like the Bombyx and Tribolium telomeres, with non-LTR retrotransposons insertions that presumably confounded the WGS assembly. Overall it appears that the aphid telomeres are quite heterogenous, ranging from simple to complex, and will require further bioinformatic and experimental i analysis. Gene model prediction. Fewer than 200 pea aphid genes had been sequenced prior to this project. Consequently, we heavily utilized automated gene predictions to aid our understanding of the gene content in the pea aphid. Partially or fully supported models computed by NCBI's gene prediction pipeline serve as a core set of 10,245 gene models, and are integrated into the public RefSeq databases at NCBI. Since this number is likely to underestimate the true number of protein-coding genes in the pea aphid, additional models were calculated using six additional gene prediction programs and combined into a consensus set of 24,355 additional gene models using GLEAN [ref] (Table Suggestion 6). The combined total of 34,600 gene predictions is likely to be an over-estimate of the true number of pea aphid genes, since it includes unsupported ab initio models, transposons, partial gene models, and predictions of genes duplicated in the Acyr_1.0 assembly. However, it provides an expansive foundation to identify genes for subsequent analyses described below. Utilizing a variety of approaches, a subset of genes of interest were then annotated manually. All gene predictions and other identified features were loaded in a GMOD-Chado database (ref) accessible at the AphidBase web portal (http://www.aphidbase.com). AphidBase is using various open source software tools from the Generic Model Organism Database (GMOD) in particular the graphical genome browser GBrowse (ref) and the manual curation software Apollo (ref). Comparison of Gene Set to Other Organisms. In order to compare the gene content of A. pisum to that of other organisms we performed sequence searches against a database containing the proteomes encoded in 16 other species. These include 12 other insects, representing all major insect groups with sequenced genomes, and four out-groups including the crustacean Daphnia pulex, the nematode Caenorhabditis elegans and the two chordates Ciona intestinalis and Homo sapiens. To set the genome comparisons on an evolutionary context, a species phylogeny was reconstructed based on a Maximum Likelihood analysis of 197 concatenated alignments of genes with a single-copy ortholog in all species considered (see Material and Methods and FIGURE 1 ). The resulting phylogeny groups major insect groups according to previously stablished taxonomy, including the recovering of the diptera and himenoptera clades. Similarly, the phylogeny correctly places the pea aphid as a sister group of Pediculus humanus,also a member of the the para-neopthera clade, which appears at the base of the insect phylogeny. The long branch leading to A. pisum is indicative of a very long evolutionary distance and therefore significant genomic differences with its closer relatives with sequenced genomes are expected. Figure 1B ,shows a summary of the comparison of the pea aphid gene repertoire to that of other organisms. 12,885 genes in the Acyr1.0 gene set (37%) show no significant hitst (e-value < 10-3) with genes in other species included in the analysis. This large number of species-specific genes might be in part due to failures in gene prediction programes or undetectable homology due to extensive sequence divergence, but might also reflect true genetic specificities of this species as compared to other insects. A. pisum shares a range of 30-53% of its gene repertoire with other insects, being Nasonia vitripennis and Tribolium castaneum the two species that share the highest percentage of aphid genes (53% in both cases). Interestingly the closest relative among insect with sequenced genomes, Pediculus humanus, shares only 38% of the pea aphid genes. The pea aphid phylome: detection of orthology, paralogy and lineage-specific gene expansions To obtain an overview of the evolution of each single pea aphid gene and infer the corresponding phylogeny-based orthology and paralogy relationships among pea aphid genes and those in other organisms, we reconstructed the pea aphid phylome, that is, the complete collection of phylogenetic trees of every protein encoded in the A. pisum genome. To do so, we followed a similar pipeline to that used for the human phylome (Huerta-Cepas et al 2008). The resulting alignments, phylogenies and orthology predictions can be accessed through phylomeDB (Huerta-Cepas et al 2008) (http://phylomedb.org) and AphidBase. We scanned the pea aphid phylome with a previously-described, phylogeny-based orthology prediction algorithm (Huerta-Cepas et al. 2007). Prediction of orthology is a fundamental step in the functional annotation of newly sequenced genomes. Most projects use blast-based orthology detection methods although phylogeny-based approaches are considered more accurate (Gabaldon 2008). Using phylogeny-based orthology we were able to directly transfer of GO annotations to 4058 pea aphid genes that display one-to-one orthology relationships with Drosophila melanogaster genes (see Material and Methods ). To our knowledge, this is the first newly sequenced genome for which phylogeny-based orthology predictions have been used in the annotation pipeline. Another advantage of the availability of the phylome, is that we can readily obtain a picture of the gene duplications occurred specifically within the A. pisum lineage. For this, we used the above mentioned algorithm to detect all A. pisum paralogy relationships resulting from duplications in the pea aphid lineage. 2459 pea aphid gene families present lineage-specific duplications (Figure 2). Most of thes gene family expansions are small-to-moderare in size, resulting in a total of 2 to 10 in-paralogs (2239 families). The remaining 220 families have experienced massive expansions resulting in in-paralogs groups with 10-50 members (196 families) and 50 to roughly 200 members (19 families). Sequence analyses of members of the latter groups have identified reverse-transcriptase and trasposase domains, suggesting that these may represent expansions of transposable elements. However other expansions affect other classes of genes. For instance, the pea aphid posseses circa 200 lineage-specific paralogs of the Drosophila gene kelsh, coding for an acting binding protein involved in ovarian folicule cell migration and oogenesis (see, for instance, the gene tree for ACYPI51424-PA in phylomeDB). Another example is that of an lineage-specific expansion of a putative AcycoA transporter leading to 19 in-paralogs (Figure 2 B), The exact functional meaning of these and other expansions remain to be investigated but some are likely to be related with specific adaptations of aphids in terms of life cycle and diet. Further examples of gene family expansions are disussed throrough the text. (NOTE: some expert in aphids can give a hint on the functional meaning of these two examples?) Chromatine modifications. The extent and function of DNA methylation in insects still remains largely unknown. The pea aphid has the full complement of CpG methylation associated proteins. Two copies of DNA methyltransferase 1 (Dnmt1a and Dnmt1b), the maintenance methyltransferase, one copy of Dnmt2, and one Dnmt3, the de-novo methyltransferase were identified (Walsh et al., companion paper). All of the Dnmts were active in vitro and there mRNAs detected by RT-PCR. Also present were a CpG binding protein and a Dnmt1 associated protein (Dmap1). Global methylation levels are (to be determined or reference (Mandrioli and Borsatti 2007). Additionaly, the pea aphid has a full complement of the histone genes submitted to post-translational modifications such as acetylation, methylation, phosphorylation, and ADP ribosylation. Several genes appear to have undergone recent duplications, potentially enabling greater diversity and specialization among chromatin remodeling complexes. The pea aphid possesses orthologs of histone deacetylase proteins such as HDAC8, a putative HDAC10 and extra Rpd3-like proteins that may participate in gene silencing, that are absent from the Drosophila genome. Histone acetyltransferases are also abundant with two paralogs for PCAF/GCN5 and the MYST family members related to the Drosophila genes enoki mushroom (enok) and males absent on the first (mof). The pea aphid possesses an extended repertoire of more than two dozen SET-domain proteins, protein arginine methyltransferase-like proteins, and two Dot1-like proteins, predicted to be involved in histone methylation. Multiple classes of Jumonji C domain containing proteins exist that, along with two LSD1-like proteins, are likely to participate in histone demethylation. The attachment of ubiquitin or the small ubiquitin-like modifier (SUMO) to histones and other transcriptional regulators can have a dramatic effect on chromatin structure (2,3). There are at least three SUMO-related proteins in the pea aphid genome, one of which is a clear ortholog of Drosophila smt3, a protein that is highly active in the germline and also required for morphogenesis (4,5). A family of poly ADP ribosylases was also identified that may participate in chromatin replication through histone modification (6). Small non-coding regulatory RNAs. RNA mediated gene silencing is mediated by two types of small non coding RNAs: the small interfering RNAs (siRNAs) and the microRNAs (miRNAs). Both mi- and siRNAs play a crucial role in the regulation of gene expression in eukaryote. While miRNA are processed for endogenous genes encoding stem loop hairpin transcripts, siRNA arise by cleavage of either exogenous or endogenous long double stranded RNA (dsRNA) precursors. Depending on the organisms, siRNAs and miRNAs have two overlapping (like in vertebrates) or parallel (insect) pathways involving key factors such as dicer proteins, double stranded RNA binding proteins and Argonautes. We identified the pea aphid genes involved in the siRNA and miRNA machinery and evidenced an unexpected gene expansion specific of the miRNA related factors. We identified two copies of the miRNAs specific dicer-1 and argonaute-1 and four copies of pasha, a cofactor of drosha involved in the miRNA biosynthesis (Legeai et al. companion paper). Many of these expansions were also identified by PCR cloning and sequencing in other aphid species. While all these genes are monogenic in other insect species, this expansion of the microRNA machinery appears to be unique across the metazoan. Moreover, we have shown that the expression of some of these expanded miRNA related genes is linked to the reproduction mode of the pea aphid. MicroRNAs of the pea aphid have been identified by deep-sequencing and bioinformatic analyses (Jaubert –Possamai et al., companion paper). By combining these methods we identified 132 microRNAs including 65 conserved and 67 new aphid specific microRNAs. The Genome of a Phloem-Feeding Specialist Finding a Suitable Host Plant. Like other insects, aphids face the challenge of finding suitable food supplies by distinguishing hosts and non-hosts via semiochemical cues. In the case of the pea aphid, these insects are limited to plants in the family… The first step in this olfactory signal transduction involves the semiochemicals entering the antennae and binding to odorant-binding proteins (OBPs), which transport the molecules to the olfactory receptors (Ors). OBPs are a family of small water-soluble proteins that can be classified into four groups: classic OBPs (with 6 conserved cysteines), plus-C OBPs (with 8 conserved cysteines and one conserved proline), atypical OBPs (with 9 to 10 cysteines) and chemosensory proteins (CSPs with 4 conserved cysteines). We identified 11 classic OBPs, one plus-C OBP and 11 CSPs. The genes for the OBPs tend to be clustered in the genome and have more and longer introns than their counterparts in Drosophila. Orthologous sequences have also been identified in eight other aphid species showing that although there are diverse OBPs within each species there is a very high similarity between homologues in different species. This means that having identified OBPs in A. pisum the information can be readily transferred to other aphid species for studies of olfaction in for example the important pest aphid Myzus persicae. Although the exact way in which semiochemicals/OBPs interact with Ors is not established, Ors have been identified in many insect species and usually constitute a large superfamily of 7TM ligand-gated ion channels. Four A. pisum Or genes have been annotated manually, these genes being poorly represented in the concensus gene set. Seventy-nine genes in the Or family have been identified, including 49 intact and complete genes, 22 partially annotated genes and 8 putative pseudogenes. As expected, because of striking conservation of the D. melanogaster gene DmOr83b in all insects, an ortholog has been identified in A.pisum. There are three other highly divergent genes that show some homology with other insect sequences but the remaining 75 genes correspond to aphid specific expansions with no clear relatives among other insect receptor genes. These genes form two gene subfamilies of 9 and 37 intact complete genes with some clades having reasonably long branches to each protein indicating relatively old expansions of receptors, while other groups exhibit short branch lengths to most proteins and some tandem genes, which might indicate more recent duplicates. Pseudogenes are mainly found in recently expanded groups of genes. Except for the ortholog of DmOr83b, the possible functions of these receptors are unclear and further functional annotation is required. The insect chemoreceptor superfamily of 7TM ligand-gated ion channels consists not only of Ors but also the basal gustatory receptor (Gr) family and for A.pisum 78 genes were identified as Grs, with six members of the sugar receptor subfamily. This subfamily is generally well conserved throughout the holometabolic insects with relatives in Daphnia. No relatives of the carbon dioxide receptors that are highly conserved from flies to moths and beetles, but not Hymenoptera, were found. The remaining 72 genes fall into two distinct classes, none of which have clear-cut relationships to any of the Grs from holometabolic insects. Five are highly divergent "singletons" while another four form a small subfamily and possible functions of these receptors are unclear. The remaining Grs form two large subfamilies of 21 and 42 genes that by analogy with similar Gr expansions in other insects might constitute the "bitter" taste receptors of aphids involved in detection of the many secondary plant compounds they are exposed to. These large subfamilies have quite distinct phylogenetic properties. The 21 gene subfamily has reasonably long branches to each protein, and only three of them are apparent pseudogenes and two have parts missing in assembly gaps, indicating that this is a relatively old expansion of receptors, most of which are still functional. The 42 gene subfamily exhibits extremely short branch lengths to most proteins. Eight of them are clear pseudogenes, while another 14 are only partial gene models due to assembly problems with parts of these genes missing in gaps. In summary the Or and Gr subfamilies appear to be undergoing rapid and recent expansion in the A.pisum genome and it can be speculated that some of these genes might be crucially involved in perception of host plant chemicals, either volatile or non-volatile. Specifically, bitter taste receptors are good candidates for the perception of chemicals present in plant subepidermic tissues, which are known to be important for host plant acceptance in aphids. Therefore, the identification of chemoreceptor genes is a crucial step towards understanding the mechanisms of host plant specialisation and host race formation. Initiating Feeding and Overcoming Plant Defenses.. Once an aphid has located a host plant, feeding activity must commence. For aphids this is distinct from insects with chewing mouthparts since they feed on a single plant tissue, namely phloem sap located within the sieve tubes, rather than ingesting a mixture of whole cells. Bioactive compounds in the salivary secretions of sap-feeding insects are believed to overcome plant defense mechanisms, both in the overlying tissue and within the sieve elements, so that the insect can feed for prolonged periods of time. A proteomic analysis of the A. pisum salivary gland has catalogued proteins from this tissue (referred to as the saliome) and identified putative secreted salivary proteins. A dual approach of GelC MS/MS and MALDI TOF/MS on 1DE and 2DE fractionated glands, respectively, identified a total of XX proteins, 67 of which were novel (with no identity in public databases). Following SignalP analysis, YY of the identified proteins contained a secretion signal and, of these, ZZ had previously been identified as being secreted into artificial diets during feeding. [Annotations of the genes involved in A.pisum? Numbers present in genome?] When aphid feed from plants, they take up both food substances and plant secondary metabolites. The need to detoxify these potentially deleterious compounds is a problem faced by all herbivorous insects, which generally use a range of detoxification enzymes, including cytochromes P450 monooxygenase (P450s), glutathione Stransferases (GSTs), and esterases. A. pisum genome analysis has identified 82 P450 genes [and XX GSTs? and YY esterases?]. Whereas A. pisum feeds almost exclusively from the Fabacae, the aphid Myzus persicae feeds on hundreds of species in more than forty plant families. Therefore, M. persicae, is exposed to a greater diversity of plant secondary metabolites and might be predicted to have a wider array of detoxification enzymes. Consistent with a hypothesis of a larger complement of detoxification enzymes in M. persicae, analysis of available M. persicae ESTs has identified 140 putative P450s, compared with the 82 in A. pisum. Detoxification enzymes may also be involved in resistance to many insecticides. Additionally, insecticide resistance can be caused by mutations at the target-site of the chemical, most commonly ion channels. These include the voltage-gated sodium channel, glutamate receptors and nicotinic acetylcholine receptors and the A. pisum genome has been found to contain XX, YY and ZZ genes encoding these proteins respectively. Other genes have been identified encoding potassium channels, calcium channels, and chloride channels. Although the A. pisum EST coverage is not yet sufficient to confirm that the full length gene models are correct, the data has allowed to determine whether orthologs of D. melanogaster ion channel genes are present. This revealed the extent of gene duplication and loss in the ion channel genes that is so prevalent in other gene families studied in A. pisum. Another group of proteins involved in detoxification are the proteases. Proteases are a structurally and functionally diverse set of enzymes involved in a plethora of biological processes ranging from non specific degradation of ingested proteins to complex cascading pathways involving highly selective cleavage of substrates. The availability of the A. pisum genome has allowed the comprehensive characterization and analysis of the complete protease repertoire (degradome) of this organism. Using the peptidase classification system established by MEROPS taxonomic levels were assigned to the peptidases, which were then used for annotation purposes. Additionally a non-redundant taxonomical evaluation of clan AA peptidases according to most recent trends focusing on the large diversity of LTR retroelement proteases and their relationships with their host gene counterparts. The current annotation of the A. pisum degradome indicates that there are at least XXX proteases and homologues which are distributed into xx aspartic, xx cysteine, xx metallo, xx serine, xx threonine proteases. The distribution and taxonomy of A. pisum proteases was investigated through comparative and phylogenetic analyses conducted against a number of insect species (in particular Drosophila melanogaster). [GENERAL RESULTS TO FOLLOW]. The annotation of the degradome of A. pisum is still ongoing and an accurate assessment of the distribution of peptidases will only be obtained after considerable functional experimentation. [protease inhibitors?] Plant Phloem as a Food Source. Phloem sap has a high and variable sucrose concentration, in the range 0.2–1.5M (Douglas et al., 2006). Consequently, although the diet ingested by aphids supplies a high level of carbon for nutrition, routinely it also has an osmotic pressure significantly above that of insect haemolymph. Ingested sucrose is hydrolysed by a gut sucrase (Christofoletti et al., 2003; Price et al., 2007), and accessing the constituent sugars as a resource for metabolism is dependent on transport out of the gut. Sugar transport also plays an important role in osmoregulation, both by removal from the gut, and by enabling accumulation in the haemolymph, where the major sugars for A. pisum are fructose (source of metabolic intermediates; estimated concentration approx. 130mM), and the disaccharide trehalose (estimated concentration approx. 260mM) (Rhodes et al., 1997; Ashford et al., 2000). Sugar transport across cell membranes in higher organisms uses both proteins of the major facilitator superfamily (MFS), which can be uniporters or coupled transporters, and proteins of the sodium:solute symporter family (SSF) in which sugar transport is coupled to sodium ion transport. A. pisum genome contains approximately 200 predicted genes encoding proteins belonging to clan MFS (Pfam CL0015), which includes a wide range of functional roles involving transport of many different types of small molecules. Sequence analysis of the predicted gene products has allowed the identification of families and subfamilies of putative transporters of oligopeptides, nucleosides, folate, organic anions, phosphate, amines, organic cations and monocarboxylates, on the basis of sequence similarity to previously identified proteins in D. melanogaster and other insects. A nonredundant and nearly complete family of 54 genes has been identified in A. pisum which encode proteins belonging to the sugar transporter family Sugar_tr (Pfam PF00083) within clan MFS. These gene products show similarity to proteins annotated “sugar transporters” in D. melanogaster and other insects greater than similarity to proteins annotated with other functions. Of the 54 potential sugar transporter genes in A. pisum, 16 have no corresponding ESTs, leaving 38 active genes. The sugar transporter genes in A. pisum can be divided into two groups, based on an InterPro signature diagnostic for transporters of sugars and inositol (IPR003663). Preliminary functional analysis of selected A. pisum sugar transporters has been carried out by complementation analysis; growth of a yeast hexose transport-deficient mutant on minimal media containing hexoses has been restored by transformation with expression constructs containing coding sequences for the proteins. Several gene products containing the IPR003663 signature were able to transport hexoses in this assay, whereas a gene product not containing this signature was not a hexose transporter (data not presented). However, characterisation of substrate specificities for individual transporters requires direct uptake assays with labelled substrates. These assays have been carried out for most highly-expressed sugar transporter in A. pisum (Ap_ST3; see companion paper), which is a uniporter with specificity for fructose and glucose. This transporter is expressed in gut tissue, and is likely to play an important role in transport of sucrose hydrolysis products from gut lumen into the haemolymph. The number of sugar transporters in A. pisum is higher than in other insects, except Tribolium, which has a similarly extreme diet with potentially high sugar concentration due to low water content. The high number of sugar transporters in A.pisum may result from gene duplication, evidenced by a “clustering” of genes; one genomic scaffold contains 7 sugar transporter genes in a single 250kbp region of genomic DNA. Although aphid equivalents can be identified for the trehalose transporter characterised from the anhydrobiotic insect, Polypedilum vanderplanki (Kikawada et al., 2007), and for some Drosophila sugar transporters, including those containing a glucose transporter signature (IPR000803), there is a high level of sequence diversity among these genes, with evidence of species-based grouping of sequences when analysed by the Clustal method. It is suggested that aphids have evolved an increased set of MFS-type sugar transporter genes as a functional adaptation to feeding on specialised diets, with specific requirements for the transport of sugars and other small molecules. In contrast to members of clan MFS, genes encoding sodium-solute symporters of family SSF are comparatively rare in A. pisum compared to D. melanogaster (5 predicted genes vs. 14). The A. pisum genes encode proteins predicted to transport short-chain fatty acids or choline, but not sugars. Proteins capable of transport of sugars against a concentration gradient have yet to be identified in insects. Transmission of Plant Diseases. One of the consequences of the phloemfeeding of aphid is that they are able to transmit plant diseases and indeed aphids are responsible for the transmission of over 55% of plant viruses. Two principal phytovirus transmission mechanisms have been described so far: the circulative transmission involving the transport by transcytosis of virions through two different barriers in the insect vector (the gut and the salivary gland) and the non-circulative transmission in which virus particles are retained on the cuticle lining of the stylet and which do not involved internalization of virions in insect cells. In the circulative mode of transmission, virions enter the cell following receptor mediated-endocytosis, and are transported across the cell in vesicles of different nature before being released at the other side of the cell by exocytosis. Transcytosis is a general mechanism used for the transport of macromolecules across cells. The correct uptake, transport and delivery of the vesicles cargo relies on the participation of several families of proteins. Sequence comparisons with annotated genomes such as that of D. melanogaster, T. castaneum and humans has allowed the identification of A. pisum protein families involved in transcytosis and potentially involved in the circulative transmission of virus: clathrins, and dynamins (involved in vesicle formation), SNAREs and Rab GTPases (involved in fusion of membrane), sec proteins (involved in protein translocation), synaptotagmins, cytoskeleton proteins as well as receptor proteins (scavenger receptors, receptor tyrosine kinase,…). Genomes of a Symbiotic Association Host-symbiont genomes. Like many insects, aphids are hosts to various symbiotic microorganisms. Aphids harbor the obligate mutualistic primary symbiont, Buchnera aphidicola (Gammaproteobacteria), within the cytoplasm of specialized cells called bacteriocytes. Buchnera synthesize essential amino acids (i.e. the amino acids that animals cannot synthesize de novo but that are required in proteins) and is required for aphids’ survival. It is widely accepted that aphids can utilize the diet of plant phloem sap, which is deficient in essential amino acids, only because their Buchnera symbionts are a supplementary source of these nutrients. Additionally, many aphids also harbor facultative secondary symbiotic bacteria (Moran et al. 2005) that have been shown to influence several aspects of aphid ecology, including host plant specialization and heat tolerance (Chen et al., 2000; Montllor et al., 2002; Russell, Moran, 2006; Tsuchida et al., 2004). These symbionts also protect their hosts from fungal pathogens and parasitoid wasps (Oliver et al., 2005; Oliver et al., 2003, Scarborough et al., 2005). Such intimate, evolutionarily stable associations influence host and bacterial evolution and likely shape their genomes as well. Lateral Gene transfer from symbionts to the host. The A. pisum genome provides the first opportunity to examine the complete genome of an animal that is host to an obligate mutualistic intracellular bacteria (primary symbiont). Aphids have been dependent on symbionts for millions of years. Since the initial infection of an aphid ancestor more than 100 Myr ago (Moran et al., 1993), Buchnera have been subjected to strict vertical transmission through host generations, and the mutualism between Buchnera and their host has evolved to the point that neither can reproduce in the absence of the other. During the course of coevolution with the host, Buchnera has lost a number of genes that appear to be essential for bacterial existence. The genome of Buchnera from A. pisum encodes about 620 genes, which is only one seventh the number of genes in the genome of related bacteria such as Escherichia coli (Shigenobu et al., 2000). This raises the question of whether any lost genes have been transferred from the genome of ancestral Buchnera to the genome of aphids. Such lateral gene transfer (LGT) would parallel that known to have occurred from bacterial endosymbionts to the host nuclei during the evolution of mitochondria and plastids in eukaryotic hosts (Dyall et al., 2004). Secondary symbionts, bacterial commensals, and bacterial pathogens could also serve as sources of transferred DNA. Indeed, there are some reports of LGT between a facultative endosymbiont Wolbachia (secondary symbiont) and its host arthropods and nematodes (Kondo et al., 2002; Fenn et al., 2006; Hotopp et al. 2007; Nikoh et al., 2008). However, none of these laterally transferred genes reported thus far appear to be functional. Screening the genome of A. pisum for bacterial sequences, followed by phylogenetic analyses, identified several genes that seem to have been transferred from bacterial genomes to the genome of an ancestor of A. pisum (Nikoh et al., 2009, companion paper). The candidate genes include those for LD-carboxypeptidase (ldcA), N-acetylmuramoyl-L-alanine amidase (ybjR), rare lipoprotein A (rlpA), DNA polymerase III alpha chain (dnaE), and uridylyltransferase (glnD). Buchnera lacks all of these genes other than dnaE. Transcripts of ldcA and rlpA were originally detected in the transcriptome analysis of the bacteriocyte (Nakabachi et al., 2005). While phylogenetic analyses suggested that ldcA and ybjR were transferred from Rickettsiales (Alphaproteobacteria), not Buchnera, dnaE in the aphid genome was significantly similar to that of extant Buchnera, and glnD appeared to be of gammaproteobacterial origin (Buchnera and many facultative symbionts and pathogens are Gammaproteobacteria). Coding regions of ldcA, ybjR, and rlpA appear to be intact, and these genes were shown to be expressed strongly in the bacteriocyte (Fig. 1), implying that they are not only functional, but also that they may play important roles in symbiosis with Buchnera (Nikoh and Nakabachi, 2009; Nikoh et al., 2009, companion paper). Only small parts of dnaE and glnD are represented in the aphid genome, implying that they are not functional. Thus, although it seems that aphids acquired some functional genes via LGT from secondary symbionts, the aphid genome appears not to contain a significant portion of the hundreds of genes that Buchnera lost as it evolved a highly reduced genome. Metabolism and symbiosis. The metabolic capacity of the pea aphid was examined in the context of its unusual diet of plant phloem sap, which is rich in sugars and deficient in essential amino acids, and its obligate symbiosis with Buchnera. The genetic capacities of the insect and Buchnera for amino acid metabolism are broadly complementary, largely as a result of gene loss from Buchnera (Shigenobu et al. 2000). This complementarity results in several apparent instances of metabolic pathways shared between the aphid and Buchnera (see companion paper, Wilson et al.).The pea aphid also appears to lack some nitrogen metabolism genes that are present in other sequenced insects genomes. Of particular note is the amino acid tyrosine. Insects generally have the gene phenylalanine hydroxylase, which mediates the synthesis of tyrosine from the essential amino acid phenylalanine, and the genes for tyrosine degradation to fumarate and acetoacetic acid. The pea aphid has the gene for phenylalanine hydroxylase; and the high abundance of this gene transcript in bacteriocytes (Nakabachi et al., 2005) suggests that phenylalanine synthesized by Buchnera is an important source of the aphid tyrosine requirement. Unlike other insects, however, the pea aphid lacks tyrosine transaminase and other tyrosine catabolism genes (Wilson et al. companion paper). Perhaps tyrosine degradation is redundant in the pea aphid because Buchnera, which can neither synthesize nor degrade tyrosine but requires it for protein synthesis, is predicted to be a major sink for this amino acid (Shingenou et al., 2000). The genes for the urea cycle and two core genes of the purine salvage pathway, adenosine deaminase and purine nucleoside phosphorylase, are also apparently absent (see companion paper, Ramsey et al.), with the implication that the aphid is unlikely to be able to produce urea and uric acid, respectively. Furthermore, the pea aphid is expected to be entirely dependent on the diet and Buchnera for its supply of the amino acid arginine, unlike the many animals that derive part of their arginine requirement from the urea cycle. The presence in the aphid genome of a gene for glutamine synthetase 2 (ACYPI006239; EC 6.3.1.2; Glutamate + ATP + ammonia -> Glutamine + ADP + phosphate), which was found to be highly expressed in the bacteriocyte cells that house Buchnera (Nakabachi et al. 2005; Nakabachi et al., 2009, companion paper), raises the possibility that bacteriocytes actively synthesize glutamine, which is then utilized by Buchnera as an amino donor in several metabolic pathways, including arginine biosynthesis. These genomic data are fully consistent with the evidence that the nitrogen excretory products of pea aphids include no detectable uric acid or urea (Sasaki et al. 1990); and that the growth of pea aphids experimentally deprived of Buchnera is significantly depressed on arginine-free diets (Gündüz et al. 2008). Nitrogen excretion in most terrestrial insects is dominated by uric acid voided via the Malpighian tubules. Aphids are most unusual in that ammonia is their sole known nitrogen excretory compound and Malpighian tubules are absent. These evolutionary changes and the loss of genes in the urea cycle and purine salvage pathway of aphids can be correlated with the high water content of the phloem sap diet and efficient water cycling in the aphid gut (Shakesby et al. 2008). Immunity and symbiosis. Studies of diverse hosts suggest that host immune function plays a role in the establishment and maintenance of symbiotic associations (Heddi et al. 2006). In turn, the evolutionary maintenance of a host immune response to microbes may be influenced by the ability of symbionts to protect their hosts from pathogens and parasites. Therefore, host immunity may both shape and be shaped by symbiotic associations. Aphids lack many immune-related genes common to insects and other invertebrates (companion paper, Gerardo et al.). First, while orthologs to key components of the immune-related toll, jak/stat and jnk signaling pathways were identified, many of the genes comprising the IMD pathway (IMD, BG4, Dredd, Relish) appear to be missing in the pea aphid. This pathway is intact in genomes of other sequenced insects (and several of the genes are also found in the crustacean, Dapnia pulex). Second, several of the main insect immune pathways are triggered by recognition of pathogens via peptioglycan recognition proteins (PGRPs), which are also absent in the pea aphid but are present in many insects and other arthropods (e.g., flies, mosquitoes, bees, lice, ticks; but notably, not Daphnia pulex). Finally, in eukaryotes, recognition and signaling ultimately leads to the production of diverse antimicrobial peptides (AMPs), some of which are genus-specific (e.g., drosomycin in Drosophila) and some of which are commonly shared across diverse organisms (e.g., defensins in plants and animals). Manual annotation revealed few identifiable AMP genes in the pea aphid, and RNA and protein isolation methods (i.e, suppression subtractive hybridization, sequencing of ESTs from infected individuals and HPLC analyses) successfully used to identify AMPs in other immune-challenged insects, did not recover any AMPs from immune-challenged aphids (Altincicek et al, 2008, Gerardo et al, companion paper). In fact, during these immune-challenges, aphids upregulated few genes of known immunefunction and few novel genes that could be associated with an alternative immuneresponse. Although further functional assays are important to test for additional immune responses in aphids, missing immune-genes coupled with the weak response to standard immune challenges suggest that aphids might have a reduced immune response compared to the insects sequenced thus far. If so, immune processes may not play a critical role in aphid interactions with their symbiotic bacteria. There are several possible explanations for why aphids may have a reduced immune system. First, aphids feed on mostly microbe-free phloem sap, reducing the risk of ingestion of pathogens while feeding. Second, during much of their lifecycle, pea aphids rapidly reproduce clones of themselves, making it possible that they may have higher fitness if they invest in reproduction rather than in a presumably costly immune defense. Finally, reduced immune function could be a consequence of symbiosis itself. Because aphids frequently harbor protection-conferring symbionts, symbioses may relax selection for maintenance of the hosts’ own immune systems or may select for loss of immune-functions that could prevent that establishment of beneficial symbionts. Future work on other aphid species and other insect groups will be important to establish when the distinctive features of the aphid immune gene repertoire evolved, and how these may relate to diet, symbiosis and other aspects of the insect life style. Genome of the primary symbiont Buchnera aphidicola. Though the sequencing project was designed to target the genome of A. pisum, the project also generated sequences of the primary and secondary symbiotic bacteria. We obtained 24,947 sequence reads corresponding to the Buchnera genome. Using such "contaminants", we were able to reconstruct the 642,011-base-pair complete genome of Buchnera with 20x coverage. Compared with the originally sequenced strain (from Japan, Shigenobu 2000), the new strain (from North America) shows approximately 1500 mismatches (0.23%) and two larger inserts (1.2 kbp and 150 bp). The newly sequenced strain is almost 100% identical to a cluster of five recently sequenced Buchnera strains from A. pisum collected in North America (Moran et al. 2009). Compared to the closest strain, it shows only 10 nucleotide substitutions, 5 single base indels in homopolymeric runs of 5-29 bases, and 4 larger indels including a 282 bp deletion and a 157 bp insertion, both in intergenic spacers. The close correspondence between the newly sequenced strain, assembled from traditional small insert clones and Sanger sequencing, and five previously sequenced North American strains, sequenced using Solexa/Illumina methods without traditional cloning, verifies the general accuracy of both approaches. Genome of the secondary aphid symbiont Candidatus Regiella insecticola. Most of the genome of Regiella also was sequenced in connection with the pea aphid genome project. Regiella infects a range of aphid species, including pea aphid, in which it is sporadically distributed among individuals and populations. It is often intracellular, like Buchnera (Fig. 2), but also lives extracellularly in the hemolymph. In contrast to Buchnera, secondary symbionts such as Regiella are at low titer in host tissues, resulting in low representation of symbiont DNA in samples. Although the aphid sequencing project was carried out on a strain cured of Regiella infection, most of the Regiella sequence was obtained by constructing a large insert (BAC) clone library from DNA isolated from infected hosts and sequencing clones categorized as symbiontderived on the basis of end sequences. This produced a sequence in two scaffolds estimated to represent at least 98% of a single circular chromosome representing the entire Regiella genome. Contrasting the gene inventories of Buchnera and Regiella illustrates the very different lifestyles of these two bacterial symbionts (Table XX). Buchnera, as previously known, possesses a highly reduced genome largely comprised of essential genes and genes for host nutrition, and completely lacks mobile elements, phage or genes for toxin production, all of which are present in Regiella. Regiella possesses a larger genome, intermediate in size between that of free-living Gammaproteobacteria and that of Buchnera. It also contains many genes involved in transport and invasion, and it has a lower overall coding density, reflecting the recent degradation of genes from which DNA persists in the genome. … Development in a Polymorphic Hemimetabolous Insect Overview of Development. Aphids display a wide range of adult phenotypes and possess two divergent modes of embryonic development: parthenogenetic and sexual embryogenesis (Miura et al. 2003). Remarkably, all phenotypes and both modes of embryonic development are coded by a single genome. This ability of a single genotype to produce different phenotypes in response to environmental cues is an example of phenotypic plasticity, which in cases like aphids, where plasticity results in the production of discrete forms without intermediates, is known as polyphenism (Nijhout 2003). Polyphenism requires concomitance between an input signal (the environmental cue) and a critical period of development where the developing organism (usually an embryo) is able to respond to this input. When parthenogenetic, a female adult aphid contains embryos at all stages of differentiation and development, ensuring that at least one embryo will be responsive to an environmental trigger. Aphid polyphenism includes soldiers in gall-forming aphids (Ijichi et al. 2005), a switch from the the production of apterous to winged individuals in response to crowding or predation (Braendle et al. 2006), and, extraordinarily, a switch of reproductive strategy from apomictic parthenogenesis to sexual reproduction in response to seasonal change (Lees 1959, reviewed in Le Trionnaire et al. 2008). Signaling pathways and transcription factors. Genes of the highly conserved TGF-β, Wnt, EGF and JAK/STAT signaling pathways have undergone several aphidspecific duplications and losses. Multiple paralogs were found for Dpp (4 paralogs, TGFβligand), Medea (5, TGF-β co-Smad), Mad (2, TGF-β R-Smad), Domeless (4, JAK/STAT receptor), STAT (2, JAK/STAT transcription factor), Argos (4, negative regulator of EGF signaling) and Armadillo (2, β-catenin in Wnt signaling). Aphid-specific gene losses were found for several TGF-βligands (BMP10, Maverick and Alp23), Wnt ligands (Wnt6, Wnt10) and Sprouty (RTK signaling inhibitor). Most of the transcription factor families are similar in size and composition to those of other insects. However, aphid has significantly more zinc finger containing proteins. Although the number of bHLH containing genes looks similar to other insects, direct orthologs of the achaete-scute genes cannot be found in the aphid genome. All HOX complex genes are present, but Hox3 (zen) and ftz, which have evolved non-homeotic functions in insects, are highly diverged from orthologs of other species. Circadian rhythm. In Drosophila, two interdependent transcriptional negative feedback loops centered on the genes Clock and Period are essential for the circadian clock (Cyran et al., 2003). The clock feedback loop is highly conserved in the pea aphid with well-conserved orthologs of the genes Clock, Vrille and PDP1 all present in the genome. In contrast the period feedback loop is not well conserved; the gene Period in A. pisum does not contain the motifs necessary for nuclear import and the whole protein seems to be evolving at accelerated rates (circadian rhythm companion paper). Other participants of the circadian clock, the cryptochromes Cry1 and Cry2 (Yuan et al. 2007), are present in the pea aphid genome; the latter is duplicated. The circadian clock repertoire includes a collection of additional genes; of these, orthologs of Drosophila kinases Double-Time, Shaggy and Casein Kinase 2, as well as Protein Phosphatase 2a and the protein degradation protein Supernumerary Limbs are relatively well conserved in A. pisum (Table X, Comparison of circadian rhythm genes across insects). Neuropeptides. Neuropeptides are cell-to-cell signaling molecules that act as hormones, neurotransmitters or neuromodulators and possibly are involved in transmitting the environmental input to the target tissues in polyphenism (Hardie 1987). In A. pisum there are about 30 neuropeptide precursor genes that can generate more than 88 neuropeptides. The PDF and corazonin precursor genes were not found. This is surprising, as PDF is known to be involved in circadian rhythm and corazonin has already been found in hemipterans (Rhodnius) and is known to regulate migratory phase transition in Locusta and Schistocerca. Mitosis, meiosis and cell cycle. Most of the genes involved in mitosis, meiosis and the cell cycle are present in the pea aphid genome. Remarkably, the complement of meiotic genes is more similar to that of vertebrates than to that of other insects because it retains meiotic genes (Hop2, Mnd1, Msh4, Msh5) lost in many other insects and the Ecdysozoa. The complement of genes known to regulate transition from G1 (growth) to S (DNA replication) phases in other organisms is similar in pea aphid to other insects and the metazoans; D-type and E-type cyclins, E2F transcription factors and Retinoblastoma protein are all found in the pea aphid genome. Interestingly, compared to Drosophila, the pea aphid genome contains duplications of several mitotic kinases, such as Cdk1, Polo and Aurora. Expression studies implicate these mitotic kinase paralogs in regulation of aphid reproductive polyphenism (Srinivasan et al companion paper). Aphid has several duplicated mitosis-related genes that are single in other insects, but are duplicated in Daphnia, including Smc6 (Structural maintenance of chromosomes 6; Uniprot:Q96SB8; ARP1_G1616) and Topo2 (DNA Topoisomerase 2; Uniprot:Q92547; ARP1_G1509). These are involved in DNA double-strand breaks and homologous recombination (Harvey et al 2002; Hwang et al 2008)While neither loss nor expansion of cell cycle or meiosis genes are sufficient to account for aphid reproductive plasticity, the expansion of key mitotic regulatory kinases raises interesting questions about the role of these kinase paralogs in aphid mitotic and meiotic reproductive plasticity. Embryogenesis. The majority of the genes involved in axis formation, segmentation, neurogenesis, eye development and germ-line specification in the embryo are well conserved. Genes playing critical roles in Drosophila embryogenesis, but not found in non-dipteran insects, are also missing from aphids, such as oskar (germ-line specification), bicoid (anterior development) and gurken (dorso-ventral patterning). Despite the absence of these orthologs, the downstream components of the signaling pathways are well conserved. Lineage-specific gene losses were found for the gap genes, giant and huckbein. Although a single homolog of Drosophila anterior gap gene otd has been identified, it is not expressed in the anterior (Huang et al. companion paper). Some orthologous genes for establishing the body plan have undergone aphidspecific gene duplications. For example, spatzle and Dorsal, the key components of dorso-ventral patterning have been duplicated. There are two paralogs of Torso-like, the most conserved molecule in the terminal patterning pathway. More striking examples of duplications are seen in the key genes of germ-line development: the aphid genome has 3 paralogs of vasa and 4 paralogs of nanos, of which only 1 and 2, respectively, are expressed in the germline (Chang et al. 2007, 2008, and unpublished data). Juvenile hormones. The main enzymes responsible for the synthesis and degradation of juvenile hormones (JH) are present in the pea aphid genome (Table X JH related genes in pea aphid and their methylation status, JH companion paper). However, the pea aphid apparently lacks other JH associated proteins such as hexamerins, which constitute a class of JH binding proteins implicated in social insect caste regulation (Zhou et al. 2007). JH has been postulated to regulate polyphenism in insects, including aphids (Corbitt & Hardie 1985, Nijhout 2003). The coding sequence of JH binding protein is differentially methylated between apterous and alate morphs of the pea aphid (Walsh et al., companion paper). However, no correlation has been demonstrated between JHIII titre and the proportion of winged offspring from aphids induced during their parthenogenetic phase (Schwartzberg et al. 2008). Hemimetabolous development. In holometabolous insects such as Drosophila, wing development occurs primarily during the last larval instar and pupal stages, either directly from the ectoderm or from sequestered imaginal discs. In contrast, in hemimetabolous insects such as aphids, wing development progresses gradually across all nymphal instars. Despite this marked developmental difference between holometabolous and hemimetabolous insects, all major components of Drosophila wing development are conserved in the pea aphid. However, in contrast to holometabola, the pea aphid has fewer genes encoding for chitinase, an enzyme with chitinolytic activities that degrades old cuticles and the peritrophic membrane. This difference possibly reflects the fact that hemimetabolous insects do not require dramatic exoskeletal reconstruction. Sex determination and dosage compensation. Sex determination in aphids is chromosomal, where females have two X chromosomes and males only one (Wilson et al 1997) while both sexes share the same autosomal complement. Insect sex determination pathways appear diverse and only moderately conserved through evolutionary time in insects. The pea aphid has homologs to the terminal two genes, transformer 2 and doublesex, of the D. melanogaster somatic sex determination pathway. However, the pea aphid gene model for doublesex is currently incomplete because it is missing the dimerization domain. Several genes associated with Drosophila sexual differentiation and courtship behavior, are also conserved in the pea aphid genome (Huang et al companion paper). Dosage compensation in Drosophila involves hypertranscription of the single male X chromosome (Arnold et al 2008). While some Drosophila dosage compensation genes are present in the pea aphid genome (maleless, males-absent-on-the-first) , others, such as the male-specific-lethal genes (msl-1, msl-2) and RNA genes (roX1, roX2), are absent. Since some dosage compensation genes are also found in the honeybee (mle, mof and msl-3), an insect that lacks X-specific dosage compensation (Honeybee Consortium, 2006), it remains an open question as to whether the genes present in aphids function in dosage compensation or are rather more broadly associated with chromatin structure and transcriptional regulation. Discussion Other sections being written currently? Polyphenism. Living individuals adapt their physiology to changing environment during their life-span. Aphids, through their high capacity for phenotypic plasticity, adapt not only their physiology, but also their embryogenesis program. Adults exposed to environmental changes are thus able to produce in their progeny morphs genetically identical but best adapted to the new environment. This requires sensing and transducing of environmental signals, leading to regulation of genetic developmental programs. One of the most prominent changes in the environment sensed by aphids is reduced day-length in autumn, responsible for the switch from viviparous parthenogenesis to oviparous sexual reproduction. It is still under debate whether sensing and transducing this seasonal photoperiodism is linked to the circadian clock as a mechanism to measure day-length (Van Nunez and Hardie 2001). Pea aphid genome analysis showed that the period feedback loop is not well conserved. The high rate of evolution of the period pea aphid protein is intriguing and suggests an atypical circadian mechanism. The absence of PDF precursor gene in the pea aphid genome strengthens this observation: PDF is a well known regulator of the circadian rhythm that operates with the brain master gene clock. In D. melanogaster, PDF is secreted in a circadian manner and perturbation of its expression causes arhythmicity (Helfricht-Förster 2005). Early experiments in the 80’s strongly suggested a role for JHs in transducing the photoperiodic signals responsible for the switch of reproduction mode from brain to the ovaries (Corbitt and Hardie 1985). JHs also regulate the development of winged or apterous morphs (Braendle et al. 2006). The synthesis, complex formation and degradation of JHs – very potent morphogenic molecules - are highly regulated, and this entire repertoire is present in the pea aphid genome. In particular, the JH binding protein hexamerin which has been shown to play a key role in phenotypic plasticity in termites and honeybee (ref) was not found in the pea aphid genome. Interestingly, some of JH binding proteins have been found to be differentially methylated in apterous and winged morphs. This suggests that epigenetic regulation might be involved in aphid phenotypic plasticity (Wang et al. 2006). The expansion of part of the small non-coding RNA machinery in the pea aphid could play a role in epigenetic regulation (Brennecke et al. 2008). In particular, the duplication of Dicer 1 and a preferential expression of one of its paralogs in sexual morphs suggest a role for miRNAs in the regulation of reproduction mode and therefore a role in phenotypic plasticity. The final effect of JHs during seasonal photoperiodism is to determine the fate of the germaria from developing embryos to undergo modified mitosis to enter parthenogenetic embryogenesis, or to enter meiosis to produce sexual female gametes. Parthenogenetic aphids derive from a modified mitosis of oocytes stem cells for where a single division free of recombination produces two diploid cells: a polar body that degenerates and a diploid oocyte that immediately undergoes synchronous mitotic divisions and embryogenesis. The regulation of this process is not understood, and needs to be studied in more detail. Differences in the cell cycle genes of the pea aphid compared to Drosophila have been identified, and expression studies indicate preferential regulation of mitotic kinase paralogs in different reproductive morphs. Although the post-fertilization embryology of oviparous development is typical for true bugs, the embryology of viviparous development differs in several profound ways [Miura 2003]. Viviparous embryos take 10-15 days to develop, while oviparous embryos can take more than 100. Viviparous eggs are yolk-free and as a result are less than 1/10th the length of oviparous eggs. The endosymbiotic bacteria are transferred into the embryo just after cellularization in the parthenogenetic embryo, whereas bacteria are packaged into sexual eggs before fertilization. The central question on the polyphenisms is how the single genome gives rise to such divergent developmental consequences. We identified developmental genes in the pea aphid genome comprehensively. It turned out that the majority of genes involved in signaling pathways, establishment of the body plan and organogenesis are well conserved, but peculiar lineage-specific gene expansions and losses were found. It is unclear how much the gene duplications and losses have impacted the developmental pathway of the pea aphid. It is also likely that the same genes are used in different manners between the two modes. Future study should focus on comparative analyses as to how the developmental genes we identified here are used in such divergent embryological context. The sex determination and dosage compensation mechanisms in aphids are not known. One could hypothesise that the developmental programs of XX females and X0 males in aphids follow the same general mechanisms of those in Drosophila, where the ratio between the numbers of X and autosome chromosomes dictates a cascade of alternative splicing regulations leading to the expression of the genetic programs specific for females or males. Key regulators of sex determination in Drosophila, such as Sex- lethal, have not been found in the pea aphid but these are, in general, poorly-conserved between species. Surprisingly, the double-sex genes - transcriptional factors responsible for the expression of female or male specific genetic programs - are slightly different from other insects that might suggest a different regulation process of sexual phenotypes in Hemiptera. Materials and Methods Sequencing strain. Aphids for DNA isolation were from a clone, LSR1.AC.F1, resulting from a single generation of inbreeding of clone LSR1. Aphids were treated with ampicillin to reduce their only facultative symbiont, Regiella insecticola. Prior to DNA preparation aphids were heat treated to reduce the number of primary symbionts, Buchnera aphidicola. Entire aphid colonies on broad bean plants were placed in a 30¬∞C incubator for 4 days. Quantification of levels of Buchnera DNA revealed a significant decrease in the level of Buchnera. Approximately 2% of the sequencing reads came from the Buchnera genome and were removed prior to assembly. WHERE CAN THE STRAIN BE OBTAINED FROM? Sequencing and Assembly, Acyr 1.0. Sanger reads (3.13 million), produced on 3730 sequencing (Applied Biosystems, Foster city CA) machines, were assembled using the Atlas assembly pipeline, representing about 464Mb of sequence and about 6.2X coverage of the (clonable) A. pisum genome. Two WGS libraries, with inserts of 2-3 kb and 4-5 kb and a BAC library with insert size ~130kb were used to produce the data. The assembly contained 72,844 contigs, with an N50 length of 10.8 kb, and a total length of 446.6 Mb. Based on paired end data, these contigs were ordered and oriented into 22,801 scaffolds, with an N50 length of 86.9 kb and a total length of 464.3 Mb when gaps between contigs within scaffolds are included. The LSR1 pea aphid genome sequence is available from the NCBI with project accession ABLF01000000 Automated Gene Model Prediction. The NCBI gene prediction pipeline uses a combination of homology searching with ab initio modeling. cDNAs and ESTs were aligned to the genomic sequences using Splign[1]. Proteins were aligned to the genomic sequences using ProSplign[2]. The best scoring CDS was identified for all cDNA alignments using the same scoring system used by Gnomon[3], the NCBI ab initio prediction tool. All cDNAs with a CDS scoring above a certain threshold were marked as coding cDNAs, and all others were marked as UTRs. CDSes that lack a translation initiation or termination signal were categorized as incomplete. Protein alignments were scored the same way, and CDSes that did not satisfy the threshold criterion for a valid CDS were removed. After determining the UTR/CDS nature of each alignment, the alignments were assembled using a modification of the Maximal Transcript Alignment algorithm[4], taking into account not only exon-intron structure compatibility but also the compatibility of the reading frames. Two coding alignments were connected only if they both had open and compatible CDSs. UTRs were connected to coding alignments only if the necessary translation initiation or termination signals were present. There were no restrictions on the connection of UTRs other than the exon-intron structure compatibility. All assembled models with a complete CDS, including the translation initiation and termination signals, were combined into alternatively spliced isoform groups. Incomplete or partially supported models were directed to Gnomon[3] for extension by ab inito prediction. Models containing a debilitating mutation such as a frameshift or nonsense mutation were categorized as either transcribed or non-transcribed pseudogenes. A subset of pseudogenes are likely to be functional genes that have errors in the Acyr_1.0 assembly, and may be re-classified as protein-coding genes with subsequent improvements to the assembly and annotation. Gnomon[3] was also used to predict pure ab initio models in regions of the genome that lacked any cDNA, EST or protein alignments. AphidBase. The Acyrthosiphon pisum assembled have been broadly scanned to highlight transcription evidences. ESTs, ESTs contigs and full-length cDNAs have been mapped using SIM-4 whereas homologs in other insect genomes or Uniprot have been identified by high-throughput blasting. Also, various gene prediction software (Augustus, RefSeq, Genscan, Maker, Snap, GeneID, Gnomon, Fgenesh) have been run and a reference set containing 34821 putative genes have been established based on the RefSeq predictions when present or a combination of the other predictions using Glean. Parallely, the 27 annotation groups used Apollo for curating about a thousand of these gene models. All these features have been loaded in a GMOD-Chado database (ref) accessible at the AphidBase web portal. In addition of a wiki, a blast search and a full text search engine, AphidBase is using various open source software tools from the Generic Model Organism Database (GMOD) in particular the graphical genome browser GBrowse (ref) and the manual curation software Apollo (ref). Symbiont sequences. During the course of whole genome sequencing of the LSR1 clone of A. pisum, 24,947 sequence reads corresponding to the Buchnera genome were obtained as byproducts. Using such “contaminants”, the whole genome of Buchnera was reconstructed in two distinct methods; de novo assembling using CAP3 software and comparative (reference mapping type) assembling using Amos. Results of both methods were essentially the same, but the latter produced longer and fewer contigs. Transposable Elements detection. Using methods for de novo TE identification, required to overcome the challenge of detecting nested and fragmented TEs. The “REPET” (http://urgi.versailles.inra.fr/development/repet/ ) pipeline that we have developed was used and improved to analyze the pea aphid genome. TE consensus were predicted “ab initio” by first searching repeat with BLASTER for an allby-all genome comparison and then grouping results using three clustering methods (GROUPER, RECON, PILER) with default parameters. We then built one consensus per group with the MAFFT multiple sequence alignment program and classified each consensus according to BLASTER matches using TBLASTX and BLASTX with the entire Repbase Update (for coding TE features) as reference data bank, and according to the presence of structural features such as terminal repeats (TIR, LTR, SSR tails). For example, a consensus is defined as MITE if (i) it carries TIRs; (ii) it doesn't match via tBlastx or Blastx [6] with known TEs; (iii) its length without its TIRs is lower than 500bp. The set of consensus was then analyzed by an all-by-all BLASTER procedure to remove redundancies, ie when a consensus sequence is included into another at a 95% identity threshold and 98% length threshold. From that step we got TE consensus sequences representing ancestral copies of TEs subfamilies. Then they were clustered into groups to identify TE families by using the GROUPER clustering method. Each family was then identified assuming that the most populated well characterized TE category in a group of consensus sequence can define the order of the group it belongs to. Eighty five families containing at least 5 TE consensus sequences were then manually expertized using multiple sequences alignments, phylogenies and Hidden Markov Models. This close examination allows us to confirm grouping and decipher specific features like chimeric TE families or subfamilies. The pea aphid genome was then annotated with all the subfamilies TE consensus sequences (output from the de novo step) using the “REPET” pipeline annotation step. This pipeline is composed of the TE detection softwares BLASTER, RepeatMasker [7] and Censor, and the satellite detection softwares RepeatMasker, TRF [9] and Mreps [10]. To save computer time and reduce software memory requirements, we segmented the genomic sequences into chunks of 200 kb overlapping by 10 kb. Each chunk is then independently analyzed by the different programs. Simple repeats have been used to filter out spurious hits. TE or repeat copies less than 20 bp after removing simple repeat regions were discarded. To take into account the fact that TE often insert the ones into the others and therefore fragments belonging to the same copies are separated, a specific “long join” annotation procedure has been performed, using age estimates of repeat fragments. Indeed the identity percentage between a fragment and its reference TE/repeat consensus can be used to estimate the age of this fragment. Consecutive fragments on both the genome and the same reference repeat consensus are automatically joined if their identity percentage difference is less than 2% (the two fragments have approximately the same age) and (i) if they are separated by a gap of less than 5000 bp and/or by a mismatch region of less than 500 nucleotides, or (ii) if there are nested repeats: the fragments are separated by a sequence of which more than 95% consisted of other younger repeat insertions, all inserts having a higher identity compared to their respective consensus. Fragments separated by more than 100kb are not joined. At the end, nested repeats are split if inner repeat fragments are longer than outer joined fragments. Finally, the Acyrtosiphon pisum genome gives the opportunity to compare the evolutionary dynamics and diversity in mobile genetic elements between this organism and other biological species. To assist further research on this topic, we explored new proteomic-based protocols for implementing the approach we addtionally screened the Aphid genome using 28 full-length-single-frame proteome sequences, which concatenate the Gypsy database [12,13] collection of majority-rule consensus (MRC) sequences. This collection describes the different protein products coded by the gag-pol (and env) internal region of different LTR retroelement lineages. In collabotration with the Virus Transmission and Transcytosis Annotation Team (LTR retroelemens include not only retrotransposons but also infectious retroviruses), we characterized a number of protein sequences belonging to the main described lineages in the pipeline, in particular those of the Ty3/gypsy, Bel and Ty1/Copia groups. The curate material has been organized within a Refseq database of prototypic LTR retroelement protein products namely Retroproteome. For simplicity´s sake we will be prepare a supplementary manuscript describing this tool, BLAST web-servers availability and utility examples. Phylome reconstruction. We reconstructed the complete collection of phylogenetic trees, also known as “phylome”, for all A. pisum protein-coding genes. For this we used a similar automated pipeline to that described earlier for the human genome (HuertaCepas et al. 2007). A database was created containing A. pisum proteome and that of other 16 species. These include 12 other insects: Tribolium castaneum, Nasonia vitripennis, Apis mellifera (from NCBI database), Drosophila pseudoobscura, Drosophila melanogaster, Drosophila mojavensis, Drosophila yakuba (from FlyBase), Pediculus humanus, Culex pipiens (from VectorBase), Anopheles gambiae, Aedes aegypti (from Ensembl) and Bombys mori (from SILKDB), and four out-groups: the crustacean Daphnia pulex (the GNOMON predicted set provided by the JGI instintute), the nematode Caenorhabditis elegans and the two chordates Ciona intestinalis and Homo sapiens (from Ensembl). Then, for each protein encoded in A. pisum genome, a SmithWaterman (Smith and Waterman, 1981) search (e-val 10-3) was performed against the above mentioned proteomes. Sequences that aligned with a continuous region longer than 33% of the query sequence were selected and aligned using MUSCLE 3.6 (Edgar, 2004) with default parameters. Gappy positions in the alignments were removed using trimAl v1.0 (http://bioinfo.cipf.es/trimal), using a gap threshold of 10% and a conservation thresohld of 50%. Phylogenetic trees were derived using Neighbor Joining (NJ) trees using scoredist distances as implemented in BioNJ (Gascuel, 1997) and Maximum Likelihood (ML) as implemented in PhyML v2.4.4 (Guindon ad Gascuel, 2003) with aLRT , using JTT as an evolutionary model and assuming a discrete gammadistribution model with four rate categories and invariant sites, where the gamma shape parameter and the fraction of invariant sites were estimated from the data. Support for the different partitions was computed by approximate likelihood ratio test as implemented in PhymL (aLRT) (M. Anisimova and O. Gascuel, 2006). All trees and alignments have been deposited in PhylomeDB (Huerta-Cepas et al. 2008) (http://phylomedb.org). Phylogeny-based orthology determination. Orthology and paralogy relationships among A. pisum genes and those encoded in the other genomes included in the analysis were inferred by a phylogenetic approach that uses a previously-described species-overlap algorithm (Huerta-Cepas et al. 2007). Basically this algorithm uses the level of species overlap between the two daughter partitions of a given node to define it as a duplication (if there is species overlap) or speciation (if there is no overlap). After mapping all all duplications and speciations on the phylogenetic tree of a given gene family all orthology and paralogy relationships are inferred accordingly. All orthology and paralogy predictions can be accessed through PhylomedDB. Detection of aphid-specific gene expansions The duplication events defined by the above mentioned species overlap algorithm that only comprised paralogs from A. pisum were considered lineage-specific duplications. Whenever more than one round of duplication followed A. pisum speciation event (family expansion), all resulting paralogs were grouped into a single group of “in-paralogs”. Results from all the trees in the phylome were merged into a non-redundant list of inparalogs groups, by merging groups sharing a significant fraction of their members (50%). Orthology-based functional annotation. A list of orthology-based transfer of functional annotations was built based on phylogeny-based orthology relationships with Drosophila melanogaster. A. pisum genes with orthology relationships with annotated D. melanogaster genes were grouped according to the type of orthology relationship. 4058 aphid genes could be annotated based on a clear one-to-one orthology relationship with a Drosphila gene. Additional 2315 genes presented a many-to-one orthology relationship with annotated Drosophila genes and thus can temptatively be annotated with the GO terms associated to the fly genes but with the cautionary remark that processes of neo and sub-functionalization may have occurred. Species tree reconstruction. 197 genes having a single-copy ortholog in all the species included in the analyses were selected to infer a species phylogeny. Alignments performed with MUSCLE as previously described were concatenated into a superalignment containing 144,922 positions. The removal of positions with gaps in more than 50% of the sequences resulted in a final alignment of 109,422 positions. This alignment was used for Maximum Likelihood (ML) tree reconstruction as implemented in PhyML v2.4.4 (Guindon ad Gascuel, 2003), using JTT as an evolutionary model and assuming a discrete gamma-distribution model with four rate categories and invariant sites, where the gamma shape parameter and the fraction of invariant sites were estimated from the data. Bootstrap analyis was performed on the basis of 100 replicates. Detection of Odorant Binding Proteins. Gene sequences can be predicted to encode OBPs by their predicted proteins having:1) a α-helix pattern, 2) the conserved cysteine residues with the expected spacing between them, 3) a globular water-soluble nature and 4) the presence of a signal peptide. Given this, genes encoding OBPs in the A. pisum genome were identified and annotated using several approaches. The A. pisum EST database (167,706 sequences) and the whole genome sequence were searched using: 1) An algorithm to identify the conserved cysteine motif C1-X8-41-C2-X3-C3-X2147-C4-X7-15-C5-X8-C6 in the 6-frame translated sequences; 2) rps-BLAST with the PBP/GOBP (pfam01395) and CSP (pfam03392) conserved domains and 3) tBLASTn and PSI-BLAST using as the ‘query’ known OBPs from other insects. The genome sequence was also searched using tBLASTn and HMMER against the 6-frame translated sequences (known OBPs from other insects were used as query for all BLAST searches; Pfam profiles were used on all HMMER searches). References (not in alphabetical order) M. Anisimova and O. Gascuel, "Approximate likelihood ratio test for branchs: A fast, accurate and powerful alternative," Systematic Biology, 55(4), 539-552, 2006 Ashford, D.A., Smith, W.A., Douglas, A.E., 2000. Living on a high sugar diet: the fate of sucrose ingested by a phloem-feeding insect, the pea aphid Acyrthosiphon pisum. J. Insect Physiol. 46, 335–341. A. E. Douglas, D. R. G. Price, L. B. Minto, E. Jones, K. V. Pescod, C. L. M. J. François, J. Pritchard and N. Boonham (2006) Sweet problems: insect traits defining the limits to dietary sugar utilisation by the pea aphid, Acyrthosiphon pisum. Journal of Experimental Biology 209, 1395-1403 (2006) Kikawada et al., 2007, PNAS 104:11585-90 Rhodes et al. Dietary sucrose and oligosaccharide synthesis in relation to osmoregulation in the pea aphid, Acyrthosiphon pisum. Physiol Entomol (1997) vol. 22 (4) pp. 373-379 D. R. G Price, A. J Karley, D. A Ashford, H. V Isaacs, M. E Pownall, H. S Wilkinson, J. A Gatehouse, A. E Douglas (2007) Molecular characterisation of a candidate gut sucrase in the pea aphid, Acyrthosiphon pisum. Insect Biochem. Mol. Biol. 37, 307-317 Arnold, A. P., Itoh, Y. & Melamed, E. (2008) A Bird's-Eye View of Sex Chromosome Dosage Compensation. Annual Review of Genomics and Human Genetics, 9, 109-127. Bloch, G; Toma, DP; Robinson, GE. 2001. Behavioral rhythmicity, age, division of labor and period expression in the honey bee brain. JOURNAL OF BIOLOGICAL RHYTHMS Volume: 16 Issue: 5 Pages: 444-456 Braendle, C., G. K. Davis, J. A. Brisson, and D. L. Stern. 2006. Wing dimorphism in aphids. Heredity 97:192-199. Brisson, J. A., G. K. Davis, and D. L. Stern. 2007. Common genome-wide transcription patterns underlying the wing polyphenism and polymorphism in the pea aphid (Acyrthosiphon pisum). Evol. Dev. 9:338-346. Burmester, T; Scheller, K. 1999. Ligands and receptors: Common theme in insect storage protein transport. NATURWISSENSCHAFTEN Volume: 86 Issue: 10 Pages: 468-474. Chang, C-c, G. W. Lin, C. E. Cook, S. B. Horng, H. J. Lee, and T. Y. Huang. 2007. Apvasa marks germ-cell migration in the parthenogenetic pea aphid Acyrthosiphon pisum (Hemiptera: Aphidoidea). Dev Genes Evol. 217:275-287. PMID: 17333259 Chang, C-c, T. Y. Huang, C. E. Cook, G. W. Lin, C. L. Shih, and R. P. Y. Chen. 2008. Developmental expression of Apnanos during oogenesis and embryogenesis in the parthenogenetic pea aphid Acyrthosiphon pisum. Int. J. Dev. Biol. (in press) doi: 10.1387/ijdb.082570cc Corbitt & Hardie 1985 Entomol exp appl 38 131-135, Cyran SA, Buchsbaum AM, Reddy KL, Lin MC, Glossop NRJ, Hardin PE, Young MW, Storti RV, Blau J 2003. vrille, Pdp1, and dClock form a second feedback loop in the Drosophila circadian clock. Cell. 112: 329-341. Ghanim, M., A. Dombrovsky, B. Raccah, and A. Sherman. 2006. A microarray approach identifies ANT, OS-D and takeout-like genes as differentially regulated in alate and apterous morphs of the green peach aphid Myzus persicae (Sulzer). Insect Biochemistry and Molecular Biology 36:857-868. Hardie, J; Nunes, MV 2001 Aphid photoperiodic clocks. JOURNAL OF INSECT PHYSIOLOGY Volume: 47 Issue: 8 Pages: 821-832 Ishikawa, A., S. Hongo, and T. Miura. 2008. Morphological and histological examination of polyphenic wing formation in the pea aphid Acyrthosiphon pisum (Hemiptera, Hexapoda). Zoomorphology 127:121-133. Lees AD 1959. The role of photoperiod and temperature in the determination of parthenogenetic and sexual forms in the aphid Megoura viciae Buckton. I - The influence of these factors on apterous virginoparae and their progeny. J.Ins.Physiol. 3 92-117. Miura T, Braendle C, Shingleton A, Sisk G, Kambhampati S, Stern DL 2003. A comparison of parthenogenetic and sexual embryogenesis of the pea aphid Acyrthosiphon pisum (Hemiptera : Aphidoidea). JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION Volume: 295B Issue: 1 Pages: 59-81 Muller, C. B., I. S. Williams, and J. Hardie. 2001. The role of nutrition, crowding, and interspecfic interactions in the development of winged aphids. Ecol. Ent. 26:330-340. Myers EM, 2003. The circadian control of eclosion. CHRONOBIOLOGY INTERNATIONAL Volume: 20 Issue: 5 Pages: 775-794 Nijhout, H. F. 2003. Development and evolution of adaptive polyphenisms. Evol Dev 5:918. Schwartzberg, Ezra G, Kunert, Grit, Westerlund, Stephanie, Hoffmann, Klaus H., Weisser, Wolfgang W. 2008. Juvenile hormone titres and winged offspring production do not correlate in the pea aphid, Acyrthosiphon pisum J Insect Physiol 34 Issue: 9 Pages: 1146-1148 Tagu, D; Sabater-Munoz, B; Simon, JC 2005 Deciphering reproductive polyphenism in aphids. Invertebr Reprod Dev 48 71-80 The Honeybee Genome Sequencing Consortium (2006) Insights into social insects from the genome of the honeybee Aphis mellifera. Nature, 443, 931-949. Yuan, Quan, Metterville, Danielle, Briscoe, Adriana D, Reppert, Steven M. 2007 Insect cryptochromes: Gene duplication and loss define diverse ways to construct insect circadian clocks. MOLECULAR BIOLOGY AND EVOLUTION Volume: 24 Issue: 4 Pages: 948-955 Wilson, A. C. C., Sunnucks, P. & Hales, D. F. (1997) Random loss of X chromosome at male determination in an aphid, Sitobion near fragariae, detected using an X-linked polymorphic microsatellite marker. Genetical Research, Cambridge, 69, 233-236. Zhou XG; Tarver, MR; Scharf, ME 2007. Hexamerin-based regulation of juvenile hormone-dependent gene expression underlies phenotypic plasticity in a social insect Development 134 601-610 Altincicek B, Gross J, Vilcinskas A. 2008. Wound-mediated gene expression and accelerated viviparous reproduction of the pea aphid Acyrthosiphon pisum. Insect Molecular Biology 17: 711-716. Anselme C, Villar A, Balmand S, Fauvarque MO, Heddi A. 2006. Host PRGP gene expression and bacterial release in endosymbiosis of the weevil Sitophilus zeamais. Applied and Environmental Microbiology 72: 6766-6772. Chen DQ, Montllor CB, Purcell AH. 2000. Fitness effects of two facultative endosymbiotic bacteria on the pea aphid, Acyrthosiphon pisum, and the blue alfalfa aphid, A-kondoi. Entomologia Experimentalis Et Applicata 95: 315-23 Dyall SD, Brown MT, Johnson PJ. 2004 Ancient invasions: from endosymbionts to organelles. Science 304:253-7. PMID: 15073369 Fenn K, Conlon C, Jones M, Quail MA, Holroyd NE, Parkhill J, Blaxter M: Phylogenetic relationships of the Wolbachia of nematodes and arthropods. PLoS Pathog 2006, 2(10):e94. Gündüz E. et al. 2009. Symbiotic bacteria enable insect to utilise a nutritionallyinadequate diet. Proceedings of the Royal Society of London B.in press. Hotopp JC, Clark ME, Oliveira DC, Foster JM, Fischer P, Torres MC, Giebel JD, Kumar N, Ishmael N, Wang S, Ingram J, Nene RV, Shepard J, Tomkins J, Richards S, Spiro DJ, Ghedin E, Slatko BE, Tettelin H, Werren JH. 2007. Widespread lateral gene transfer from intracellular bacteria to multicellular eukaryotes. Science 317:1753-6. PMID: 17761848 Harvey SH, Krien MJ, O'Connell MJ. 2002. Structural maintenance of chromosomes (SMC) proteins, a family of conserved ATPases. Genome Biol. 2002;3(2):REVIEWS3003.1-3003.5 doi:10.1186/gb-2002-3-2reviews3003 PMID: 11864377 Hwang JY, Smith S, Ceschia A, Torres-Rosell J, Aragon L, Myung K. 2008. Smc5-Smc6 complex suppresses gross chromosomal rearrangements mediated by break-induced replications. DNA Repair (Amst). 7(9):1426-36. PMID: 18585101 Kondo N, Nikoh N, Ijichi N, Shimada M, Fukatsu T. 2002. Genome fragment of Wolbachia endosymbiont transferred to X chromosome of host insect. Proc Natl Acad Sci U S A. 99:14280-5. PMID: 12386340 Montllor CB, Maxmen A, Purcell AH. 2002. Facultative bacterial endosymbionts benefit pea aphids Acyrthosiphon pisum under heat stress. Ecological Entomology 27: 189-95 Moran NA, McLaughlin HJ, Sorek R. 2009. The dynamics and time scale of ongoing genomic erosion in symbiotic bacteria. Science 323: in press. Moran NA, Munson MA, Baumann P, Ishikawa H: A Molecular Clock in Endosymbiotic Bacteria Is Calibrated Using the Insect Hosts. P Roy Soc Lond B Bio 1993, 253(1337):167-171. Moran NA, Russell JA, Koga R, Fukatsu T. 2005. Evolutionary relationships of three new species of Enterobacteriaceae living as symbionts of aphids and other insects. Applied and Environmental Microbiology 71: 3302-10 Nakabachi A. Shigenobu S., Sakazume N., Shirake T., Hayashizaki Y., Carninci P., Ishikawa H., Kudo T, Fukatsu T. 2005. Transcriptome analysis of the aphid bacteriocyte, the symbiotic host cell that harbors an endocellular mutualistic bacterium, Buchnera. Proc. Natl. Acad. Sci. USA 102: 5477-5482. PMID: 15800043 Nikoh N, Tanaka K, Shibata F, Kondo N, Hizume M, Shimada M, Fukatsu T. 2008. Wolbachia genome integrated in an insect chromosome: evolution and fate of laterally transferred endosymbiont genes. Genome Res. 18:272-280. PMID: 18073380 Oliver KM, Russell JA, Moran NA, Hunter MS. 2003. Facultative bacterial symbionts in aphids confer resistance to parasitic wasps. Proceedings of the National Academy of Sciences of the United States of America 100: 1803-7 Oliver KM, Moran NA, Hunter MS. 2005. Variation in resistance to parasitism in aphids is due to symbionts not host genotype. Proceedings of the National Academy of Sciences of the United States of America 102: 12795-800 Russell JA, Moran NA. 2006. Costs and benefits of symbiont infection in aphids: variation among symbionts and across temperatures. Proceedings of the Royal Society B-Biological Sciences 273: 603-10 Sasaki T et al. 1990. J. Insect Physiol. 36: 35-40. Sandstrom JP, Russell JA, White JP, Moran NA. 2001. Independent origins and horizontal transfer of bacterial symbionts of aphids. Molecular Ecology 10: 217-28 Scarborough CL, Ferrari J, Godfray HCJ. 2005. Aphid protected from pathogen by endosymbiont. Science 310: 1781. Shakesby AJ, Wallace IS, Isaacs HV, Pritchard J, Roberts DM and Douglas AE 2008. A water-specific aquaporin involved in aphid osmoregulation. Insect Biochemistry and Molecular Biology, in press. Shigenobu S., Watanabe H., Hattori M., Sakaki Y., Ishikawa H. 2000. Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407: 81–86. PMID:10993077 Tsuchida T, Koga R, Fukatsu T. 2004. Host plant specialization governed by facultative symbiont. Science 303: 1989. Arnold, A. P., Itoh, Y. & Melamed, E. (2008) A Bird's-Eye View of Sex Chromosome Dosage Compensation. Annual Review of Genomics and Human Genetics, 9, 109-127. PMID: 18489256 Braendle, C., G. K. Davis, J. A. Brisson, and D. L. Stern. 2006. Wing dimorphism in aphids. Heredity 97:192-199. PMID: 16823401 Corbitt, T. S. & Hardie, J. (1985) Juvenile hormone effects on polymorphism in the pea aphid, Acyrthosiphon pisum. Entomologia Experimentalis et Applicata, 38, 131-135. Cyran SA, Buchsbaum AM, Reddy KL, Lin MC, Glossop NRJ, Hardin PE, Young MW, Storti RV, Blau J 2003. vrille, Pdp1, and dClock form a second feedback loop in the Drosophila circadian clock. Cell. 112: 329-341. PMID: 12581523 Hardie J 1987. Neurosecretory and endocrine systems. In “Aphids, their biology, natural ennemies and control, Volume A”, Eds Minks,A.K.; Harrewijn,P. Elsevier, Amsterdam, Oxford, New York, Tokyo pp 139-152. Ijichi N, Shibao H, Miura T, Matsumoto T, Fukatsu T 2005. Analysis of natural colonies of a social aphid Colophina arma: population dynamics, reproductive schedule, and survey for ecological correlates with soldier production. Applied Entomology and Zoology. 40: 239-245. Lees AD 1959. The role of photoperiod and temperature in the determination of parthenogenetic and sexual forms in the aphid Megoura viciae Buckton. I - The influence of these factors on apterous virginoparae and their progeny. J.Ins.Physiol. 3 92-117. Le Trionnaire, G., Hardie, J., Jaubert-Possamai, S., Simon, J. C. Tagu, D. (2008) Shifting from clonal to sexual reproduction in aphids: physiological and developmental aspects. Biology of the Cell, 100, 441-451. PMID: 18627352 Miura T, Braendle C, Shingleton A, Sisk G, Kambhampati S, Stern DL 2003. A comparison of parthenogenetic and sexual embryogenesis of the pea aphid Acyrthosiphon pisum (Hemiptera : Aphidoidea). JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 295B 59-81 Nijhout, H. F. 2003. Development and evolution of adaptive polyphenisms. Evol Dev 5:918. PMID: 12492404 Schwartzberg, Ezra G, Kunert, Grit, Westerlund, Stephanie, Hoffmann, Klaus H., Weisser, Wolfgang W. 2008. Juvenile hormone titres and winged offspring production do not correlate in the pea aphid, Acyrthosiphon pisum J Insect Physiol 34(9): 1146-1148 PMID: 18634797. The Honeybee Genome Sequencing Consortium (2006) Insights into social insects from the genome of the honeybee Aphis mellifera. Nature, 443, 931-949. PMID: 17073008 Yuan, Quan, Metterville, Danielle, Briscoe, Adriana D, Reppert, Steven M. 2007 Insect cryptochromes: Gene duplication and loss define diverse ways to construct insect circadian clocks. MOLECULAR BIOLOGY AND EVOLUTION, 24(4): 948-955 PMID: 17244599. Wilson, A. C. C., Sunnucks, P. & Hales, D. F. (1997) Random loss of X chromosome at male determination in an aphid, Sitobion near fragariae, detected using an X-linked polymorphic microsatellite marker. Genetical Research, Cambridge, 69, 233-236. Zhou XG; Tarver, MR; Scharf, ME 2007. Hexamerin-based regulation of juvenile hormone-dependent gene expression underlies phenotypic plasticity in a social insect Development 134 601-610. PMID: 17215309 Edgar RC. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, 5:113. Gabaldón. T. (2008) Large-scale assignment of orthology: back to phylogenetics? Genome Biol. Oct 30;9(10):235. Gascuel O. (2003) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 1997, 14:685-695. Guindon S, Gascuel O. (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol, 52:696-704. Huerta-Cepas J, Bueno A, Dopazo J, Gabaldón T (2008). PhylomeDB: a database for genome-wide collections of gene phylogenies. Nucleic Acids Res. 36:D491-496. Huerta-Cepas J, Dopazo H, Dopazo J, Gabaldón T. (2007) The human phylome. Genome Biol. 8:R109. Smith TF, Waterman MS. (1981) Identification of common molecular subsequences. J Mol Biol . 147:195-197. [1] Yu.Kapustin, A.Souvorov, T.Tatusova. Splign - a Hybrid Approach To Spliced Alignments. RECOMB 2004 - Currents in Computational Molecular Biology. p.741. [2]. B. Kiryutin, A. Souvorov. New global protein-nucleotide alignment tool. ISMB 2005. [3]. A. Souvorov, T. Tatusova, D. Lipman. Eukariotic Genome Annotation with Gnomon a Multi-step Combined Gene Prediction Tool. ISMB 2004, p125. [4]. Haas BJ, Delcher AL, Mount SM, Wortman JR et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003 Oct 1;31(19):5654-66. (PMID: 14500829) Tables Table Suggestion 1. Sanger Read Statistics. Insert Size (kb) Raw reads Passed reads 2-5 4,325,31 3 35 24,673 110-130 56,246 TOTAL 4,406,23 2 Assembled reads Clone 3,955,990 3,044,414 plasmi d 8,158 5,294 fosmid 45,140 2,286 BAC 4,009,288 3,051,994 Table Suggestion 2. Repeats Repeats types Number of families LTR 4 LARD 13 LINE 16 SINE 7 TIR 37 Helitron 2 Polinton 3 MITE 3 Total 85 Number of copies Coverag e (bp) Coverage (% of genome) Table Suggestion 3. Comparison of main features of a primary and secondary symbiont (could be coupled with image of both inside cells) Buchnera of A. pisum Regiella from A. pisum Required by aphid host? Yes No Maternally Yes Yes inherited? Can invade new hosts? No Yes Location in host Bacteriocytes only Bacteriocytes, hemolymph Bacterial Division Gammaprotebact eria Gammaproteobacteria Genome Size 0.64 Mb ~2-2.5 Mb %G+C 26% 45% # coding genes 564 In progress % coding 86% In progress # rRNA operons 1 (in 2 parts) 4 (intact) Mobile elements Absent Abundant Table Suggestion 4. Comparison of A. pisum core clock gene sequences with homologous sequences from other insects with sequenced genome available. % identical positions Gene Id. Comp.b Rate Lengthd c Period D. B. mori A. mellifera melanogaster ca 0,009** 644 20,8 / 34,7 22,6 / 34,6 27,3 / 41,0 27 0,006** 418 29,0 / 59,7 29,0& / 55,8& NA 30 Cycle 0,519 308 67,8$ / 81,1$ 57,1 / 68,3 68,6 / 79,8 68 Clock 0,983 319 44,2 / 53,3 44,0& / 50,0& 49,3 / 56,8 45 Timeless Vrille Acypix L1 L2 1,000 = 130 48,0 / 42,3 50,7 / 52,7 47,7 / 52,7 49 PDP1 1,000 = 143 80,4 / 84,6 84,6 / 88,1 86,0 / 88,8 81 Cryptochrome1 0,200 = 466 52,4 / 59,2# 54,1 / 60,5# NA Cryptochrome2a 0,830 491 71,1$ / 83,9$ 73,3 / 79,8 73,7 / 85,3 72 491 71,1$ / 83,9$ 72,7 / 79,8 73,5 / 85,3 71 Cryptochrome2b 0,664 a) % of identical positions between each given species and A. pisum / Pediculus humanus. Since some sequences were either not available or badly predicted, they were conveniently replaced in the comparisons by sequences from other species: $, Anopheles gambiae replaces D. melanogaster, &, Antheraea pernyi replaces Bombyx mori; #, Dianemobius nigrofasciatus replaces P. humanus. Average % identity over all comparisons is shown in the last column. b) p-values obtained for A. pisum sequences in the chi-square tests performed using TreePuzzle (Schmidt et al., 2002) to test for homogeneity of amino acid composition in insect sequences. **, highly significant (<0,01) deviations in amino acid composition of A. pisum sequences. c) Program RRTree (Robinson-Rechavi and Huchon, 2000) was used to test for homogeneity in rates of amino acid sequence evolution among insect sequences. , A. pisum sequences showing highly accelerated rates in all comparisons; , A. pisum sequences showing accelerated rates in most comparisons; = , A. pisum sequences not showing accelerated rates in any comparison. d) Length of aligned sequences (for each gene only blocks whose alignment was unequivocal were used). L1, L2 correspond to genes from conserved loops 1 and 2 respectively of the D. melanogaster clock NA, not applicable. Table… List of pea aphid genes related to juvenile hormone and insulin patways. Putative orthologs for each pea aphid gene prediction are indicated. M indicates CpG methylation detected, X indicates CpG methylation not found. Pea aphid gene gene name abbrev. prediction Drosophila ortholog Tribolium ortholog Apis JH-related genes Juvenile Hormone Acid Methyltransferase Cytosolic Juvenile Hormone Binding Protein Juvenile Hormone Epoxide Hydrolase Juvenile Hormone Esterase* Juvenile Hormone Esterase Binding Protein ACYPI255574 X, ACYPI568283 X FBgn0028841 JHEH ACYPI154871 M ACYPI275360 X, ACYPI189600 X, ACYPI307696 M FBgn0010053, FBgn0034405, FBgn0034406 JHE ACYPI381461 JHEBP ACYPI563350 M FBgn0035088 XM_964394 JHAMT1 JHBP NM_001127311 XM_ XM_964351 XM_ XM_970006 XM_ XM_ Hexamarin Methoprenetolerant allatostatin hex none Met Ast hmm126914 hmm252834 FBgn0002723 FBgn0015591 ACYPI008623 ACYPI003035 ACYPI003572 FBgn0028961 allatostatin receptor FKBP39 Chd64 broad Br Retinoid X receptor RXR (ultraspiracle) (usp) insulin-related genes Insulin receptor (InR) Insulin receptor tyrosine kinase substrate Pkb/Akt (rac serine/threonin kinase) Forkhead box subgroup O Pten XM_961866, XM_962135 NM_001099342 XM_001809286 XM_ FBgn0035499 XM_ NM_ XM_ ACYPI008576 FBgn0000210 XM_001810758, XM_001810798 ACYPI005934 FBgn0003964 NM_001114294 NM_ XM_967677 XM_ XM_ ACYPI009339, ACYPI010079 InR NM_ NM_ NM_ FBgn0013984 ACYPI008202 Pkb/Akt ACYPI002231 FBgn0010379 FOXO Pten ACYPI008827 ACYPI004294 FBgn0038197 FBgn0026379 Target of rapamycin Tor ACYPI004568 FBgn0021796 * The predicted juvenile homone esterase is identified by the characteristic GQSAG motif and does not show significant homology to other known JHEs Table Suggestion 6. Summary of gene models produced by different gene annotation pipelines. NCBI models are subdivided into protein coding models completely or partially based on EST or protein alignments, pseudogene models containing debilitating frameshift or nonsense codons, and ab initio models. # of models NCBI complete support 3403 (3623 transcripts) partial support 6842 XM_ XM_ pseudogenes 841 ab initio 26,689 AUGUSTUS Fgenesh GENEid GenScan MAKER SNAP GLEAN (- RefSeq) 24,355 preliminary OGS 34,600 (34,820 transcripts) 5 0 Figures Figure Suggestion 1. Species phylogeny based on a Maximum Likelihood analyses of a concatenated alignment of 197 widespread, single-copy proteins. The tree has been rooted using chordates as the most external out-group. Bars and lines on the right summarize the results of comparative genomics analyses: A.- Comparison of the gene content of all species included in the analysis. Bars represent the gene content of each species (scale on the top). These have been subdivided to indicate different types of homology relationships: Black: widespread genes that are found with a one to one orthology in, at least 16 of the 17 species; Blue: widespread genes that can be found in at least 16 of the 17 species and are sometimes present in more than one copy; Red: insect-specific widespread genes present in at least 12 of the 13 insect species and absent from noninsect species; Yellow: insect specific non-widespread genes (present in less than 12 insect species); Green: genes present in insects and other groups but with a patchy distribution; White: species-specific genes with no (detectable) homologs in other species (stripped fraction correspond to species specific genes present in more than one copy). The thin red line under each bar represents the percentage of A. pisum genes that have homologs in a given species. B.- This graphic represents the number of single genes and duplicated genes for each species. Singletons are represented by the purple fraction while multi-copy genes are marked by the pink part of the bar. Figure suggestion 2 (NOTE: this figure was proposed to be merged with the figure above. But I think this will make it too confusing. I suggest this to be figure 2) Lineage-specific gene expansions in the pea-aphid. A) Size distribution of the major lineage-specific groups of in-paralogs (paralogs coming from duplications occurred after the speciation of the lineages leading to the pea aphid and Pediculus humanus), Y axis (note the logarithmic scale) represents the number of gene families with lineage-specific expansions of a given size (X axis), as inferred from the analysis of the pea aphid phylome. B) Maximum Likelihood phylogenetic tree showing an lineage-specific gene expansion in a family coding for a putative Acetyl-CoA transporter. This expansion has resulted in 19 intra-specific paralogs in the pea aphid, whereas other insects and out- 5 1 groups included in the analysis only present one orthologous sequence. The tree was downloaded from phylomeDB.org (Huerta-Cepas et. al. 2008) and re-formated, the complete tree can be accessed with the code ACYPI004176-PA. The tree was reconstructed following the phylome tree reconstruction pipeline (see MethodS). Figure Suggestion 2. Distribution of the mean identity between the copies and the consensus for different super-families of TEs (NOTE: these data are not for pea aphids) Figure Suggestion 3. Preliminary phylogeny based on Ty3/Gypsy and Retroviridae Integrases, the presence of KRB2 in the pea Aphid genome, and its relationship with Ty3/Gypsy integrases has been colored in red Figure Suggestion 4. Distribution of synonymous distances (dS) among pairs of paralogs. Left, only pairs matching a reciprocal best hit criterion (RBH), Right, all pairs of paralogs. Figure Suggestion 5. Transcription levels of laterally transferred genes in the bacteriocyte. Ivory columns and blue colums indicate abundance of transcripts in the whole body and in the bacteriocyte, respectively; bars, standard errors (n = 6). The expression levels are shown in terms of mRNA copies of target genes per copy of mRNA for RpL7. Asterisks indicate statistically significant differences (Mann-Whitney U test; **, p < 0.01). Transcripts for ldcA, ybjR, and rlpA are 11.6, 8.5, 154–fold more abundant in the bacteriocyte than in the whole body, respectively. It is also notable that the copy numbers of their transcripts in the bacteriocyte were comparable to that of the control transcript encoding ribosomal protein L7 (RpL7), indicating that their expression levels are relatively high. Figure Suggestion 6. Transmission electron micrograph showing Buchnera and Regiella in adjacent host bacteriocytes. 5 2 Figure Suggestion 7. There is interest in having a figure that highlights the interaction of aphids and Buchnera. The obligate symbiosis is central to the aphid story, and even a generalized descriptive diagram would help to highlight this. However, no one has come up with a concrete idea yet. Figure Suggestion 8. It would be possible to have a figure highlighting the missing genes of the IMD immune pathway, which is intact in all other sequenced insects today and is absent in pea aphids, and/or table of major immune gene classes and their numbers across the sequenced insects, highlighting the missing genes of aphids. Figure Suggestion 9. Atsushi and Angela have data on the expression of several metabolism-related genes across different aphid tissues. The symbiont-section group is having an ongoing disscussion about these data, but it may be an interesting point to highlight (once we figure out what it means). Figure Suggestion 10. A cell cycle figure with all the genes marked as present and absent and indicating genes that have duplications - we would need someone in the mitosis/meiosis team to generate this (ask Dayalan). Figure Suggestion 11. Gene duplications of mitotic kinases being involved in polyphenism regulation. This story appears super interesting. Can it generate a figure?? Possibly a phylogeny with some annotation about polyphenism? (ask Dayalan). Figure Suggestion 12. a figure about circadian rhythm pathways Clock and Period showing the drosophila pathways with the genes present in the pea aphid shown clearly and the missing genes appearing in "ghost-like" writing (ask David Martinez). Figure Suggestion 13. (A) NJ tree of A.pisum facilitative sugar transporters (ApST1ApSTxx). Numbers on the branches represent level of confidence as determined by bootstrap analysis (1000 replicates). The scale bar indicates an evolutionary distance of 0.05 amino acid substitutions per position in the sequence. Number of ESTs supporting 5 3 each gene sequence is indicated. (B) Genomic structure of duplicated A.pisum sugar transporters. Regions with sequence identity >65% are shaded. Intron regions are not drawn to scale. (C) Genomic clustering of a group of duplicated A.pisum sugar transporters, duplicated genes are boxed. 5 4 Authors and Affiliations Being assembled in a separate document. 5 5