The First Genome Sequence of a Symbiotic Insect:

advertisement
a.
Genome Sequence of the Pea Aphid Acyrthosiphon pisum: Adaptation to
Host Plants and Symbiotic Bacteria
The International Aphid Genomics Consortium
Abstract
to be done later
Highlights
Note, these highlights may be incorporated into a single paragraph near the end of the
introduction and/or may be incorporated into a synopsis of the article to be included in a
PLOS edition. They will also be the points that we will want to try and highlight when
talking with others about the project, so it is very important that we pick the most exciting
elements.
General Novel Aspects of the Project

sequenced a symbiotic system: the aphid host and its primary and secondary
symbionts.

First Hemimetabolous insect sequenced

First Agricultural pest sequenced
General Features of the Pea Aphid
1. A high, steady wave of gene duplications characterizes the Acyrthosiphon
pisum genome.
2. Abundant chromatin remodeling proteins may enable functional specialization of
epigenetic pathways.
3. Unexpected expansion of the microRNA machinery a first observation in the metazoan
4. A. pisum has acquired functional genes from bacteria via lateral gene transfer, but the
number is small, and the transfer origins are from a-protobacteria, not from the group
containing the primary symbiont Buchnera or most secondary symbionts.
Feeding
5. Gene duplication has led to an extensive and diverse family of uniporters of sugars and
other compounds in pea aphid
6. The number of detoxification genes is correlated with aphid host variety
7. The pea aphid and its obligate symbiont buchnera engage in a true nutritional symbiosis,
particularly in amino acid and purine metabolism.
Reproduction
8. Expansion of some regulatory kinases involved in controlling mitosis, possibly involved in
reproductive polyphenism?
Phenotypic Plasticity
9. Different DNA methylation states are associated with the pea aphid wing polyphenism.
Some Striking Differences to Other Sequenced Insects
10. There are many cases of aphid-specific losses and duplications of “toolkit genes” for
development, which are known to be highly conserved among metazoans. Modifications
of these pathways in aphids are suggested
11. Characterization of cuticular proteins reveals both a large gene expansion of RR-2
proteins and a reduced number of chitinase genes, which might reflect the absence of
dramatic exoskeleton reconstruction in hemimetabolous insects
12. Pea aphids are missing many immune-related genes common to other insects, including
genes commonly involved in pathogen recognition, immune signaling, and antimicrobial
peptides.
13. The gene Period in A. pisum does not contain the motifs necessary for nuclear import
and the whole protein seems to be evolving at accelerated rates.
Introduction
Aphids are among the most severe pests of agricultural crops. These small, soft-bodied
insects feed from plants, affecting their growth and acting as vectors of plant viruses. As
a result, they have an impact in the production of food and fibers worldwide. Thereis a
need to advance the overall understanding of the biological interactions of these pests
with their symbionts and host plants. Aphids feed by inserting their slender mouthparts,
referred to as stylets, into phloem cells, one of the food conduits of plants. Most of the
approximately 5000 species of aphid feed on only one or a few species of host plants,
and closely related aphid species tend to feed on related host species. Once an aphid
finds a suitable plant, using a variety of visual and chemical cues, it settles to then
simultaneously feed and reproduce. Offspring are born live and typically settle close to
their mothers, spawning large colonies. Newborn nymphs molt four times, each time
growing larger but otherwise looking similar to the previous instar – like other members
of the Hemiptera, aphids are hemimetabolous insects, undergoing an incomplete
metamorphosis from the juvenile to the adult stages, which may be winged, (alate) or
wingless (apterae) and which disperse readily spreading plant diseases..
Phloem fluid provides a diet with high concentrations of simple sugars and an
unbalanced mixture of amino acids. Aphids have evolved specialized gut morphology
and physiology to reduce the high osmotic potential of phloem fluid. Most aphids harbor
intracellular symbiotic bacteria, Buchnera aphidicola, that produce several essential
amino acids that are found at low levels in the aphid diet. Aphids and Buchnera have
coevolved since the origin of aphids, about 200 million years ago, with aliquots of
Buchnera transferred directly from maternal tissues into developing embryos and
oocytes every generation. The Buchnera genome underwent a dramatic reduction in
gene content — to about 620 genes — soon after the origin of the symbiosis,
underscoring the dependence of Buchnera on the host cells. Buchnera have dispensed
with many genes that would allow them to live outside aphid cells, and they import all of
their food and potentially other essential products from the aphid cell. Some aphids also
harbor a variety of other facultative bacterial symbionts that provide ecologically relevant
benefits, such as heat tolerance and resistance to parasitoids.
Aphids are essentially plant parasites and like many parasites they have evolved
complex life cycles with alternative generations of individuals specialized to meet
different ecological challenges. They have taken this specialization, called polyphenism,
to extremes. Aphids produce forms specialized for sexual versus asexual reproduction,
sedentary rapid reproduction versus dispersal, feeding on distinct sets of host plant
species, and desiccation resistance or for colony defense in the case of social species.
Asexual forms have evolved a highly modified meiosis, which skips the reduction division
of Meiosis I, to allow parthenogenetic reproduction. Embryos develop directly within their
mothers and, sometimes embryos develop within embryos (paedogenesis), such that
females carry their grand-offspring within them. This telescoping of generations
promotes short generation times, allowing aphid colonies to rapidly exploit new
resources. (last two sentences for summer generations only?)
(EXPLAIN WHY WE CHOSE PEA APHID)
The pea aphid genome, the first published genome of a hemimetabolous insect,
provides an outgroup for the published genomes of multiple holometabolous insects
such as flies, beetles, butterflies and bees. The pea aphid genome thus creates a
dataset with which a more accurate reconstruction can be made of the gene content in
the common ancestor of hemimetabolous and holometabolous insects, which diverged
about 310-350 MYA. In addition, this hemipteran genome provides a foundation for
exploring the genetic basis of host plant specialization, extreme developmental
phenotypic plasticity, and coevolved associations with bacterial endosymbionts.
Results
Features of The Pea Aphid Genome
Genome Sequence and Organization. Initial Sanger sequencing of DNA
samples from pea aphid LSR1.AC.F1 strain produced 3.13 million reads. This represents
about 464Mb of sequence and about 6.2X coverage of the (clonable) A. pisum genome.
The final Acyr 1.0 assembly contained 72,844 contigs, with an N50 length of 10.8 kb,
and a total length of 446.6 Mb. Additionally, the sequenced sample also contained cells
from the obligate symbiont Buchnera aphidicola. The facultative symbiont Regiella
insecticola was sequenced separately, from a large insert library. Since the genome
sequence of R. insecticola has not been reported and the only available genome
sequence of B. aphidicola is from a different pea aphid strain, we identified and used
contaminating reads to assemble complete sequences of both bacterial genomes.
After Sanger sequencing and assembly, the pea aphid genome was subjected to
additional pyrosequencing on the 454 platform to improve the genome sequence contig
and scaffold lengths. <DESCRIBE THE Acyr 2.0 ASSEMBLY HERE AS SOON AS I
HAVE IT> Here we report the sequence for both versions of the pea aphid sequence,
but the annotation and analysis are based on the first Sanger based assembly Acyr_1.0.
Genome Sequence and Organization. This genome has a GC content of
29.6%, the lowest among the range of available insect genomes (34.8% in Apis mellifera
to 45% in Drosophila pseudoobscura). Transcript GC content is higher, averaging
38.8% (sd=8.4, N=37994), and is similar to Apis (mean=38.6%, sd=9.7, N=17182)
[Supplemental table at http://insects.eugenes.org/arthropods/data/summaries/
arthropod_insect_gc_content.txt ].
Transposable elements. Transposable elements (TEs) are key elements of
genome plasticity and account for a large part of many eukaryotic genomes. To find all
the TEs inserted in the the pea aphid Acyrthosiphon pisum genome, we used a
previously described transposable element annotation pipeline (Reference and see
Methods). This procedure revealed near 14,000 (WHY NOT THE EXACT NUMBER?)
consensus sequences in the A. pisum genome, which we classified into almost 1400
(EXACT NUMBER) families, according to their structural and coding features. Table 1
shows abundance and coverage of TE families. Figure 1 shows distributions of
nucleotide identity per TE categories, estimating order of TE families invasions in the
genome. XX percent of the pea aphid genome match consensus sequences for repeats.
Additionally, we discovered chimeric TE families and evidence of co-evolution. In
particular, Ty3/Gypsy long-terminal-repeat (LTR) retrotransposons are among the bestknown mobile genetic elements. Proteins coded by Ty3/Gypsy LTR retroelements
occasionally assume cellular roles. GIN-1 for instance, is an integrase apparently
functional, found in humans and other vertebrates [11]. GIN-1 is similar to the integrases
coded by the 412/Mdg1 clade of Ty3/Gypsy elements described in arthropods. The
evolutionary history of GIN-1 is not exceptional. Screening the A. pisum genome
revealed a similar co-evolutionary history between Ty3/Gypsy LTR retroelements and
KRB2 a family of nonviral integrases described in humans and other vertebrates.
Telomeres. Similarly to other non-dipteran insects, the pea aphid possesses a
single candidate telomerase gene. The canonical arthropod telomere repeat TTAGG can
be found in long stretches and the vast majority of these are plus/minus matches,
indicating that they are at the ends of chromosomes. Sun and Robinson (1966)
published karyotypes for various aphids, and the four haploid chromosomes of the pea
aphid matches with the four linkage groups of Hawthorne and Via (reference). Of the
expected 8 telomeres we were able to identify TTAGG repeat stretches are at the ends
of just 5 scaffolds in the assembled genome, of which all but two are very short. The two
relatively long scaffolds identified appear to be true, relatively simple telomeres.
However the others are more complex like the Bombyx and Tribolium telomeres, with
non-LTR retrotransposons insertions that presumably confounded the WGS assembly.
Overall it appears that the aphid telomeres are quite heterogenous, ranging from simple
to complex, and will require further bioinformatic and experimental i analysis.
Gene model prediction. Fewer than 200 pea aphid genes had been sequenced
prior to this project. Consequently, we heavily utilized automated gene predictions to aid
our understanding of the gene content in the pea aphid. Partially or fully supported
models computed by NCBI's gene prediction pipeline serve as a core set of 10,245 gene
models, and are integrated into the public RefSeq databases at NCBI. Since this number
is likely to underestimate the true number of protein-coding genes in the pea aphid,
additional models were calculated using six additional gene prediction programs and
combined into a consensus set of 24,355 additional gene models using GLEAN [ref]
(Table Suggestion 6). The combined total of 34,600 gene predictions is likely to be an
over-estimate of the true number of pea aphid genes, since it includes unsupported ab
initio models, transposons, partial gene models, and predictions of genes duplicated in
the Acyr_1.0 assembly. However, it provides an expansive foundation to identify genes
for subsequent analyses described below.
Utilizing a variety of approaches, a subset of genes of interest were then
annotated manually. All gene predictions and other identified features were loaded in a
GMOD-Chado database (ref) accessible at the AphidBase web portal
(http://www.aphidbase.com). AphidBase is using various open source software tools
from the Generic Model Organism Database (GMOD) in particular the graphical genome
browser GBrowse (ref) and the manual curation software Apollo (ref).
Comparison of Gene Set to Other Organisms. In order to compare the gene
content of A. pisum to that of other organisms we performed sequence searches against
a database containing the proteomes encoded in 16 other species. These include 12
other insects, representing all major insect groups with sequenced genomes, and four
out-groups including the crustacean Daphnia pulex, the nematode Caenorhabditis
elegans and the two chordates Ciona intestinalis and Homo sapiens. To set the genome
comparisons on an evolutionary context, a species phylogeny was reconstructed based
on a Maximum Likelihood analysis of 197 concatenated alignments of genes with a
single-copy ortholog in all species considered (see Material and Methods and FIGURE 1
). The resulting phylogeny groups major insect groups according to previously stablished
taxonomy, including the recovering of the diptera and himenoptera clades. Similarly, the
phylogeny correctly places the pea aphid as a sister group of Pediculus humanus,also a
member of the the para-neopthera clade, which appears at the base of the insect
phylogeny. The long branch leading to A. pisum is indicative of a very long evolutionary
distance and therefore significant genomic differences with its closer relatives with
sequenced genomes are expected. Figure 1B ,shows a summary of the comparison of
the pea aphid gene repertoire to that of other organisms. 12,885 genes in the Acyr1.0
gene set (37%) show no significant hitst (e-value < 10-3) with genes in other species
included in the analysis. This large number of species-specific genes might be in part
due to failures in gene prediction programes or undetectable homology due to extensive
sequence divergence, but might also reflect true genetic specificities of this species as
compared to other insects. A. pisum shares a range of 30-53% of its gene repertoire
with other insects, being Nasonia vitripennis and Tribolium castaneum the two species
that share the highest percentage of aphid genes (53% in both cases). Interestingly the
closest relative among insect with sequenced genomes, Pediculus humanus, shares
only 38% of the pea aphid genes.
The pea aphid phylome: detection of orthology, paralogy and lineage-specific
gene expansions
To obtain an overview of the evolution of each single pea aphid gene and infer the
corresponding phylogeny-based orthology and paralogy relationships among pea aphid
genes and those in other organisms, we reconstructed the pea aphid phylome, that is,
the complete collection of phylogenetic trees of every protein encoded in the A. pisum
genome. To do so, we followed a similar pipeline to that used for the human phylome
(Huerta-Cepas et al 2008). The resulting alignments, phylogenies and orthology
predictions can be accessed through phylomeDB (Huerta-Cepas et al 2008)
(http://phylomedb.org) and AphidBase. We scanned the pea aphid phylome with a
previously-described, phylogeny-based orthology prediction algorithm (Huerta-Cepas et
al. 2007). Prediction of orthology is a fundamental step in the functional annotation of
newly sequenced genomes. Most projects use blast-based orthology detection methods
although phylogeny-based approaches are considered more accurate (Gabaldon 2008).
Using phylogeny-based orthology we were able to directly transfer of GO annotations to
4058 pea aphid genes that display one-to-one orthology relationships with Drosophila
melanogaster genes (see Material and Methods ). To our knowledge, this is the first
newly sequenced genome for which phylogeny-based orthology predictions have been
used in the annotation pipeline.
Another advantage of the availability of the phylome, is that we can readily obtain a
picture of the gene duplications occurred specifically within the A. pisum lineage. For
this, we used the above mentioned algorithm to detect all A. pisum paralogy
relationships resulting from duplications in the pea aphid lineage. 2459 pea aphid gene
families present lineage-specific duplications (Figure 2). Most of thes gene family
expansions are small-to-moderare in size, resulting in a total of 2 to 10 in-paralogs (2239
families). The remaining 220 families have experienced massive expansions resulting in
in-paralogs groups with 10-50 members (196 families) and 50 to roughly 200 members
(19 families). Sequence analyses of members of the latter groups have identified
reverse-transcriptase and trasposase domains, suggesting that these may represent
expansions of transposable elements. However other expansions affect other classes of
genes. For instance, the pea aphid posseses circa 200 lineage-specific paralogs of the
Drosophila gene kelsh, coding for an acting binding protein involved in ovarian folicule
cell migration and oogenesis (see, for instance, the gene tree for ACYPI51424-PA in
phylomeDB). Another example is that of an lineage-specific expansion of a putative AcycoA transporter leading to 19 in-paralogs (Figure 2 B), The exact functional meaning of
these and other expansions remain to be investigated but some are likely to be related
with specific adaptations of aphids in terms of life cycle and diet. Further examples of
gene family expansions are disussed throrough the text. (NOTE: some expert in aphids
can give a hint on the functional meaning of these two examples?)
Chromatine modifications. The extent and function of DNA methylation in
insects still remains largely unknown. The pea aphid has the full complement of CpG
methylation associated proteins. Two copies of DNA methyltransferase 1 (Dnmt1a and
Dnmt1b), the maintenance methyltransferase, one copy of Dnmt2, and one Dnmt3, the
de-novo methyltransferase were identified (Walsh et al., companion paper). All of the
Dnmts were active in vitro and there mRNAs detected by RT-PCR. Also present were a
CpG binding protein and a Dnmt1 associated protein (Dmap1). Global methylation levels
are (to be determined or reference (Mandrioli and Borsatti 2007). Additionaly, the pea
aphid has a full complement of the histone genes submitted to post-translational
modifications such as acetylation, methylation, phosphorylation, and ADP ribosylation.
Several genes appear to have undergone recent duplications, potentially enabling
greater diversity and specialization among chromatin remodeling complexes. The pea
aphid possesses orthologs of histone deacetylase proteins such as HDAC8, a putative
HDAC10 and extra Rpd3-like proteins that may participate in gene silencing, that are
absent from the Drosophila genome. Histone acetyltransferases are also abundant with
two paralogs for PCAF/GCN5 and the MYST family members related to the Drosophila
genes enoki mushroom (enok) and males absent on the first (mof). The pea aphid
possesses an extended repertoire of more than two dozen SET-domain proteins, protein
arginine methyltransferase-like proteins, and two Dot1-like proteins, predicted to be
involved in histone methylation. Multiple classes of Jumonji C domain containing proteins
exist that, along with two LSD1-like proteins, are likely to participate in histone
demethylation. The attachment of ubiquitin or the small ubiquitin-like modifier (SUMO) to
histones and other transcriptional regulators can have a dramatic effect on chromatin
structure (2,3). There are at least three SUMO-related proteins in the pea aphid genome,
one of which is a clear ortholog of Drosophila smt3, a protein that is highly active in the
germline and also required for morphogenesis (4,5). A family of poly ADP ribosylases
was also identified that may participate in chromatin replication through histone
modification (6).
Small non-coding regulatory RNAs. RNA mediated gene silencing is mediated
by two types of small non coding RNAs: the small interfering RNAs (siRNAs) and the
microRNAs (miRNAs). Both mi- and siRNAs play a crucial role in the regulation of gene
expression in eukaryote. While miRNA are processed for endogenous genes encoding
stem loop hairpin transcripts, siRNA arise by cleavage of either exogenous or
endogenous long double stranded RNA (dsRNA) precursors. Depending on the
organisms, siRNAs and miRNAs have two overlapping (like in vertebrates) or parallel
(insect) pathways involving key factors such as dicer proteins, double stranded RNA
binding proteins and Argonautes. We identified the pea aphid genes involved in the
siRNA and miRNA machinery and evidenced an unexpected gene expansion specific of
the miRNA related factors. We identified two copies of the miRNAs specific dicer-1 and
argonaute-1 and four copies of pasha, a cofactor of drosha involved in the miRNA
biosynthesis (Legeai et al. companion paper). Many of these expansions were also
identified by PCR cloning and sequencing in other aphid species. While all these genes
are monogenic in other insect species, this expansion of the microRNA machinery
appears to be unique across the metazoan. Moreover, we have shown that the
expression of some of these expanded miRNA related genes is linked to the
reproduction mode of the pea aphid. MicroRNAs of the pea aphid have been identified
by deep-sequencing and bioinformatic analyses (Jaubert –Possamai et al., companion
paper). By combining these methods we identified 132 microRNAs including 65
conserved and 67 new aphid specific microRNAs.
The Genome of a Phloem-Feeding Specialist
Finding a Suitable Host Plant. Like other insects, aphids face the challenge of
finding suitable food supplies by distinguishing hosts and non-hosts via semiochemical
cues. In the case of the pea aphid, these insects are limited to plants in the family… The
first step in this olfactory signal transduction involves the semiochemicals entering the
antennae and binding to odorant-binding proteins (OBPs), which transport the molecules
to the olfactory receptors (Ors). OBPs are a family of small water-soluble proteins that
can be classified into four groups: classic OBPs (with 6 conserved cysteines), plus-C
OBPs (with 8 conserved cysteines and one conserved proline), atypical OBPs (with 9 to
10 cysteines) and chemosensory proteins (CSPs with 4 conserved cysteines). We
identified 11 classic OBPs, one plus-C OBP and 11 CSPs. The genes for the OBPs tend
to be clustered in the genome and have more and longer introns than their counterparts
in Drosophila. Orthologous sequences have also been identified in eight other aphid
species showing that although there are diverse OBPs within each species there is a
very high similarity between homologues in different species. This means that having
identified OBPs in A. pisum the information can be readily transferred to other aphid
species for studies of olfaction in for example the important pest aphid Myzus persicae.
Although the exact way in which semiochemicals/OBPs interact with Ors is not
established, Ors have been identified in many insect species and usually constitute a
large superfamily of 7TM ligand-gated ion channels. Four A. pisum Or genes have been
annotated manually, these genes being poorly represented in the concensus gene set.
Seventy-nine genes in the Or family have been identified, including 49 intact and
complete genes, 22 partially annotated genes and 8 putative pseudogenes. As
expected, because of striking conservation of the D. melanogaster gene DmOr83b in all
insects, an ortholog has been identified in A.pisum. There are three other highly
divergent genes that show some homology with other insect sequences but the
remaining 75 genes correspond to aphid specific expansions with no clear relatives
among other insect receptor genes. These genes form two gene subfamilies of 9 and 37
intact complete genes with some clades having reasonably long branches to each
protein indicating relatively old expansions of receptors, while other groups exhibit short
branch lengths to most proteins and some tandem genes, which might indicate more
recent duplicates. Pseudogenes are mainly found in recently expanded groups of genes.
Except for the ortholog of DmOr83b, the possible functions of these receptors are
unclear and further functional annotation is required.
The insect chemoreceptor superfamily of 7TM ligand-gated ion channels consists
not only of Ors but also the basal gustatory receptor (Gr) family and for A.pisum 78
genes were identified as Grs, with six members of the sugar receptor subfamily. This
subfamily is generally well conserved throughout the holometabolic insects with relatives
in Daphnia. No relatives of the carbon dioxide receptors that are highly conserved from
flies to moths and beetles, but not Hymenoptera, were found. The remaining 72 genes
fall into two distinct classes, none of which have clear-cut relationships to any of the Grs
from holometabolic insects. Five are highly divergent "singletons" while another four form
a small subfamily and possible functions of these receptors are unclear. The remaining
Grs form two large subfamilies of 21 and 42 genes that by analogy with similar Gr
expansions in other insects might constitute the "bitter" taste receptors of aphids
involved in detection of the many secondary plant compounds they are exposed to.
These large subfamilies have quite distinct phylogenetic properties. The 21 gene
subfamily has reasonably long branches to each protein, and only three of them are
apparent pseudogenes and two have parts missing in assembly gaps, indicating that this
is a relatively old expansion of receptors, most of which are still functional. The 42 gene
subfamily exhibits extremely short branch lengths to most proteins. Eight of them are
clear pseudogenes, while another 14 are only partial gene models due to assembly
problems with parts of these genes missing in gaps. In summary the Or and Gr
subfamilies appear to be undergoing rapid and recent expansion in the A.pisum genome
and it can be speculated that some of these genes might be crucially involved in
perception of host plant chemicals, either volatile or non-volatile. Specifically, bitter taste
receptors are good candidates for the perception of chemicals present in plant
subepidermic tissues, which are known to be important for host plant acceptance in
aphids. Therefore, the identification of chemoreceptor genes is a crucial step towards
understanding the mechanisms of host plant specialisation and host race formation.
Initiating Feeding and Overcoming Plant Defenses.. Once an aphid has
located a host plant, feeding activity must commence. For aphids this is distinct from
insects with chewing mouthparts since they feed on a single plant tissue, namely phloem
sap located within the sieve tubes, rather than ingesting a mixture of whole cells.
Bioactive compounds in the salivary secretions of sap-feeding insects are believed to
overcome plant defense mechanisms, both in the overlying tissue and within the sieve
elements, so that the insect can feed for prolonged periods of time. A proteomic analysis
of the A. pisum salivary gland has catalogued proteins from this tissue (referred to as the
saliome) and identified putative secreted salivary proteins. A dual approach of GelC
MS/MS and MALDI TOF/MS on 1DE and 2DE fractionated glands, respectively,
identified a total of XX proteins, 67 of which were novel (with no identity in public
databases). Following SignalP analysis, YY of the identified proteins contained a
secretion signal and, of these, ZZ had previously been identified as being secreted into
artificial diets during feeding. [Annotations of the genes involved in A.pisum? Numbers
present in genome?]
When aphid feed from plants, they take up both food substances and plant
secondary metabolites. The need to detoxify these potentially deleterious compounds is
a problem faced by all herbivorous insects, which generally use a range of detoxification
enzymes, including cytochromes P450 monooxygenase (P450s), glutathione Stransferases (GSTs), and esterases. A. pisum genome analysis has identified 82 P450
genes [and XX GSTs? and YY esterases?]. Whereas A. pisum feeds almost exclusively
from the Fabacae, the aphid Myzus persicae feeds on hundreds of species in more than
forty plant families. Therefore, M. persicae, is exposed to a greater diversity of plant
secondary metabolites and might be predicted to have a wider array of detoxification
enzymes. Consistent with a hypothesis of a larger complement of detoxification enzymes
in M. persicae, analysis of available M. persicae ESTs has identified 140 putative P450s,
compared with the 82 in A. pisum.
Detoxification enzymes may also be involved in resistance to many insecticides.
Additionally, insecticide resistance can be caused by mutations at the target-site of the
chemical, most commonly ion channels. These include the voltage-gated sodium
channel, glutamate receptors and nicotinic acetylcholine receptors and the A. pisum
genome has been found to contain XX, YY and ZZ genes encoding these proteins
respectively. Other genes have been identified encoding potassium channels, calcium
channels, and chloride channels. Although the A. pisum EST coverage is not yet
sufficient to confirm that the full length gene models are correct, the data has allowed to
determine whether orthologs of D. melanogaster ion channel genes are present. This
revealed the extent of gene duplication and loss in the ion channel genes that is so
prevalent in other gene families studied in A. pisum.
Another group of proteins involved in detoxification are the proteases. Proteases
are a structurally and functionally diverse set of enzymes involved in a plethora of
biological processes ranging from non specific degradation of ingested proteins to
complex cascading pathways involving highly selective cleavage of substrates. The
availability of the A. pisum genome has allowed the comprehensive characterization and
analysis of the complete protease repertoire (degradome) of this organism. Using the
peptidase classification system established by MEROPS taxonomic levels were
assigned to the peptidases, which were then used for annotation purposes. Additionally
a non-redundant taxonomical evaluation of clan AA peptidases according to most recent
trends focusing on the large diversity of LTR retroelement proteases and their
relationships with their host gene counterparts. The current annotation of the A. pisum
degradome indicates that there are at least XXX proteases and homologues which are
distributed into xx aspartic, xx cysteine, xx metallo, xx serine, xx threonine proteases.
The distribution and taxonomy of A. pisum proteases was investigated through
comparative and phylogenetic analyses conducted against a number of insect species
(in particular Drosophila melanogaster). [GENERAL RESULTS TO FOLLOW]. The
annotation of the degradome of A. pisum is still ongoing and an accurate assessment of
the distribution of peptidases will only be obtained after considerable functional
experimentation. [protease inhibitors?]
Plant Phloem as a Food Source. Phloem sap has a high and variable sucrose
concentration, in the range 0.2–1.5M (Douglas et al., 2006). Consequently, although the
diet ingested by aphids supplies a high level of carbon for nutrition, routinely it also has
an osmotic pressure significantly above that of insect haemolymph. Ingested sucrose is
hydrolysed by a gut sucrase (Christofoletti et al., 2003; Price et al., 2007), and accessing
the constituent sugars as a resource for metabolism is dependent on transport out of the
gut. Sugar transport also plays an important role in osmoregulation, both by removal
from the gut, and by enabling accumulation in the haemolymph, where the major sugars
for A. pisum are fructose (source of metabolic intermediates; estimated concentration
approx. 130mM), and the disaccharide trehalose (estimated concentration approx.
260mM) (Rhodes et al., 1997; Ashford et al., 2000). Sugar transport across cell
membranes in higher organisms uses both proteins of the major facilitator superfamily
(MFS), which can be uniporters or coupled transporters, and proteins of the
sodium:solute symporter family (SSF) in which sugar transport is coupled to sodium ion
transport. A. pisum genome contains approximately 200 predicted genes encoding
proteins belonging to clan MFS (Pfam CL0015), which includes a wide range of
functional roles involving transport of many different types of small molecules. Sequence
analysis of the predicted gene products has allowed the identification of families and
subfamilies of putative transporters of oligopeptides, nucleosides, folate, organic anions,
phosphate, amines, organic cations and monocarboxylates, on the basis of sequence
similarity to previously identified proteins in D. melanogaster and other insects. A nonredundant and nearly complete family of 54 genes has been identified in A. pisum which
encode proteins belonging to the sugar transporter family Sugar_tr (Pfam PF00083)
within clan MFS. These gene products show similarity to proteins annotated “sugar
transporters” in D. melanogaster and other insects greater than similarity to proteins
annotated with other functions. Of the 54 potential sugar transporter genes in A. pisum,
16 have no corresponding ESTs, leaving 38 active genes.
The sugar transporter genes in A. pisum can be divided into two groups, based
on an InterPro signature diagnostic for transporters of sugars and inositol (IPR003663).
Preliminary functional analysis of selected A. pisum sugar transporters has been carried
out by complementation analysis; growth of a yeast hexose transport-deficient mutant on
minimal media containing hexoses has been restored by transformation with expression
constructs containing coding sequences for the proteins. Several gene products
containing the IPR003663 signature were able to transport hexoses in this assay,
whereas a gene product not containing this signature was not a hexose transporter (data
not presented). However, characterisation of substrate specificities for individual
transporters requires direct uptake assays with labelled substrates. These assays have
been carried out for most highly-expressed sugar transporter in A. pisum (Ap_ST3; see
companion paper), which is a uniporter with specificity for fructose and glucose. This
transporter is expressed in gut tissue, and is likely to play an important role in transport
of sucrose hydrolysis products from gut lumen into the haemolymph.
The number of sugar transporters in A. pisum is higher than in other insects, except
Tribolium, which has a similarly extreme diet with potentially high sugar concentration
due to low water content. The high number of sugar transporters in A.pisum may result
from gene duplication, evidenced by a “clustering” of genes; one genomic scaffold
contains 7 sugar transporter genes in a single 250kbp region of genomic DNA. Although
aphid equivalents can be identified for the trehalose transporter characterised from the
anhydrobiotic insect, Polypedilum vanderplanki (Kikawada et al., 2007), and for some
Drosophila sugar transporters, including those containing a glucose transporter
signature (IPR000803), there is a high level of sequence diversity among these genes,
with evidence of species-based grouping of sequences when analysed by the Clustal
method. It is suggested that aphids have evolved an increased set of MFS-type sugar
transporter genes as a functional adaptation to feeding on specialised diets, with specific
requirements for the transport of sugars and other small molecules. In contrast to
members of clan MFS, genes encoding sodium-solute symporters of family SSF are
comparatively rare in A. pisum compared to D. melanogaster (5 predicted genes vs. 14).
The A. pisum genes encode proteins predicted to transport short-chain fatty acids or
choline, but not sugars. Proteins capable of transport of sugars against a concentration
gradient have yet to be identified in insects.
Transmission of Plant Diseases. One of the consequences of the phloemfeeding of aphid is that they are able to transmit plant diseases and indeed aphids are
responsible for the transmission of over 55% of plant viruses. Two principal phytovirus
transmission mechanisms have been described so far: the circulative transmission
involving the transport by transcytosis of virions through two different barriers in the
insect vector (the gut and the salivary gland) and the non-circulative transmission in
which virus particles are retained on the cuticle lining of the stylet and which do not
involved internalization of virions in insect cells. In the circulative mode of transmission,
virions enter the cell following receptor mediated-endocytosis, and are transported
across the cell in vesicles of different nature before being released at the other side of
the cell by exocytosis. Transcytosis is a general mechanism used for the transport of
macromolecules across cells. The correct uptake, transport and delivery of the vesicles
cargo relies on the participation of several families of proteins. Sequence comparisons
with annotated genomes such as that of D. melanogaster, T. castaneum and humans
has allowed the identification of A. pisum protein families involved in transcytosis and
potentially involved in the circulative transmission of virus: clathrins, and dynamins
(involved in vesicle formation), SNAREs and Rab GTPases (involved in fusion of
membrane), sec proteins (involved in protein translocation), synaptotagmins,
cytoskeleton proteins as well as receptor proteins (scavenger receptors, receptor
tyrosine kinase,…).
Genomes of a Symbiotic Association
Host-symbiont genomes. Like many insects, aphids are hosts to various
symbiotic microorganisms. Aphids harbor the obligate mutualistic primary symbiont,
Buchnera aphidicola (Gammaproteobacteria), within the cytoplasm of specialized cells
called bacteriocytes. Buchnera synthesize essential amino acids (i.e. the amino acids
that animals cannot synthesize de novo but that are required in proteins) and is required
for aphids’ survival. It is widely accepted that aphids can utilize the diet of plant phloem
sap, which is deficient in essential amino acids, only because their Buchnera symbionts
are a supplementary source of these nutrients. Additionally, many aphids also harbor
facultative secondary symbiotic bacteria (Moran et al. 2005) that have been shown to
influence several aspects of aphid ecology, including host plant specialization and heat
tolerance (Chen et al., 2000; Montllor et al., 2002; Russell, Moran, 2006; Tsuchida et al.,
2004). These symbionts also protect their hosts from fungal pathogens and parasitoid
wasps (Oliver et al., 2005; Oliver et al., 2003, Scarborough et al., 2005). Such intimate,
evolutionarily stable associations influence host and bacterial evolution and likely shape
their genomes as well.
Lateral Gene transfer from symbionts to the host. The A. pisum genome
provides the first opportunity to examine the complete genome of an animal that is host
to an obligate mutualistic intracellular bacteria (primary symbiont). Aphids have been
dependent on symbionts for millions of years. Since the initial infection of an aphid
ancestor more than 100 Myr ago (Moran et al., 1993), Buchnera have been subjected to
strict vertical transmission through host generations, and the mutualism between
Buchnera and their host has evolved to the point that neither can reproduce in the
absence of the other. During the course of coevolution with the host, Buchnera has lost
a number of genes that appear to be essential for bacterial existence. The genome of
Buchnera from A. pisum encodes about 620 genes, which is only one seventh the
number of genes in the genome of related bacteria such as Escherichia coli (Shigenobu
et al., 2000). This raises the question of whether any lost genes have been transferred
from the genome of ancestral Buchnera to the genome of aphids. Such lateral gene
transfer (LGT) would parallel that known to have occurred from bacterial endosymbionts
to the host nuclei during the evolution of mitochondria and plastids in eukaryotic hosts
(Dyall et al., 2004). Secondary symbionts, bacterial commensals, and bacterial
pathogens could also serve as sources of transferred DNA. Indeed, there are some
reports of LGT between a facultative endosymbiont Wolbachia (secondary symbiont)
and its host arthropods and nematodes (Kondo et al., 2002; Fenn et al., 2006; Hotopp et
al. 2007; Nikoh et al., 2008). However, none of these laterally transferred genes
reported thus far appear to be functional.
Screening the genome of A. pisum for bacterial sequences, followed by
phylogenetic analyses, identified several genes that seem to have been transferred from
bacterial genomes to the genome of an ancestor of A. pisum (Nikoh et al., 2009,
companion paper). The candidate genes include those for LD-carboxypeptidase (ldcA),
N-acetylmuramoyl-L-alanine amidase (ybjR), rare lipoprotein A (rlpA), DNA polymerase
III alpha chain (dnaE), and uridylyltransferase (glnD). Buchnera lacks all of these genes
other than dnaE. Transcripts of ldcA and rlpA were originally detected in the
transcriptome analysis of the bacteriocyte (Nakabachi et al., 2005). While phylogenetic
analyses suggested that ldcA and ybjR were transferred from Rickettsiales
(Alphaproteobacteria), not Buchnera, dnaE in the aphid genome was significantly similar
to that of extant Buchnera, and glnD appeared to be of gammaproteobacterial origin
(Buchnera and many facultative symbionts and pathogens are Gammaproteobacteria).
Coding regions of ldcA, ybjR, and rlpA appear to be intact, and these genes were shown
to be expressed strongly in the bacteriocyte (Fig. 1), implying that they are not only
functional, but also that they may play important roles in symbiosis with Buchnera
(Nikoh and Nakabachi, 2009; Nikoh et al., 2009, companion paper). Only small parts of
dnaE and glnD are represented in the aphid genome, implying that they are not
functional. Thus, although it seems that aphids acquired some functional genes via LGT
from secondary symbionts, the aphid genome appears not to contain a significant
portion of the hundreds of genes that Buchnera lost as it evolved a highly reduced
genome.
Metabolism and symbiosis. The metabolic capacity of the pea aphid was
examined in the context of its unusual diet of plant phloem sap, which is rich in sugars
and deficient in essential amino acids, and its obligate symbiosis with Buchnera. The
genetic capacities of the insect and Buchnera for amino acid metabolism are broadly
complementary, largely as a result of gene loss from Buchnera (Shigenobu et al. 2000).
This complementarity results in several apparent instances of metabolic pathways
shared between the aphid and Buchnera (see companion paper, Wilson et al.).The pea
aphid also appears to lack some nitrogen metabolism genes that are present in other
sequenced insects genomes. Of particular note is the amino acid tyrosine. Insects
generally have the gene phenylalanine hydroxylase, which mediates the synthesis of
tyrosine from the essential amino acid phenylalanine, and the genes for tyrosine
degradation to fumarate and acetoacetic acid. The pea aphid has the gene for
phenylalanine hydroxylase; and the high abundance of this gene transcript in
bacteriocytes (Nakabachi et al., 2005) suggests that phenylalanine synthesized by
Buchnera is an important source of the aphid tyrosine requirement. Unlike other insects,
however, the pea aphid lacks tyrosine transaminase and other tyrosine catabolism
genes (Wilson et al. companion paper). Perhaps tyrosine degradation is redundant in
the pea aphid because Buchnera, which can neither synthesize nor degrade tyrosine but
requires it for protein synthesis, is predicted to be a major sink for this amino acid
(Shingenou et al., 2000).
The genes for the urea cycle and two core genes of the purine salvage pathway,
adenosine deaminase and purine nucleoside phosphorylase, are also apparently absent
(see companion paper, Ramsey et al.), with the implication that the aphid is unlikely to
be able to produce urea and uric acid, respectively. Furthermore, the pea aphid is
expected to be entirely dependent on the diet and Buchnera for its supply of the amino
acid arginine, unlike the many animals that derive part of their arginine requirement from
the urea cycle. The presence in the aphid genome of a gene for glutamine synthetase 2
(ACYPI006239; EC 6.3.1.2; Glutamate + ATP + ammonia -> Glutamine + ADP +
phosphate), which was found to be highly expressed in the bacteriocyte cells that house
Buchnera (Nakabachi et al. 2005; Nakabachi et al., 2009, companion paper), raises the
possibility that bacteriocytes actively synthesize glutamine, which is then utilized by
Buchnera as an amino donor in several metabolic pathways, including arginine
biosynthesis. These genomic data are fully consistent with the evidence that the nitrogen
excretory products of pea aphids include no detectable uric acid or urea (Sasaki et al.
1990); and that the growth of pea aphids experimentally deprived of Buchnera is
significantly depressed on arginine-free diets (Gündüz et al. 2008). Nitrogen excretion in
most terrestrial insects is dominated by uric acid voided via the Malpighian tubules.
Aphids are most unusual in that ammonia is their sole known nitrogen excretory
compound and Malpighian tubules are absent. These evolutionary changes and the loss
of genes in the urea cycle and purine salvage pathway of aphids can be correlated with
the high water content of the phloem sap diet and efficient water cycling in the aphid gut
(Shakesby et al. 2008).
Immunity and symbiosis. Studies of diverse hosts suggest that host immune
function plays a role in the establishment and maintenance of symbiotic associations
(Heddi et al. 2006). In turn, the evolutionary maintenance of a host immune response to
microbes may be influenced by the ability of symbionts to protect their hosts from
pathogens and parasites. Therefore, host immunity may both shape and be shaped by
symbiotic associations.
Aphids lack many immune-related genes common to insects and other
invertebrates (companion paper, Gerardo et al.). First, while orthologs to key
components of the immune-related toll, jak/stat and jnk signaling pathways were
identified, many of the genes comprising the IMD pathway (IMD, BG4, Dredd, Relish)
appear to be missing in the pea aphid. This pathway is intact in genomes of other
sequenced insects (and several of the genes are also found in the crustacean, Dapnia
pulex). Second, several of the main insect immune pathways are triggered by
recognition of pathogens via peptioglycan recognition proteins (PGRPs), which are also
absent in the pea aphid but are present in many insects and other arthropods (e.g., flies,
mosquitoes, bees, lice, ticks; but notably, not Daphnia pulex). Finally, in eukaryotes,
recognition and signaling ultimately leads to the production of diverse antimicrobial
peptides (AMPs), some of which are genus-specific (e.g., drosomycin in Drosophila) and
some of which are commonly shared across diverse organisms (e.g., defensins in plants
and animals). Manual annotation revealed few identifiable AMP genes in the pea aphid,
and RNA and protein isolation methods (i.e, suppression subtractive hybridization,
sequencing of ESTs from infected individuals and HPLC analyses) successfully used to
identify AMPs in other immune-challenged insects, did not recover any AMPs from
immune-challenged aphids (Altincicek et al, 2008, Gerardo et al, companion paper). In
fact, during these immune-challenges, aphids upregulated few genes of known immunefunction and few novel genes that could be associated with an alternative immuneresponse. Although further functional assays are important to test for additional immune
responses in aphids, missing immune-genes coupled with the weak response to
standard immune challenges suggest that aphids might have a reduced immune
response compared to the insects sequenced thus far. If so, immune processes may not
play a critical role in aphid interactions with their symbiotic bacteria.
There are several possible explanations for why aphids may have a reduced
immune system. First, aphids feed on mostly microbe-free phloem sap, reducing the risk
of ingestion of pathogens while feeding. Second, during much of their lifecycle, pea
aphids rapidly reproduce clones of themselves, making it possible that they may have
higher fitness if they invest in reproduction rather than in a presumably costly immune
defense. Finally, reduced immune function could be a consequence of symbiosis itself.
Because aphids frequently harbor protection-conferring symbionts, symbioses may relax
selection for maintenance of the hosts’ own immune systems or may select for loss of
immune-functions that could prevent that establishment of beneficial symbionts. Future
work on other aphid species and other insect groups will be important to establish when
the distinctive features of the aphid immune gene repertoire evolved, and how these
may relate to diet, symbiosis and other aspects of the insect life style.
Genome of the primary symbiont Buchnera aphidicola. Though the
sequencing project was designed to target the genome of A. pisum, the project also
generated sequences of the primary and secondary symbiotic bacteria. We obtained
24,947 sequence reads corresponding to the Buchnera genome. Using such
"contaminants", we were able to reconstruct the 642,011-base-pair complete genome of
Buchnera with 20x coverage. Compared with the originally sequenced strain (from
Japan, Shigenobu 2000), the new strain (from North America) shows approximately
1500 mismatches (0.23%) and two larger inserts (1.2 kbp and 150 bp). The newly
sequenced strain is almost 100% identical to a cluster of five recently sequenced
Buchnera strains from A. pisum collected in North America (Moran et al. 2009).
Compared to the closest strain, it shows only 10 nucleotide substitutions, 5 single base
indels in homopolymeric runs of 5-29 bases, and 4 larger indels including a 282 bp
deletion and a 157 bp insertion, both in intergenic spacers. The close correspondence
between the newly sequenced strain, assembled from traditional small insert clones and
Sanger sequencing, and five previously sequenced North American strains, sequenced
using Solexa/Illumina methods without traditional cloning, verifies the general accuracy
of both approaches.
Genome of the secondary aphid symbiont Candidatus Regiella insecticola.
Most of the genome of Regiella also was sequenced in connection with the pea aphid
genome project. Regiella infects a range of aphid species, including pea aphid, in which
it is sporadically distributed among individuals and populations. It is often intracellular,
like Buchnera (Fig. 2), but also lives extracellularly in the hemolymph. In contrast to
Buchnera, secondary symbionts such as Regiella are at low titer in host tissues,
resulting in low representation of symbiont DNA in samples. Although the aphid
sequencing project was carried out on a strain cured of Regiella infection, most of the
Regiella sequence was obtained by constructing a large insert (BAC) clone library from
DNA isolated from infected hosts and sequencing clones categorized as symbiontderived on the basis of end sequences. This produced a sequence in two scaffolds
estimated to represent at least 98% of a single circular chromosome representing the
entire Regiella genome. Contrasting the gene inventories of Buchnera and Regiella
illustrates the very different lifestyles of these two bacterial symbionts (Table XX).
Buchnera, as previously known, possesses a highly reduced genome largely comprised
of essential genes and genes for host nutrition, and completely lacks mobile elements,
phage or genes for toxin production, all of which are present in Regiella. Regiella
possesses a larger genome, intermediate in size between that of free-living
Gammaproteobacteria and that of Buchnera. It also contains many genes involved in
transport and invasion, and it has a lower overall coding density, reflecting the recent
degradation of genes from which DNA persists in the genome. …
Development in a Polymorphic Hemimetabolous Insect
Overview of Development. Aphids display a wide range of adult phenotypes and
possess two divergent modes of embryonic development: parthenogenetic and sexual
embryogenesis (Miura et al. 2003). Remarkably, all phenotypes and both modes of
embryonic development are coded by a single genome. This ability of a single genotype
to produce different phenotypes in response to environmental cues is an example of
phenotypic plasticity, which in cases like aphids, where plasticity results in the production
of discrete forms without intermediates, is known as polyphenism (Nijhout 2003).
Polyphenism requires concomitance between an input signal (the environmental cue)
and a critical period of development where the developing organism (usually an embryo)
is able to respond to this input. When parthenogenetic, a female adult aphid contains
embryos at all stages of differentiation and development, ensuring that at least one
embryo will be responsive to an environmental trigger. Aphid polyphenism includes
soldiers in gall-forming aphids (Ijichi et al. 2005), a switch from the the production of
apterous to winged individuals in response to crowding or predation (Braendle et al.
2006), and, extraordinarily, a switch of reproductive strategy from apomictic
parthenogenesis to sexual reproduction in response to seasonal change (Lees 1959,
reviewed in Le Trionnaire et al. 2008).
Signaling pathways and transcription factors. Genes of the highly conserved
TGF-β, Wnt, EGF and JAK/STAT signaling pathways have undergone several aphidspecific duplications and losses. Multiple paralogs were found for Dpp (4 paralogs, TGFβligand), Medea (5, TGF-β co-Smad), Mad (2, TGF-β R-Smad), Domeless (4, JAK/STAT
receptor), STAT (2, JAK/STAT transcription factor), Argos (4, negative regulator of EGF
signaling) and Armadillo (2, β-catenin in Wnt signaling). Aphid-specific gene losses were
found for several TGF-βligands (BMP10, Maverick and Alp23), Wnt ligands (Wnt6,
Wnt10) and Sprouty (RTK signaling inhibitor).
Most of the transcription factor families are similar in size and composition to those
of other insects. However, aphid has significantly more zinc finger containing proteins.
Although the number of bHLH containing genes looks similar to other insects, direct
orthologs of the achaete-scute genes cannot be found in the aphid genome. All HOX
complex genes are present, but Hox3 (zen) and ftz, which have evolved non-homeotic
functions in insects, are highly diverged from orthologs of other species.
Circadian rhythm. In Drosophila, two interdependent transcriptional negative
feedback loops centered on the genes Clock and Period are essential for the circadian
clock (Cyran et al., 2003). The clock feedback loop is highly conserved in the pea aphid
with well-conserved orthologs of the genes Clock, Vrille and PDP1 all present in the
genome. In contrast the period feedback loop is not well conserved; the gene Period in
A. pisum does not contain the motifs necessary for nuclear import and the whole protein
seems to be evolving at accelerated rates (circadian rhythm companion paper). Other
participants of the circadian clock, the cryptochromes Cry1 and Cry2 (Yuan et al. 2007),
are present in the pea aphid genome; the latter is duplicated. The circadian clock
repertoire includes a collection of additional genes; of these, orthologs of Drosophila
kinases Double-Time, Shaggy and Casein Kinase 2, as well as Protein Phosphatase 2a
and the protein degradation protein Supernumerary Limbs are relatively well conserved
in A. pisum (Table X, Comparison of circadian rhythm genes across insects).
Neuropeptides. Neuropeptides are cell-to-cell signaling molecules that act as
hormones, neurotransmitters or neuromodulators and possibly are involved in
transmitting the environmental input to the target tissues in polyphenism (Hardie 1987).
In A. pisum there are about 30 neuropeptide precursor genes that can generate more
than 88 neuropeptides. The PDF and corazonin precursor genes were not found. This is
surprising, as PDF is known to be involved in circadian rhythm and corazonin has
already been found in hemipterans (Rhodnius) and is known to regulate migratory phase
transition in Locusta and Schistocerca.
Mitosis, meiosis and cell cycle. Most of the genes involved in mitosis, meiosis
and the cell cycle are present in the pea aphid genome. Remarkably, the complement of
meiotic genes is more similar to that of vertebrates than to that of other insects because
it retains meiotic genes (Hop2, Mnd1, Msh4, Msh5) lost in many other insects and the
Ecdysozoa. The complement of genes known to regulate transition from G1 (growth) to
S (DNA replication) phases in other organisms is similar in pea aphid to other insects
and the metazoans; D-type and E-type cyclins, E2F transcription factors and
Retinoblastoma protein are all found in the pea aphid genome. Interestingly, compared
to Drosophila, the pea aphid genome contains duplications of several mitotic kinases,
such as Cdk1, Polo and Aurora. Expression studies implicate these mitotic kinase
paralogs in regulation of aphid reproductive polyphenism (Srinivasan et al companion
paper). Aphid has several duplicated mitosis-related genes that are single in other
insects, but are duplicated in Daphnia, including Smc6 (Structural maintenance of
chromosomes 6; Uniprot:Q96SB8; ARP1_G1616) and Topo2 (DNA Topoisomerase 2;
Uniprot:Q92547; ARP1_G1509). These are involved in DNA double-strand breaks and
homologous recombination (Harvey et al 2002; Hwang et al 2008)While neither loss nor
expansion of cell cycle or meiosis genes are sufficient to account for aphid reproductive
plasticity, the expansion of key mitotic regulatory kinases raises interesting questions
about the role of these kinase paralogs in aphid mitotic and meiotic reproductive
plasticity.
Embryogenesis. The majority of the genes involved in axis formation,
segmentation, neurogenesis, eye development and germ-line specification in the embryo
are well conserved. Genes playing critical roles in Drosophila embryogenesis, but not
found in non-dipteran insects, are also missing from aphids, such as oskar (germ-line
specification), bicoid (anterior development) and gurken (dorso-ventral patterning).
Despite the absence of these orthologs, the downstream components of the signaling
pathways are well conserved. Lineage-specific gene losses were found for the gap
genes, giant and huckbein. Although a single homolog of Drosophila anterior gap gene
otd has been identified, it is not expressed in the anterior (Huang et al. companion
paper). Some orthologous genes for establishing the body plan have undergone aphidspecific gene duplications. For example, spatzle and Dorsal, the key components of
dorso-ventral patterning have been duplicated. There are two paralogs of Torso-like, the
most conserved molecule in the terminal patterning pathway. More striking examples of
duplications are seen in the key genes of germ-line development: the aphid genome has
3 paralogs of vasa and 4 paralogs of nanos, of which only 1 and 2, respectively, are
expressed in the germline (Chang et al. 2007, 2008, and unpublished data).
Juvenile hormones. The main enzymes responsible for the synthesis and
degradation of juvenile hormones (JH) are present in the pea aphid genome (Table X JH
related genes in pea aphid and their methylation status, JH companion paper). However,
the pea aphid apparently lacks other JH associated proteins such as hexamerins, which
constitute a class of JH binding proteins implicated in social insect caste regulation
(Zhou et al. 2007).
JH has been postulated to regulate polyphenism in insects, including aphids
(Corbitt & Hardie 1985, Nijhout 2003). The coding sequence of JH binding protein is
differentially methylated between apterous and alate morphs of the pea aphid (Walsh et
al., companion paper). However, no correlation has been demonstrated between JHIII
titre and the proportion of winged offspring from aphids induced during their
parthenogenetic phase (Schwartzberg et al. 2008).
Hemimetabolous development. In holometabolous insects such as Drosophila,
wing development occurs primarily during the last larval instar and pupal stages, either
directly from the ectoderm or from sequestered imaginal discs. In contrast, in
hemimetabolous insects such as aphids, wing development progresses gradually across
all nymphal instars. Despite this marked developmental difference between
holometabolous and hemimetabolous insects, all major components of Drosophila wing
development are conserved in the pea aphid. However, in contrast to holometabola, the
pea aphid has fewer genes encoding for chitinase, an enzyme with chitinolytic activities
that degrades old cuticles and the peritrophic membrane. This difference possibly
reflects the fact that hemimetabolous insects do not require dramatic exoskeletal
reconstruction.
Sex determination and dosage compensation. Sex determination in aphids is
chromosomal, where females have two X chromosomes and males only one (Wilson et
al 1997) while both sexes share the same autosomal complement. Insect sex
determination pathways appear diverse and only moderately conserved through
evolutionary time in insects. The pea aphid has homologs to the terminal two genes,
transformer 2 and doublesex, of the D. melanogaster somatic sex determination
pathway. However, the pea aphid gene model for doublesex is currently incomplete
because it is missing the dimerization domain. Several genes associated with Drosophila
sexual differentiation and courtship behavior, are also conserved in the pea aphid
genome (Huang et al companion paper).
Dosage compensation in Drosophila involves hypertranscription of the single male
X chromosome (Arnold et al 2008). While some Drosophila dosage compensation genes
are present in the pea aphid genome (maleless, males-absent-on-the-first) , others, such
as the male-specific-lethal genes (msl-1, msl-2) and RNA genes (roX1, roX2), are
absent. Since some dosage compensation genes are also found in the honeybee (mle,
mof and msl-3), an insect that lacks X-specific dosage compensation (Honeybee
Consortium, 2006), it remains an open question as to whether the genes present in
aphids function in dosage compensation or are rather more broadly associated with
chromatin structure and transcriptional regulation.
Discussion
Other sections being written currently?
Polyphenism. Living individuals adapt their physiology to changing environment
during their life-span. Aphids, through their high capacity for phenotypic plasticity, adapt
not only their physiology, but also their embryogenesis program. Adults exposed to
environmental changes are thus able to produce in their progeny morphs genetically
identical but best adapted to the new environment. This requires sensing and
transducing of environmental signals, leading to regulation of genetic developmental
programs. One of the most prominent changes in the environment sensed by aphids is
reduced day-length in autumn, responsible for the switch from viviparous
parthenogenesis to oviparous sexual reproduction. It is still under debate whether
sensing and transducing this seasonal photoperiodism is linked to the circadian clock as
a mechanism to measure day-length (Van Nunez and Hardie 2001). Pea aphid genome
analysis showed that the period feedback loop is not well conserved. The high rate of
evolution of the period pea aphid protein is intriguing and suggests an atypical circadian
mechanism. The absence of PDF precursor gene in the pea aphid genome strengthens
this observation: PDF is a well known regulator of the circadian rhythm that operates
with the brain master gene clock. In D. melanogaster, PDF is secreted in a circadian
manner and perturbation of its expression causes arhythmicity (Helfricht-Förster 2005).
Early experiments in the 80’s strongly suggested a role for JHs in transducing the
photoperiodic signals responsible for the switch of reproduction mode from brain to the
ovaries (Corbitt and Hardie 1985). JHs also regulate the development of winged or
apterous morphs (Braendle et al. 2006). The synthesis, complex formation and
degradation of JHs – very potent morphogenic molecules - are highly regulated, and this
entire repertoire is present in the pea aphid genome. In particular, the JH binding protein
hexamerin which has been shown to play a key role in phenotypic plasticity in termites
and honeybee (ref) was not found in the pea aphid genome.
Interestingly, some of JH binding proteins have been found to be differentially
methylated in apterous and winged morphs. This suggests that epigenetic regulation
might be involved in aphid phenotypic plasticity (Wang et al. 2006). The expansion of
part of the small non-coding RNA machinery in the pea aphid could play a role in
epigenetic regulation (Brennecke et al. 2008). In particular, the duplication of Dicer 1 and
a preferential expression of one of its paralogs in sexual morphs suggest a role for
miRNAs in the regulation of reproduction mode and therefore a role in phenotypic
plasticity.
The final effect of JHs during seasonal photoperiodism is to determine the fate of the
germaria from developing embryos to undergo modified mitosis to enter parthenogenetic
embryogenesis, or to enter meiosis to produce sexual female gametes. Parthenogenetic
aphids derive from a modified mitosis of oocytes stem cells for where a single division
free of recombination produces two diploid cells: a polar body that degenerates and a
diploid oocyte that immediately undergoes synchronous mitotic divisions and
embryogenesis. The regulation of this process is not understood, and needs to be
studied in more detail. Differences in the cell cycle genes of the pea aphid compared to
Drosophila have been identified, and expression studies indicate preferential regulation
of mitotic kinase paralogs in different reproductive morphs.
Although the post-fertilization embryology of oviparous development is typical for true
bugs, the embryology of viviparous development differs in several profound ways [Miura
2003]. Viviparous embryos take 10-15 days to develop, while oviparous embryos can
take more than 100. Viviparous eggs are yolk-free and as a result are less than 1/10th
the length of oviparous eggs. The endosymbiotic bacteria are transferred into the
embryo just after cellularization in the parthenogenetic embryo, whereas bacteria are
packaged into sexual eggs before fertilization. The central question on the polyphenisms
is how the single genome gives rise to such divergent developmental consequences. We
identified developmental genes in the pea aphid genome comprehensively. It turned out
that the majority of genes involved in signaling pathways, establishment of the body plan
and organogenesis are well conserved, but peculiar lineage-specific gene expansions
and losses were found. It is unclear how much the gene duplications and losses have
impacted the developmental pathway of the pea aphid. It is also likely that the same
genes are used in different manners between the two modes. Future study should focus
on comparative analyses as to how the developmental genes we identified here are
used in such divergent embryological context.
The sex determination and dosage compensation mechanisms in aphids are not
known. One could hypothesise that the developmental programs of XX females and X0
males in aphids follow the same general mechanisms of those in Drosophila, where the
ratio between the numbers of X and autosome chromosomes dictates a cascade of
alternative splicing regulations leading to the expression of the genetic programs specific
for females or males. Key regulators of sex determination in Drosophila, such as Sex-
lethal, have not been found in the pea aphid but these are, in general, poorly-conserved
between species. Surprisingly, the double-sex genes - transcriptional factors responsible
for the expression of female or male specific genetic programs - are slightly different
from other insects that might suggest a different regulation process of sexual phenotypes
in Hemiptera.
Materials and Methods
Sequencing strain. Aphids for DNA isolation were from a clone, LSR1.AC.F1,
resulting from a single generation of inbreeding of clone LSR1. Aphids were treated with
ampicillin to reduce their only facultative symbiont, Regiella insecticola. Prior to DNA
preparation aphids were heat treated to reduce the number of primary symbionts,
Buchnera aphidicola. Entire aphid colonies on broad bean plants were placed in a
30°C incubator for 4 days. Quantification of levels of Buchnera DNA revealed a
significant decrease in the level of Buchnera. Approximately 2% of the sequencing reads
came from the Buchnera genome and were removed prior to assembly. WHERE CAN
THE STRAIN BE OBTAINED FROM?
Sequencing and Assembly, Acyr 1.0. Sanger reads (3.13 million), produced on
3730 sequencing (Applied Biosystems, Foster city CA) machines, were assembled using
the Atlas assembly pipeline, representing about 464Mb of sequence and about 6.2X
coverage of the (clonable) A. pisum genome. Two WGS libraries, with inserts of 2-3 kb
and 4-5 kb and a BAC library with insert size ~130kb were used to produce the data.
The assembly contained 72,844 contigs, with an N50 length of 10.8 kb, and a total
length of 446.6 Mb. Based on paired end data, these contigs were ordered and oriented
into 22,801 scaffolds, with an N50 length of 86.9 kb and a total length of 464.3 Mb when
gaps between contigs within scaffolds are included. The LSR1 pea aphid genome
sequence is available from the NCBI with project accession ABLF01000000
Automated Gene Model Prediction. The NCBI gene prediction pipeline uses a
combination of homology searching with ab initio modeling. cDNAs and ESTs were
aligned to the genomic sequences using Splign[1]. Proteins were aligned to the genomic
sequences using ProSplign[2]. The best scoring CDS was identified for all cDNA
alignments using the same scoring system used by Gnomon[3], the NCBI ab initio
prediction tool. All cDNAs with a CDS scoring above a certain threshold were marked as
coding cDNAs, and all others were marked as UTRs. CDSes that lack a translation
initiation or termination signal were categorized as incomplete. Protein alignments were
scored the same way, and CDSes that did not satisfy the threshold criterion for a valid
CDS were removed. After determining the UTR/CDS nature of each alignment, the
alignments were assembled using a modification of the Maximal Transcript Alignment
algorithm[4], taking into account not only exon-intron structure compatibility but also the
compatibility of the reading frames. Two coding alignments were connected only if they
both had open and compatible CDSs. UTRs were connected to coding alignments only if
the necessary translation initiation or termination signals were present. There were no
restrictions on the connection of UTRs other than the exon-intron structure compatibility.
All assembled models with a complete CDS, including the translation initiation and
termination signals, were combined into alternatively spliced isoform groups. Incomplete
or partially supported models were directed to Gnomon[3] for extension by ab inito
prediction. Models containing a debilitating mutation such as a frameshift or nonsense
mutation were categorized as either transcribed or non-transcribed pseudogenes. A
subset of pseudogenes are likely to be functional genes that have errors in the Acyr_1.0
assembly, and may be re-classified as protein-coding genes with subsequent
improvements to the assembly and annotation. Gnomon[3] was also used to predict pure
ab initio models in regions of the genome that lacked any cDNA, EST or protein
alignments.
AphidBase. The Acyrthosiphon pisum assembled have been broadly scanned to
highlight transcription evidences. ESTs, ESTs contigs and full-length cDNAs have been
mapped using SIM-4 whereas homologs in other insect genomes or Uniprot have been
identified by high-throughput blasting. Also, various gene prediction software (Augustus,
RefSeq, Genscan, Maker, Snap, GeneID, Gnomon, Fgenesh) have been run and a
reference set containing 34821 putative genes have been established based on the
RefSeq predictions when present or a combination of the other predictions using Glean.
Parallely, the 27 annotation groups used Apollo for curating about a thousand of these
gene models.
All these features have been loaded in a GMOD-Chado database (ref) accessible
at the AphidBase web portal. In addition of a wiki, a blast search and a full text search
engine, AphidBase is using various open source software tools from the Generic Model
Organism Database (GMOD) in particular the graphical genome browser GBrowse (ref)
and the manual curation software Apollo (ref).
Symbiont sequences. During the course of whole genome sequencing of the
LSR1 clone of A. pisum, 24,947 sequence reads corresponding to the Buchnera
genome were obtained as byproducts. Using such “contaminants”, the whole genome of
Buchnera was reconstructed in two distinct methods; de novo assembling using CAP3
software and comparative (reference mapping type) assembling using Amos. Results of
both methods were essentially the same, but the latter produced longer and fewer
contigs.
Transposable Elements detection. Using methods for de novo TE
identification, required to overcome the challenge of detecting nested and fragmented
TEs. The “REPET” (http://urgi.versailles.inra.fr/development/repet/ ) pipeline that we
have developed was used and improved to analyze the pea aphid genome. TE
consensus were predicted “ab initio” by first searching repeat with BLASTER for an allby-all genome comparison and then grouping results using three clustering methods
(GROUPER, RECON, PILER) with default parameters. We then built one consensus per
group with the MAFFT multiple sequence alignment program and classified each
consensus according to BLASTER matches using TBLASTX and BLASTX with the
entire Repbase Update (for coding TE features) as reference data bank, and according
to the presence of structural features such as terminal repeats (TIR, LTR, SSR tails). For
example, a consensus is defined as MITE if (i) it carries TIRs; (ii) it doesn't match via
tBlastx or Blastx [6] with known TEs; (iii) its length without its TIRs is lower than 500bp.
The set of consensus was then analyzed by an all-by-all BLASTER procedure to remove
redundancies, ie when a consensus sequence is included into another at a 95% identity
threshold and 98% length threshold.
From that step we got TE consensus sequences representing ancestral copies of
TEs subfamilies. Then they were clustered into groups to identify TE families by using
the GROUPER clustering method. Each family was then identified assuming that the
most populated well characterized TE category in a group of consensus sequence can
define the order of the group it belongs to. Eighty five families containing at least 5 TE
consensus sequences were then manually expertized using multiple sequences
alignments, phylogenies and Hidden Markov Models. This close examination allows us
to confirm grouping and decipher specific features like chimeric TE families or subfamilies.
The pea aphid genome was then annotated with all the subfamilies TE
consensus sequences (output from the de novo step) using the “REPET” pipeline
annotation step. This pipeline is composed of the TE detection softwares BLASTER,
RepeatMasker [7] and Censor, and the satellite detection softwares RepeatMasker, TRF
[9] and Mreps [10].
To save computer time and reduce software memory requirements, we
segmented the genomic sequences into chunks of 200 kb overlapping by 10 kb. Each
chunk is then independently analyzed by the different programs. Simple repeats have
been used to filter out spurious hits. TE or repeat copies less than 20 bp after removing
simple repeat regions were discarded.
To take into account the fact that TE often insert the ones into the others and
therefore fragments belonging to the same copies are separated, a specific “long join”
annotation procedure has been performed, using age estimates of repeat fragments.
Indeed the identity percentage between a fragment and its reference TE/repeat
consensus can be used to estimate the age of this fragment. Consecutive fragments on
both the genome and the same reference repeat consensus are automatically joined if
their identity percentage difference is less than 2% (the two fragments have
approximately the same age) and (i) if they are separated by a gap of less than 5000 bp
and/or by a mismatch region of less than 500 nucleotides, or (ii) if there are nested
repeats: the fragments are separated by a sequence of which more than 95% consisted
of other younger repeat insertions, all inserts having a higher identity compared to their
respective consensus. Fragments separated by more than 100kb are not joined. At the
end, nested repeats are split if inner repeat fragments are longer than outer joined
fragments.
Finally, the Acyrtosiphon pisum genome gives the opportunity to compare the
evolutionary dynamics and diversity in mobile genetic elements between this organism
and other biological species. To assist further research on this topic, we explored new
proteomic-based protocols for implementing the approach we addtionally screened the
Aphid genome using 28 full-length-single-frame proteome sequences, which
concatenate the Gypsy database [12,13] collection of majority-rule consensus (MRC)
sequences. This collection describes the different protein products coded by the gag-pol
(and env) internal region of different LTR retroelement lineages. In collabotration with the
Virus Transmission and Transcytosis Annotation Team (LTR retroelemens include not
only retrotransposons but also infectious retroviruses), we characterized a number of
protein sequences belonging to the main described lineages in the pipeline, in particular
those of the Ty3/gypsy, Bel and Ty1/Copia groups. The curate material has been
organized within a Refseq database of prototypic LTR retroelement protein products
namely Retroproteome. For simplicity´s sake we will be prepare a supplementary
manuscript describing this tool, BLAST web-servers availability and utility examples.
Phylome reconstruction. We reconstructed the complete collection of phylogenetic
trees, also known as “phylome”, for all A. pisum protein-coding genes. For this we used
a similar automated pipeline to that described earlier for the human genome (HuertaCepas et al. 2007). A database was created containing A. pisum proteome and that of
other 16 species. These include 12 other insects: Tribolium castaneum, Nasonia
vitripennis, Apis mellifera (from NCBI database), Drosophila pseudoobscura, Drosophila
melanogaster, Drosophila mojavensis, Drosophila yakuba (from FlyBase), Pediculus
humanus, Culex pipiens (from VectorBase), Anopheles gambiae, Aedes aegypti (from
Ensembl) and Bombys mori (from SILKDB), and four out-groups: the crustacean
Daphnia pulex (the GNOMON predicted set provided by the JGI instintute), the
nematode Caenorhabditis elegans and the two chordates Ciona intestinalis and Homo
sapiens (from Ensembl). Then, for each protein encoded in A. pisum genome, a SmithWaterman (Smith and Waterman, 1981) search (e-val 10-3) was performed against the
above mentioned proteomes. Sequences that aligned with a continuous region longer
than 33% of the query sequence were selected and aligned using MUSCLE 3.6 (Edgar,
2004) with default parameters. Gappy positions in the alignments were removed using
trimAl v1.0 (http://bioinfo.cipf.es/trimal), using a gap threshold of 10% and a
conservation thresohld of 50%. Phylogenetic trees were derived using Neighbor Joining
(NJ) trees using scoredist distances as implemented in BioNJ (Gascuel, 1997) and
Maximum Likelihood (ML) as implemented in PhyML v2.4.4 (Guindon ad Gascuel, 2003)
with aLRT , using JTT as an evolutionary model and assuming a discrete gammadistribution model with four rate categories and invariant sites, where the gamma shape
parameter and the fraction of invariant sites were estimated from the data. Support for
the different partitions was computed by approximate likelihood ratio test as
implemented in PhymL (aLRT) (M. Anisimova and O. Gascuel, 2006). All trees and
alignments have been deposited in PhylomeDB (Huerta-Cepas et al. 2008)
(http://phylomedb.org).
Phylogeny-based orthology determination. Orthology and paralogy relationships
among A. pisum genes and those encoded in the other genomes included in the
analysis were inferred by a phylogenetic approach that uses a previously-described
species-overlap algorithm (Huerta-Cepas et al. 2007). Basically this algorithm uses the
level of species overlap between the two daughter partitions of a given node to define it
as a duplication (if there is species overlap) or speciation (if there is no overlap). After
mapping all all duplications and speciations on the phylogenetic tree of a given gene
family all orthology and paralogy relationships are inferred accordingly. All orthology and
paralogy predictions can be accessed through PhylomedDB.
Detection of aphid-specific gene expansions
The duplication events defined by the above mentioned species overlap algorithm that
only comprised paralogs from A. pisum were considered lineage-specific duplications.
Whenever more than one round of duplication followed A. pisum speciation event (family
expansion), all resulting paralogs were grouped into a single group of “in-paralogs”.
Results from all the trees in the phylome were merged into a non-redundant list of inparalogs groups, by merging groups sharing a significant fraction of their members
(50%).
Orthology-based functional annotation. A list of orthology-based transfer of functional
annotations was built based on phylogeny-based orthology relationships with Drosophila
melanogaster. A. pisum genes with orthology relationships with annotated D.
melanogaster genes were grouped according to the type of orthology relationship. 4058
aphid genes could be annotated based on a clear one-to-one orthology relationship with
a Drosphila gene. Additional 2315 genes presented a many-to-one orthology relationship
with annotated Drosophila genes and thus can temptatively be annotated with the GO
terms associated to the fly genes but with the cautionary remark that processes of neo
and sub-functionalization may have occurred.
Species tree reconstruction. 197 genes having a single-copy ortholog in all the
species included in the analyses were selected to infer a species phylogeny. Alignments
performed with MUSCLE as previously described were concatenated into a superalignment containing 144,922 positions. The removal of positions with gaps in more than
50% of the sequences resulted in a final alignment of 109,422 positions. This alignment
was used for Maximum Likelihood (ML) tree reconstruction as implemented in PhyML
v2.4.4 (Guindon ad Gascuel, 2003), using JTT as an evolutionary model and assuming
a discrete gamma-distribution model with four rate categories and invariant sites, where
the gamma shape parameter and the fraction of invariant sites were estimated from the
data. Bootstrap analyis was performed on the basis of 100 replicates.
Detection of Odorant Binding Proteins. Gene sequences can be predicted to encode
OBPs by their predicted proteins having:1) a α-helix pattern, 2) the conserved cysteine
residues with the expected spacing between them, 3) a globular water-soluble nature
and 4) the presence of a signal peptide. Given this, genes encoding OBPs in the A.
pisum genome were identified and annotated using several approaches. The A. pisum
EST database (167,706 sequences) and the whole genome sequence were searched
using: 1) An algorithm to identify the conserved cysteine motif C1-X8-41-C2-X3-C3-X2147-C4-X7-15-C5-X8-C6 in the 6-frame translated sequences; 2) rps-BLAST with the
PBP/GOBP (pfam01395) and CSP (pfam03392) conserved domains and 3) tBLASTn
and PSI-BLAST using as the ‘query’ known OBPs from other insects. The genome
sequence was also searched using tBLASTn and HMMER against the 6-frame
translated sequences (known OBPs from other insects were used as query for all BLAST
searches; Pfam profiles were used on all HMMER searches).
References (not in alphabetical order)
M. Anisimova and O. Gascuel, "Approximate likelihood ratio test for branchs: A fast,
accurate and powerful alternative," Systematic Biology, 55(4), 539-552, 2006
Ashford, D.A., Smith, W.A., Douglas, A.E., 2000. Living on a high sugar diet: the fate of
sucrose ingested by a phloem-feeding insect, the pea aphid Acyrthosiphon pisum. J.
Insect Physiol. 46, 335–341.
A. E. Douglas, D. R. G. Price, L. B. Minto, E. Jones, K. V. Pescod, C. L. M. J. François,
J. Pritchard and N. Boonham (2006) Sweet problems: insect traits defining the limits to
dietary sugar utilisation by the pea aphid, Acyrthosiphon pisum. Journal of Experimental
Biology 209, 1395-1403 (2006)
Kikawada et al., 2007, PNAS 104:11585-90
Rhodes et al. Dietary sucrose and oligosaccharide synthesis in relation to
osmoregulation in the pea aphid, Acyrthosiphon pisum. Physiol Entomol (1997) vol. 22
(4) pp. 373-379
D. R. G Price, A. J Karley, D. A Ashford, H. V Isaacs, M. E Pownall, H. S Wilkinson, J. A
Gatehouse, A. E Douglas (2007) Molecular characterisation of a candidate gut sucrase
in the pea aphid, Acyrthosiphon pisum. Insect Biochem. Mol. Biol. 37, 307-317
Arnold, A. P., Itoh, Y. & Melamed, E. (2008) A Bird's-Eye View of Sex Chromosome
Dosage Compensation. Annual Review of Genomics and Human Genetics, 9, 109-127.
Bloch, G; Toma, DP; Robinson, GE. 2001. Behavioral rhythmicity, age, division of labor
and period expression in the honey bee brain. JOURNAL OF BIOLOGICAL RHYTHMS
Volume: 16 Issue: 5 Pages: 444-456
Braendle, C., G. K. Davis, J. A. Brisson, and D. L. Stern. 2006. Wing dimorphism in
aphids. Heredity 97:192-199.
Brisson, J. A., G. K. Davis, and D. L. Stern. 2007. Common genome-wide transcription
patterns underlying the wing polyphenism and polymorphism in the pea aphid
(Acyrthosiphon pisum). Evol. Dev. 9:338-346.
Burmester, T; Scheller, K. 1999. Ligands and receptors: Common theme in insect
storage protein transport. NATURWISSENSCHAFTEN Volume: 86 Issue: 10 Pages:
468-474.
Chang, C-c, G. W. Lin, C. E. Cook, S. B. Horng, H. J. Lee, and T. Y. Huang. 2007.
Apvasa marks germ-cell migration in the parthenogenetic pea aphid Acyrthosiphon
pisum (Hemiptera: Aphidoidea). Dev Genes Evol. 217:275-287. PMID: 17333259
Chang, C-c, T. Y. Huang, C. E. Cook, G. W. Lin, C. L. Shih, and R. P. Y. Chen. 2008.
Developmental expression of Apnanos during oogenesis and embryogenesis in the
parthenogenetic pea aphid Acyrthosiphon pisum. Int. J. Dev. Biol. (in press) doi:
10.1387/ijdb.082570cc
Corbitt & Hardie 1985 Entomol exp appl 38 131-135,
Cyran SA, Buchsbaum AM, Reddy KL, Lin MC, Glossop NRJ, Hardin PE, Young MW,
Storti RV, Blau J 2003. vrille, Pdp1, and dClock form a second feedback loop in the
Drosophila circadian clock. Cell. 112: 329-341.
Ghanim, M., A. Dombrovsky, B. Raccah, and A. Sherman. 2006. A microarray approach
identifies ANT, OS-D and takeout-like genes as differentially regulated in alate and
apterous morphs of the green peach aphid Myzus persicae (Sulzer). Insect Biochemistry
and Molecular Biology 36:857-868.
Hardie, J; Nunes, MV 2001 Aphid photoperiodic clocks. JOURNAL OF INSECT
PHYSIOLOGY Volume: 47 Issue: 8 Pages: 821-832
Ishikawa, A., S. Hongo, and T. Miura. 2008. Morphological and histological examination
of polyphenic wing formation in the pea aphid Acyrthosiphon pisum (Hemiptera,
Hexapoda). Zoomorphology 127:121-133.
Lees AD 1959. The role of photoperiod and temperature in the determination of
parthenogenetic and sexual forms in the aphid Megoura viciae Buckton. I - The influence
of these factors on apterous virginoparae and their progeny. J.Ins.Physiol. 3 92-117.
Miura T, Braendle C, Shingleton A, Sisk G, Kambhampati S, Stern DL 2003. A
comparison of parthenogenetic and sexual embryogenesis of the pea aphid
Acyrthosiphon pisum (Hemiptera : Aphidoidea). JOURNAL OF EXPERIMENTAL
ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION Volume:
295B Issue: 1 Pages: 59-81
Muller, C. B., I. S. Williams, and J. Hardie. 2001. The role of nutrition, crowding, and
interspecfic interactions in the development of winged aphids. Ecol. Ent. 26:330-340.
Myers EM, 2003. The circadian control of eclosion. CHRONOBIOLOGY
INTERNATIONAL Volume: 20 Issue: 5 Pages: 775-794
Nijhout, H. F. 2003. Development and evolution of adaptive polyphenisms. Evol Dev 5:918.
Schwartzberg, Ezra G, Kunert, Grit, Westerlund, Stephanie, Hoffmann, Klaus H.,
Weisser, Wolfgang W. 2008. Juvenile hormone titres and winged offspring production do
not correlate in the pea aphid, Acyrthosiphon pisum J Insect Physiol 34 Issue: 9
Pages: 1146-1148
Tagu, D; Sabater-Munoz, B; Simon, JC 2005 Deciphering reproductive polyphenism
in aphids. Invertebr Reprod Dev 48 71-80
The Honeybee Genome Sequencing Consortium (2006) Insights into social insects from
the genome of the honeybee Aphis mellifera. Nature, 443, 931-949.
Yuan, Quan, Metterville, Danielle, Briscoe, Adriana D, Reppert, Steven M. 2007 Insect
cryptochromes: Gene duplication and loss define diverse ways to construct insect
circadian clocks. MOLECULAR BIOLOGY AND EVOLUTION Volume: 24 Issue: 4
Pages: 948-955
Wilson, A. C. C., Sunnucks, P. & Hales, D. F. (1997) Random loss of X chromosome at
male determination in an aphid, Sitobion near fragariae, detected using an X-linked
polymorphic microsatellite marker. Genetical Research, Cambridge, 69, 233-236.
Zhou XG; Tarver, MR; Scharf, ME 2007. Hexamerin-based regulation of juvenile
hormone-dependent gene expression underlies phenotypic plasticity in a social insect
Development 134 601-610
Altincicek B, Gross J, Vilcinskas A. 2008. Wound-mediated gene expression and
accelerated viviparous reproduction of the pea aphid Acyrthosiphon pisum. Insect
Molecular Biology 17: 711-716.
Anselme C, Villar A, Balmand S, Fauvarque MO, Heddi A. 2006. Host PRGP gene
expression and bacterial release in endosymbiosis of the weevil Sitophilus zeamais.
Applied and Environmental Microbiology 72: 6766-6772.
Chen DQ, Montllor CB, Purcell AH. 2000. Fitness effects of two facultative
endosymbiotic bacteria on the pea aphid, Acyrthosiphon pisum, and the blue alfalfa
aphid, A-kondoi. Entomologia Experimentalis Et Applicata 95: 315-23
Dyall SD, Brown MT, Johnson PJ. 2004 Ancient invasions: from endosymbionts to
organelles. Science 304:253-7. PMID: 15073369
Fenn K, Conlon C, Jones M, Quail MA, Holroyd NE, Parkhill J, Blaxter M: Phylogenetic
relationships of the Wolbachia of nematodes and arthropods. PLoS Pathog 2006,
2(10):e94.
Gündüz E. et al. 2009. Symbiotic bacteria enable insect to utilise a nutritionallyinadequate diet. Proceedings of the Royal Society of London B.in press.
Hotopp JC, Clark ME, Oliveira DC, Foster JM, Fischer P, Torres MC, Giebel JD, Kumar
N, Ishmael N, Wang S, Ingram J, Nene RV, Shepard J, Tomkins J, Richards S, Spiro DJ,
Ghedin E, Slatko BE, Tettelin H, Werren JH. 2007. Widespread lateral gene transfer
from intracellular bacteria to multicellular eukaryotes. Science 317:1753-6. PMID:
17761848
Harvey SH, Krien MJ, O'Connell MJ. 2002.
Structural maintenance of chromosomes (SMC) proteins, a family of conserved
ATPases.
Genome Biol. 2002;3(2):REVIEWS3003.1-3003.5 doi:10.1186/gb-2002-3-2reviews3003 PMID: 11864377
Hwang JY, Smith S, Ceschia A, Torres-Rosell J, Aragon L, Myung K. 2008.
Smc5-Smc6 complex suppresses gross chromosomal rearrangements mediated by
break-induced replications.
DNA Repair (Amst). 7(9):1426-36. PMID: 18585101
Kondo N, Nikoh N, Ijichi N, Shimada M, Fukatsu T. 2002. Genome fragment of
Wolbachia endosymbiont transferred to X chromosome of host insect. Proc Natl Acad
Sci U S A. 99:14280-5. PMID: 12386340
Montllor CB, Maxmen A, Purcell AH. 2002. Facultative bacterial endosymbionts benefit
pea aphids Acyrthosiphon pisum under heat stress. Ecological Entomology 27: 189-95
Moran NA, McLaughlin HJ, Sorek R. 2009. The dynamics and time scale of ongoing
genomic erosion in symbiotic bacteria. Science 323: in press.
Moran NA, Munson MA, Baumann P, Ishikawa H: A Molecular Clock in Endosymbiotic
Bacteria Is Calibrated Using the Insect Hosts. P Roy Soc Lond B Bio 1993,
253(1337):167-171.
Moran NA, Russell JA, Koga R, Fukatsu T. 2005. Evolutionary relationships of three new
species of Enterobacteriaceae living as symbionts of aphids and other insects. Applied
and Environmental Microbiology 71: 3302-10
Nakabachi A. Shigenobu S., Sakazume N., Shirake T., Hayashizaki Y., Carninci P.,
Ishikawa H., Kudo T, Fukatsu T. 2005. Transcriptome analysis of the aphid bacteriocyte,
the symbiotic host cell that harbors an endocellular mutualistic bacterium, Buchnera.
Proc. Natl. Acad. Sci. USA 102: 5477-5482. PMID: 15800043
Nikoh N, Tanaka K, Shibata F, Kondo N, Hizume M, Shimada M, Fukatsu T. 2008.
Wolbachia genome integrated in an insect chromosome: evolution and fate of laterally
transferred endosymbiont genes. Genome Res. 18:272-280. PMID: 18073380
Oliver KM, Russell JA, Moran NA, Hunter MS. 2003. Facultative bacterial symbionts in
aphids confer resistance to parasitic wasps. Proceedings of the National Academy of
Sciences of the United States of America 100: 1803-7
Oliver KM, Moran NA, Hunter MS. 2005. Variation in resistance to parasitism in aphids is
due to symbionts not host genotype. Proceedings of the National Academy of Sciences
of the United States of America 102: 12795-800
Russell JA, Moran NA. 2006. Costs and benefits of symbiont infection in aphids:
variation among symbionts and across temperatures. Proceedings of the Royal Society
B-Biological Sciences 273: 603-10
Sasaki T et al. 1990. J. Insect Physiol. 36: 35-40.
Sandstrom JP, Russell JA, White JP, Moran NA. 2001. Independent origins and
horizontal transfer of bacterial symbionts of aphids. Molecular Ecology 10: 217-28
Scarborough CL, Ferrari J, Godfray HCJ. 2005. Aphid protected from pathogen by
endosymbiont. Science 310: 1781.
Shakesby AJ, Wallace IS, Isaacs HV, Pritchard J, Roberts DM and Douglas AE 2008. A
water-specific aquaporin involved in aphid osmoregulation. Insect Biochemistry and
Molecular Biology, in press.
Shigenobu S., Watanabe H., Hattori M., Sakaki Y., Ishikawa H. 2000. Genome sequence
of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407: 81–86.
PMID:10993077
Tsuchida T, Koga R, Fukatsu T. 2004. Host plant specialization governed by facultative
symbiont. Science 303: 1989.
Arnold, A. P., Itoh, Y. & Melamed, E. (2008) A Bird's-Eye View of Sex Chromosome
Dosage Compensation. Annual Review of Genomics and Human Genetics, 9, 109-127.
PMID: 18489256
Braendle, C., G. K. Davis, J. A. Brisson, and D. L. Stern. 2006. Wing dimorphism in
aphids. Heredity 97:192-199. PMID: 16823401
Corbitt, T. S. & Hardie, J. (1985) Juvenile hormone effects on polymorphism in the pea
aphid, Acyrthosiphon pisum. Entomologia Experimentalis et Applicata, 38, 131-135.
Cyran SA, Buchsbaum AM, Reddy KL, Lin MC, Glossop NRJ, Hardin PE, Young MW,
Storti RV, Blau J 2003. vrille, Pdp1, and dClock form a second feedback loop in the
Drosophila circadian clock. Cell. 112: 329-341. PMID: 12581523
Hardie J 1987. Neurosecretory and endocrine systems. In “Aphids, their biology, natural
ennemies and control, Volume A”, Eds Minks,A.K.; Harrewijn,P. Elsevier, Amsterdam,
Oxford, New York, Tokyo pp 139-152.
Ijichi N, Shibao H, Miura T, Matsumoto T, Fukatsu T 2005. Analysis of natural colonies of
a social aphid Colophina arma: population dynamics, reproductive schedule, and survey
for ecological correlates with soldier production. Applied Entomology and Zoology. 40:
239-245.
Lees AD 1959. The role of photoperiod and temperature in the determination of
parthenogenetic and sexual forms in the aphid Megoura viciae Buckton. I - The influence
of these factors on apterous virginoparae and their progeny. J.Ins.Physiol. 3 92-117.
Le Trionnaire, G., Hardie, J., Jaubert-Possamai, S., Simon, J. C. Tagu, D. (2008)
Shifting from clonal to sexual reproduction in aphids: physiological and developmental
aspects. Biology of the Cell, 100, 441-451. PMID: 18627352
Miura T, Braendle C, Shingleton A, Sisk G, Kambhampati S, Stern DL 2003. A
comparison of parthenogenetic and sexual embryogenesis of the pea aphid
Acyrthosiphon pisum (Hemiptera : Aphidoidea). JOURNAL OF EXPERIMENTAL
ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 295B 59-81
Nijhout, H. F. 2003. Development and evolution of adaptive polyphenisms. Evol Dev 5:918. PMID: 12492404
Schwartzberg, Ezra G, Kunert, Grit, Westerlund, Stephanie, Hoffmann, Klaus H.,
Weisser, Wolfgang W. 2008. Juvenile hormone titres and winged offspring production do
not correlate in the pea aphid, Acyrthosiphon pisum J Insect Physiol 34(9): 1146-1148
PMID: 18634797.
The Honeybee Genome Sequencing Consortium (2006) Insights into social insects from
the genome of the honeybee Aphis mellifera. Nature, 443, 931-949. PMID: 17073008
Yuan, Quan, Metterville, Danielle, Briscoe, Adriana D, Reppert, Steven M. 2007 Insect
cryptochromes: Gene duplication and loss define diverse ways to construct insect
circadian clocks. MOLECULAR BIOLOGY AND EVOLUTION, 24(4): 948-955 PMID:
17244599.
Wilson, A. C. C., Sunnucks, P. & Hales, D. F. (1997) Random loss of X chromosome at
male determination in an aphid, Sitobion near fragariae, detected using an X-linked
polymorphic microsatellite marker. Genetical Research, Cambridge, 69, 233-236.
Zhou XG; Tarver, MR; Scharf, ME 2007. Hexamerin-based regulation of juvenile
hormone-dependent gene expression underlies phenotypic plasticity in a social insect
Development 134 601-610. PMID: 17215309
Edgar RC. (2004) MUSCLE: a multiple sequence alignment method with reduced time
and space complexity. BMC Bioinformatics, 5:113.
Gabaldón. T. (2008) Large-scale assignment of orthology: back to phylogenetics?
Genome Biol. Oct 30;9(10):235.
Gascuel O. (2003) BIONJ: an improved version of the NJ algorithm based on a simple
model of sequence data. Mol Biol Evol 1997, 14:685-695.
Guindon S, Gascuel O. (2003) A simple, fast, and accurate algorithm to estimate large
phylogenies by maximum likelihood. Syst Biol, 52:696-704.
Huerta-Cepas J, Bueno A, Dopazo J, Gabaldón T (2008). PhylomeDB: a database for
genome-wide collections of gene phylogenies. Nucleic Acids Res. 36:D491-496.
Huerta-Cepas J, Dopazo H, Dopazo J, Gabaldón T. (2007) The human phylome.
Genome Biol. 8:R109.
Smith TF, Waterman MS. (1981) Identification of common molecular subsequences. J
Mol Biol . 147:195-197.
[1] Yu.Kapustin, A.Souvorov, T.Tatusova. Splign - a Hybrid Approach To Spliced
Alignments. RECOMB 2004 - Currents in Computational Molecular Biology. p.741.
[2]. B. Kiryutin, A. Souvorov. New global protein-nucleotide alignment tool. ISMB 2005.
[3]. A. Souvorov, T. Tatusova, D. Lipman. Eukariotic Genome Annotation with Gnomon a Multi-step Combined Gene Prediction Tool. ISMB 2004, p125.
[4]. Haas BJ, Delcher AL, Mount SM, Wortman JR et al. Improving the Arabidopsis
genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res.
2003 Oct 1;31(19):5654-66. (PMID: 14500829)
Tables
Table Suggestion 1. Sanger Read Statistics.
Insert Size
(kb)
Raw
reads
Passed
reads
2-5 4,325,31
3
35 24,673
110-130 56,246
TOTAL 4,406,23
2
Assembled
reads
Clone
3,955,990
3,044,414
plasmi
d
8,158
5,294
fosmid
45,140
2,286
BAC
4,009,288
3,051,994
Table Suggestion 2. Repeats
Repeats
types
Number of
families
LTR
4
LARD
13
LINE
16
SINE
7
TIR
37
Helitron
2
Polinton
3
MITE
3
Total
85
Number of
copies
Coverag
e (bp)
Coverage
(% of
genome)
Table Suggestion 3. Comparison of main features of a primary and secondary symbiont
(could be coupled with image of both inside cells)
Buchnera of A.
pisum
Regiella from A. pisum
Required by aphid
host?
Yes
No
Maternally
Yes
Yes
inherited?
Can invade new
hosts?
No
Yes
Location in host
Bacteriocytes
only
Bacteriocytes,
hemolymph
Bacterial Division
Gammaprotebact
eria
Gammaproteobacteria
Genome Size
0.64 Mb
~2-2.5 Mb
%G+C
26%
45%
# coding genes
564
In progress
% coding
86%
In progress
# rRNA operons
1 (in 2 parts)
4 (intact)
Mobile elements
Absent
Abundant
Table Suggestion 4. Comparison of A. pisum core clock gene sequences with
homologous sequences from other insects with sequenced genome available.
% identical positions
Gene
Id.
Comp.b Rate
Lengthd
c
Period
D.
B. mori
A. mellifera
melanogaster
ca
0,009**
644
20,8 / 34,7
22,6 / 34,6
27,3 / 41,0
27
0,006**
418
29,0 / 59,7
29,0& / 55,8&
NA
30
Cycle
0,519
308
67,8$ / 81,1$
57,1 / 68,3
68,6 / 79,8
68
Clock
0,983
319
44,2 / 53,3
44,0& / 50,0&
49,3 / 56,8
45
Timeless
Vrille
Acypix
L1
L2
1,000
=
130
48,0 / 42,3
50,7 / 52,7
47,7 / 52,7
49
PDP1
1,000
=
143
80,4 / 84,6
84,6 / 88,1
86,0 / 88,8
81
Cryptochrome1
0,200
=
466
52,4 / 59,2#
54,1 / 60,5#
NA
Cryptochrome2a
0,830
491
71,1$ / 83,9$
73,3 / 79,8
73,7 / 85,3
72
491
71,1$ / 83,9$
72,7 / 79,8
73,5 / 85,3
71
Cryptochrome2b
0,664
a) % of identical positions between each given species and A. pisum / Pediculus humanus.
Since some sequences were either not available or badly predicted, they were conveniently
replaced in the comparisons by sequences from other species: $, Anopheles gambiae replaces
D. melanogaster, &, Antheraea pernyi replaces Bombyx mori; #, Dianemobius nigrofasciatus
replaces P. humanus. Average % identity over all comparisons is shown in the last column.
b) p-values obtained for A. pisum sequences in the chi-square tests performed using TreePuzzle (Schmidt et al., 2002) to test for homogeneity of amino acid composition in insect
sequences. **, highly significant (<0,01) deviations in amino acid composition of A. pisum
sequences.
c) Program RRTree (Robinson-Rechavi and Huchon, 2000) was used to test for homogeneity in
rates of amino acid sequence evolution among insect sequences.
, A. pisum sequences
showing highly accelerated rates in all comparisons; , A. pisum sequences showing
accelerated rates in most comparisons; = , A. pisum sequences not showing accelerated rates
in any comparison.
d) Length of aligned sequences (for each gene only blocks whose alignment was unequivocal
were used).
L1, L2 correspond to genes from conserved loops 1 and 2 respectively of the D. melanogaster
clock
NA, not applicable.
Table… List of pea aphid genes related to juvenile hormone and insulin patways.
Putative orthologs for each pea aphid gene prediction are indicated.
M indicates CpG methylation detected, X indicates CpG methylation not found.
Pea aphid gene
gene name
abbrev.
prediction
Drosophila ortholog
Tribolium ortholog Apis
JH-related genes
Juvenile Hormone
Acid
Methyltransferase
Cytosolic Juvenile
Hormone Binding
Protein
Juvenile Hormone
Epoxide Hydrolase
Juvenile Hormone
Esterase*
Juvenile Hormone
Esterase Binding
Protein
ACYPI255574 X,
ACYPI568283 X
FBgn0028841
JHEH
ACYPI154871 M
ACYPI275360 X,
ACYPI189600 X,
ACYPI307696 M
FBgn0010053,
FBgn0034405,
FBgn0034406
JHE
ACYPI381461
JHEBP
ACYPI563350 M
FBgn0035088
XM_964394
JHAMT1
JHBP
NM_001127311
XM_
XM_964351
XM_
XM_970006
XM_
XM_
Hexamarin
Methoprenetolerant
allatostatin
hex
none
Met
Ast
hmm126914
hmm252834
FBgn0002723
FBgn0015591
ACYPI008623
ACYPI003035
ACYPI003572
FBgn0028961
allatostatin receptor
FKBP39
Chd64
broad
Br
Retinoid X receptor RXR
(ultraspiracle)
(usp)
insulin-related
genes
Insulin receptor
(InR)
Insulin receptor
tyrosine kinase
substrate
Pkb/Akt (rac
serine/threonin
kinase)
Forkhead box
subgroup O
Pten
XM_961866,
XM_962135
NM_001099342
XM_001809286
XM_
FBgn0035499
XM_
NM_
XM_
ACYPI008576
FBgn0000210
XM_001810758,
XM_001810798
ACYPI005934
FBgn0003964
NM_001114294
NM_
XM_967677
XM_
XM_
ACYPI009339,
ACYPI010079
InR
NM_
NM_
NM_
FBgn0013984
ACYPI008202
Pkb/Akt
ACYPI002231
FBgn0010379
FOXO
Pten
ACYPI008827
ACYPI004294
FBgn0038197
FBgn0026379
Target of rapamycin Tor
ACYPI004568
FBgn0021796
* The predicted juvenile homone esterase is identified by the characteristic GQSAG motif
and does not show significant homology to other known JHEs
Table Suggestion 6. Summary of gene models produced by different gene annotation
pipelines. NCBI models are subdivided into protein coding models completely or partially
based on EST or protein alignments, pseudogene models containing debilitating
frameshift or nonsense codons, and ab initio models.
# of models
NCBI
complete support
3403 (3623 transcripts)
partial support
6842
XM_
XM_
pseudogenes
841
ab initio
26,689
AUGUSTUS
Fgenesh
GENEid
GenScan
MAKER
SNAP
GLEAN (- RefSeq)
24,355
preliminary OGS
34,600 (34,820 transcripts)
5
0
Figures
Figure Suggestion 1.
Species phylogeny based on a Maximum Likelihood analyses of a concatenated
alignment of 197 widespread, single-copy proteins. The tree has been rooted using
chordates as the most external out-group. Bars and lines on the right summarize the
results of comparative genomics analyses: A.- Comparison of the gene content of all
species included in the analysis. Bars represent the gene content of each species (scale on
the top). These have been subdivided to indicate different types of homology
relationships: Black: widespread genes that are found with a one to one orthology in, at
least 16 of the 17 species; Blue: widespread genes that can be found in at least 16 of the
17 species and are sometimes present in more than one copy; Red: insect-specific
widespread genes present in at least 12 of the 13 insect species and absent from noninsect species; Yellow: insect specific non-widespread genes (present in less than 12
insect species); Green: genes present in insects and other groups but with a patchy
distribution; White: species-specific genes with no (detectable) homologs in other species
(stripped fraction correspond to species specific genes present in more than one copy).
The thin red line under each bar represents the percentage of A. pisum genes that have
homologs in a given species. B.- This graphic represents the number of single genes and
duplicated genes for each species. Singletons are represented by the purple fraction
while multi-copy genes are marked by the pink part of the bar.
Figure suggestion 2 (NOTE: this figure was proposed to be merged with the figure
above. But I think this will make it too confusing. I suggest this to be figure 2)
Lineage-specific gene expansions in the pea-aphid. A) Size distribution of the major
lineage-specific groups of in-paralogs (paralogs coming from duplications occurred after
the speciation of the lineages leading to the pea aphid and Pediculus humanus), Y axis
(note the logarithmic scale) represents the number of gene families with lineage-specific
expansions of a given size (X axis), as inferred from the analysis of the pea aphid
phylome. B) Maximum Likelihood phylogenetic tree showing an lineage-specific gene
expansion in a family coding for a putative Acetyl-CoA transporter. This expansion has
resulted in 19 intra-specific paralogs in the pea aphid, whereas other insects and out-
5
1
groups included in the analysis only present one orthologous sequence. The tree was
downloaded from phylomeDB.org (Huerta-Cepas et. al. 2008) and re-formated, the
complete tree can be accessed with the code ACYPI004176-PA. The tree was
reconstructed following the phylome tree reconstruction pipeline (see MethodS).
Figure Suggestion 2. Distribution of the mean identity between the copies and the
consensus for different super-families of TEs (NOTE: these data are not for pea aphids)
Figure Suggestion 3. Preliminary phylogeny based on Ty3/Gypsy and Retroviridae
Integrases, the presence of KRB2 in the pea Aphid genome, and its relationship with
Ty3/Gypsy integrases has been colored in red
Figure Suggestion 4. Distribution of synonymous distances (dS) among pairs of
paralogs. Left, only pairs matching a reciprocal best hit criterion (RBH), Right, all pairs of
paralogs.
Figure Suggestion 5. Transcription levels of laterally transferred genes in the
bacteriocyte. Ivory columns and blue colums indicate abundance of transcripts in the
whole body and in the bacteriocyte, respectively; bars, standard errors (n = 6). The
expression levels are shown in terms of mRNA copies of target genes per copy of
mRNA for RpL7. Asterisks indicate statistically significant differences (Mann-Whitney U
test; **, p < 0.01). Transcripts for ldcA, ybjR, and rlpA are 11.6, 8.5, 154–fold more
abundant in the bacteriocyte than in the whole body, respectively. It is also notable that
the copy numbers of their transcripts in the bacteriocyte were comparable to that of the
control transcript encoding ribosomal protein L7 (RpL7), indicating that their expression
levels are relatively high.
Figure Suggestion 6. Transmission electron micrograph showing Buchnera and
Regiella in adjacent host bacteriocytes.
5
2
Figure Suggestion 7. There is interest in having a figure that highlights the interaction
of aphids and Buchnera. The obligate symbiosis is central to the aphid story, and even a
generalized descriptive diagram would help to highlight this. However, no one has come
up with a concrete idea yet.
Figure Suggestion 8. It would be possible to have a figure highlighting the missing
genes of the IMD immune pathway, which is intact in all other sequenced insects today
and is absent in pea aphids, and/or table of major immune gene classes and their
numbers across the sequenced insects, highlighting the missing genes of aphids.
Figure Suggestion 9. Atsushi and Angela have data on the expression of several
metabolism-related genes across different aphid tissues. The symbiont-section group is
having an ongoing disscussion about these data, but it may be an interesting point to
highlight (once we figure out what it means).
Figure Suggestion 10. A cell cycle figure with all the genes marked as present and
absent and indicating genes that have duplications - we would need someone in the
mitosis/meiosis team to generate this (ask Dayalan).
Figure Suggestion 11. Gene duplications of mitotic kinases being involved in
polyphenism regulation. This story appears super interesting. Can it generate a figure??
Possibly a phylogeny with some annotation about polyphenism? (ask Dayalan).
Figure Suggestion 12. a figure about circadian rhythm pathways Clock and Period
showing the drosophila pathways with the genes present in the pea aphid shown clearly
and the missing genes appearing in "ghost-like" writing (ask David Martinez).
Figure Suggestion 13. (A) NJ tree of A.pisum facilitative sugar transporters (ApST1ApSTxx). Numbers on the branches represent level of confidence as determined by
bootstrap analysis (1000 replicates). The scale bar indicates an evolutionary distance of
0.05 amino acid substitutions per position in the sequence. Number of ESTs supporting
5
3
each gene sequence is indicated. (B) Genomic structure of duplicated A.pisum sugar
transporters. Regions with sequence identity >65% are shaded. Intron regions are not
drawn to scale. (C) Genomic clustering of a group of duplicated A.pisum sugar
transporters, duplicated genes are boxed.
5
4
Authors and Affiliations
Being assembled in a separate document.
5
5
Download