supplemental material

advertisement
SUPPLEMENTAL DATA
Supplemental Text 1
Molecular methods
RNA extraction: Around 200 male Drosophila mayaguana flies (stock 15081-1397.03
from the Tucson Stock Center) were dissected for their accessory glands. Dissections
were done in PBS with RNA later (Ambion). Accessory glands were pooled together and
kept in RNA later at –20°C until RNA extraction. Total RNA was extracted with the
RNAeasy extraction kit (QIAGEN) and mRNA was isolated from total RNA using the
Oligotex mRNA kit (QIAGEN).
cDNA synthesis and library construction: cDNA synthesis was done with the Super
SMART cDNA Synthesis Kit (Clontech). This kit uses PCR to increase the amount of
initial mRNA. Female cDNA generated by the same process was used as probe for
subtraction of housekeeping genes and other non-male specific genes using the PCRSelect cDNA Subtraction Kit (Clontech). A potential drawback of using the Super
SMART cDNA synthesis kit is that the protocol involves a restriction reaction, which
may lead to the restriction of some of the cDNA, generating a lot of short fragments. To
decrease the chance to clone these short fragments, the last PCR product of the
subtraction process was size fractionated for fragments larger than ~700bp.
Size
fractionation was carried out in 2% low-melting agarose gel stained with SYBR. PCR
fragments were retrieved from the gel using the Wizard gel purification kit (PROMEGA).
After a quick incubation with Taq polymerase for adding overhanging A’s, the retrieved
fragments were cloned into a TA plasmid vector (Topo TA cloning kit, Invitrogen).
Chemically competent cells (Invitrogen) were then transformed to produce an accessory
gland subtracted library.
Dot blot: To do a further screening of accessory gland specific genes, a dot-blot
procedure was carried out. About 600 white colonies were picked and used for PCR
amplification of inserts using standard M13 primers. PCR products were gathered in 96well plates and cleaned using a BIOMEK robot. 5ml of the cleaned PCR product was
mixed with 5ml of 0.6N NaOH and 1ml of this mixture was blotted onto positively
charged nylon membranes.
DNA was immobilized on the membranes by UV-
crosslinking. Membranes were hybridized with DIG labeled female cDNA as probe.
Hybridization and visualization was done with the DIG starter kit (Roche), according to
the manufacturer’s protocol.
Sequencing and editing: Inserts were sequenced with the M13 forward primer and a
PolyT primer. Sequences were obtained using an ABI3730xl automated sequencer and
edited with Sequencher v4.2 (Gene Codes). Vector and adaptor sequences were trimmed
and similar sequences were assembled in contigs.
Supplemental Text 2
Acp classification
Dot-blot data was available for 75 of the unique sequences. Among those, 48 had a
clear, negative result (no hybridization with female probes; Table 1). Presence of a signal
peptide can be evaluated using three different statistics (S-mean, D-score, and p), as
implemented in the program SignalP. Since these statistics can be sometimes discordant,
here we chose to report all three of them for easy of comparison with other studies (Table
1). Considering any of the three statistics, 64 transcripts had more than 50% of chance of
containing a signal peptide. Among the sequences for which a signal peptide was not
detected, many seem to be lacking their 3’ end and therefore the presence of signal
peptide in the gene cannot be ruled out (Supplemental data - Table 1, n*).
Among the 91 unique transcripts uncovered in the D. mayaguana library, 18
matched at least three of the Acp criteria (Table 1, category 1). Also classified as Acps
were 12 transcripts with negative dot-bolt results and a high probability of having a signal
peptide as supported by the three statistics obtained with SignalP, for which there were
no Acp orthologs in other species or known Acp related conserved domain (Table 1,
category 2).
For some transcripts (15), the dot blot results were inconclusive, but
homology to an Acp or a common Acp functional domain, together with strong evidence
of a signal peptide, was considered good indicators of an Acp status (Table 1, category
3). Finally, nine other transcripts were classified as Acps even though there was little
support for the presence of a signal peptide, if (a) there was evidence of another marker
of secretory sequence as detected by the program SecretomeP and/or (b) there was no
evidence that the 3’ tip of the mRNA sequence was complete (Table 1, category 4).
These sequences also had either criteria 1 (homology) or 4 (function) and had a negative
dot blot. These classification steps resulted in a total of 54 likely Acp candidates.
Among the transcripts that were not classified as Acps, 18 could possibly be Acps
(Supplemental data – Table 1, category 5), since there is no strong evidence against their
inclusion as Acps (dot blot results were negative, they did not have domains that point to
other non-Acp function, and the presence of a signal peptide could not be ruled out). The
remaining 19 transcripts either had a clear, positive hybridization in the dot blot tests or a
non-Acp associated function.
Supplemental Text 3
Gene families with more than three identified genes in D. mayaguana
Cluster 7: Members of Cluster 7 had D. mojavensis Acp16b and Acp24 as their best hits
in the D. mojavenis reproductive tract library database. WAGSTAFF and BEGUN (2005b)
identified two paralogs of Acp16b in D. mojavensis, named Acp16a and Acp16c
(although they did not mention the similarity between Acp16 and Acp24). To have a
better understanding of the orthology of genes in the two species, we carried a
phylogenetic analysis including the two D. mayaguana transcripts and D. mojavensis
Acp16a, Acp16b, Acp16c, and Acp24. Although likelihood and maximum parsimony
trees did not agree on all the gene relationships, the two analyses were in accordance in
that D. mayaguana transcripts form a clade, while Acp16a, Acp16b, and Acp16c (of D.
mojavensis) cluster together in another clade (Supplemental data - Figure 2). The D.
mayaguana transcripts had as their best hit in the D. mojavensis genome the same locus
as Acp24 when using BLASTn, but were more similar to the Acp16b locus if the program
used was tBLASTx. All these loci, including the D. mayaguana similar transcripts, had
several hits similar to the beginning of their sequences (30 – 80bp), in a genomic region
spanning ~25kb. This is a transposon rich genomic region, but none of the transcripts
matched a transposon repeat sequence itself. Also, none of these genes have a conserved
protein domain.
Cluster 11: The genes included in cluster 11 hit the same D. melanogaster Acp in the
BLAST searches, Acp53C14c. Acp53C14c is the most divergent of four paralogous genes
(plus Acp53Ea, Acp53C14a, Acp53C14b) present in D. melanogaster (HOLLOWAY and
BEGUN 2004). The genomic region is conserved across Drosophila species and may
include different numbers of in tandem repeats of these genes.
For instance, D.
pseudoobscura have seven similar ORFs in this microsyntenic region, but there is no
information on whether all these copies are expressed and where (WAGSTAFF and BEGUN
2005a). The four D. mayaguana sequences in cluster 11 matched three D. mojavensis
Acps: Acp1, Acp2, and Acp25.
These three loci are in adjacent regions of the D.
mojavensis genome. In fact, the region encompassing these loci, spans approximately 5.5
kb and has seven ORFs identified by different methods, including Oxford pipeline, NCBI
Gnomon, GeneWise (BIRNEY et al. 2004), and Contrast (GROSS et al. 2005).
A
phylogenetic analysis including all the D. mayaguana paralogs and orthologous
sequences in D. mojavensis, D. arizonae, and D. mulleri (Supplemental data – Figure 3b),
showed that relationships, within each ortholog clade, are compatible with the
phylogenetic hypothesis most accepted for the mulleri cluster (DURANDO et al. 2000).
The trees suggest that mayAcp2a is the product of an early duplication with subsequent
loss in D. mojavensis or in the lineage leading to this species.
Cluster 12: With a total of nine distinct sequences, Cluster 12 is composed of transcripts
that had Acp5a or Acp23 as best hits in the D. mojavensis Acp database. This cluster is
composed mostly of short transcripts (163 - 356bp), although one of them is 925bp long.
Most of these transcripts were found in two or more clones and seven of these sequences
had a signal peptide as detected by the program SignalP 3.0, indicating that they are
likely to represent secretory genes. These transcripts did not have significant hits in other
Drosophila species. An alignment of the 10 D. mayaguana sequences and the three
homologs of D. mojavensis showed that although there are many large and short indels
and a lot of substitutions, there is one long (100bp) plus a few short blocks that are
conserved. No conserved protein domain was detected in the CD-search, however.
Repetitive elements were searched in these genes using RepeatMasker 3.0 (SMIT and
GREEN 1996-2004), but no significant association was found. Nevertheless, they match
transposon sequences located in a transposon rich region in the genome of D. mojavensis.
A phylogenetic analysis of the sequences suggest species specific duplications. A third
copy of Acp5 was also found in the D. mojavensis genome (WAGSTAFF and BEGUN
2007).
Cluster 13: The largest cluster (Cluster 13 in Table 4) included 10 transcripts. These
sequences were not similar to any D. mojavensis Acps or other ESTs found in the
reproductive tract of males of this species. They have similarity, (1e-11 > E > 1e-33) to five
adjacent loci in the D. mojavensis genome, although only three of these loci were best
hits for each of the D. mayaguana transcripts in this cluster.
All D. mayaguana
transcripts in this cluster were relatively short (220 – 450bp), with a peptide signal and a
conserved Kazal-type serine protease inhibitor domain (cd00104). These genes do not
seem to have orthologs in any Drosophila species (Supplemental data – Table 2). The
dot blot results were variable, with some being apparently specific to accessory glands,
while others showed strong hybridization to female probes. When searched against each
other, every sequence hit all the other family members, but E-values varied considerably
(1e-11 > E > 1e-169), showing that there are distinct degrees of similarity among them. A
phylogenetic analysis of the family showed that its members fall into three main clades
and all clades have members that met the Acp criteria (Supplemental data – Figure 5).
D. simulans - 31
D. sechellia - 30
D. melanogaster - 30
D. yakuba - 33
D. erecta - 31
D. ananassae - 31
D. pseudoobscura - 36
36
D. persimilis - 36
D. willistoni - 36
D. mojavensis - 87
D. virilis - 39
D. grimshawi - 34
Supplemental Figure 1. Number of D. mayaguana accessory gland transcripts with
significant BLAST hits (E < 1e-05) in each Drosophila species with a complete genome
sequence. Phylogeny taken from FlyBase.
Supplemental Figure 2. Maximum likelihood tree illustrating relationships among
members of Cluster 7. Prefix of gene names refer to the species they belong to: D.
mojavensis (moj), D. arizonae (ari), and D. mayaguana (may).
Supplemental Figure 3a. Maximum likelihood tree depicting relationships among
members of Cluster 11 identified in D. mojavensis and D. mayaguana through cDNA
libraries (sequences with prefixes mojAcp and mayAcp, respectively) and D. mojavensis
predicted ORF’s in the correspondent genomic region. Gene predictions were obtained
by Oxford pipeline (TRdmoj prefix), GeneWise (GI_EISE; BIRNEY et al. 2004), Gnomon
(GI_NCBI; http://www.ncbi.nlm.nih.gov/genome/guide/gnomon.html), and Contrast
(GI_BATZ; GROSS et al. 2005).
Supplemental Figure 3b. Maximum likelihood tree depicting relationships among
members of Cluster 11. Notice the long branches between paralogs as opposed to shorter
ortholog branches’ lengths. Prefix of gene names refer to the species they belong to: D.
mojavensis (moj), D. arizonae (ari), and D. mayaguana (may).
Supplemental Figure 4. Maximum likelihood tree depicting relationships among
members of Cluster 12.
Supplemental Figure 5. Maximum likelihood tree depicting relationships among
members of Cluster 13.
Download