Mutator is a family of DNA transposons found in the genomes of

advertisement
Supplemental Methods
Identification of MULE-related consensus sequences from the RECON output
The 266 MULE-related consensus sequences were identified based on the 3300 repeat
families recovered by RECON as follows: (1) the sequences of repeat families were used
as queries to search against the sequences of previously characterized MULEs in rice1-3.
If a sequence is similar to known MULEs (BLASTN E < 10-10), it is considered to be a
MULE related sequence; (2) the sequences of repeat families were used to search against
proteins in GenBank (downloaded on Feb. 15, 2003). If a sequence is similar to known
Mutator-like proteins (BLASTX E< 10-10), it is considered to be a MULE related
sequence; (3) if a sequence was not similar to any known TEs, the following procedure
was used to define new MULE TIRs since many consensus sequences in the RECON
output represent a single MULE TIR. First, the relevant sequence was used to search the
rice genome database and at least 20 hits (if there are 20 or more hits, BLASTX E< 10-10)
and the100 bp flanking sequence on each side of the hits were recovered. The recovered
sequences were then aligned using “pileup” in GCG (see Methods), with the resulting
output examined for the presence of possible border between putative elements and their
flanking sequences. A border was defined if the sequence homology stops at the same
position for more than half of the aligned sequences, and the 10 bp sequence at the
termini of the putative element was compared with known MULEs. If the 10 most
terminal nucleotides were similar (at least 6 out of 10 bp are identical) to any known
MULE termini (see Supplemental Table 1 for possible combinations of the most terminal
sequence), the consensus sequence was considered to be a TIR candidate. To test whether
the candidate represents the TIR of a MULE, the 10 kb flanking sequences of the relevant
hits were searched for the presence of the same candidate sequence in an inverted
orientation. If such a pair of sequences was found and a 9 bp TSD was identified
immediately beside the termini, it was considered to be a MULE. If for a given consensus
five such elements were found, the consensus was considered to be a MULE TIR.
PCR amplification of Pack-MULE fragments from Nipponbare DNA
To further confirm the presence of Pack-MULEs in the rice genome, PCR experiments
were performed for the 3 Pack-MULEs described in Figure 3, and for 10 randomly
selected Pack-MULEs from the 100 Pack-MULEs (from chromosomes 1 and 10)
analyzed in this study. Primer location is diagrammed in Supplemental Figure 2A (also
see Figure 3) with a generic Pack-MULE: each of the 13 Pack-MULEs tested was
amplified with two pairs of primers (purple and blue), with the two internal primers
located in the acquired region. In this way the two amplicons should cover the whole
element plus some flanking sequence. For 4 of the Pack-MULEs tested, significant
secondary structure necessitated either digestion of genomic DNA with a restriction
enzyme prior to PCR amplification or the use of 76oC as extension temperature
(compared to other reactions which were at 72 oC, see Supplemental Table 7 for details).
For all elements tested, fragments of the anticipated size were obtained (Supplemental
Table 7 and Supplemental Figure 2B).
A “touchdown” protocol was used for the PCR amplification, i.e., the annealing
temperature starts at 6 degree higher than the final annealing temperature and then
reduced to that temperature in 1oC increments each cycle. The temperature cycling
parameters are: 94 oC 3min; 94 oC 45 sec, (A+6)-(A+1)oC 45 sec, 72 oC 60 sec, touch-
down ; 94 oC 45 sec, A oC 45 sec, 72 oC 60 sec for 32 cylces ; a final cycle of 72 oC for 3
min; where A stands for the final annealing temperature for individual reaction
(Supplemental Table 7).
Control experiment for Ka/Ks analysis
The Ka/Ks analysis indicated that 18 out of the 54 sequence pairs (MULE vs genomic
homolog) have potentially been under purifying selection (p < 0.05). Such a value (19%,
since the 54 sequence pairs derived from 100 Pack-MULEs) is much higher than that is
expected by chance (5%), and can be explained in two ways. It may indicate that many of
the Pack-MULEs have been functional. Alternatively, the high value could be an artifact
due to the misidentification of the putative genomic homolog. This could occur, for
example, when the genomic homolog and its paralog were duplicated and diverged under
purifying selection. Thereafter the genomic homolog was captured by the Pack-MULE,
followed by its deletion from the genome (or the genomic homolog may be missing from
the available database, e.g., located in sequencing gaps). In this case, the paralog was
identified as the genomic homolog because it was the closest related copy of the PackMULE sequence in the database. As a result, the low Ks/Ka value would reflect the
purifying selection between genomic paralogs instead of that between Pack-MULE and
its genomic homolog. If a paralog was mistakenly identified as its genomic homolog, it
will be signified by a relatively recent element associated with a distantly related
genomic homolog. To test the notion, the age of a Pack-MULE was estimated by
comparing the sequence similarity of its TIRs since transpositionally competent MULEs
have highly similar TIRs4-6. If some of the apparent genomic homologs are indeed
paralogs, it will more likely occur in elements where the sequence similarity between the
Pack-MULE and its genomic homolog is lower than or close to that of their TIRs.
Accordingly, if the observed purifying selection is an artifact and most of the PackMULE captured sequences sustained neutral drift, we would expect more examples of
purifying selection in this group of elements (refer to “the paralog group”, where the
value of “Internal minus TIR” is smaller or equal to 0 in Supplemental Table 10) than
that for other elements. The result indicates that among 7 sequence pairs in the “paralog
group”, two of them (29%) show significant purifying selection (Supplemental Table 10).
Because this value is not higher than the average (18/54 = 33%), it suggests that the
paralog-induced artifact is not a significant issue.
References
1.
Tarchini, R., Biddle, P., Wineland, R., Tingey, S. & Rafalski, A. The complete
sequence of 340 kb of DNA around the rice adh1-adh2 region reveals interrupted
colinearity with maize chromosome 4. Plant Cell 12, 381-391 (2000).
2.
Turcotte, K., Srinivasan, S. & Bureau, T. Survey of transposable elements from
rice genomic sequences. Plant J. 25, 169-179 (2001).
3.
Jiang, N. & Wessler, S. R. Mutator-like non-autonomous DNA transposons from
Oryza sativa. Repbase Reports 2, 8-13 (2002).
4.
Bennetzen, J. L. The Mutator transposable element system of maize. Curr. Top.
Microbiol. Immunol. 204, 195-229 (1996).
5.
Lisch, D. Mutator transposons. Trends Plant Sci. 7, 498-504 (2002).
6.
Chalvet, F., Grimaldi, C., Kaper, F., Langin, T. & Daboussi, M. J. Hop, an active
Mutator-like element in the genome of the fungus Fusarium oxysporum. Mol.
Biol. Evol. 20, 1362-75 (2003).
Download