Transposable elements composition in 10 sequenced regions of the

advertisement
Supplemental Text 2. Sequence annotation of 10 BAC clones from wheat chromosome
3B: Transposable element prediction, annotation, classification and composition.
TEs prediction, annotation, classification and nomenclature were performed essentially as
suggested by the unified classification system for eukaryotic TEs (WICKER et al. 2007)
with two exceptions. The Sukkula were considered as Gypsy because of similarities with
the Erika (Gypsy) elements. The Athila retrotransposons were analyzed separately from
the other Gypsy retrotransposons (see also Supplemental Method 1 for detailed
classification and annotation method).
The 79.1% of TEs space were shown to be composed of a wide variety of TEs,
distributed as follows: 61.9% for class I (171 TEs from 48 families), 16.2% for class II
(113 TEs from 28 families) and 1% for unclassified TEs (18 TEs of nine families)
(Figure 1). Transposable elements distribution is not homogeneous or random across the
10 sequenced genomic regions, which map to different locations of the chromosome 3B.
While class I retrotransposons constitute the highest TEs proportion of eight sequenced
regions, BAC clone TA3B54F7 shows the highest proportion of CACTA class II (40.5%),
while the smallest BAC clone TA3B63C11 (21.23 kb) carry no TEs (Figure 1). On the
other hand, there are no clear relationships between sequence composition of the 10
genomic regions and their BIN map position on the chromosome 3B (Figure 1). For
example BAC clones TA3B63B7, TA3B95F5, TA3B63N2 and TA3B54F7, which map
on the , deletion BIN 3BL7 of the long arm of chromosome 3B, show different sequence
classes and TEs proportions.
Charles et al. Supplemental_Text-2
1
Class I transposable elements
The 61.9% class I TEs were composed of 171 TEs belonging to 48 families. Three main
retrotransposon superfamilies constitute the majority of class I TE DNA sequences, as
follows: 10.8% Athila- (37 TEs from four families), 30.8% Gypsy- (57 TEs of 14
families) and 14.7% Copia- (45 TE from 10 families) like ‘long terminal repeats (LTR)’retrotransposons (Figure1, see also supplemental Table 1 for details). With the exception
of BAC clones TA3B63C11 (no TE detected), class I TEs composition range between
32.5% and 89.7%, depending on the sequenced region (Figure 1C).
Class II transposable elements
Class II TEs represent 16.2% of the cumulative sequence length (113 TEs from 28
families) and are composed of 54 CACTA, 56 MITEs and 3 LITEs. In term of sequence
representation, the CACTA TEs represent the majority (96%) of class II DNA sequences.
As in previous studies with Triticeae (WICKER et al. 2003a, 2005) the CACTA
transposons were often found clustered in the genome. This is particularly the case for
BAC clone TA3B54F7 where 15 CACTA TEs (complete and truncated) were found,
representing 40.5% of the 190 kb (Figure1 and Supplemental Table 1). It is also the case
of BAC clones TA3B95G2, TA3B95C9, TA3B95F5 and TA3B63N2, each containing 613 CACTA-like elements (complete and truncated) representing 15-20% of the BACs
sequence lengths. The other five BAC clones are relatively CACTA-poor regions
containing 0 to 2 CACTA TEs.
Charles et al. Supplemental_Text-2
2
Novel transposable elements
Twenty-one transposable element families were identified for the first time in this study
(Figure 1, indicated by arrows), four of which are present in several copies. Description
of these novel TEs, their features, characteristics as well as the suggested nomenclature
are presented in Supplemental Table 5. They account for 9.8% by number and 7.9% by
length of the overall sequences.
Class I retrotransposons are the category for which we found the majority of novel TE
families (17). From these, 11 novel LTR class I retrotransposon families were identified.
Three novel LTR retrotransposons show stretches of weak similarities with known
Copia-like and three other with known Gypsy retrotransposon families. They were
designated with new family names, based on the TE classification guidelines (WICKER et
al. 2007), and considered as belonging to the same super-families of the referenced TE
with which they show the highest similarity (Supplemental Table 5).
We were not able to assign five of the novel LTR retrotransposon families to any of the
three LTR retrotransposon superfamilies. Three of them (Marina, Camillia, and Cathia)
do not show matches in their LTR or internal domains with known LTR retrotransposon
families (and superfamilies) and were identified based on their structural features
(Supplemental Table 4). Two of these seem complete and the LTR_STRUC program
(MCCARTHY et al. 2003) predicts two LTRs, with target site duplications (TSD) as well
as predicted polynucleotide binding site (PBS) and a polypurine tract (PPT) signatures.
The novel retrotransposon Cathia has its 3’ LTR truncated, but LTR_STRUC predicts a
putative PBS and a PPT (Supplemental Table 5) after adjustments of parameters.
Charles et al. Supplemental_Text-2
3
Overall, seven novel LTR-retrotransposons (from six novel families) have 5’ and 3’
LTRs and target site duplication (TSD) motifs, which allow the estimation of their
insertion dates (see Results: Insertions dates and proliferation of LTR-retrotransposons).
Three novel non-LTR class I retrotransposons were identified as LINE families and the
remaining three novel class I TEs show weak similarities with class I TE polyproteins
and could not be assigned as LTR or non-LTR class I TEs.
The three novel class II TE families include one CACTA and two MITEs of new families,
sharing weak homologies with known CACTA and MITE.
The two copies of the novel unclassified element that we named Aurelie share a stretch of
weak homology with other Triticeae unclassified transposable elements (Supplemental
Table 5).
Charles et al. Supplemental_Text-2
4
Download