Supplemental Text 2. Sequence annotation of 10 BAC clones from wheat chromosome 3B: Transposable element prediction, annotation, classification and composition. TEs prediction, annotation, classification and nomenclature were performed essentially as suggested by the unified classification system for eukaryotic TEs (WICKER et al. 2007) with two exceptions. The Sukkula were considered as Gypsy because of similarities with the Erika (Gypsy) elements. The Athila retrotransposons were analyzed separately from the other Gypsy retrotransposons (see also Supplemental Method 1 for detailed classification and annotation method). The 79.1% of TEs space were shown to be composed of a wide variety of TEs, distributed as follows: 61.9% for class I (171 TEs from 48 families), 16.2% for class II (113 TEs from 28 families) and 1% for unclassified TEs (18 TEs of nine families) (Figure 1). Transposable elements distribution is not homogeneous or random across the 10 sequenced genomic regions, which map to different locations of the chromosome 3B. While class I retrotransposons constitute the highest TEs proportion of eight sequenced regions, BAC clone TA3B54F7 shows the highest proportion of CACTA class II (40.5%), while the smallest BAC clone TA3B63C11 (21.23 kb) carry no TEs (Figure 1). On the other hand, there are no clear relationships between sequence composition of the 10 genomic regions and their BIN map position on the chromosome 3B (Figure 1). For example BAC clones TA3B63B7, TA3B95F5, TA3B63N2 and TA3B54F7, which map on the , deletion BIN 3BL7 of the long arm of chromosome 3B, show different sequence classes and TEs proportions. Charles et al. Supplemental_Text-2 1 Class I transposable elements The 61.9% class I TEs were composed of 171 TEs belonging to 48 families. Three main retrotransposon superfamilies constitute the majority of class I TE DNA sequences, as follows: 10.8% Athila- (37 TEs from four families), 30.8% Gypsy- (57 TEs of 14 families) and 14.7% Copia- (45 TE from 10 families) like ‘long terminal repeats (LTR)’retrotransposons (Figure1, see also supplemental Table 1 for details). With the exception of BAC clones TA3B63C11 (no TE detected), class I TEs composition range between 32.5% and 89.7%, depending on the sequenced region (Figure 1C). Class II transposable elements Class II TEs represent 16.2% of the cumulative sequence length (113 TEs from 28 families) and are composed of 54 CACTA, 56 MITEs and 3 LITEs. In term of sequence representation, the CACTA TEs represent the majority (96%) of class II DNA sequences. As in previous studies with Triticeae (WICKER et al. 2003a, 2005) the CACTA transposons were often found clustered in the genome. This is particularly the case for BAC clone TA3B54F7 where 15 CACTA TEs (complete and truncated) were found, representing 40.5% of the 190 kb (Figure1 and Supplemental Table 1). It is also the case of BAC clones TA3B95G2, TA3B95C9, TA3B95F5 and TA3B63N2, each containing 613 CACTA-like elements (complete and truncated) representing 15-20% of the BACs sequence lengths. The other five BAC clones are relatively CACTA-poor regions containing 0 to 2 CACTA TEs. Charles et al. Supplemental_Text-2 2 Novel transposable elements Twenty-one transposable element families were identified for the first time in this study (Figure 1, indicated by arrows), four of which are present in several copies. Description of these novel TEs, their features, characteristics as well as the suggested nomenclature are presented in Supplemental Table 5. They account for 9.8% by number and 7.9% by length of the overall sequences. Class I retrotransposons are the category for which we found the majority of novel TE families (17). From these, 11 novel LTR class I retrotransposon families were identified. Three novel LTR retrotransposons show stretches of weak similarities with known Copia-like and three other with known Gypsy retrotransposon families. They were designated with new family names, based on the TE classification guidelines (WICKER et al. 2007), and considered as belonging to the same super-families of the referenced TE with which they show the highest similarity (Supplemental Table 5). We were not able to assign five of the novel LTR retrotransposon families to any of the three LTR retrotransposon superfamilies. Three of them (Marina, Camillia, and Cathia) do not show matches in their LTR or internal domains with known LTR retrotransposon families (and superfamilies) and were identified based on their structural features (Supplemental Table 4). Two of these seem complete and the LTR_STRUC program (MCCARTHY et al. 2003) predicts two LTRs, with target site duplications (TSD) as well as predicted polynucleotide binding site (PBS) and a polypurine tract (PPT) signatures. The novel retrotransposon Cathia has its 3’ LTR truncated, but LTR_STRUC predicts a putative PBS and a PPT (Supplemental Table 5) after adjustments of parameters. Charles et al. Supplemental_Text-2 3 Overall, seven novel LTR-retrotransposons (from six novel families) have 5’ and 3’ LTRs and target site duplication (TSD) motifs, which allow the estimation of their insertion dates (see Results: Insertions dates and proliferation of LTR-retrotransposons). Three novel non-LTR class I retrotransposons were identified as LINE families and the remaining three novel class I TEs show weak similarities with class I TE polyproteins and could not be assigned as LTR or non-LTR class I TEs. The three novel class II TE families include one CACTA and two MITEs of new families, sharing weak homologies with known CACTA and MITE. The two copies of the novel unclassified element that we named Aurelie share a stretch of weak homology with other Triticeae unclassified transposable elements (Supplemental Table 5). Charles et al. Supplemental_Text-2 4