Supplementary Information for detectMITE: A novel approach to

advertisement
Supplementary Information
for
detectMITE: A novel approach to detect miniature inverted repeat
transposable elements in genomes
Congting Ye, Guoli Ji and Chun Liang
Supplementary Figures
Figure S1 …………………………………………………………………………… 2
Figure S2 …………………………………………………………………………… 3
Figure S3 …………………………………………………………………………… 4
Figure S4 …………………………………………………………………………… 5
Figure S5 …………………………………………………………………………… 6
1
A >SQ261080185Oryza sativa, chromosome:Chr04, from MITE
family DTT_Ors37, SuperFamily Tc1/Mariner, complete sequence
TACTACCTCCGTCCTATATTACTTGCTTTTTTGAGTTTTTTAAGTTTTTGTTTGTCAATGTTTGATCATT
CGTCTTATTCAAATTTTTTTGGAATTATTATTTATTTTGTTTGTCATTTGCTTTATTATCAAAAGTACTT
TACATATGACTTATCTTTTTTTATATTTACACTAATTTTTCAAATAAAATGAGTTTTTGTTTGTCAATGT
TTAATCATTCGTCTTATTCAATTTTTTTTGGAATTATTATTTATTTTGTTTGTCATTTGCTTTATTATCA
AAATTACTTTACATATGACTTATTTTTTTTTATATTTGCACTAATTTTTCAAATAAAACGAGTTTATGTT
TGTCAATATTTGATCATTCGTCTTATTCAAAATTTTTTAGAATTATTATTTATTTTGTTTGTCATTTGCT
TTATTATCAAAAATACTTTACATATGACTTATCTTTTTTTATATTTGCACTAATTTTTCAAATAAAACGA
ATGGTTAAACGTTGCAAATAAAAAATCAAAAACGTCACCTATTATGGAACGGAGGGAGTA
B >SQ265159547 Oryza sativa, chromosome:Chr02, from MITE
family DTH_Ors8, SuperFamily PIF/Harbinger, complete sequence
GGCCCCCATCGGTTGGCTTTTTTTTTCTAATAAGGCAAAACGGTTTATCAGGGAATAAAAAAAATTATAG
GTAAAACTTATATATATATATATATATATATATATATATATATATATATACATACATACATATATATATA
TATACATACATACATATATATATATATACATACATACATATATATATATATATACATACATACATATATA
TATATATACATACATACATATATATATATATATACATACATACATATATATATATATATACATACATACA
TATATATACATATATATATGTGTGTGTGTGTTTTAACTTAAAAGCCAATGCTGAAAAAAATACGTTGAAA
ATATATCAAAATTAATCTCAAAATTAAGTTTGAAAATTCAAAATTTGGCTTATTCTTTAGCTTATTGGGC
CATCTGATGGGAGCC
C >SQ262021977 Oryza sativa, chromosome:Chr10, from MITE
family DTM_Ors4, SuperFamily Mutator, complete sequence
CTGGATTTTTCACATTTGGGTCCTTTTGAAAAACTTATTTTGCAAATAGACCCTGGAAAAACTTATCCCA
GAAATAGTCCTTTTTGGGGCGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGACGACGGCGCCAACCACTCTGGTGCCCCAAAAAGGAAC
CATTTCTGGAATAAGTTTTTCCAGGGGTCTACTTGCGAAATAAGTTTTTTAAAAGGGCCAAAATGTGAAA
AATCCAG
D >Rice_3_55765 Unknow 3
AATGGGCGATCGATCGCCTGGGGATGGGGGTAGCGATCAATCCCCTCCCCCTCCCTCCCTCCACCTGGTT
TCCTTTTTTGGCACCGCATTACTTTCCTATTTTAGTAAATTTATGCACCTAAAGTTTATACACCTCAAGT
TTACACATCTAAAGTTTAGAGACCAAAAGTTTATAAGTCAAAAGTTTATATATCCGATTCAAATTTGAAT
TTGAATTCAAATATTTTTTATATATAGTATTTCTATACATCTAAAGTTTATACACCTAAAGTTTATAGAC
CCAAAGTTTATAAGTCAAAAGTTTACATACCCGTTTCAAATTTGAATTTGAATTATATCTGATTCAAATT
TGAATTTGAATTCAAATATTTTCTATATATAGTATTTTTATGCATCTAAAGTTTATACACCTAAAGTTTA
TAGACCTAAAGTTTATAAGTCAAAAGTTTACATACCCGATTCAAATTTGAATTTGAATTATATCCGATTC
AAATTTGAATTTGAATTCAAATATTTTCTATATATAGTATTTCTATACATCTAAAGTTTATACACCTAAA
GTTTATAGACCCAAAGTTTATAAGTCAAAAGTTTACATACCCGATTCAAATTTGAATTTGAATTCAAATT
TTTTATATATAGTATTTCTATACATAAATTTTTCTAACTTTTGTTTTTTTTAAAAAAATTTGTGTGGTGT
ACTGTAGTAGGAAGAGAAGAAGGGGAGGAGGAAGGGGGGAGAGGAGGGAGGAGTGTATCGAGTATAGGGG
AGGGGGGGCGGATCTGATCGCTGGGCGGATGGCGTGGCGATCA
Figure S1. Examples of low complexity MITEs identified by the Lempel-Ziv
complexity algorithm in the rice genome from the detection outputs of
MITE-Hunter and RSPB.
(A) A sequence containing tandem repeats in the output of RSPB. (B) A sequence
mainly consisting of ‘AT’ dinucleotide repeats in the output of RSPB. (C) A sequence
containing too many unknown bases in the output of RSPB. (D) A sequence
containing tandem repeats in the output of MITE-Hunter.
2
A
3'
5'
……
B
TIR
TIR
……
C
TIR
TIR
……
D
TIR
TIR
……
Figure S2. Examples of the 795 groups of MITE sequences in the rice genome
uniquely identified by RSPB, but not by detectMITE.
(A) Sequences that do not bear terminal inverted repeats (TIRs). (B) TIRs of
sequences that have too many mismatches or non-complementary pairs. (C) A/T
content of TIRs is too high. (D) Number of full-length copies of the MITE sequences
possessing good TIRs is less than 3 across the genome (i.e., the second and the last
sequences do not have good TIRs - mismatched pairs in the stem≥3). Relevant data
are available at
http://sourceforge.net/projects/detectmite/files/Supplementary_Data.7z.
3
Figure S3. Examples of MITE super-families (represented by family member)
uniquely detected by detectMITE among the pairwise comparison with MITE
Digger, MITE-Hunter, and RSPB individually.
(A) A MITE super-family (family_2578 in super-family_1295) missed by MITE
Digger. (B) A MITE super-family (family_3151 in super-family_1445) missed by
MITE-Hunter. (C) A MITE super-family (family_4493 in super-family_1757) missed
by RSPB. (D) Three MITE super-families (family_3080 in super-famiy_1419;
family_1254 in super-family_660; family_1151 in super-family_587) missed by
MITE Digger, MITE-Hunter and RSPB together. Relevant data are available at
http://sourceforge.net/projects/detectmite/files/Supplementary_Data.7z.
4
A >6|31213569|31213694|2|3;family_378;super-family_161
>ORSgTEMT01701756 gi|19223840|nt141213-141384 putative MITE, MITE-adh,
type D-like
AGACTTTCTAGCATTGCCCACATTCATATAGATGTTAATGAATCTGGACATAACATCTGTATGAATGTGGGAT
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
AGACTTTCTAGCATTGCCCACATTCATATAGATGTTAATGAATCTGGACATAACATCTGTATGAATGTGGGAT
ATATGTCTAGATTCATTAATATCTATATGAATGTGGGCAATGCTAGAAAGTCT
|||||||||||||||||||||||||||||||||||||||||||||||||||||
ATATGTCTAGATTCATTAATATCTATATGAATGTGGGCAATGCTAGAAAGTCT
B >5|829372|829569|2|3;family_2688;super-family_1324
>ORSgTEMT01602375 gi|18656390|nt97785-98015 putative MITE, MITE-adh, type
B-like
TATGACACCATCGACTTTTTAACAAACATTTGTCCATTCATCTTATTCAAATTCTTTTATGCAAATATAAAAA
||||||:||:|:||| |||||||:|||:||||:||||||:||||||||||||| |||||||||||||||||||
TATGACGCCGTTGAC-TTTTAACCAACGTTTGACCATTCGTCTTATTCAAATT-TTTTATGCAAATATAAAAA
AAAATAAGTCATGCTTAAAGAATATTTGAAGATAAATCAAGTCACAATAAAATAAATAATAATTATATGTATT
:|::||:|||||:|||||||||:||||||:|||||||||||||||||||||||||||||||||||:||::|||
TACTTATGTCATACTTAAAGAACATTTGATGATAAATCAAGTCACAATAAAATAAATAATAATTACATAAATT
TTTTGAATAATACAAATGGTCAAATGTATCCCAAAAAGTCAACGGTGTCATA
||||||||||:||:||:|||||||:||:|:|||||||||||||||:||||||
TTTTGAATAAGACGAAAGGTCAAACGTTTATTAAAAAGTCAACGGCGTCATA
C >11|15010637|15010746|6|3;family_695;super-family_279
>ORSgTEMT01701949 gi|23307556|nt16547-16700 putative MITE, MITE-adh, type
D-like
GACTTTCTAGCATTGTCCACATTCATATAGATGTTAATGAATCCAGGCGCATATATATATATTTCTAGATTCA
|||||||||||||||:|||:|||:|||||:|||||||||||||:|
: ||||||||||:|:||||||||||
GACTTTCTAGCATTGCCCATATTTATATATATGTTAATGAATCTA---A-ATATATATATGTGTCTAGATTCA
TTAATATATATATGAATATAGACAATGCTAGAAAGTC
||||:||:|||||||||:|:|||||||||||||||||
TTAACATCTATATGAATGTGGACAATGCTAGAAAGTC
Figure S4. Examples of MITE super-families (represented by family member)
detected by detectMITE, shared by MITE-Hunter but missed by MITE Digger,
have valid blast matches (e-value ≤10-10) against the TIGR Plant Repeat
Database.
In each alignment, the upper sequence is the MITE sequence detected by detectMITE,
and the lower sequence is the sequence annotated in the TIGR Plant Repeat Database.
| stands for two identical nucleotides. : stands for two different nucleotides. - stands
for a gap. Relevant data are available at
http://sourceforge.net/projects/detectmite/files/Supplementary_Data.7z.
5
A >4|20606447|20606593|3|5;family_26;super-family_25
>ORSgTEMT00500006 gi|21912505|nt113703-113850 putative MITE, Gaijin/
Gaigin-like
GGCCGTGTTTAGTTTCAAAGTTTTTCTTCAAACTTCTAACTTTTCTATCACATCGAAACTTTCTTACACACAT
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
GGCCGTGTTTAGTTTCAAAGTTTTTCTTCAAACTTCTAACTTTTCTATCACATCGAAACTTTCTTACACACAT
AAACTTATAACTTTTCCATCACATCGTTCCAATTTTAACCAAACTTTTAATTTTGACGTGAACTAAACACAGC
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
AAACTTATAACTTTTCCATCACATCGTTCCAATTTTAACCAAACTTTTAATTTTGACGTGAACTAAACACAGC
C
|
C
B >11|3151373|3151519|3|4;family_23;super-family_23
>ORSgTEMT00501140 gi|24137620|nt71791-71930 putative MITE, Gaijin/Gaiginlike
---GGCTGTGTTTAGATTCAAAATTTGGATCTAAACTTTAAACTTCAGTCCTTTTCCGTCACATCAACCCATC
||||||||||||||:|||||||||||||
|
||||||||||||||||||:|||:|||||||::||
TAAGGCTGTGTTTAGATCCAAAATTTGGATC----C---AAACTTCAGTCCTTTTCCATCATATCAACCTGTC
ATACACATACAACTTTTCAGTCACATCATCTTCAATTTTAACCAAAATCCAAACTTCCCCCTCAACTAAACAC
|||||||:||||||||||||||||||||||||||||||:||||||||||||||||||||||||||||||||||
ATACACACACAACTTTTCAGTCACATCATCTTCAATTTCAACCAAAATCCAAACTTCCCCCTCAACTAAACAC
AGCC
|
A--C >1|12802401|12802508|3|3;family_738;super-family_380
>ORSgTEMT01701078 gi|13603465|nt65492-65639 putative MITE, MITE-adh, type
D-like
AAGTCATTCTAGCATTTTCCACATCCATATGGATGTTAGTGAATCTAGACACATATA--TATCTAGATTCACT
||||||||||||||||||||||||||||||:|:|||||:||||||||||||:||||| ||||||||||||:|
AAGTCATTCTAGCATTTTCCACATCCATATTGTTGTTAATGAATCTAGACATATATATTTATCTAGATTCATT
AACATCCATATGTATGTAAAAAAATCTAGAATGACTT
||:|||:|||||:||:|:|||||::||||||||||||
AATATCAATATGAATATGAAAAATGCTAGAATGACTT
Figure S5. Examples of MITE super-families (represented by family member)
detected by detectMITE but missed by both MITE Digger and MITE-Hunter have
valid blast matches (e-value ≤10-10) against the TIGR Plant Repeat Database.
In each alignment, the upper sequence is the MITE sequence detected by detectMITE,
and the lower sequence is the sequence annotated in the TIGR Plant Repeat Database.
| stands for two identical nucleotides. : stands for two different nucleotides. - stands
for a gap. Relevant data are available at
http://sourceforge.net/projects/detectmite/files/Supplementary_Data.7z.
6
Download