The CAB group runs the GenomeThreader program versus the BAC

advertisement
The CAB group (responsible Maria Luisa Chiusano) runs the GenomeThreader program versus the
BAC sequences. The GenomeThreader is used to create splice-alignments of each EST versus the
S.lycopersicum BAC sequences.
For the iTAG pipeline the alignments are filtered out with 95% identity and 90% coverage for both
tomato species and other solanaceae while in ISOL@ (Chiusano et al. 2008) the threshold values
are 90% identity and 80% coverage.
EST sequences are downloaded from the dbEST division of GenBank (current release is updated to
October 2008). Up to now, different sources are available for genome annotation:



TOMATO:
o SOLLC = Solanum lycopersicum;
o SOLHA = Solanum habrochaites;
o SOLPN = Solanum pennellii;
o SOLLP = Solanum lycopersicum X Solanum pimpinellifolium;
OTHER_SOLANACEAE:
o SOLTU = Solanum tuberosum;
o SOLCH = Solanum chacoense;
o CAPAN = Capsicum annuum;
o CAPCH = Capsicum chinense;
o TOBAC = Nicotiana tabacum;
o NICBE = Nicotiana benthamiana;
o NICSY = Nicotiana sylvestris;
o NICAT = Nicotiana attenuata;
o NICLS = Nicotiana langsdorffii x Nicotiana sandera;
o PETHY = Petunia x hybrida;
OTHER_RELATED_SPECIES (RUBIACEAE):
o COFCA = Coffea canephora;
o COFAR = Coffea arabica;
The last iTAG batch contains 20 sequences, the longest contigs from both the French and Dutch
assemblies. We mapped the EST dataset with both the filter settings to evaluate the best
methodology to follow for the complete genome annotation.
Here follows the specific tables/figures that can be useful to be considered for both the genome
sequencing and the annotation.
length
cabog_fr_ctg1260083
cabog_fr_ctg1278170
cabog_fr_ctg1278984
cabog_fr_ctg1296143
cabog_fr_ctg1296595
cabog_fr_ctg1297914
cabog_fr_ctg1299069
cabog_fr_ctg779464
cabog_fr_ctg807586
cabog_fr_ctg807802
newbler_nl_contig06158
newbler_nl_contig06218
newbler_nl_contig09955
newbler_nl_contig09971
newbler_nl_contig11147
newbler_nl_contig118501
newbler_nl_contig171271
newbler_nl_contig21291
newbler_nl_contig21309
newbler_nl_contig77086
152974
163078
165237
158944
155576
237842
157395
172121
153983
203723
244520
208292
249379
204436
204889
194932
197254
200928
238687
201077
80% coverage & 90% identity
90% coverage & 95% identity
mapped
SOLLC_EST
mapped
SOLLC_EST
SOLLC_EST (nt)
2
8
7
0
11
3
12
2
5
0
1
3
1
4
6
1
0
0
6
0
1511
2571
2211
0
5271
1203
15132
462
757
0
582
1090
323
1034
1822
707
0
0
2539
0
SOLLC_EST (nt)
0
0
0
0
2
0
11
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
2223
0
15132
0
0
0
0
602
0
0
0
0
0
0
745
0
Table 1. S. lycopersicum ESTs mapped along the ITAG batch002 contigs.
For each contig the table 1 reports: the sequence length, the number of ESTs of S.lycopersicum (SOLLC) mapped on
that contig and the corresponding EST coverage in nucleotide (extension in nucleotide of the exon+intron length). The
quality of accepted alignments is 80% coverage and 90% identity, the same used on our genome platform (ISOL@,
Chiusano et al.2008). The last two columns report the same content considering the quality of the alignments decided
by iTAG, i.e. 90% coverage and 95% identity.
Data concerning EST coverage in nucleotide for all the contis and from all the SOLEST collection (SOlanaceae EST collection at CAB)
Table A) 80% coverage 90 %identity. Table B) 90%coverage 95%identity.
Figura 2. A) Solanum lycopersicum EST coverage on 1307 BACs (GenBank, October 2009). For each
S.lycopersicum BAC sequence annotated in the ISOL@ platform we report the S.lycopersicum EST coverage in (%). B)
coverage of the actual batch002. Please notice the different scale on the y-axis.
Download