The CAB group (responsible Maria Luisa Chiusano) runs the GenomeThreader program versus the BAC sequences. The GenomeThreader is used to create splice-alignments of each EST versus the S.lycopersicum BAC sequences. For the iTAG pipeline the alignments are filtered out with 95% identity and 90% coverage for both tomato species and other solanaceae while in ISOL@ (Chiusano et al. 2008) the threshold values are 90% identity and 80% coverage. EST sequences are downloaded from the dbEST division of GenBank (current release is updated to October 2008). Up to now, different sources are available for genome annotation: TOMATO: o SOLLC = Solanum lycopersicum; o SOLHA = Solanum habrochaites; o SOLPN = Solanum pennellii; o SOLLP = Solanum lycopersicum X Solanum pimpinellifolium; OTHER_SOLANACEAE: o SOLTU = Solanum tuberosum; o SOLCH = Solanum chacoense; o CAPAN = Capsicum annuum; o CAPCH = Capsicum chinense; o TOBAC = Nicotiana tabacum; o NICBE = Nicotiana benthamiana; o NICSY = Nicotiana sylvestris; o NICAT = Nicotiana attenuata; o NICLS = Nicotiana langsdorffii x Nicotiana sandera; o PETHY = Petunia x hybrida; OTHER_RELATED_SPECIES (RUBIACEAE): o COFCA = Coffea canephora; o COFAR = Coffea arabica; The last iTAG batch contains 20 sequences, the longest contigs from both the French and Dutch assemblies. We mapped the EST dataset with both the filter settings to evaluate the best methodology to follow for the complete genome annotation. Here follows the specific tables/figures that can be useful to be considered for both the genome sequencing and the annotation. length cabog_fr_ctg1260083 cabog_fr_ctg1278170 cabog_fr_ctg1278984 cabog_fr_ctg1296143 cabog_fr_ctg1296595 cabog_fr_ctg1297914 cabog_fr_ctg1299069 cabog_fr_ctg779464 cabog_fr_ctg807586 cabog_fr_ctg807802 newbler_nl_contig06158 newbler_nl_contig06218 newbler_nl_contig09955 newbler_nl_contig09971 newbler_nl_contig11147 newbler_nl_contig118501 newbler_nl_contig171271 newbler_nl_contig21291 newbler_nl_contig21309 newbler_nl_contig77086 152974 163078 165237 158944 155576 237842 157395 172121 153983 203723 244520 208292 249379 204436 204889 194932 197254 200928 238687 201077 80% coverage & 90% identity 90% coverage & 95% identity mapped SOLLC_EST mapped SOLLC_EST SOLLC_EST (nt) 2 8 7 0 11 3 12 2 5 0 1 3 1 4 6 1 0 0 6 0 1511 2571 2211 0 5271 1203 15132 462 757 0 582 1090 323 1034 1822 707 0 0 2539 0 SOLLC_EST (nt) 0 0 0 0 2 0 11 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 2223 0 15132 0 0 0 0 602 0 0 0 0 0 0 745 0 Table 1. S. lycopersicum ESTs mapped along the ITAG batch002 contigs. For each contig the table 1 reports: the sequence length, the number of ESTs of S.lycopersicum (SOLLC) mapped on that contig and the corresponding EST coverage in nucleotide (extension in nucleotide of the exon+intron length). The quality of accepted alignments is 80% coverage and 90% identity, the same used on our genome platform (ISOL@, Chiusano et al.2008). The last two columns report the same content considering the quality of the alignments decided by iTAG, i.e. 90% coverage and 95% identity. Data concerning EST coverage in nucleotide for all the contis and from all the SOLEST collection (SOlanaceae EST collection at CAB) Table A) 80% coverage 90 %identity. Table B) 90%coverage 95%identity. Figura 2. A) Solanum lycopersicum EST coverage on 1307 BACs (GenBank, October 2009). For each S.lycopersicum BAC sequence annotated in the ISOL@ platform we report the S.lycopersicum EST coverage in (%). B) coverage of the actual batch002. Please notice the different scale on the y-axis.