Second Tomato Finishing Workshop, Apr. 24-25, 2008 Chromosome 8 Sequencing: Current Status and Future Prospects toward Finishing Shusei Sato, Erika Asamizu, Takakazu Kaneko, Hiroyuki Fukuoka, Satoshi Tabata Distribution of Anchor Markers on Chromosomes Initial seeds on Chr.8 92 165 1.8 79 67 143 171 1.8 2.6 62 137 2.2 40 119 3.0 63 101 1.6 51 112 2.2 33 87 2.6 40 116 2.9 41 87 2.1 43 103 2.4 39 # anchors 120 cM chr length 3.1 cM per anchor Sequence strategy We have been taking the same strategy applied in the Lotus japonicus genome project. <Shotgun sequencing of BAC clones> •vector for the shotgun clone: •insert size of the shotgun clone: pUC118 ca. 3 kb •template DNA preparation: •sequencing chemistry: •sequencer: TempliPhi •gap closing in finishing phase: AB Big Dye Terminator AB 3730 primer walking shotgun clone or BAC direct <Walking from seed clones> • BAC end sequence database Problem • It is impossible to continue walking from the small number of seed points – Extension terminated at 18/33 seed points Complementary Efforts in Japan 1. Development of EST-derived new microsatellite markers to obtain more seed points for sequencing 2. Gap filling by an alternative sequencing strategy Development of New Microsatellite Markers MiBASE(http://www.kazusa.or.jp/jsol/microtom/indexj.html) EST unigenes (26,363) Full-length cDNA (57,422) 2,627 SSR 522 have already been mapped 2,105 new EST SSR Summary of EST-SSR Marker analysis 712 markers have been mapped on EXPEN2000 chr1 chr2 chr3 chr4 chr5 chr6 78 66 74 60 62 52 chr7 chr8 chr9 chr10 chr11 chr12 56 62 52 49 50 51 34 new seed clones have been selected Status of Chr.8 Sequencing Status Nr. BACs Finished 128 Phase 2 5 Assembling 12 Sequencing 22 Total 167 Finished length without overlap: 12,562,802 bp 67 seed points (42 contigs, 13 single clones) Troubled clones Clones finished in phase 2 C08HBa0050P21: Presence of a long (AT) cluster C08SLm0144I10: Presence of a long (AT) cluster C08HBa0045I24: Presence of a long (C) cluster C08HBa0202N15: Presence of highly similar repeat sequences C08SLe0126A12: Presence of highly similar repeat sequences Gap Filling by Whole Genome Shotgun Sequencing • Selected BAC Mixture (SBM) shotgun 1. Select BACs whose end sequences do not contain undesired (repeat) sequences 2. Mix the BACs and sequence by shotgun BAC Repeat Gene space BLASTN BES vs. RepeatDB 402,012 end sequences from 177,408 BAC clones vs. 14,229 repeat sequences (TIGR_SolAth_repeat, mips_repeat_collection, SGN repeat collection) Percentage LTR 49.7 Unknown 30.3 rRNA 7.4 Satellite 1.6 Simple repeat 1.5 DNA 1.4 LINE 1.3 Library Both ends are repeat One end is repeat Both ends are NOT repeat HBa 19,123 35,277 26,181 Eco RI 9,533 18,594 15,516 Mbo I 15,570 20,995 13,134 Total 44,226 74,866 54,831 Source of selected BAC mixture Selected BAC Mixture (SBM) shotgun sequencing 10,000 clones from HBa library 5,000 clones from EcoRI library 5,000 clones from MboI library 20,000 BAC clones Six-times the euchromatin coverage Status of SBM Shotgun Sequences ・Sequencing started in Feb. 2007. ・As of April 2007, 2.2 million sequences has been accumulated. (total length is 1.5 Gbp) ・Assembled into 193,330 contigs. Total size of the contigs is ~ 484 Mbp. longest contig: 17,702 bp, > 5kb contigs: 21,230 Dept. Plant Genome Research Satoshi Tabata Erika Asamizu Shusei Sato Sequencing Shigemi Sasamoto Akiko Watanabe Tsuyuko Wada Akiko Ono Ai Matsuno Midori Kato Kumiko Kawashima Yoshimi Shimizu Chiharu Minami Chika Takahashi Molecular Genetics and Physiology team Marker Takakazu Kaneko Naomi Nakazaki National Institute of Vegetable and Tea Science Hiroyuki Fukuoka Satomi Negoro Yumika Kitamura