Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology Summary of ESTs – Sep 13, 2002 Top Twelve Plants Glycine max (soybean) Hordeum vulgare (barley) Triticum aestivum (bread wheat) Zea mays (maize) Arabidopsis thaliana (thale cress) Medicago truncatula (barrel medic) Lycopersicon esculentum (tomato) Oryza sativa (rice) Solanum tuberosum (potato) Sorghum bicolor (sorghum) Lactuca sativa (lettuce) Pinus taeda (loblolly pine) Top Four Non-Plant Homo sapiens (human) Mus musculus + domesticus (mouse) Rattus sp. (rat) Drosophila melanogaster (fruit fly) 274,840 262,138 205,506 179,431 174,624 170,500 148,346 108,429 94,420 84,712 68,188 60,226 4,664,006 2,691,077 351,864 256,583 BLAST for Recognition of Undesirable Clones Summary of 84 Barley Libraries (ver. 0.90) # High quality sequences E. coli genome Lambda genome rRNA Chloroplast Mitochondrion Fungal cDNA Repetitive Elements Low complexity Odd vector Both polyA & polyT Total Good . % 282,720 507 39 6,075 2,664 204 366 289 1,194 37 28 271,317 0.18 0.01 2.15 0.94 0.07 0.13 0.10 0.42 0.01 0.01 96.0 Unigenes in ESTs in Current Assembly Ideally: one “unigene” per gene in the genome, expecting ~50,000 based on rice. Maximum unigene count in ESTs: the sum of the number of contigs and singletons following assembly: Contigs Singletons Total 24,208 24,899 49,107 Minimum unigene count in ESTs: the sum of the number of contigs and singletons that have good 3’ ends: Contigs Singletons Total 14,589 7,219 21,880 The Immediate Objective Microarray Chip Gene Expression Data http://www.affymetrix.com/ Barley 2H Caleosins Barley 2H 77cM Steptoe x Morex <0cM> Hvcal1 Hvcal2 EST alignment EST alignment Oscal1 BAC OSJB0004 Oscal2 < 8kb > 0cM 78.2cM Rice R4 Gene Map TIGR Rice Caleosin Gene Models OSCal01(R4) OSCal02(R4) OSCal03(R3) Comparison of Gene Structures of Barley and Rice Caleosins Caleosin1 Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Exon 6 Barley Rice 156 86 96 125 156 86 95 126 Caleosin2 Exon 1 Barley Rice Exon 2 Exon 3 Exon 4 Exon 5 156 86 96 125 156 86 99 125 Exon 2 Exon 3 149 86 95 126 150 86 95 126 Exon 6 Caleosin3 Barley Rice Exon 1a Exon 1b Exon 4 Exon 6 Exon 6 Wheat Group 5 Deletions Homology of Wheat G3 Deletion line mapped ESTs to Rice Chromosomes 12 11 10 9 Rice Chromsomes 8 7 6 5 4 3 2 1 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 Wheat ESTs mapped to Group 3 Deletion lines 96 101 106 111 116 121 126 131 136 141 General Comclusions • EST sequence • • • • May lack polyA Reading frame may be ambiguous Exon/intron boundaries may not be obvious We don’t have all barley genes despite >330,000 ESTS. (probably between 33% to 50%. • Value of comparative studies with rice • BUT poor annotation (actually appalling) • Rice genomic sequencing is work in progress • Comparative route is OK but can’t be only game in town. Several examples of genes not being there !!! Major Issues • Data validation » » » • Comparative Data » » » » » » • Errors in public database sequence Errors in annotation ‘Chinese whispers’ – anchoring annotation in biochemistry Rice > wheat > maize – but also Arabidopsis When is homology actually orthology ? Partial data sets % match only part of the story Need for domain/feature information – mammalian/bacterial bias Everything in work in progress ? Where are the data sources » » » » » » » » » » dbEST Nr nucleotide database at NCBI Gramene at CSHL TIGR GrainGenes/wEST at USDA, Albany CUGI > AGI Iowa State/USDA Harvest/Foxpro ContEST at SCRI The horses mouth Phenotype <-> Sequence • Sd1 – green revolution gene in rice. Mutation in gibberellin20 oxidase (plant hormone production pathway) one member of a small gene family other members have subtely different pattern of expression able to partially compensate for mutation. • Rht1 – green revolution gene in wheat. Mutation in receptor response pathway. Copies in all 3 wheat genomes • Barley - commercially significant dwarfs from both of these and several other pathway or response genes. Acknowledgements • • • • • • • • Robbie Waugh Peter Hedley, David Caldwell, Luke Ramsay, Hui Liu Linda Cardle Paul Shaw Arnise Druker • • • • Doreen Ware Dave Mathews Tim Close Olin Anderson