Some Jolly Fun with Barley ESTs & David Marshall

advertisement
Some Jolly Fun with Barley ESTs
David Marshall
&
All the Folks in Computational Biology
Summary of ESTs – Sep 13, 2002
Top Twelve Plants
Glycine max (soybean)
Hordeum vulgare (barley)
Triticum aestivum (bread wheat)
Zea mays (maize)
Arabidopsis thaliana (thale cress)
Medicago truncatula (barrel medic)
Lycopersicon esculentum (tomato)
Oryza sativa (rice)
Solanum tuberosum (potato)
Sorghum bicolor (sorghum)
Lactuca sativa (lettuce)
Pinus taeda (loblolly pine)
Top Four Non-Plant
Homo sapiens (human)
Mus musculus + domesticus (mouse)
Rattus sp. (rat)
Drosophila melanogaster (fruit fly)
274,840
262,138
205,506
179,431
174,624
170,500
148,346
108,429
94,420
84,712
68,188
60,226
4,664,006
2,691,077
351,864
256,583
BLAST for Recognition of Undesirable Clones
Summary of 84 Barley Libraries (ver. 0.90)
#
High quality sequences
E. coli genome
Lambda genome
rRNA
Chloroplast
Mitochondrion
Fungal cDNA
Repetitive Elements
Low complexity
Odd vector
Both polyA & polyT
Total Good
. %
282,720
507
39
6,075
2,664
204
366
289
1,194
37
28
271,317
0.18
0.01
2.15
0.94
0.07
0.13
0.10
0.42
0.01
0.01
96.0
Unigenes in ESTs in Current Assembly
Ideally: one “unigene” per gene in the genome, expecting ~50,000
based on rice.
Maximum unigene count in ESTs: the sum of the number of contigs
and singletons following assembly:
Contigs
Singletons
Total
24,208
24,899
49,107
Minimum unigene count in ESTs: the sum of the number of contigs
and singletons that have good 3’ ends:
Contigs
Singletons
Total
14,589
7,219
21,880
The Immediate Objective
Microarray Chip
Gene Expression Data
http://www.affymetrix.com/
Barley 2H Caleosins
Barley 2H
77cM
Steptoe x
Morex
<0cM>
Hvcal1
Hvcal2
EST
alignment
EST
alignment
Oscal1
BAC OSJB0004
Oscal2
<
8kb
>
0cM
78.2cM
Rice R4
Gene Map
TIGR Rice Caleosin Gene Models
OSCal01(R4)
OSCal02(R4)
OSCal03(R3)
Comparison of Gene Structures of
Barley and Rice Caleosins
Caleosin1
Exon 1
Exon 2
Exon 3
Exon 4
Exon 5
Exon 6
Barley
Rice
156
86
96
125
156
86
95
126
Caleosin2
Exon 1
Barley
Rice
Exon 2
Exon 3
Exon 4
Exon 5
156
86
96
125
156
86
99
125
Exon 2
Exon 3
149
86
95
126
150
86
95
126
Exon 6
Caleosin3
Barley
Rice
Exon 1a Exon 1b
Exon 4
Exon 6
Exon 6
Wheat Group 5 Deletions
Homology of Wheat G3 Deletion line mapped
ESTs to Rice Chromosomes
12
11
10
9
Rice Chromsomes
8
7
6
5
4
3
2
1
0
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
Wheat ESTs mapped to Group 3 Deletion lines
96
101 106 111 116 121 126 131 136 141
General Comclusions
• EST sequence
•
•
•
•
May lack polyA
Reading frame may be ambiguous
Exon/intron boundaries may not be obvious
We don’t have all barley genes despite >330,000 ESTS. (probably
between 33% to 50%.
• Value of comparative studies with rice
• BUT poor annotation (actually appalling)
• Rice genomic sequencing is work in progress
• Comparative route is OK but can’t be only game in town. Several
examples of genes not being there !!!
Major Issues
•
Data validation
»
»
»
•
Comparative Data
»
»
»
»
»
»
•
Errors in public database sequence
Errors in annotation
‘Chinese whispers’ – anchoring annotation in biochemistry
Rice > wheat > maize – but also Arabidopsis
When is homology actually orthology ?
Partial data sets
% match only part of the story
Need for domain/feature information – mammalian/bacterial bias
Everything in work in progress ?
Where are the data sources
»
»
»
»
»
»
»
»
»
»
dbEST
Nr nucleotide database at NCBI
Gramene at CSHL
TIGR
GrainGenes/wEST at USDA, Albany
CUGI > AGI
Iowa State/USDA
Harvest/Foxpro
ContEST at SCRI
The horses mouth
Phenotype <-> Sequence
• Sd1 – green revolution gene in rice. Mutation in gibberellin20 oxidase (plant hormone production pathway) one
member of a small gene family other members have subtely
different pattern of expression able to partially compensate
for mutation.
• Rht1 – green revolution gene in wheat. Mutation in
receptor response pathway. Copies in all 3 wheat genomes
• Barley - commercially significant dwarfs from both of these
and several other pathway or response genes.
Acknowledgements
•
•
•
•
•
•
•
•
Robbie Waugh
Peter Hedley,
David Caldwell,
Luke Ramsay,
Hui Liu
Linda Cardle
Paul Shaw
Arnise Druker
•
•
•
•
Doreen Ware
Dave Mathews
Tim Close
Olin Anderson
Download