Some Jolly Fun with Barley ESTs David Marshall &

advertisement
Some Jolly Fun with Barley
ESTs
David Marshall
&
All the Folks in Computational Biology
Summary of ESTs – Sep 13, 2002
Top Twelve Plants
Glycine max (soybean)
Hordeum vulgare (barley)
Triticum aestivum (bread wheat)
Zea mays (maize)
Arabidopsis thaliana (thale cress)
Medicago truncatula (barrel medic)
Lycopersicon esculentum (tomato)
Oryza sativa (rice)
Solanum tuberosum (potato)
Sorghum bicolor (sorghum)
Lactuca sativa (lettuce)
Pinus taeda (loblolly pine)
Top Four Non-Plant
Homo sapiens (human)
Mus musculus + domesticus (mouse)
Rattus sp. (rat)
Drosophila melanogaster (fruit fly)
274,840
262,138
205,506
179,431
174,624
170,500
148,346
108,429
94,420
84,712
68,188
60,226
4,664,006
2,691,077
351,864
256,583
BLAST for Recognition of Undesirable Clones
Summary of 84 Barley Libraries (ver. 0.90)
#
High quality sequences
E. coli genome
Lambda genome
rRNA
Chloroplast
Mitochondrion
Fungal cDNA
Repetitive Elements
Low complexity
Odd vector
Both polyA & polyT
282,720
507
39
6,075
2,664
204
366
289
1,194
37
28
Total Good
271,317
. %
0.18
0.01
2.15
0.94
0.07
0.13
0.10
0.42
0.01
0.01
96.0
Unigenes in ESTs in Current
Assembly
Ideally: one “unigene” per gene in the genome, expecting
~50,000 based on rice.
Maximum unigene count in ESTs: the sum of the number of
contigs and singletons following assembly:
Contigs
Singletons
Total
24,208
24,899
49,107
Minimum unigene count in ESTs: the sum of the number of
contigs and singletons that have good 3’ ends:
Contigs
Singletons
Total
14,589
7,219
21,880
The Immediate Objective
Microarray Chip
Gene Expression Data
http://www.affymetrix.com/
Barley 2H Caleosins
Barley 2H
77cM
Steptoe x
Morex
<0cM>
Hvcal1
Hvcal2
EST
alignment
EST
alignment
Oscal1
BAC OSJB0004
Oscal2
<
8kb
>
0cM
78.2cM
Rice R4
Gene Map
TIGR Rice Caleosin Gene
Models
OSCal01(R4)
OSCal02(R4)
OSCal03(R3)
Comparison of Gene Structures
of Barley and Rice Caleosins
Caleosin1
Exon 1
Barley
Rice
Exon 2
Exon 3
Exon 4
Exon 5
156
86
96
125
156
86
95
126
Exon 6
Caleosin2
Exon 1
Barley
Rice
Exon 2
Exon 3
Exon 4
Exon 5
156
86
96
125
156
86
99
125
Exon 2
Exon 3
149
86
95
126
150
86
95
126
Exon 6
Caleosin3
Barley
Rice
Exon 1a Exon 1b
Exon 4
Exon 6
Exon 6
Wheat Group 5 Deletions
Homology of Wheat G3 Deletion line
mapped ESTs to Rice Chromosomes
12
11
10
9
Rice Chromsomes
8
7
6
5
4
3
2
1
0
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
Wheat ESTs mapped to Group 3 Deletion lines
96
101 106 111 116 121 126 131 136 141
General Comclusions
• EST sequence
•
•
•
•
May lack polyA
Reading frame may be ambiguous
Exon/intron boundaries may not be obvious
We don’t have all barley genes despite >330,000 ESTS.
(probably between 33% to 50%.
• Value of comparative studies with rice
• BUT poor annotation (actually appalling)
• Rice genomic sequencing is work in progress
• Comparative route is OK but can’t be only game in town.
Several examples of genes not being there !!!
Major Issues
•
Data validation
»
»
»
•
Comparative Data
»
»
»
»
»
»
•
Errors in public database sequence
Errors in annotation
‘Chinese whispers’ – anchoring annotation in biochemistry
Rice > wheat > maize – but also Arabidopsis
When is homology actually orthology ?
Partial data sets
% match only part of the story
Need for domain/feature information – mammalian/bacterial bias
Everything in work in progress ?
Where are the data sources
»
»
»
»
»
»
»
»
»
»
dbEST
Nr nucleotide database at NCBI
Gramene at CSHL
TIGR
GrainGenes/wEST at USDA, Albany
CUGI > AGI
Iowa State/USDA
Harvest/Foxpro
ContEST at SCRI
The horses mouth
Phenotype <-> Sequence
• Sd1 – green revolution gene in rice. Mutation in
gibberellin-20 oxidase (plant hormone production
pathway) one member of a small gene family other
members have subtely different pattern of expression
able to partially compensate for mutation.
• Rht1 – green revolution gene in wheat. Mutation in
receptor response pathway. Copies in all 3 wheat
genomes
• Barley - commercially significant dwarfs from both of
these and several other pathway or response genes.
Acknowledgements
•
•
•
•
•
•
•
•
Robbie Waugh
Peter Hedley,
David Caldwell,
Luke Ramsay,
Hui Liu
Linda Cardle
Paul Shaw
Arnise Druker
•
•
•
•
Doreen Ware
Dave Mathews
Tim Close
Olin Anderson
Download