Bioinformatics for Real Genomes: Getting the Plumbing Right David Marshall

advertisement
Bioinformatics for Real Genomes:
Getting the Plumbing Right
David Marshall
Scottish Crop Research Institute
Where we were
Where we are now !
ITMI Co-ordination
Large task avoid senseless duplication
Validated set of wheat/barley ESTs
Mapping in wheat/barley and in silico
mapping to rice genomic sequence
Unigene sets for microarrays
Focus for development and curation on
annotation
Databases for SAGE tags and/or predicted
peptide fragments for proteomics.
EST Sequence Assembly
Development of Unigene Sets for Wheat/Barley
Focus for cDNA or oligo based arrays
Common focus for validated sequence
annotation
Comparative map framework links
eSNP discovery
SSR discovery
eSNP Discovery Programme
pZE40 Alignment
SNPs in Linkage Disequilibrium
Alternative polyA site
EST’s
Genomic
sequence
SSR in intron
Unspliced intron
2 x 6 nt indel
•Exon/intron boundaries conserved with rice
•2 x Genbank nr database entries both contained sequencing errors with
significant effect on predicted protein sequence
Available information
260,000 Barley ESTs in dbEST
>BE215812
ctcgtgccgaattcggcacgagctcgtgccgaattcggcacgaggagagagagagagaga
gagagagagagagagagagagagagagagagagagagagagagagagagagagagagaga
gagagagagagagagagagagagagagagagagagagagagagagagagagagagagaga
gagagagagagagagagagagaactagtctcgagggggggcccggtacccac
>70% Barley ESTs containing SSRs
contain vector/adaptor sequence
SNP Genotyping by
Pyrosequencing
Wheat Group 5 Deletions
NSF Wheat Programme
Map ~ 10 K wheat ESTs on to Endo & Gills
Chinese Spring deletion lines – slow progress
Also map onto rice – who/when/in silico ?
So far ~750 mapping events on to Group 5 (5A,
5B & 5D) deletion lines of which ~250 involve
different ESTS
Terminal deletions on 5A, 5B & 5D ~10 ESTs
most with good barley contig homology (GSP
homologue on top of5A, 5B & 5D)
Comparative Approach
Blast
Barley Virtual
Map Based on
Rice
Barley
ESTs
Rice
Sequence
Barley
Genetic
Map
Rice- Barley Synteny
How it can inform Barley Genomics/Genetics
What is the state/extent of comparative map information
between the Triticeae and Rice ?
What resources are available in Rice ?
• Now
• In the near future
What other Triticeae mapping information can we exploit
?
What is available in Rice
~50% of japonica genome is sequenced
• poorly annotated as yet
Rice Gene map ~ 6500 ESTs
• anchored to the physical map that forms the RGP template
Rice indica sequence
• As yet only poorly annotated contigs from shotgun
sequencing. But good for confirmation or showing missing
bits from the RGP sequence
Syngenta japonica shotgun sequence
• available with conditions
Gramene anchors of Triticeae ESTs
TIGR and NCBI Unigenes
Comparison of Indica vs Japonica
rice
Rice – Triticeae Synteny Issues
Some cases syntenous tracts are well defined – e.g 3H-R1
Other cases information is based on very few RFLPs – e.g. 5H –
R11 & R12
Tract ends are not well defined –e.g. R9 on 5H.
Breakdown of RFLP synteny - is it always real or due to
orthology/paralogy issues ?
Microsynteny
– every so often something is out to lunch !
Example - 5H Synteny
Lot of confidence that Rice 9 forms the central block on 5H.
– Less certain of what happens at the end are they there and where do
they map in barley
Lot of confidence that the bottom of 5HL represents the end of R3.
– The rest of R3 corresponds to 4H. The end points/translocation breaks
in both species are not well defined
The short arm of 5H and how it corresponds to R11 and R12 is not
well resolved.
– The information is based on very few RFLP markers and the absences
of R11 and R12 homologies elsewhere
Barley EST consensus homologies
to Rice 1R Gene Map
So you thought Rice/Barley was
complicated ?
QTL Analysis for economically
relevant traits
Gene Expression
identify unigene set & previously characterised ESTs
• insert array v oligonucleotide array
• ‘community array’ v specific arrays (ITMI)
• other expression analyses
• cDNA AFLPs
• in situ (traditional, direct PCR amplification)
2 dpa whole grain
• SAGE
• RT-PCR/TaqMan
25 dpa embryo
30 dpa embryo
3H
Allelic variability at an SSR linked to a disease
locus (Bmac29)
Yield
H. spontaneum
Middle East Landraces Cultivated barley
Stress
Quality
Disease
Agronomy
rym4,rym5
Bmac29
rym4 confers resistance to barley yellow mosaic virus
Graphical Genotypes of Foundation and Post 1985 cultivars
Novel in 1985s
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
5
3
160
154
#
#
#
#
#
#
#
#
#
#
#
#
#
*
#
#
#
#
#
#
#
#
#
#
#
#
#
*
2
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
2
144
134
126
144
134
144
144
126
144
144
156
134
166
144
5
144
134
126
144
134
144
144
126
144
144
156
134
166
144
126
166
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
6
161
153
145
153
161
145
161
153
153
157
153
161
145
153
4
161
153
161
153
161
145
161
153
153
157
153
161
145
153
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
3
153
176
196
192
176
194
178
194
176
194
176
176
196
196
176
5
176
196
192
176
194
178
194
176
194
176
176
196
196
176
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
4
239
#
*
#
#
#
#
#
#
#
#
#
#
#
#
Bmac0030
HVM 3
Hv OLE
HVM 62
Bmac0209
Bmac0067
Bmag0006
4H
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# **
#
#
#
#
#
#
#
#
#
3
4
171
0
#
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
4
##
##
##
##
##
##
##
##
##
##
##
##
## * *
##
##
##
##
4
5
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
##
##
*
##
##
##
##
##
##
##
##
##
##
##
##
##
##
8
##
##
*
##
##
##
##
##
##
##
##
##
##
##
##
##
##
5
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
2
6H
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
2
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
3
135
##
##
##
##
##
##
##
##
##
##
##
##
##
##
#
#
#
#
#
#
#
#
#
#
#
#
#
*
#
#
#
9
7H
##
##
##
##
##
##
##
##
##
##
##
##
##
##
2
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
4
Bmac0156
Bmac 0273
HvCMA
HVM4
Bmac0218
Bmag0009
Bmac0018
Bmac0316
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
5H
#
# # ## ##
*
# # ## ##
#
# # ## ##
#
# # ## ##
#
# # ## ##
#
# # ## ##
#
# # ## ##
#
# # ## ##
#
# # ## ##
#
# # ## ##
#
**# # ## ##
#
# # ## ##
#
# # ## ##
#
# # ## ##
2
3
3
2
#
#
## ##
** ## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
6
2
#
*
#
#
#
#
#
#
#
#
#
#
#
#
2
#
*
#
#
#
#
#
#
#
#
#
#
#
#
2
##
##
##
##
*
##
*
##
##
##
*
*
##
##
##
##
##
##
*
##
*
##
##
##
*
*
##
##
2
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
1
## ##
## ##
## ##
## ##
* *
* *
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
3
Bmac0156
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
HvLEU
Bmag0005
Bmac 0113
Bmac0096
Bmac0181
Bmac0030
HVM 3
Hvm 67
## ##
84 ##
## ##
## ##
* *
## ##
## ##
## ##
## ##
## ##
84 84
## ##
## ##
## ##
## ##
## ##
## ##
4
Bmac 0273
3
#
# #
#
# #
#
# #
#
# #
# ** # #
#
# #
#
# #
#
# #
#
# #
#
# #
#
# #
#
# #
#
# #
# ** # #
#
# #
#
# #
#
* *
3
6
4
HvCMA
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
HVM4
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Bmac0218
6
Hv OLE
HVM 62
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
7H
Bmag0009
7
176
194
176
176
186
176
194
192
186
176
196
176
176
186
186
178
186
Bmac0018
176
194
176
176
186
176
194
192
186
176
196
176
176
186
186
178
186
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Bmac0316
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Bmac0209
Bmag0006
Bmac0067
161
157
161
157
161
161
149
161
161
161
161
149
145
147
157
161
161
HVM 54
Bmac0134
HVA1
HVM 20
Bmac 0032
Cultiva rs
Alexis
Chad
Chariot
Cooper
Dandy
Derkado
Hart
Livet
Optic
Prisma
Tyne
Riviera
Tankard
Landlord
No. of alleles
161
157
161
157
161
161
149
161
161
161
161
149
145
147
157
161
161
5
2H
Bmac 0213
Post 1985
144
168
134
156
144
134
134
144
134
134
144
142
134
144
138
144
144
6H
HvLEU
2
144
168
134
156
144
134
134
144
134
134
144
142
134
144
138
144
144
6
Bmag0005
Bmac 0113
6
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
HVM 54
Bmac0134
HVA1
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Bmac0096
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
5H
Hvm 67
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
4H
Bmac0181
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ##
## ## * *
## ##
## ##
## ##
## ##
## ##
## ##
7
5
HVM 20
Bmac 0032
C ultivars
Hanna
Intensiv
Gotlands
Binder
Vollkorn.
Opal
I. Archer
Kenia
Bavaria
Haisa
Agio
Krim mesni
Delta
Tern
Monte Cristo
Arabische
Lyallpur
No. of alleles
2H
Bmac 0213
Foundation
#
#
#
#
#
#
#
#
#
#
#
#
#
#
4
1
M0 seed
Mutagenesis
M1 Plants under glasshouse
Mutant Database
Field Plots of M2 Plants
Mutation Scanning
CCM, EMC, or HA
Mutation Scanning Results
Verified Mutations sites
Loci Scanned
Phenotypes Scored
M2 family
Genomic DNA
Isolation
&
M3
Seed Harvest
http://www.fccc.edu/research/labs/yeung/page7.html
Sequence Verification
Delivery of Mutants
Subset of M3 seeds
Mapped Mutation
Primers designed for screen
Annotation of Databas
Issues for Real Genomes
How good are the model organisms?
Is your gene/phenotype actually in the model organism ?
With sparse data sets when do you do the analysis ?
If you do an analysis how do you store the workflow and
propagate changes and notify the results ?
How often do you re-run your workflow ?
How good is the data on which your informatics is based
?
Just because someone says two things are the same –
are they ?
When you rely on comparative links how do you prevent
Chinese whisper problems ?
Some of the things we do
in REAL plant species
Protein targeting libraries
Proteomics
Metabolomics
Modelling of Flux through Metabolic Pathways
Alternative mapping strategies for Plants
• Happy Mapping
• Radiation Hybrids
BAC & YAC libraries, targeted genomics sequencing
Activation tags, promoter traps, VIGS
Mutation grid, transposon mutagenesis
Microarrays, Sage
Phenotyping up to field and brewery/distilling stages
EST data sets are central focus for informatics in many crop species
(up 400K for barley, 600K for wheat)
Collaborative Activities
IGF participants
SEERAD
Universities of Dundee & Abertay
– SIMBIOS/FATMAN bioinformatics
ITMI Bioinformatics
Waite Institute, University Adelaide
GrainGenes, Albany & Cornell
Gramene, Cornell and CSHL
Genome Atlantic
Ag Canada Saskatoon
Download