Bioinformatics for Real Genomes: Getting the Plumbing Right David Marshall Scottish Crop Research Institute Where we were Where we are now ! ITMI Co-ordination Large task avoid senseless duplication Validated set of wheat/barley ESTs Mapping in wheat/barley and in silico mapping to rice genomic sequence Unigene sets for microarrays Focus for development and curation on annotation Databases for SAGE tags and/or predicted peptide fragments for proteomics. EST Sequence Assembly Development of Unigene Sets for Wheat/Barley Focus for cDNA or oligo based arrays Common focus for validated sequence annotation Comparative map framework links eSNP discovery SSR discovery eSNP Discovery Programme pZE40 Alignment SNPs in Linkage Disequilibrium Alternative polyA site EST’s Genomic sequence SSR in intron Unspliced intron 2 x 6 nt indel •Exon/intron boundaries conserved with rice •2 x Genbank nr database entries both contained sequencing errors with significant effect on predicted protein sequence Available information 260,000 Barley ESTs in dbEST >BE215812 ctcgtgccgaattcggcacgagctcgtgccgaattcggcacgaggagagagagagagaga gagagagagagagagagagagagagagagagagagagagagagagagagagagagagaga gagagagagagagagagagagagagagagagagagagagagagagagagagagagagaga gagagagagagagagagagagaactagtctcgagggggggcccggtacccac >70% Barley ESTs containing SSRs contain vector/adaptor sequence SNP Genotyping by Pyrosequencing Wheat Group 5 Deletions NSF Wheat Programme Map ~ 10 K wheat ESTs on to Endo & Gills Chinese Spring deletion lines – slow progress Also map onto rice – who/when/in silico ? So far ~750 mapping events on to Group 5 (5A, 5B & 5D) deletion lines of which ~250 involve different ESTS Terminal deletions on 5A, 5B & 5D ~10 ESTs most with good barley contig homology (GSP homologue on top of5A, 5B & 5D) Comparative Approach Blast Barley Virtual Map Based on Rice Barley ESTs Rice Sequence Barley Genetic Map Rice- Barley Synteny How it can inform Barley Genomics/Genetics What is the state/extent of comparative map information between the Triticeae and Rice ? What resources are available in Rice ? • Now • In the near future What other Triticeae mapping information can we exploit ? What is available in Rice ~50% of japonica genome is sequenced • poorly annotated as yet Rice Gene map ~ 6500 ESTs • anchored to the physical map that forms the RGP template Rice indica sequence • As yet only poorly annotated contigs from shotgun sequencing. But good for confirmation or showing missing bits from the RGP sequence Syngenta japonica shotgun sequence • available with conditions Gramene anchors of Triticeae ESTs TIGR and NCBI Unigenes Comparison of Indica vs Japonica rice Rice – Triticeae Synteny Issues Some cases syntenous tracts are well defined – e.g 3H-R1 Other cases information is based on very few RFLPs – e.g. 5H – R11 & R12 Tract ends are not well defined –e.g. R9 on 5H. Breakdown of RFLP synteny - is it always real or due to orthology/paralogy issues ? Microsynteny – every so often something is out to lunch ! Example - 5H Synteny Lot of confidence that Rice 9 forms the central block on 5H. – Less certain of what happens at the end are they there and where do they map in barley Lot of confidence that the bottom of 5HL represents the end of R3. – The rest of R3 corresponds to 4H. The end points/translocation breaks in both species are not well defined The short arm of 5H and how it corresponds to R11 and R12 is not well resolved. – The information is based on very few RFLP markers and the absences of R11 and R12 homologies elsewhere Barley EST consensus homologies to Rice 1R Gene Map So you thought Rice/Barley was complicated ? QTL Analysis for economically relevant traits Gene Expression identify unigene set & previously characterised ESTs • insert array v oligonucleotide array • ‘community array’ v specific arrays (ITMI) • other expression analyses • cDNA AFLPs • in situ (traditional, direct PCR amplification) 2 dpa whole grain • SAGE • RT-PCR/TaqMan 25 dpa embryo 30 dpa embryo 3H Allelic variability at an SSR linked to a disease locus (Bmac29) Yield H. spontaneum Middle East Landraces Cultivated barley Stress Quality Disease Agronomy rym4,rym5 Bmac29 rym4 confers resistance to barley yellow mosaic virus Graphical Genotypes of Foundation and Post 1985 cultivars Novel in 1985s ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 5 3 160 154 # # # # # # # # # # # # # * # # # # # # # # # # # # # * 2 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 2 144 134 126 144 134 144 144 126 144 144 156 134 166 144 5 144 134 126 144 134 144 144 126 144 144 156 134 166 144 126 166 # # # # # # # # # # # # # # # # # 6 161 153 145 153 161 145 161 153 153 157 153 161 145 153 4 161 153 161 153 161 145 161 153 153 157 153 161 145 153 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 3 153 176 196 192 176 194 178 194 176 194 176 176 196 196 176 5 176 196 192 176 194 178 194 176 194 176 176 196 196 176 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 4 239 # * # # # # # # # # # # # # Bmac0030 HVM 3 Hv OLE HVM 62 Bmac0209 Bmac0067 Bmag0006 4H # # # # # # # # # # # # # # # # # # # ** # # # # # # # # # 3 4 171 0 # ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 4 ## ## ## ## ## ## ## ## ## ## ## ## ## * * ## ## ## ## 4 5 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # ## ## * ## ## ## ## ## ## ## ## ## ## ## ## ## ## 8 ## ## * ## ## ## ## ## ## ## ## ## ## ## ## ## ## 5 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 2 6H ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 2 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 3 135 ## ## ## ## ## ## ## ## ## ## ## ## ## ## # # # # # # # # # # # # # * # # # 9 7H ## ## ## ## ## ## ## ## ## ## ## ## ## ## 2 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 4 Bmac0156 Bmac 0273 HvCMA HVM4 Bmac0218 Bmag0009 Bmac0018 Bmac0316 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 5H # # # ## ## * # # ## ## # # # ## ## # # # ## ## # # # ## ## # # # ## ## # # # ## ## # # # ## ## # # # ## ## # # # ## ## # **# # ## ## # # # ## ## # # # ## ## # # # ## ## 2 3 3 2 # # ## ## ** ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 6 2 # * # # # # # # # # # # # # 2 # * # # # # # # # # # # # # 2 ## ## ## ## * ## * ## ## ## * * ## ## ## ## ## ## * ## * ## ## ## * * ## ## 2 # # # # # # # # # # # # # # # # # # # # # # # # # # # # 1 ## ## ## ## ## ## ## ## * * * * ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 3 Bmac0156 # # # # # # # # # # # # # # # # # HvLEU Bmag0005 Bmac 0113 Bmac0096 Bmac0181 Bmac0030 HVM 3 Hvm 67 ## ## 84 ## ## ## ## ## * * ## ## ## ## ## ## ## ## ## ## 84 84 ## ## ## ## ## ## ## ## ## ## ## ## 4 Bmac 0273 3 # # # # # # # # # # # # # ** # # # # # # # # # # # # # # # # # # # # # # # # # # # ** # # # # # # # # # * * 3 6 4 HvCMA # # # # # # # # # # # # # # # # # HVM4 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Bmac0218 6 Hv OLE HVM 62 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 7H Bmag0009 7 176 194 176 176 186 176 194 192 186 176 196 176 176 186 186 178 186 Bmac0018 176 194 176 176 186 176 194 192 186 176 196 176 176 186 186 178 186 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Bmac0316 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Bmac0209 Bmag0006 Bmac0067 161 157 161 157 161 161 149 161 161 161 161 149 145 147 157 161 161 HVM 54 Bmac0134 HVA1 HVM 20 Bmac 0032 Cultiva rs Alexis Chad Chariot Cooper Dandy Derkado Hart Livet Optic Prisma Tyne Riviera Tankard Landlord No. of alleles 161 157 161 157 161 161 149 161 161 161 161 149 145 147 157 161 161 5 2H Bmac 0213 Post 1985 144 168 134 156 144 134 134 144 134 134 144 142 134 144 138 144 144 6H HvLEU 2 144 168 134 156 144 134 134 144 134 134 144 142 134 144 138 144 144 6 Bmag0005 Bmac 0113 6 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## HVM 54 Bmac0134 HVA1 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Bmac0096 # # # # # # # # # # # # # # # # # 5H Hvm 67 # # # # # # # # # # # # # # # # # 4H Bmac0181 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## * * ## ## ## ## ## ## ## ## ## ## ## ## 7 5 HVM 20 Bmac 0032 C ultivars Hanna Intensiv Gotlands Binder Vollkorn. Opal I. Archer Kenia Bavaria Haisa Agio Krim mesni Delta Tern Monte Cristo Arabische Lyallpur No. of alleles 2H Bmac 0213 Foundation # # # # # # # # # # # # # # 4 1 M0 seed Mutagenesis M1 Plants under glasshouse Mutant Database Field Plots of M2 Plants Mutation Scanning CCM, EMC, or HA Mutation Scanning Results Verified Mutations sites Loci Scanned Phenotypes Scored M2 family Genomic DNA Isolation & M3 Seed Harvest http://www.fccc.edu/research/labs/yeung/page7.html Sequence Verification Delivery of Mutants Subset of M3 seeds Mapped Mutation Primers designed for screen Annotation of Databas Issues for Real Genomes How good are the model organisms? Is your gene/phenotype actually in the model organism ? With sparse data sets when do you do the analysis ? If you do an analysis how do you store the workflow and propagate changes and notify the results ? How often do you re-run your workflow ? How good is the data on which your informatics is based ? Just because someone says two things are the same – are they ? When you rely on comparative links how do you prevent Chinese whisper problems ? Some of the things we do in REAL plant species Protein targeting libraries Proteomics Metabolomics Modelling of Flux through Metabolic Pathways Alternative mapping strategies for Plants • Happy Mapping • Radiation Hybrids BAC & YAC libraries, targeted genomics sequencing Activation tags, promoter traps, VIGS Mutation grid, transposon mutagenesis Microarrays, Sage Phenotyping up to field and brewery/distilling stages EST data sets are central focus for informatics in many crop species (up 400K for barley, 600K for wheat) Collaborative Activities IGF participants SEERAD Universities of Dundee & Abertay – SIMBIOS/FATMAN bioinformatics ITMI Bioinformatics Waite Institute, University Adelaide GrainGenes, Albany & Cornell Gramene, Cornell and CSHL Genome Atlantic Ag Canada Saskatoon