Gramene Scientific Advisory Board December 14, 2010 Gramene SAB 2010 1 Introduction of SAB Members • David Marshall (SCRI) • Paul Flicek (EBI) • Michael Ashburner (Cambridge) • Anna M McClung (USDA-ARS) • Patricia Klein (Texas A&M) • William Beavis (Iowa State) • Tim Nelson (Yale) • Georgia Davis (Missouri) Gramene SAB 2010 2 Introduction of Gramene • • • • • • • • • • • • • • • • • Doreen Ware (CSHL, PI) Susan McCouch (Cornell, PI) Pankaj Jaiswal (OSU, PI) Ed Buckler (Cornell, PI) Vindhya Amarasinghe (OSU, Pathways) Karthikeyan Athikkattuvalasu (Cornell, Diversity, Phenotypes) Terry Casstevens (Cornell, Diversity) Charles Chen (Cornell, Diversity) Aaron Chuah (CSHL, Diversity) Genevieve DeClerck (Cornell, Diversity) Palitha Dharmawardhana (OSU, Pathways) Marcela Monaco (CSHL, Pathways) Will Spooner (CSHL, Genomes) Joshua Stein (CSHL, Genomes) Jim Thomason (CSHL, Germplasm, Website, Pathways, Genes) Sharon Wei (CSHL, Genomes) Ken Youens-Clark (CSHL, Project Manager, etc.) Gramene SAB 2010 3 Aim 1: Genomes Doreen Ware, PI Sharon Wei, Will Spooner, Ken Youens-Clark, Jim Thomason, Marcela Monaco, Josh Stein, (Total Full Time Equivalent [FTE] 3.5) Note: hired 25% FTE (Josh) to replace Noel Yap who left the project in the Cornell Group 1.5 FTE available from Ware, Dvorak NSF collaborations Gramene SAB 2010 4 Suggestions From Last Year • Add Brachypodium – Added in Release 29 • Add a basal plant, e.g. Selaginella – We chose Physcomitrella patens because it was better documented at the time (GB record and published) – Selaginella now has GB record and will be investigated for 2011 • Add a Solanacea and/or Legume – We are adding tomato in 2011 and are looking into either soybean or Medicago • Display RNAseq data – We now have the ability to display as DAS track (see maizesequence.org) – Need to investigate data sources Gramene SAB 2010 5 Highlights in 2010 • Genomes: 3 new; many updates • Software: Ensembl 59 provides new visualizations – SNP view – SNP Mart – Multi-species view – Multi-sequence alignment • New Analyses – Gene-centered synteny build – EPO multi-sequence alignment – Split-gene detection • New Development – GERP Conservation (Sharon) – GWAS views (Aaron, NSF 2010 collaboration) – Tandem arrays (Josh, Will) Gramene SAB 2010 6 17 Genomes in Release 32 • • • • Physcomitrella (moss): Basal land plant Updated assemblies of grapevine & poplar Updated annotations of Indica rice & Arabidopsis Updated assemblies & annotations of Oryza chr 3S projects Species Physcomitrella patens Oryza nivara (AA) 3S New Oryza rufipogon (AA) 3S Oryza sativa ssp. indica Brachypodium distachyon Vitis vinifera Populus trichocarpa Arabidopsis thaliana Updated Oryza brachyantha (FF) 3S Oryza glaberrima (AA) 3S Oryza officinalis (CC) 3S Oryza punctata (BB) 3S Oryza minuta (BBCC) 3S Oryza barthii (AA) 3S Oryza sativa ssp. Japonica Unchange Arabidopsis lyrata d Sorghum bicolor Assembly Annotation v1.1 nivara_454_AGP July 2010 rufipogon_454_AGP July 2010 BGI-2005 Brachy1.0 IGGP_12X JGI 2.0 TAIR10 brachyantha_454_AGP July 2010 BAC_Sanger_2009, Sep 2009 Officinalis_3S Sep 2009 Punctata_3S Sep 2009 Minuta_CC_3S Sep 2009 BAC_pool_2008 MSU 6.0 Araly1.2 Sbi1 v1.1 CSHL_v1.1 CSHL_v1.1 BGI GLEAN 2008 Brachy1.2 Genoscope 2010 JGI 2.0 TAIR10 CSHL_v1.1 CSHL_v2.1 CSHL_v2.1 CSHL_v2.1 CSHL_v2.1 CSHL_v2.1 MSU 6.0 Araly1.2 Sbi1.4 Gramene SAB 2010 7 Genome Plans 2011: Planning : • Lycopersicon esculentum (tomato) • Oryza glabberima (African domesticated rice) • Oryza brachyantha (wild rice) • Aegilops tauschii (wheat D, NSF #0701916) Investigating: • Selaginella moellendorffii (basal vascular plant) • Triticum aestivum (hexaploid wheat) • Malus x domestica (apple) • Glycine max (soybean) or Medicago Gramene SAB 2010 8 Collaborations Genomes – – – – – – – – – – – – – NSF PGI #0638820 PI Wing end 2009 (wild rice OMAP) USDA ARS Grape end 2009 NSF PGI PI Buckler end 2009 NSF 2010 #0723510 PI Nordborg end 2011 (Arabidopsis thaliana, A. Lyrata, Capesella) NSF #0701916 PGI PI Dvorak end 2011 (wheat) NSF PGI PI Wilson end 2010 (maize) NSF PGI PI #0723510 Scanlon end 2012 (maize) NSF PGI PI Springer to start this year (maize) NSF PGI PI Wing end 2011 (wild rice OGE) NSF PGI #1032105 PI McCombie end 2012 (wheat) EBI BBRSC Paul Kersey (travel for coordination participants) NSF PGI PI McCouch end 2014 (rice) NSF XXX Iplant Steve Goff New Maps and Markers New maps in last year: •Sorghum genetic (Mace) •Barley genetic (Close) •Ae. tauschii genetic (Dvorak) •Switchgrass genetic (Tobias) Gramene SAB 2010 10 More genomes in CMap Added two more fully sequenced genomes to CMap with seq/seq comparisons based on orthology (build 32). Gramene SAB 2010 11 New SNP View Shows functional consequences of polymorphism New in Ensembl 56 • Synonymous coding • Non-synonymous coding • Stop gain/loss • Splice site • UTR • Intronic Rice Maize 160,000 SNPs x 21 varieties (incl. Nipponbare ref.) from OryzaSNP, MSU6 1.6 million SNPs x 27 NAM founder lines from Panzea, AGPv1 2010 Project SNP Discovery: 637,522 SNPs x 21 ecotypes (incl. Col-0 ref.), TAIR9 Arabidopsis 2010 Project 250K SNP chip genotypes v3.04, 214,000 SNPs x 1179 ecotypes, TAIR9 1001 Genomes/WTCHG SNPs from dbSNP, 2.7 million SNPs, 17 ecotypes, TAIR9 Grape 71K SNPs (Myles et al.) Gramene SAB 2010 12 SNP BioMart Available for rice japonica, rice indica, Arabidopsis & grape datasets Configure output fields and format (XLS, CSV, TSV, or HTML) If HTML, link to Variation, Gene, or Browser Pages Filter on region, phenotype, strains, id, & consequence (e.g. introduced STOP codon), and other attributes Gramene SAB 2010 13 Whole Genome Alignments BLASTZ-CHAIN-NET between 20 pairs of species Alignment (Release) Oryza sativa Japonica Oryza sativa Indica Sorghum bicolor Brachypodium distachyon Arabidopsis thaliana Arabidopsis lyrata Vitis vinifera Poplar trichocarpa Oryza glaberrima 3s Oryza minuta CC 3s Oryza officinalis 3s Oryza punctata 3s Physcomitrella patens Schwartz S et al., Genome Res.;13(1):103-7 Kent WJ et al., Proc Natl Acad Sci U S A., 2003;100(20):11484-9 O.jap 31 O.ind 31 S.bic 31 31 31 31 31 31 31 31 31 31 31 31 32 - B.dis 31 - A.tha 31 31 31 32 New & improved alignment viewer (Ensembl 56) Gramene SAB 2010 14 Multispecies View Re-introduced in Ensembl 56 • Stack any number of genomes aligned to a common reference by BLASTZ • Browse & zoom along any genome independently Gramene SAB 2010 15 Automated Detection of Split Genes Special class of “paralog” since Ensembl 58 Contiguous split paralog: Non-overlapping, nearby (<1 Mb), same strand Putative split paralog: Non-overlapping, different regions (e.g. scaffolds) Genome alignment confirms inconsistent annotation Species Split Genes Populus trichocarpa 1181 Sorghum bicolor 1087 Oryza sativa Japonica 916 Vitis vinifera 520 Oryza sativa Indica 365 Zea mays 280 Arabidopsis lyrata 202 Arabidopsis thaliana 137 Brachypodium distachyon 101 Gramene SAB 2010 16 Gene-Centered Synteny Build 2010: Implemented with automated pipeline runnables • Release 31: monocots • Release 32: dicots Compara Orthologs Collinear mappings (DAGchainer) “in-range” mappings near collinear anchors Oryza sativa Japonica Map O.jap Brachypodium distachyon YES B.dis Sorghum bicolor YES YES S.bic Arabidopsis thaliana - - - A.tha Arabidopsis lyrata - - - YES A.lyr Vitis vinifera - - - YES YES V.vin Poplar trichocarpa - - - YES YES YES Gramene SAB 2010 P.tri 17 Grape Reference Highlights Duplicated Regions in Arabidopsis and Poplar • Polyploid and segmental duplications manifest as cosyntenic regions • SyntenyView links to browser: Thus users can easily navigate between duplicated regions Gramene SAB 2010 18 EPO Multiple Alignment & Ancestor Reconstruction • Gramene implementation in 2010 • Release 32: 8-way EPO alignment – Rice japonica, indica, Brachypodium, sorghum, Arabidopsis, A. lyrata, grape, poplar Paten et al (2008) Genome Research 18:1814 Paten et al (2008) Genome Research 18:1829 2010 Genomes Development: Constrained Elements • Genomic Evolutionary Rate Profiling (GERP): measures purifying selection • Method testing using 4-way and 8-way EPO alignments as input with varying parameters • Input tree generated from 1301 ortholog sets • Planning release in 2011 Cooper et al (2005) Genome Research 15:901 Gramene SAB 2010 20 2010 Genomes Development Gramene SAB 2010 21 Tandem Duplicate Detection Species Rice japonica Sorghum Maize Arabidopsis Clusters Genes Largest Function 2519 7054 24 phytosulfokine receptor-like (LRR-kinase receptor) 2182 5927 19 Chalcone-stilbene synthase like 1871 4564 22 DUF1754 (domain of unknown function) 1738 4581 28 ECA1 gametogenesis related family • Adjacent paralogs with no more than 2 intervening unrelated gene • Increase gene dosage • Diversifying selection • Often species-specific LRR-Kinase species-specific expansions LRR-Kinase cluster in rice Gramene SAB 2010 22 Collaboration with Ensembl Genomes • • • • • Share conference calls Developers meeting (Hinxton, UK, Sept. 2010) Co-authored papers/posters Two releases Ensembl Developer’s Workshop Gramene SAB 2010 23 Website Improvements • Home facelift: quick entrypoints • Migrated to Apache 2.0 in Release 31 REST Interfaces New RESTful interface for site gives greater user control over data views and format Gramene SAB 2010 25 New Oryza Pages • Highlights this genus with images, phylogeny, geographic origin, & traits of interest • Entry points to browsers, germplasm, markers, & taxonomy ontology Gramene SAB 2010 26 Web Services • Distributed Annotation Server (DAS) serving Ensembl genes as well as Gramene markers, sequences, and QTL • Gramene Mart integration with Galaxy • Public MySQL server • Diversity data via Tassel and GDPC • Subversion for code access Gramene SAB 2010 27 Browser Development 2011 Plans • Communicate/distinguish gene-confidence information – 28% of MSU6 rice genes are annotated as “TE_related” and 17% are in poorly-conserved “hypothetical” class – 20% Sorghum genes are “low-confidence” (TE, pseudogenes, etc) – Color-code or display in separate tracks in browser – Color-code in gene-tree display • List/Display detailed gene-level synteny information – Explicitly list syntenic genes from Gene Page – Indicate that a gene is syntenic to one or more genes of a different species within the browser (e.g. color-code or synteny track) • List co-syntenic genes – 2 genes (in separate blocks) having synteny to a common gene in another species arose from a large scale duplication event (e.g. polyploidy or segmental). • Tandem Array track – Indicate clusters of paralogous genes within browser • [Challenges of low-depth or highly fragmented genomes, e.g. wheat & Physcomitrella] Gramene SAB 2010 28 2010 Ongoing Development Work • miRNA pipeline runnable – Refine and automate steps in miRNA annotation – Vmatch alignment – mfold RNA secondary structure prediction – Filter based on secondary structure • Gene-Build with RNAseq evidence data – First pilot experiments performed Gramene SAB 2010 29 Questions for the SAB? • Nominate genomes • New data types e.g. RNAseq data available for current genomes that we may not be aware of • Any physical aspects of web site needing improvement Gramene SAB 2010 30 Aim 2: Pathways Pankaj Jaiswal, PI Palitha Dharmawardhana, Jim Thomason, Vindhya Amarasinghe, Liya Ren, AS Karthikeyan, Marcela Monaco Note: Liya left the project this year and has been replaced by Marcela. Gramene SAB 2010 31 Aim#2 Plan (2009-2010 / Year-3) • Continue curating Rice and Sorghum Pathways • Release MaizeCyc and BrachyCyc • Add all available microarray probesets to MarkerDb and allow OMICS viewer to validate • Develop Reactome database for (Rice) • Update the gene database schema to structure the allele based annotations on function, phenotype and interactions. • Maintain and Develop Ontologies 32 Added BrachyCyc, MaizeCyc Updated Pathway tools twice to latest versions. Updated the individual pathway databases twice to be consistent with the Pathway tools version Rice Pathways curated by addition of hydroxycinnamic acid and serotonin biosynthetic pathways, updates to auxin biosynthesis, tryptophan biosynthesis. Addition of 80 transport reactions and 477 transporters Gramene SAB 2010 33 Suggestions from last SAB Concerns on supporting three technologies: Cyc, Reactome, WikiPathways. Suggested moving to Reactome and allow the Cyc and WikiPathway databases to be populated by automated exports using BioPax. Gramene SAB 2010 34 Reactome Database Build • Reactome: – Rice • Start with RiceCyc import and build on the existing Enselmbl and Curated Genedb resources – Arabidopsis • After consulting with the Reactome project and the Arabidopsis Reactome group, this will become part of the renewal effort. The work on it will start with integrating it in the Reactome central database from its current location in JIC (www.arabidopsis reactome.org) , followed by active curation. • Active curation will be primarily done in collaboration with Nick Provart’s group at Univ. of Toronto. • This is a new International Collaboration – Plan is to integrate the plant specific Reactome database instances in the Reactome central database, but provide a modified user interface for users. Gramene SAB 2010 35 Rice Reactome • Initial build of the Rice Reactome started by importing the complete (curated and predicted) RiceCyc data in BioPax level-2 format. • A test-v2 Rice Reactome is available from this link. – The Reactome tools with some tweaking successfully imported 375 pathways and the children reactions – Efforts are now on to integrate the mappings to • ChEBI, Ligand and PubChem for compounds/metabolites • KEGG for EC enzymes • Uniprot – Drawing the network diagrams requiring manual curation. • Priority is to draw networks for fully curated Rice Pathways by using the Reactome tools – Integrate predicted models of regulatory pathways for rice based on the reference pathway projections for cell cycle, transcription, translation etc. – Curate test case rice pathways • Organized a week long workshop attended by curators from Gramene and BAR-Univ. of Toronto (Nick Provart’s group) • Mentored by Reactome co-PI Peter D’Eustachio • A test case of ABA metabolism and signaling was curated, which contained both the molecular and genetic interaction datasets. Gramene SAB 2010 36 ABA metabolism and signaling pathway Klinger et al J. Exp. Bot. (2010) 61 (12): 3199-3210. Reactome model: A prototype reaction network, ABA-mediated transcriptional regulation, was laid out using material from Nambara & Marion-Poll (2005 – PMID: 15862093) to supplement the pathways of ABA synthesis and catabolism available as RiceCyc templates, and the regulatory processes discussed by Xiong et al. (2002 – PMID: 11779861) (especially Figure 10) and Klingler et al. (2010 – PMID: 20522527) Gramene SAB 2010 37 Automated Cyc and WikiPathways builds • • • • • Based on the SAB suggestions, the progress has been made towards the goal of extending the annotation of pathway databases in Cyc and Wiki versions in an automated way. However to do that approach we have to streamline the data workflow and structure the current curated gene database as a central repository/aggregator of necessary datasets to help achieve this goal. The Curated Gene database schema was restructured to hold, whole genome based annotations on genes and alleles and their associations to function, phenotype, germplasm, pathways, gene-to-gene interactions, gene products, and gene models, besides providing cross references to sequencing project objects (like gene models from IRGSP-RAP, MSU-OSA, BGI gene models for rice O. sativa) and published literature. Use aggregated datasets for automated Cyc build using the standard patwhay tools and provide the BioPax and SMBL dumps to WikiPathways project for their users. Gramene’s focus will be pathway curation and annotation in Reactome and functional annotation in gene database. Gramene SAB 2010 38 Outreach • Curated rice specific pathways and compounds contributed to PlantCyc and MetaCyc projects on reference pathway databases. • Organized Workshops – Community Gene Annotation Workshop at Plant Biology 2010 (July 2010) • Jointly organized with Plant Ontology (PO) Project. • Provided meeting support by way of website portal and onsite helping hands • Tool development (plant configurations of Phenote annotation tool and Ontologies) and funding provided by PO project. • Attended by about 35 researchers of which 12 were awarded travel support by PO. – Reactome workshop at CSHL, 25-29 October 2010 • • • • • Attended by Gramene and BAR curators Mentored by Reactome database (Peter D’Eustachio) Hands on curation of a test case pathway. Analysis of RiceCyc import and current Reactome Annotation tools. Development of curation strategy and annotation guidelines. Gramene SAB 2010 39 Plans for 2010-2011 • Release Rice Reactome • Release curated gene database in new avatar as aggregator of gene information • Integrate microarray probeset mappings in OMICS validator for non-rice pathways • Conduct the gene and pathway annotation outreach workshops. • Develop test cases for upcoming Renewal and strategies for analyzing large-scale datasets generated by NextGen technologies on transcriptomics and metabolomics. • Maintain the current Cyc based Pathway views upgare to v14.5 and later of Ptools Gramene SAB 2010 40 Pathway Collaborations • • • • • • • • • • • • • • • Metacyc/BioCyc (Peter Karp) Reactome (Lincoln Stein, Peter D’Eustachio) Arabidopsis Reactome (Nick Provart, Henning Hermjakob) PlantCyc (Sue Rhee) SolCyc and Solanaceae Genome Network (Lukas Mueller) Phenote curation tool (Nomi Harris, Suzi Lewis) Ontologies (GO, PO, OBO) BrachyBase (Todd Mockler) Sorghum Biofuel and Bioenergy Project (John Mullet) MaizeSequence.org MaizeGDB Maize Pathways (Andrew Hanson) C3-C4 project (Tim Nelson, Tom Brutnell, Chris Myer, R. Bruskiewich) WikiPathways Expression data (Todd Mockler, Tim Nelson, Tom Brutnell) Gramene SAB 2010 41 Questions for SAB? • Nominate Pathways • Types of analysis users are interested in • Potential collaborators (national and International) Gramene SAB 2010 42 Aim3: Gramene Diversity Module Susan McCouch & Edward Buckler, PIs Terry Casstevens, Genevieve DeClerck, Charles Chen, AS Karthikeyan, Jon Zhang, Qi Sun, Ken Youens-Clark. Gramene SAB 2010 43 Suggestions from last year • Integration with key tools – We provide new SNP query tool, Weblaunched Tassel, and downloads to work with Flapjack, in formats like Plink, HapMap, etc. • How about genotype storage? – Implemented BLOBs to store SNPs New Data Sets • Arabidopsis – Atwell et. al.. Genotype, phenotype, association data. ~214,000 SNPs, 199 Germplasm, 107 Phenotypes. • Rice – Zhao et. al PLoS May 2010, "1536 Assay": 1311 SNPs x 395 varieties, mapped to MSU6.0 – Gross B, et. al, Mol Ecol. Aug 2010 SNP diversity study from PG • Maize – dbSNP IDs and AGPv2 coordinate update for current dataset (1.6 million SNP x 27 NAM lines) Web Interface – SNP Query Downloads Tassel GWAS Visualization Gramene SAB 2010 49 Tassel Development • • • • • • • • New data structure significantly improving memory efficiency Alignment viewer User-friendly “wizards” Progress monitoring with ability to cancel tasks Import/export Hapmap, Flapjack, Plink data formats Auto-loading and analysis execution from web site startup GLM and MLM: – GLM interface simplified. – Compression and faster P3D implemented for MLM resulting in reduced runtime. – Matrix Algebra library wrapper written to make switching to newer, faster libraries easier. – EJML Matrix Algebra library interface implemented. Tassel 3.0 Pipeline… – Automates complex loading/analysis pipelines – Doesn't need Java coding to create – Has simultaneously executing pipeline segments – Works from web site launch, command line, and GUI Selection of candidate genes - Experimental evidences (from other species, e.g. Arabidopsis) - Ontology terms - SNP positions - Linkage disequilibrium estimates (r2) Compara pipeline Prior-candidate genes Hapmap SNP information Linkage block size calculations - Coordinates of the genes - Functional implication or annotations Hapmap SNP information Enrichment score calculations -SNP positions GWAS associations - Associated SNP map positions - p-values Functional implications Linkage block size for ith prior candidate is given by: Bi = 95% quantile {di1, di2, di3,…dix} di1, di2, ..and dix are the map distances of the SNP loci in the gene to other loci on the same chromosome that are in a perfect LD (r2=1.0) for ith prior candidate gene, the enrichment score, Ei, is calculated by the weighted hypermetric probability of observing gi significant associations in the linkage block Bi, given the number of SNP xi located in the block and the total number of Gt SNP loci on the chromosome Functional implication of prior candidate genes by statistically significant overrepresentation of association signals Example: Days-to-silk flowering time associations of maize chromosome 8 - Maize first generation hapmap 1.6 M SNP of all chromosomes- 136,119 SNPs on chromosome 8 - Flowering time trait, Days-to-Silk, of maize GWAS associations on chromosome 8- 144 associations (p-values < 1e-6) - Curated Arabidopsis flowering time candidate genes- 274 genes in total - Compara orthology of maize homologs to Arabidopsis flowering time candidates- 74 prior candidate genes - Linkage disequilibrium estimates (r2) from 136,119 SNPs, filtered with MAF > 0.05 - Genetic distances calculated from each maize candidate gene to 144 GWAS associations - Genetic distances of every pair of SNP loci in a perfect LD (r2=1.0) Linkage block size calculations Probability 95% quantile Linkage block size =105,387 bp 0 0.2 Mb 0.4Mb 0.6 Mb 0.8 Mb genetic distance of SNP loci Empirical cumulative probability distribution of genetic distances estimated by the SNP loci that are in a perfect LD Enrichment score calculations Enrichment score for ith gene: Suppose GWAS identify Mt SNPs significantly associated with flowering time variation in Nt total number of SNPs on a given chromosome. The enrichment score (Sei) determines the probability of getting gi number of significant GWAS association, weighted by p-values, within a linkage block. Sei = log10 æM t ö æ N t - M t ö ç ÷´ ç ÷ è gi ø è xi - gi ø æNt ö ç ÷ è xi ø Mt: total number of significant GWAS SNPs on a given chromosome Nt: total number of SNPs on a given chromosome where gi: significant GWAS SNPs in the defined window xi: number of SNPS in the defined window Sei: enrichment score of the ith maize flowering time candidate gene 14 10 6 FT maize homolog AGL79 maize homolog 4 Chromosome 2 Chromosome 3 GRMZM2G098443 GRMZM2G030762 GRMZM2G700665 GRMZM2G479610 GRMZM2G375448 GRMZM2G020291 GRMZM2G134941 AC209819.3_FG009 GRMZM2G021614 GRMZM2G179264 GRMZM2G049661 GRMZM2G100318 GRMZM2G054380 GRMZM2G082490 8 Log10 of odds of maize flowering time prior candidate gene GRMZM2G115960 GRMZM2G062262 GRMZM2G365688 GRMZM2G072052 GRMZM2G160514 GRMZM2G104549 GRMZM2G103666 GRMZM2G466139 AC208915.3_FG010 GRMZM2G157605 GRMZM2G169654 GRMZM2G057150 GRMZM2G178102 GRMZM2G026643 GRMZM2G097182 GRMZM2G099461 GRMZM2G180406 GRMZM2G133168 18 GRMZM2G059358 GRMZM2G089159 GRMZM2G105869 GRMZM2G107945 GRMZM2G408768 GRMZM2G061734 GRMZM2G010505 GRMZM2G129034 GRMZM2G438260 GRMZM2G175718 GRMZM2G105317 GRMZM2G048494 AC197575.3_FG008 GRMZM2G021044 GRMZM2G033962 GRMZM2G474468 GRMZM2G021560 GRMZM2G081812 GRMZM2G083504 GRMZM2G143602 GRMZM2G062019 GRMZM2G148693 GRMZM2G067915 GRMZM2G395244 GRMZM2G174784 GRMZM2G080054 GRMZM2G039996 GRMZM2G170934 20 GI maize homolog 16 TOC1 maize homolog 12 rap2.7 AP2 maize homolog LOD =2* 2 0 Chromosome 8 * Probability of null hypothesis is assessed by randomizing the association results with respect to the SNP positions, without changing the number and strength of association signals. Plans - Rice • Rice Diversity 44K chip: ~39,000 SNPs, 400 rice lines, phenotype data for 23traits - Build 33 • Rice SNP Consortium 1M chip data Build 34 • Curate key large GWAS results Plans Maize, Arabidopsis • Maize Diversity/Panzea, 56 million SNPs x 104 maize lines (Build 33) • Phenotypic data for an additional 1020 traits (depending on publication acceptance rate) • Additional data from Arabidopsis 2010 Project • Curate key large GWAS results Diversity Collaborations • Rice: – McCouch (#0606461, #1026555) – Wing (#1026200) – Purugganan (#0701382) – Olsen (#0638820) • Arabidopsis: Nordberg (#0723510) • Maize: Buckler (#0820619) Gramene SAB 2010 57 Plans - Software • Google Web Toolkit for association data viewer • SNP Query - additional features • TASSEL – Flapjack integration. Work with SCRI to create seamless connectivity between the two applications – Complete support for heterozygous data – Greater Junit testing (regression testing) – Automated MLM/GLM association analysis – New graphical displays (i.e., Manhattan plot) – Improvements to kinship calculations, imputation function • Functional implications from GWAS associations -- develop webbased interface for statistical method Plans – Comparative GWAS • Develop web-based interface for comparative candidate gene enrichment system. Diversity Questions for the SAB • What should happen to diversity data in the renewal? – Large projects such as SeeD (CIMMYT), Wheat/Barley CAP, GRIN-Global will likely go to new standards • What needs to be done to transition? Gramene SAB 2010 60 Aim 5: Outreach Everyone Gramene SAB 2010 61 Gramene SAB 2010 62 Tutorials OpenHelix’s Gramene tutorial went live the end of March, 2010. As of Sept. 7, The tutorial includes a self-run tutorial as well as PowerPoint slides, handouts, and exercises. In the five months it has been available, the landing page has received 305 views, with 36 viewings of the tutorial. Five new Gramene-produced tutorials such as this one on pathways. Gramene SAB 2010 63 Meetings and Presentations – Presentations • PAG • Rice Technical Working Group • Maize conference • International Symposium on Integrative Bioinformatics • Evolution • ISMB • Genome Informatics • Agronomy, Crop and Soil Sciences Meeting – ASPB curation workshop with hands-on exercises – Other: • Gramene Retreat (CSHL, June 2010) • Plant Ensembl developers meeting (Hinxton, Sept. 2010) • Plant Reactome training workshop (CSHL, Oct. 2010) • Ken and Jim TA’d bioinformatics course (CSHL, Oct. 2010) Letters of Support • Wise/Dickerson, NSF-PGRP TRPGR: NextGen PLEXdb (0543441) • Ana Caicedo (UMass) The evolutionary genomics of invasive weedy rice (0638820) • Rod Wing CPGS Oryza Genome Evolution (1026200) • Dick McCombie CPGS: Gene Discovery in Wheat (1032105) • Carolyn Lawrence, NSF-PGRP GERP: Functional Structural Diversity Among Maize Haplotypes (0743804) • Steven Briggs, TRPGR Discovery, revision, and validation of maize genes by proteogenomics (0924023) • Matt Vaughn, Epigenetic Variation in Maize (0922095) Gramene SAB 2010 65 Publications • • • • “Gramene database in 2010: updates and extensions” (Youens-Clark, et al.) Nucleic Acids Research, 2010, 1–10 doi:10.1093/nar/gkq1148. “Fine Quantitative Trait Loci Mapping of Carbon and Nitrogen Metabolism Enzyme Activities and Seedling Biomass in the Intermated Maize IBM Mapping Population.” (Zhang, Chen, Buckler, et al.) Plant Physiology, in press. “Gramene database: a hub for comparative plant genomics.” (P Jaiswal). Methods Mol Biol. 2011;678:247-75. (invited book chapter) “Applications and methods utilizing the Simple Semantic Web Architecture and Protocol (SSWAP) for bioinformatics resource discovery and disparate data and service integration.” (Nelson et.al) BioData Min. 2010 Jun 4;3(1):3. Coming Up: • “Gramene GeneTrees: A comprehensive database of phylogenetic trees in plants and other model Eukaryotes” (Plant Phys) • RiceCyc • Diversity • Genome sequence analysis Gramene SAB 2010 66 Plant Ensembl Collaboration • Lead: Will • EBI Participants: Paul Kersey, Paul Derwent, Dan Staines, Andy Yates • Gramene Participants: Will Spooner, Doreen Ware, Aaron Chuah, Shiran Pasternak, Sharon Wei Gramene SAB 2010 67 Plant Reactome Curators Meeting Pankaj Jaiswal and Marcela Monaco organized an intensive five-day meeting (October 25-29) at CSHL with Peter D'Eustachio of New York University to learn how to use the Reactome model and software to curate plant pathways. Other participants included Vindhya Amarasinghe (OSU), Palitha Dharmawardhana (OSU), and Hardeep Nahal (Univ. of Toronto). Gramene SAB 2010 68 • Development work on visualizing annotations from DNA Subway within Gramene’s Ensembl views • Contribution of reference genomes for high-throughput sequencing Gramene SAB 2010 69 Web Usage and Stats Gramene SAB 2010 70 Page Requests by Year per Month 2001 - 2010 Explanation of drop in web usage Prior to release 29, Gramene was experiencing problems from abusive spidering by web searches on our development site. As a consequence, all indexing was disabled in our “robots.txt” file. Through an error in the release process, this file was copied to the live server, thereby refusing access to search engines. This explains the severe drop in usage by casual users finding Gramene through Internet searches. The problem has been fixed, and usage appears to be climbing again. Gramene SAB 2010 72 3-year Perspective Gramene SAB 2010 73 Top Countries - Visits% Nov 2009 – Nov 2010 Duration of Visit Depth of Visit Visitor Loyalty Thanks, from Gramene Gramene SAB 2010 78 End Gramene SAB 2010 79