(i) RESPONSE TO PREVIOUS REVIEW

advertisement

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

(ii) EXECUTIVE SUMMARY

PROJECT TITLE

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

PROJECT TYPE

Standard Research and Extension Project

MANDATED FOCUS AREAS ADDRESSED IN THIS PPROJECT

Focus Area 1: Research in plant breeding, genetics, and genomics to improve crop characteristics (60%);

Focus Area 2: Efforts to identify and address threats from pests and diseases (30%)

Focus Area 3: Efforts to improve production efficiency, productivity and profitability over the long term

(10%)

PROGRAM STAFF

Project Director

1.

Doreen S. Main - Associate Professor of Bioinformatics, Department of Horticulture and Landscape

Architecture, Washington State University, 48 Johnson Hall, Pullman, WA 99164-6414.

Email: dorrie@wsu.edu

Co-Project Directors

2.

Sook Jung - Assistant Research Professor of Bioinformatics, Department of Horticulture and

Landscape Architecture, Washington State University, 48 Johnson Hall, Pullman, WA 99164-6414.

Email: sook@bioinfo.wsu.edu

3.

Cameron Peace - Assistant Professor of Tree Fruit Molecular Genetics, Department of Horticulture and

Landscape Architecture, Washington State University, 39 Johnson Hall, Pullman, WA 99164-6414.

Email: cpeace@wsu.edu

4.

Katherine Evans - Associate Professor of Tree Fruit Breeding, Dept. of Horticulture and Landscape

Architecture, 1100 N. Western Ave., Wenatchee, WA 98801-1230.

Email: kate_evans@wsu.edu

5.

Nnadozie Oraguzie - Associate Professor of Tree Fruit Breeding, Dept. of Horticulture and Landscape

Architecture, Washington State University, 24106 N Bunn Rd, Prosser WA 99350-8694.

Email: norguzie@wsu.edu

1

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

6.

Albert G. Abbott - Coker Endowed Chair and Professor of Plant Genetics, Department of Genetics and

Biochemistry Clemson University, 103 Jordan Hall, Clemson, SC 29634.

Email:aalbert@clemson.edu

7.

Desmond Layne - Associate Professor of Pomology and Extension Fruit Specialist, Department of

Horticulture, Clemson University, 165 Poole Agricultural Center, Clemson, SC 29634.

Email: dlayne@clemson.edu

8.

Fred Gmitter - Professor of Citrus Breeding and Genetics, University of Florida, Citrus Research and

Education Center, 700 Experiment Station Rd., Lake Alfred, FL 33850.

Email: fgmitter@ufl.edu

9.

Lukas Mueller - Assistant Scientist of Bioinformatics and Genomics, the Boyce Thompson Institute for

Plant Research, Ithaca, NY 14853-1801.

Email: lam87@cornell.edu

Other Key Personnel

10.

Mercy Olmstead - Fruit Extension Specialist, Dept. of Horticulture and Landscape Architecture,

Washington State University, 24106 N Bunn Rd, Prosser WA 99350-8694. University of Florida Fruit

Extension Specialist from 07/01/09.

Email: molmstead@wsu.edu.

11.

Gary R. Brown - Director of the Center for Teaching, Learning & Technology, Smith Center for

Undergraduate Education, Washington State University, Pullman, WA 99164-4550.

Email:browng@wsu.edu

12.

Randall Svancara - System/ Database Administrator, Dept. of Horticulture and Landscape Architecture,

Washington State University, 48 Johnson Hall, Pullman, WA 99164-6414.

Email:rsvancara@wsu.edu

13.

Tetyana Zhebentyayeva - Fruit Molecular Biologist Researcher, Dept. of Genetics and Biochemistry

Clemson University, 103 Jordan Hall, Clemson, SC 29634.

Email: tzhebe@clemson.edu

14.

Chunxian Chen - Assistant-In Citrus Genomics, University of Florida, Citrus Research and Education

Center, 700 Experiment Station Rd., Lake Alfred, FL 33850.

Email:cxchen@crec.ifas.ufl.edu.

STAKEHOLDER NEEDS ADDRESSED

Rising labor, energy, water, and land costs, rapid spread of pests and disease, together with consumer demand for high quality and environmentally safe products, provide real threats to the global competitiveness of U.S. tree fruit production. Meeting these challenges requires crop improvement – development and adoption of genetically improved cultivars as well as better management of existing cultivars. This project will provide an integrated online knowledgebase of genomics, genetic, breeding, and cultivar performance data which will facilitate research to discover genes underlying important agricultural traits and to develop markers for marker-assisted breeding, and enhance critical decision-making by breeders and growers.

OUTREACH PLAN

Extension activities (Objective 3) will enable growers, breeders, research scientists, and extension specialists to utilize the databases developed and access the data portal created. We will provide informal

2

Tree Fruit GDR: Translating Genomics into Advances in Horticulture jamborees held around industry field days and meetings to facilitate two-way information exchange, host

“train the trainer” workshops at national conferences, and disseminate successful research stories of genomic discoveries and modern integrated crop improvement that impact consumers.

POTENTIAL ECONOMIC, SOCIAL, AND ENVIRONMENTAL BENEFITS

Knowledge not used is useless, and information not shared is doomed to obscurity. TfGDR’s mission of information gathering from those who generate it, analysis and interpretation for those who need it will provide significant economic, social, and environmental benefits for tree fruit production in the U.S. The integration of the genetic, genomic and breeding data will enable researchers to come up with new models and theories to test and the results of these researches will improve our understanding of the fundamental biology underlying the important traits of tree fruit crops, and help develop tools for generating improved cultivars. The use of a knowledgebase of “breeder tools” will accelerate development of new cultivars with superior genetic attributes which meet grower and consumer needs (disease and pest resistance; superior fruit quality). The provision of a database of cultivar performance will allow growers to choose cultivars that best meet their particular needs (local climate and soil, resource sustainability, harvest timing, consumer preferences) with efficient management to achieve and maintain genetic potential.

STAKEHOLDER ENGAGEMENT THROUGHOUT THE PROJECT

We have engaged stakeholders throughout the development of the previous submission and this resubmission, as documented by the attached letters from industry (Appendix 2). Engagement included attending many Citrus and Rosaceae industry meetings where we illustrated what the existing databases

can do and what they could do with additional investment. A grower directed us to the peach cultivar performance database at Clemson University resulting in its inclusion in this project. Research scientists and growers are integral to database development and use, and their participation and feedback will be frequently solicited. To this end we have an Advisory Panel that includes industry representatives, research scientists, and communication experts and we are offering workshops and jamborees where participants and the Panel can provide in-person feedback on the project in addition to interim web- and phone-based communications.

3

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

(iii) INTRODUCTION

In a time when innovative structural, functional, and comparative genomics projects are generating vast amounts of data for Rosaceae and Citrus crops, a common computational infrastructure is critical for collecting, integrating, and translating this diverse information into a knowledgebase serving gene discovery, marker-trait identification, genomics-assisted breeding, and genetic manipulation for accelerated cultivar development, evaluation and adoption. Focusing on Citrus and Rosaceae, and in collaboration with

Solanaceae, we will develop a standardized, integrated, user-friendly specialty crop database platform for tree fruit – Tree Fruit Genome Database Resources (tfGDR) – to facilitate efficient collection, analysis, and translation of genomics, post-genomics, genetics, and cultivar performance data into solutions for tree fruit industry production and processing problems.

OBJECTIVES

1.

Collect, analyze, and integrate genomics, genetics and breeding data to facilitate gene discovery and marker-trait associations.

2.

Provide data-mining resources of genomic, genetic, breeding and cultivar performance data for breeders and growers to expedite development and adoption of new and existing cultivars

3.

Extend community outreach in genomics, genomics-assisted breeding and crop management

BACKGROUND

Edible products from Citrus (orange, grapefruit, tangerine, mandarin, lemon, lime, and pummelo) and

Rosaceae (almond, apple, apricot, blackberry, cherry, nectarine, peach, pear, plum, raspberry and strawberry) have major economic and nutritional value and impact in the U.S. Constituting much of the economic backbone for many rural communities, the production value for Rosaceae and Citrus crops in

2007 amounted to $12.7 billion, representing 67% of U.S. fruit and nut crop production and 8.9% of all

U.S. crop production (NASS, 2008, Shulaev et al., 2008). Rosaceous and Citrus fruits consumed in multiple forms, including fresh, dried, juice and other processed products, provide unique and valuable contributions to consumers’ dietary choices and overall health. They are a major source of antioxidants and other cancer and heart disease inhibiting compounds (Macheix et al., 1991; Swanson, 1998; Mazur et al., 2000;

Schieber et al., 2001; Yau et al., 2002; Whitman et al., 2005; Gorinstein et al., 2006). The importance of improving fruit and nut products will continue to grow as the focus in agricultural research continues to highlight and publicize the dietary value of foods.

The economic viability of the U.S. tree fruit industries is now under serious threat. Rising labor, energy, water and land costs, rapid spread of pests and disease, and consumer demand for high quality and environmentally safe products pose real challenges to our global competitiveness (WTFRC, 2009;

Shulaev et al., 2008; Talon and Gmitter, 2008). Of most immediate concern is the Citrus industry, where the arrival of Huanglongbing (HLB), also known as Citrus greening, threatens its very survival. Believed to be caused by Candidatus Liberibacter and vectored by the Asian citrus psyllid, HLB is spreading rapidly throughout the world’s citrus growing regions. It occurs throughout south and east Asia, in Africa, and recently in the Americas. It was found in Sao Paulo Brazil in 2004 and one year later in Florida. Now widespread in Florida, some citrus orchards are already completely infected. Although there are varying

4

Tree Fruit GDR: Translating Genomics into Advances in Horticulture degrees of tolerance to the different strains that exist, there are no sources of true genetic resistance or immunity yet known; for most Citrus trees, HLB infection results in decline and certain death. The perceived severity of this threat is underlined by the investment of almost $27 million in the past two years alone by the Florida Citrus industry on research projects attempting to generate solutions to the rapid and devastating spread of the disease. Among those projects funded is an investment in sequencing Citrus genomes, to provide tools for attempting to understand the genetic mechanisms underlying the disease process and devise a long-term genetic solution through the development of resistant cultivars.

Overcoming these challenges requires the development of genetically improved cultivars and their successful adoption in industry production. Improving tree fruit crops through conventional breeding is not an optimal solution as many of these tree fruit species are characterized by long generation intervals, protracted evaluation times, high costs of breeding inputs, and slow and irreversible maturation. The application of genomics tools, through marker-assisted breeding (MAB) and the identification of critical genes that can be manipulated by non-hybridization methods, will be a major contribution to overcoming many of these breeding constraints inherent in tree fruit breeding. Current research in genomics and genetics is generating large-scale datasets (Table 1) that require access to bioinformatics capabilities for analysis, archiving, integration, and interpretation. Translating this data into knowledge that can be converted to wisdom by experts in genomics, genetics, breeding, and production necessitates the development of a robust and high-throughput computational and database platform - which is the central

mission of this proposal.

The post-genomics era of research is focusing on studies to attribute functions to genes and describe regulatory networks controlling natural pathways of metabolism, protein synthesis, and signal transduction.

To facilitate the analysis of post-genomics experiments, new concepts must be developed for linking the vast amount of raw data to a biological context. Fortunately we do not have to start from scratch, as many existing plant community databases provide excellent examples of resources that enable basic, translational, and applied research. For agricultural crops, MaizeGDB (Lawrence et al., 2008) is widely considered to be the most advanced database that translates basic data right through to practical crop improvement. Within the specialty crops, the Solanaceae and the Rosaceae have major clade type databases available. The Solanaceae Genomics Network (SGN, Mueller et al., 2005) serves the tomato, potato, pepper, and eggplant community (Co-PD Mueller is PD of SGN), while the Genome Database for

Rosaceae (GDR, Jung et al., 2008) serves the almond, apple, apricot, blackberry, cherry, nectarine, peach, pear, raspberry, rose and strawberry communities (PD Main is PD of GDR) . Despite the importance of the

Citrus industry and the availability of significant genomic resources (Table 1), there is no central repository for Citrus genomics and genetics data. While the Citrus industry and research community recognize the need for such a database (see letter from Batkin), to date the cost of creating such an entity has been prohibitive.

The Genome Database for Rosaceae (GDR, www.bioinfo.wsu.edu/gdr, Figure 1) was funded by the NSF in

2003 (Award # 0320544) to proposal PDs Main, Jung, and Abbott, just as the Rosaceae genomics community was starting to generate significant amounts of EST and mapping data. This funding was very timely for the community, allowing access to integrated structural and functional data as it was generated.

NSF- and USDA-sponsored workshops organized by GDR investigators brought representatives together from almost all major research groups working on Rosaceae worldwide, ensuring community participation in database development. This database is credited with facilitating significant community collaborations and reducing redundancy of effort. GDR is currently the sole resource for comprehensive data submission, retrieval, and comparative genomics analysis for Rosaceae. GDR contains comprehensive data of the

5

Tree Fruit GDR: Translating Genomics into Advances in Horticulture genetically anchored peach physical map, annotated EST databases of apple, peach, almond, cherry, rose, raspberry and strawberry, Rosaceae genetic maps and markers, molecular diversity data, and all publicly available Rosaceae whole genome DNA sequences. These data can be accessed via search/browse/download pages and graphical interfaces. Online analytical tools and services, based on high-throughput computational methods and manual curation, enable the research community to interrogate this information, thus catalyzing discovery. Various community and project pages have also been developed to facilitate communication among Rosaceae researchers. Our team are currently completing the annotation of the peach genome for community dissemination via GDR. GDR is functionally associated with numerous projects and databases. GDR has been a highly accessed research community resource: over the last twelve month period more than one million pages were accessed (excluding hits from search engines) from researchers in 44 countries.

Figure 1: Genome Database for Rosaceae

There is a need, however, to expand GDR to collect and integrate more data to be directly utilized by breeders and industry members. A very valuable resource would be an expanded GDR which includes industry-level performance data, phenotypic and genotypic data from breeding and research trials, whole

6

Tree Fruit GDR: Translating Genomics into Advances in Horticulture genome sequences and biochemical pathway data. The integration of Citrus within the framework of GDR to create Tree Fruit Genome Database Resources (tfGDR) makes very practical sense given the significant ties between both industries and research communities. Using our new Cacao Genome Database web framework (www.cacaogenomedb.org) we have already begun to model the web interface for the Citrus project (www.bioinfo.wsu.edu/citrus, Figure 2) using the web content management system Drupal.

Figure 2: Citrus Genome Database Design

The Solanaceae Genomics Network (SGN, Mueller et al., 2005) is the other major clade-type (that is, of multiple related species) repository that exists for Specialty Crops. Including tomato, potato, pepper, eggplant, tobacco, and coffee, it is a more advanced database than GDR and already includes whole genome sequence for tomato and has started to develop tools for breeders. Following a two-day meeting at Cornell in February, the two teams have joined forces here to co-develop a standard platform for

Specialty Crops. To this end, co-PD Mueller is responsible for the common pathways component and code uniformity. While utilizing most of their underlying table structure, We will take the lead in the codevelopment of resources for breeders and growers, and also in utilization of the content management system, Drupal, We will therefore leverage the combined expertise of both our groups to reduce redundancy of effort and database development of our tree fruit database.

Stakeholders have been involved throughout the development of both this submission and the last. This is well documented in the letters provided by the California Citrus Research Board, the Florida Citrus Industry

7

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

Research Coordinating Council, Citrus Research Board and the National Citrus Research Council, the

Washington Tree Fruit Research Council and the South Carolina Peach Council (Appendix 2). Engagement with our grower stakeholders included attending many Rosaceae and Citrus industry meetings and national workshops where we illustrated the importance of this database for fundamental discovery and genomicsassisted crop improvement.

The South Carolina Peach Board have part-funded the Peach Cultivar Performance Database

(http://www.clemson.edu/hort/peach/index.php?p=73) developed by co-PD Layne at Clemson University and the Washington Tree Fruit Research Commission is funding an apple and cherry cultivar performance database (PD Hoheisel is a collaborator on this proposal), and the development of an Apple and Cherry

Breeders Toolbox through PD Main. These two component features will serve as templates for the other tree fruit crops to be included in Tree Fruit GDR. Further, the Florida Citrus Production Research Advisory

Council (FCPRAC) has invested $1.6 million for citrus genome sequencing and gene expression studies since 2007, and an additional $1.3 million in the comprehensive genetic improvement program in the same time. Co-PD Gmitter is the PI for these FCPRAC projects. Breeders and growers are integral to database development and use, and their participation and feedback will be frequently encouraged. To this end we have an Advisory Panel of eight individuals that is composed of four industry representatives split between

Citrus and Rosaceae. We are offering workshops and jamborees where participants and the Panel can provide in-person feedback on the project in addition to interim web- and phone-based communications.

The multi-disciplinary team of investigators is well suited for this project. It includes:

1.

Dr. Doreen Main - plant bioinfomatician, PD of GDR and the Cacao Genome Database, permanent member of the Rosaceae Genomics, Genetics and Breeding Executive Committee (RosEXEC).

2.

Dr. Albert Abbott - tree fruit molecular biologist and co-PD of GDR, former chair of both RosEXEC and the international Rosaceae Genomics Initiative (RosIGI).

3.

Dr. Fred Gmitter - an expert in Citrus breeding and genetics, current leader of the International Citrus

Genome Consortium, and chair of the Citrus Crop Germplasm Committee of the USDA’s National Plant

Germplasm System.

4.

Dr. Lukas Mueller - PD of Solanaceae Genomics Network and expert in bioinformatics and genomics.

5.

Dr. Sook Jung - plant bioinfomatician and co-PD of GDR and WTFRC “Tree Fruit Breeders Toolbox”.

6.

Dr. Cameron Peace - a tree fruit molecular geneticist working closely with tree fruit breeding programs,

Chair of RosEXEC, and chair of the Prunus Crop Germplasm Committee of the USDA’s National Plant

Germplasm System.

7.

Dr. Katherine Evans - an expert in tree fruit breeding with a molecular biology doctorate.

8.

Dr. Nnadozie Oraguzie - an expert in tree fruit breeding with expertise in statistical genetics.

9.

Dr. Desmond Layne - an expert in pomology and tree fruit extension, and creator of the peach cultivar performance database

10.

Other key personnel include Dr. Mercy Olmstead (a fruit extension specialist and expert at web enabled communications), Dr. Gary Brown (Director for the Center for Teaching, Learning and

Technology at WSU), Randall Svancara (computational specialist on GDR), Dr. Chunxian Chen (a

Citrus bioinformatics and genomics expert), and Dr. Tetyana Zhebentyayeva (an expert Prunus

Molecular Biology Researcher).

Members of our Advisory Board Panel

1. Dr. James Mcferson (Washington Tree Fruit Research Commission, Manager).

2.

Mr Ted Batkin (California Citrus Board Chair and Citrus Grower).

8

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

3.

Mr Chalmers Carr III (Member of the South Carolina Peach Council and Peach Grower).

4.

Dr Mikeal Roose (UC Riverside, Citrus Breeder and Geneticist).

5.

Dr. Amy Iezzoni (Michigan State University, Cherry Breeder and Geneticist).

6.

Dr. Pankaj Jaiswal (Oregon State University, PD Plant Ontology and co-PD GRAMENE).

7.

Dr Mary Lu Arpaia (UC Davis, Extension Subtropical Horticulturist).

8.

Mr. John Jackson (Florida Citrus Industry Research Coordinating Council, Director)

(ii) RATIONALE AND SIGNIFICANCE

Focus Areas Addressed: This proposal addresses SCRI Focus Area 1 (60%) “Research in plant breeding, genetics and genomics to improve crop characteristics”, as the analysis and integration of genomics, genetics, and breeding data into a knowledgebase accessible to genticists, genomicists, breeders and growers will add additional value to the data, produce models and theories, and enable the results of numerous research projects to be translated into improvement of a wide range of crop characteristics. Focus Area 2 (30%) “Efforts to identify and address threats from pests and diseases” is addressed as access to the integrated genomics, genetics, breeding data through our proposed activity will aid in identifying markers or target genes for developing disease resistant cultivars. Focus Area 3 (10%)

“Efforts to improve production efficiency, productivity and profitability over the long term” is addressed as breeders develop more productive and consumer-desired fruit cultivars, and as industry access to improved cultivars coupled with local, long term performance data will facilitate more informed decisions on optimal cultivar choice and management for growing, handling, processing, and marketing needs, leading to improved production efficiency, productivity and profitability over the long term.

Challenges to Tree Fruit Productivity and the Need for Marker-Assisted Breeding: Tree Fruit Growers belong to well organized communities who both understand and have clearly articulated to researchers and government agencies their critical needs regarding profitable and sustainable tree fruit production. Current and future threats from pests and disease demand the development of genetically improved cultivars that are resistant to known threats while also producing high quality fruit that meet consumers demand for fresh, dried, and juice products (WTFRC, 2009; Shulaev et al., 2008; Talon and Gmitter, 2008). The development of genetically improved cultivars that minimize or overcome these challenges coupled with successful adoption of these superior cultivars in their growing programs will meet the most critical needs of tree fruit growers. For the reason outlined in the introduction conventional breeding has not been an optimal solution for many of these tree fruit species. In Citrus, for example, nearly all of the major scion and rootstock cultivars utilized have not arisen as a consequence of systematic and targeted breeding programs. Rather, they appeared spontaneously as seedling and/or bud sport mutations, or by introduction and field trials of materials from one location to another (Soost and Roose 1996). The reasons for this are related to the peculiarities of Citrus reproductive biology and unique aspects of the taxonomic relationships of the major cultivar groups (Gmitter et al., 1992). Citrus seedlings are subject to juvenile periods ranging from one to as many as 20 years, and even after first flowering, it is common for fruit traits to be atypical of later characteristics as scion lines mature. Juvenility results in delay between hybridization and selection for desired characteristics, and also imposes great costs associated with land and time, and minimizes the numbers of families or individuals that can be grown. Many citrus types produce polyembryonic seeds through nucellar embryony, yielding seedlings that are juvenile clones of the maternal parent. Several of the so-called “species” of economic significance (e.g., orange, C. sinensis; grapefruit C. paradisi; and lemon, C. limon) are not biologically defined species; the cultivars in these groups represent accumulated somatic mutations identified over centuries through on-tree or nucellar seedling mutations (Chen et al.,

9

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

2007). Finally, there has been little information available until recently on the inheritance and genetic control of important characteristics. The identification of critical trait-associated genes coupled with the application of genomic tools and technologies in marker-assisted breeding will help overcome many of these breeding constraints inherent to tree fruit breeding. Although more genetics and genomics resources have been developed for Rosaceae than for Citrus, marker-assisted breeding has not yet been sufficiently developed or exploited in either Rosaceae or Citrus improvement programs.

The potential of MAB to increase the efficiency of tree fruit breeding is enormous. Genetic marker screening during the juvenile period can predict later performance and allow culling of inferior seedlings prior to field planting. This translates to significantly reduced costs for field space and years of maintenance to grow trees to maturity, because only the preselected elites need to be grown. Further, it allows breeders to create as seedling populations the very large families required for multi-trait improvement, which otherwise would be impractical to produce and grow to maturity in fields. Consider an example from Citrus rootstock breeding where crosses of orange or grapefruit with Poncirus trifoliata, a sexually compatible relative contributing virus, nematode, and Phytophthora resistance genes, have led to two of the most widely used Citrus rootstocks in the U.S. Virus resistance is monogenic (Gmitter et al., 1996), while nematode and Phytophthora resistance are quantitative. Assuming 1000 such hybrids were produced and

50% would be virus resistant, but only 10% would be adequately resistant to nematode, and only 10% resistant to nematode, then just 5 of the 1000 hybrids would possess adequate resistance to the three pathogens. Screening and selection for these multi-trait resistant phenotypes would require years of testing through inoculation with virus (in a temperature-controlled greenhouse structure), followed by replicated testing with nematodes, and replicated Phytophthora tests. By contrast, using MAB would enable the breeder to select these five elite hybrids in only the time it would take to extract DNA from the 1000 and run appropriate marker screens, reducing a multi-year process to a matter of weeks. As more traits are analyzed and genomics information is associated with phenotype, the results of seedling selection provide orders of magnitude greater efficiency. Even prior to seedling screening, genotypic characterization of parents is valuable to understand the specific functional alleles carried by each and to design more efficient crosses. Similarly, genetic marker characterization of advanced selections can be used to describe their performance and speed their release and adoption as new cultivars. To realize the promise of this approach to tree fruit breeding enhancement, however, requires a robust and high-throughput platform through which genomic and phenomic datasets can be integrated, which is the mission of this proposal.

Advent of Genomic and Genetics Tools and Technologies: The advent of Citrus and Rosaceae genomic sciences has generated vast resources (Table 1) which can be used for genetic improvement of these important fruit crops. More than 425,000 ESTs are publicly available for Rosaceae

(www.bioinfo.wsu.edu/gdr), with genetic maps reported for all the major species, apple and peach physical maps are completed (Han et al., 2007; Zhebentyayeva et al., 2008), and whole genome sequences will soon be released for apple, peach, and strawberry in 2009 (see letters of from Velasco, Sosinski, and

Shulaev in Appendix 2). In Citrus, fundamental genomics tools include linkage maps developed with EST-

SSRs (Chen et al., 2007), BAC libraries (Deng et al., 2001; Terol et al., 2007; Terol and Talon 2009), physical maps (http://phymap.ucdavis.edu:8080/Citrus; Shimizu et al., 2007), extensive EST libraries with more than 400,000 ESTs available through NCBI dbEST, at least four different microarray platforms, and whole genome sequencing of sweet orange (heterozygous diploid) and mandarin (haploid) by the

International Citrus Genome Consortium (ICGC) which is planned to be fully available in 2010 (co-PD

Gmitter, pers. comm).

10

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

Breeding Programs and Cultivar Performance Data: With the appreciation that marker-assisted breeding has tremendous potential for efficient production of improved fruit tree cultivars, Citrus and

Rosaceae breeders have begun organizing efforts toward the utilization of molecular resources in their programs. Very active Citrus cultivar development programs are underway at the University of Florida, the

University of California at Riverside, and the USDA Horticultural Research Lab in Ft. Pierce, FL. These programs are well-supported by the industries in Florida and California, and they incorporate scientists that include breeders, geneticists, molecular biologists, tissue culture and transgenic technology experts, and field horticulturists. These teams are generating substantial breeding populations and genotypic and phenotypic data, and they are also actively involved in Citrus genomic research field, as most Citrus genome-based activities are supervised by the breeders (Gmitter at UF and M.L. Roose at UCR). The U.S. rosaceous breeding community consists of approximately 50 U.S. professional breeders, who, like Citrus breeders, are also generating breeding populations and phenotypic data, are actively supported by industry and are increasingly involving or at least considering genomics assistance. Large-scale phenotypic data are also being actively generated from CSREES-NRI funded projects (FY 2005 and FY 2008) that specifically require submission of data to GDR. The European Commission funded HiDRAS (High quality

Disease Resistant Apples for a Sustainable Agriculture) project identified many genetic factors controlling apple fruit quality. This collaboration of 11 European groups used the innovative approach of Pedigree-

Based-Analysis to identify and characterize quantitative trait loci based on genotypic and phenotypic data collected on a comprehensive set of multiple pedigree-linked apple populations and cultivars. This data will be released publicly to the GDR by 2011 (see letter of support from Gianfranceschi). A similar project for small fruits, GENBERRY, is a collaboration of eight European countries to promote conservation, breeding, and characterization of the genetic diversity of strawberry and raspberry focusing on the germplasm available in national collections. The project will identify core collections using standardized descriptors and all data will also be provided to GDR (see letter of support from Denoyes-Rothan).

Need for web-access to the integrated data for breeders to efficiently employ marker-assisted

breeding: To be utilized routinely and with impact in cultivar improvement, genomics, genetics, and breeding resources must be collected, curated, integrated, and made available to breeders in formats that best meet the needs of breeders. The ideal web resource would allow breeders to identify markers to be used to track traits of interest in their germplasm. Such a one-stop resource would describe each trait with all available knowledge (such as economic value, heritability, and linkage to other traits), list germplasm incorporating the trait (with known pedigrees and origins), link to specific research trials in which the trait was examined, indicate associated markers (including functional diversity), and link to genomics information such as a graphical viewer of its position in the genome. Rosaceae and Citrus communities are in a perfect situation to create this type of web database since all the necessary data such as whole genome sequences, genetic markers, traits, large-scale phenotypic data, and genotypic data, are actively being generated. Without the use of bioinformatics, such as sequence analyses to predict genes with putative function, synteny/orthology analyses for knowledge transfer among species, integration of the large-scale sequences with known markers and trait loci, integration of the diversity data, description of data with controlled vocabulary (ontology), and access to the integrated data, the different categories of data remain in raw form without translation into practical knowledge and cannot be utilized in breeding programs. Data integration and provision of the efficient web interface is likewise essential for research programs to understand the fundamental biology underlying traits and interaction with environmental conditions. The ultimate scenario will be the ability to identify which DNA sequences determine a metric trait, obtain the networked mRNA profiles of these sequences and define the critical protein activities encoded by these mRNA. Defining how DNA variations affecting the coding and regulatory sequences for these genes alter the metabolic flux and thus the character state on the overall trait will be crucial. The

11

Tree Fruit GDR: Translating Genomics into Advances in Horticulture downstream effects of these proteins will provide information on the metabolic pathways which determine the trait value when modified by environmental influences on the plant. This includes spatial and temporal changes in gene expression in the plant’s life cycle, currently monitored by microarray methods. In addition, a consideration of epistatic effects of genes containing DNA variation is important. To enable these types of comprehensive analyses, the appropriate DNA, RNA, and germplasm datasets need to be collected, computationally analyzed, and then linked to traits that are of interest to breeders, industry stakeholders, and consumers. This value-adding to raw data will allow the improvement of metric traits to give a more predictable outcome to plant breeding than is currently the case with conventional one-geneat-a-time genetics or phenotypic selection approaches.

Opportunities for Continued Community Building: One of the most important assets in the efficient production of improved and sustainable cultivars through marker-assisted breeding is having a well connected and collaborative community of researchers, breeders, industry sector participants and supporting researchers, enabling exchange of needs, ideas, and resources. Community databases can significantly enable community building by acting as a communication hub as has been evident in the

Rosaceae. The Rosaceae community has become an international, cohesive, well-organized, growing body of basic, translational and applied researchers. There are elected steering committees at both the national (US Rosaceae Genomics, Genetics and Breeding Committee, RosEXEC) and international level

(Rosaceae Genomics Initiative, RosIGI); a priority-documenting White Paper; a stakeholder-driven technology roadmap; an annual Fruit and Nut Crops Workshop at the Plant and Animal Genome conference; and biennial International Rosaceae Genomics Conferences. Other important plant communities, grasses-GRAMENE (Jaiswal et al., 2006); maize-MaizeGDB (Lawrence et al., 2008),

Arabidopsis-TAIR (Rhee et al., 2003), Solanaceae-SGN (Mueller et al., 2005), legumes-LIS (Gonzales et

al., 2005), have also shown that the access to a centralized, curated database is fundamental to community building, information generation, and knowledge transfer among researchers, and can revolutionize crop improvement. Significantly higher funding levels combined with access to large-scale datasets have resulted in advanced databases for these plant communities. There is a national Citrus Genome Working

Group that meets annually, and along with the International Citrus Genome Consortium, the research community has grown closer in communication and collaboration. However, having a database for Citrus will substantially improve the collaborative environment among the national and international researchers.

The forthcoming release of whole genome sequences and large-scale phenotypic and genotypic data for

Rosaceae and Citrus makes the role of databases in integrating these new data and synergizing the

community research effort critically important.

Our approach is by definition systems-based, trans-disciplinary and stakeholder-driven as tfGDR will provide a place where genomicists, geneticists, bioinformaticists, breeders, and growers can share their data and ideas to enable better understanding of the crops and specific cultivars, to suggest direction to new methods and research, and to produce efficient and focused practical steps toward the common goal of tree fruit improvement. Our team is also composed of people from multi-institution, multi-state and multicrop backgrounds. Genome comparisons among Rosaceae or Citrus species will accelerate knowledge transfer among related species, resulting in discovery of cross-genera QTLs or markers, as shown in grass species (Paterson et al.,. 1995). The Rosaceae community is the largest and most diverse among the fruit tree crops globally, because of the genetic and crop diversity that exists within this family. Among all tree fruit crops, Citrus in fact represents the largest single fruit crop in terms of total production in both the US and globally. The Citrus genetics/genomics/breeding community is smaller than the Rosaceae community, but the need for an integrative database is equally critical to Citrus crop genetic improvement. Including

Citrus together with the Rosaceae is a logical way to bring the benefits of a database portal to the Citrus

12

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

Table 1: Genomic Resources of Rosaceae and Citrus Species

Crop Scientific

Name

Ploidy

Level

Genome

Size

Genomic

Sequences

EST Protein Genetic

Map

Physical

Map

BAC

Library

QTL Genetic

Diversity

Study peach almond apricot cherry apple rose blackberry raspberry pear strawberry

All Rosaceae

Sweet orange

Clementine

All Citrus

Prunus persica

Prunus dulcis

Prunus armeniaca

Prunus avium

Prunus cerasus

Malus

domestica

Rosa

2n = 2x =

16

2n = 2x =

18

2n = 2x =

16

2n = 2x =

18,

2n = 4x =

32

270 Mb

2n = 2x =

34

2n = 5x =

35

750 Mb

Rubus

Pyrus communis

2n = 28

2n = 6x =

48

2n = 14,

2n = 56 Fragaria

Citrus sinensis 2n = 2x =

18

Citrus clementina

2n = 2x =

18

8X ABI, 10GB

Solexa double haploid

15X heterozygote

4X double haploid

14X 454 diploid

3 genomes

367 Mb Underwayheterozygous diploid

367 Mb Underway- haploid

2 genomes

71161 264

3864 252

15105 161

1276 501

261142 1170 14

9289 674

3026 77

244 533

1276 524

430683

207500 285

13

3

5

3

4

5

3

9

86

118365 16

449394 2642 18

1

4

1

2

Yes

Yes yes

12 18

1

6

3

1

6

9

14

14 34

2 20

40 135

5

7

5

7

37

13

Tree Fruit GDR: Translating Genomics into Advances in Horticulture community, to increase the value of what is already offered and well developed through GDR, to exploit tools already developed rather than recreating a parallel system, and to expand the tree fruit research community.

(iii) APPROACH

Figure 1: Overview of the objectives and outcomes of Tree Fruit GDR

OBJECTIVE 1

Collect, analyze, and integrate genomics and genetics data to facilitate gene discovery and marker–trait associations

14

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

Data Collection and Annotation:

To provide the central data retrieval resource for gene discovery and marker development we will collect, curate, analyze, and integrate genomics and genetics data including full genome sequences, predicted and verified genes, ESTs, proteins, genetic and physical maps, and the molecular and phenotypic diversity data of Rosaceae and Citrus (Table 1). The data collection will involve computational download, literature curation and direct submission from researchers. The methods we will employ for collection, curation, and computational analysis of each type of data are described below.

Large-scale genomic sequences and annotation: Whole genome sequences will provide sound understanding of the Rosaceae and Citrus genomes.

They will allow researchers to readily study the underlying sequences of specific loci of interest, develop genome-wide markers to detect polymorphisms, and genome-wide probes to study gene activity under various conditions. The verified markers will be utilized in breeding programs to develop improved cultivars. At the same time, the sheer amount and complexity of the whole genome sequences challenges us since the data cannot be converted into knowledge without proper analyses, integration with other data, and efficient interface to access the data.

To meet this challenge, we will integrate the whole genome sequences for several tree fruit and berry crops

(apple, citrus, peach, and strawberry), which will be available soon, in tfGDR (Table 1).

For Citrus, 1.2X whole genome sequence of Citrus sinensis (sweet orange) is available (Joint Genome Institute), up to 15x coverage of the sweet orange genome using 454 technology will be available in 2010 (PI is Fred Gmitter,

Co-PD here), and the International Citrus Genome Consortium (ICGC) will provide by 2010 an 8-10X coverage of Sanger sequence of the haploid genome of ‘Clementine’ mandarin (Talon and Gmitter, 2008).

In 2009/2010, the Rosaceae research community will have entire genome sequence available for apple

(Malus), peach (Prunus) and strawberry (Fragaria). The sequence data will be available for a heterozygous

‘Golden Delicious’ apple (15X of Sanger/454), a peach double haploid (8X of Sanger and 10Mb of Solexa), and a diploid strawberry (14X of 454). In addition, large genome sequences are available from several member species including the annotated genome sequence of 53 fosmids of diploid strawberry (40kb per fosmid) available through an NRI funded project (Award #2005-35300-15467). The large-scale annotated genome sequences and their annotations (genome sequences and name, coordinates, exon/intron positions, translated sequences of the predicted genes, and matching ESTs, markers and polymorphisms) will be downloaded from their respective sequencing center, uploaded to our Chado database, and made viewable using GBrowse (Stein et al. 2002).

Synteny and Orthology analysis: Additional annotation of whole genome sequences by tfGDR will include computational detection of paralogs, orthologs, and conserved syntenic regions.

Finding orthologs and conserved syntenic regions among Rosaceae species and Citrus species will boost our understanding of these genomes since the advances made in one species will be transferred to other species. Comparative genomics analyses with more distantly related species such as Arabidopsis will allow us to utilize the rich data and comparison with Populus will help us to understand the evolution of tree species. Genetic mapping study has shown almost complete synteny among Prunus and genomes of different genera in

Rosaceae have also shown remarkable conservation of synteny (Dirlewanger et al. 2004; Shulaev et al.

2008; Vilanova et al. 2008). We have also shown considerable micro-synteny between Rosaceae and

Populus (Jung et al. 2008). To detect and visualize these, we will utilize the open-source comparative genomics tool, Sybil (Crabtree et al. 2007). We will initially focus on finding orthologs and conserved syntenic regions using closely related genomes, such as the Rosaceae species (apple, peach, and strawberry), and within the Citrus species, and then expand the analysis to include other model plant genomes such as Arabidopsis, Populus and Medicago. Several methods and tools for detection and

15

Tree Fruit GDR: Translating Genomics into Advances in Horticulture visualization for conserved syntenic regions are available, such as SynView (Wang et al. 2006) and

SynBrowse (Brendel et al. 2007), but Sybil offers added features: algorithms for detecting orthologs and paralogs, flexible parameters for synteny analysis, graphical interface to Chado-based schema, and algorithms that can be encoded in a workflow or invoked on-demand. As software for detection and display of syntenic regions is being continually developed, we will explore all Chado compliant software of this type at the time of analysis.

Unigene sets and transferable e-Markers: For Rosaceae, five unigene builds for major genera, Malus,

Prunus, Fragaria, Rosa and Rubus, and the entire family Rosaceae is available. As the whole genome sequences are available, we will align these unigenes, along with other data such as matching EST-SSR markers, onto the whole genome sequences. We will continue to build new unigene sets for Rosaceae when substantial new EST data becomes publicly available. Over 400,000 ESTs are available for Citrus

(Table 2), but no comprehensive analysis has been conducted. We will develop a unigene dataset for

Citrus using our EST annotation pipleline developed by the GDR team (Folta et al., 2005; Jung et al., 2008;

Lewers et al., 2008). Unigene sets will be annotated to provide putative function, SSRs, SNPs, and ontology terms. The unigenes for the species with whole genome sequences will be aligned as a track in

GBrowse. The unigene sets will be instrumental for researchers to further interrogate the structure and function of the predicted gene models. The unigenes of the species without whole genome sequences provide an important resource for gene discovery and marker development.

Single-copy conserved orthologs (COS) are being developed and tested for Rosaceae by Dr. Amy Iezzoni (tfGDR Advisory Board

Member) through a 2008 USDA NRI funded project. These will be integrated in tfGDR.

Loci data: We will curate data that are associated with genes (simple traits) and QTLs to provide resources to discover trait-controlling genes and develop higher quality markers for the traits. We have begun curation of Rosaceae traits, and GDR houses the description, screening methods and the map location data of 27 major simple trait loci mapped in various Prunus maps. We have characterized them by the top Trait

Ontology (TO) terms, as developed by GRAMENE (Jasiwal et al., 2006), so that users can easily query for all the loci with similar traits. We will provide a much more comprehensive trait database in tfGDR by 1) curating phenotypic data from QTL and other large-scale phenotyping projects (see objective 1b) as well as more simple traits loci; 2) curating the more extensive data such as allele/polymorphism, the dominant/recessive inheritance, germplasm, experimental and environmental variables, and any available photographs; 3) associating the phenotype data with various ontologies; 4) integrating the phenotype data with other genetic and genomics data such as gene product for the simple traits and nearby molecular markers for mapped traits (simple traits or QTL); and 5) include trait data for Citrus. Appendix 3 documents the summary of Rosaceae simple traits and QTLs with loci and marker information from over 110 publications. Curation of QTL data from publications will involve extracting all the available information, such as cross population, experimental design, analysis method, software, trait names, trait description,

QTL location, test statistics, QTL effects and candidate genes if available.

We will also download the gene and protein sequences from Rosaceae and Citrus from GenBank and

UniProtKB (Boutet et al., 2007), and these will be combined with the predicted gene models from the whole genome sequencing projects and the manually annotated genes that are associated with simple trait loci.

The gene and protein sequences will be aligned to the whole genome sequences and displayed as a track in GBrowse so that researchers can interrogate the genomic region around the trait loci of interest for further data such as gene models, associated EST/BAC/genomic clones, polymorphisms, and markers.

The phenotypes, the affected plant structure, and the environment/habitat and the experimental variables of simple traits, QTL, and other unmapped phenotypes will be associated with various ontology terms (See

16

Tree Fruit GDR: Translating Genomics into Advances in Horticulture below) in collaboration with the Rosaceae and Citrus communities, GRAMENE and the Plant Ontology

Consortium (See letter from Jaiswal). To expedite the curation process, we will use a tool such as Phenote

(http://www.phenote.org/). We will also provide password protected user-submission sites in tfGDR to encourage scientists to submit their published genes as in SGN. With the advent of whole genome sequences and large-scale phenotypic analyses, overwhelming numbers of genes are expected to be discovered, and this user-submitted gene annotation will greatly accelerate cataloging of new genes.

Genetic and physical map data: Genetic maps of tree fruit crops provide breeders with easier selection schemes using molecular markers rather than those based on the phenotypic traits. Physical maps are also being developed for apple, peach and Citrus and are being integrated with genetic maps. These resources accelerate the discovery of genes responsible for traits and the development of better markers for the QTL, which will result in increased effectiveness of marker assisted selection.

GDR currently provides access to 39 Rosaceae genetic maps through CMap, the comparative map viewer

(Ware et al., 2002). The detailed data of over 1300 genetic markers from 11 maps and molecular diversity studies are stored and available from the marker search page. We will transfer these GDR data to tfGDR, and continue to collect the genetic map data from Rosaceae and Citrus (Table 2). Specific fields for markers in GDR include: marker source descriptions, associated sequences including the primers and marker source clones, screening conditions, PCR parameters, repeat motifs, PCR product size, germplasm name, map position, marker type, source species, contact information, and references. GDR also contains information on germplasm source, marker polymorphism data, and details of genetic mapping molecular diversity studies. We will expand the germplasm section to include the pedigree and location data to house genotyping data from breeding programs (See below). Active curation will continue to add more data types, such as simply inherited traits and proteins associated with genetic maps, from the literature. We have developed standard data input files for genetic map and molecular diversity data that require minimum curation efforts to convert most researchers’ data files. We will convert these input files to work with the new tfGDR schema. We will also advance this to a web-based data-uploading system with secure password for curators and data-generating researchers. This system will be easily generated by the opensource content management platform called ‘Drupal’ (See below). The data uploaded by external curators and researchers will be examined by curators before being uploaded to the central database.

GDR contains the peach physical and transcriptome map data (Zhebentyayeva et al. 2008). The map is accessible via WebFPC, and the BAC clones and associated data such as BAC contig assembly and anchored genetic markers and ESTs, are provided through the corresponding search site. More physical map data will be available for other tree fruit crops in the near future (Table 1). The apple physical map

(Han et al. 2007), will be available to us when it is anchored to genetic maps (S. Korban, personal communication). There are also two physical maps under construction for Citrus, and the previous sweet orange physical map is being expanded and enhanced with new BAC libraries planned (Dvorak, personal communication). The Spanish Citrus Genomic Consortium is constructing a genetically anchored physical map from the BAC libraries of Clementine mandarin (presented at PAGXV by Terol et al., 2007), and the

SNP discovery is being performed from the BAC end sequences (presented at PAGXVII by Terol and

Talon, 2009). The other is from Satsuma mandarin, being built by the Citrus Genome Analysis Team from

Japan (presented at PAGXV by Shimizu et al., 2007). The peach physical map data will be transferred, and those of Citrus will be incorporated in to tfGDR and be available through GBrowse and CMap.

Large scale phenotypic and genotypic data: The recent availability of large scale genomic sequences and the development in biotechnology has given researchers the opportunity to perform genome-wide scan on

17

Tree Fruit GDR: Translating Genomics into Advances in Horticulture a wide range of germplasm in an effort to identify diverse and useful sources for biotic and abiotic resistance and desirable crop traits. Researchers are also phenotyping large sets of germplasm for important breeding traits. Integration of these data into a centralized database is extremely important since the genotyping and phenotyping data can be statistically analyzed for marker-trait associations, providing direct resources for crop improvements. A wealth of phenotypic and genotypic data has been and is being generated for Rosaceae from the CSREES NRI 2005 and 2008 projects and other international efforts such as the HiDRAS and EUROBERRY projects (see letters from Gianfranceschi, and Denoyes). The proposed

RosBREED SCRI project will conduct genome-wide genotyping and phenotyping for fruit quality traits with

Crop Reference (CR) Sets of ~ 400 individuals for apple, peach, strawberry, and sweet and sour cherry combined. The molecular diversity data has already been collected in GDR from nine projects with 20 species from Malus, Pyrus and Prunus (using 323 markers and 1088 germplasm). The marker polymorphism data pages contain polymorphism description, project description and detailed marker data including map positions.

For tfGDR, we will expand the effort to curate the phenotype and the genotype of the fruit tree germplasm with all the important associated data such as collection method, marker detail, pedigree, environment and locality data if available.

The phenotype and environment data will be associated with ontology such as PO, TO and EO for efficient data sharing and transfer (See below). The data will also be integrated with the existing trait, marker, gene, and sequence data in tfGDR. Currently,

SSR genotype data for germplasm held in the USDA National Clonal Germplasm Repository for

Citrus/Dates has been produced (Barkley et al., 2006). In general, however, genotypic and phenotypic data of Citrus is generally held by the groups generating the data, which illustrates the critical need for and the value that can be derived from inclusion of Citrus in tfGDR, to enhance community collaboration.

Biochemical pathway data: We propose to catalog all known and putative genes involved in the pathways of interest to breeders such as those that underlie quality traits (fruit appearance, flavor, texture, and postharvest life) and production attributes (yield, disease resistance, production patterns, fruit size, ease of harvest), and their integration with maps visualizing biochemical pathways.

The MetaCyc framework as a model for the RosaCyc and RutaCyc databases: PlantCyc (Zhang et al.,

2005), an offshoot of MetaCyc Caspi et al., 2998), is a database representing knowledge about biochemical pathways in over 290 different organisms (http://plantcyc.org/). It is used to predict biochemical pathways of a given organism (using the PathoLogic software) based on the annotation of its genome. The Pathway

Tools software developed by SRI International allows users to create new Pathway/Genome Databases

(PGDBs), in which genes can be associated with biochemical pathways. In collaboration with the NSFfunded PlantCyc initiative (see letter of collaboration from Dr. Sue Rhee), family-specific databases will be developed for Rosaceae (RosaCyc) and Rutaceae (RutaCyc) with pathway visualization capabilities

(Rutaceae is the family that Citrus belongs to). Using the available annotated EST and gene sequence information for Rosaceae and Rutaceae, the Pathway Tools software will be employed to generate the

RosaCyc and RutaCyc databases, which will be made accessible to users via the tfGDR website. Users will be able to query and browse these databases. Hyperlinks will associate elements in pathway schemes

(genes, enzymes or metabolites) with various kinds of relevant information (e.g., gene function, reactions, metabolite ID, and links to other databases). We will evaluate and transfer appropriate biochemical pathways from the best annotated plant genome database (TAIR for Arabidopsis) to Rosaceae and

Rutaceae. AraCyc, which is the database for metabolic pathway-based gene annotation, contains annotation and maps covering more than 332 pathways (containing roughly 2,000 genes) in Arabidopsis.

These Arabidopsis pathways will be reviewed for their presence/absence in the available Rosaceae and

Rutaceae ESTs and manually curated by biochemical pathway annotators at BTI. Curators will perform multiple levels of annotation for each biochemical pathway: (1) quality assessment of the gene models by

18

Tree Fruit GDR: Translating Genomics into Advances in Horticulture comparison with full-length cDNAs of putative orthologs from other species, (2) assignment of gene function through evaluation of multiple data types such as sequence alignments, domains, motifs, GO ontologies and transcript profiles, (3) curation of complete biochemical pathways, and (4) further curation by specialists in particular areas of plant metabolism. To increase efficacy we are planning to involve the

Rosaceae and Rutaceae research communities in our annotation efforts. We will follow the PlantCyc model with submission of information in excel sheets. Electronic newsletters, scientific meetings and workshops at the annual Plant and Animal Genome Conference will be used to engage the scientific community in this endeavor.

Develop maps for new pathways in Rosaceae and Rutaceae: Pathways occurring in Rosaceae and

Rutaceae for which no public maps exist will be curated into RosaCyc and RutaCyc de novo from the literature. With the help of the MetaCyc curators at SGN these maps will be integrated into the MetaCyc database for across-species comparisons. It is expected that approximately 60 pathways can be annotated in RosaCyc and RutaCyc based on high sequence homology of the associated genes in Rosaceae and

Rutaceae to their putative orthologs in Arabidopsis (B.M. Lange and M. Ghassemian, unpublished results).

For another 60 pathways represented in AraCyc, manual curation will play an important role in generating

RosaCyc and RutaCyc, whereas sequence similarity appears to be insufficient to build RosaCyc and

RutaCyc using the remaining 80 AraCyc pathways. In addition, we will add and annotate 30 pathways that are not represented in MetaCyc (novel pathways) or are substantially different in the Rosaceae and

Rutaceae (new pathway variants). Among these pathways are those responsible for the metabolism of monoterpenes, sesquiterpenes, diterpenes, limonoids, flavonoids, green leaf volatiles, aliphatic and aromatic esters, coumarins, furanones, and aliphatic acids. The maps and the annotation of genes in the represented pathways will be evaluated by external curators

Ontology associations: Description of data such as gene function, phenotype of plants, and experimental and environmental conditions using ontology is extremely important since it allows users to browse and search with much ease than when the data is described as free text. The use of ontology also ensures data generated by numerous researchers and institutions will be described consistently and accurately so that the data are shared and utilized efficiently for other projects. We will utilize the developing ontologies such as GO, TO, PO and EO to describe the Rosaceae and Citrus data.

Genes and EST unigenes: We will assign genes and EST unigenes to Gene Ontology terms (Gene

Ontology Consortium, 2008). The annotation will be done by converting the InterPro qualifiers, SwissProt keywords and Enzyme Commission (EC) numbers, associated with the Rosaceae proteins, to GO accession numbers by using the interpro2go, ec2go and spkw2go mappings files available from the Gene

Ontology website (www.geneontology.org). EST unigenes will be first mapped to SwissProt proteins by sequence similarity, and then mapped to GO terms. The GO term annotation data will be submitted to the

GO website. The EST data sets in GDR are already annotated with GO terms by reference to their homology with known proteins and the GO Terms that are associated with them. The GO terms can be browsed through our web site. We will develop the local GO database, regularly downloaded from the GO website. Through the ontology search site, users will be able to search and browse the GO terms to get the associated EST unigenes and genes. The EST unigenes and gene search sites will also have a GO term search category.

Phenotype data of simple trait loci and QTL: The importance of data standardization in knowledge sharing and transfer has brought researchers and breeders together to propose some plant specific standards for phenotypic data collection (Biodiversity, ECP/GR/European Cooperative Program for Plant Genetic

19

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

Resources). Similarly, international efforts through the IUCN-Conservation Commons as well as the Plant

Ontology Consortium are working toward developing standard vocabularies and field structure to describe habitat and environmental conditions. The Rosaceae community have formed a Germplasm and

Phenotyping Committee (facilitated through GDR and headed by Gayle Volk and co_PD Peace) to define a list and format of phenotype descriptors for use in breeding and germplasm evaluations that are applicable across breeding programs, genera, subfamilies and the family of Rosaceae. Following the 3rd

International Rosaceae Genomics Conference, the Rosaceae community produced recommendations for standard QTL nomenclature and reporting in the family. We will form tree fruit ontology working groups with the tree fruit community members who have been involved with the effort to standardize the phenotype/QTL descriptions, tfGDR curators, and participants in the development of Plant Ontology (PO),

Trait Ontology (TO) and Environment Ontology (EO) (Gramene). The working group will develop ontology terms for Rosaceae and Citrus using the already existing PO, TO and EO in the first year. TfGDR curators will propose any additional terms to the working group every four months after focused curation for each type of data (simple trait, QTL, and other phenotype).

Efficient and reusable database construction

We will use open-source software extensively to efficiently build our web database that will accommodate new data types and interfaces for tfGDR. New data types include whole genome sequences and annotation, large-scale phenotypic data, germplasm data with pedigree information, and breeding data.

Where possible these data will be stored in tables developed by the open-source Chado schema. Chado is a Model Organism Database (MOD) schema developed to reduce redundant software development and promote interoperability (Zhou et al., 2006; Mungall et al., 2007). Chado is flexible due to its ontology driven design and modularity. Chado contains many modules that are suitable for managing the new datatypes for tfGDR. The natural diversity module is a new module that is being developed based on the Genomic

Diversity and Phenotype Data Model (http://www.maizegenetics.net/gdpdm/). We will adopt this module and collaborate with the developer D. Clements on further improvements as necessary. This module, together with other modules, such as phenotype, phylogeny, organism, and stock, will be used to store data from large-scale phenotyping and genotyping projects and breeding experiments. Such data will include the phenotype and genotype data collected in multiple locations at different time points from each individual from a pedigree population as well as simple traits and their alleles without multiple location/time data.

The web interface for tfGDR will be developed using the content management framework, Drupal. There are major challenges associated with managing large Model Organism Databases (MODs) including maintaining a large amount of static content, providing intuitive web based interfaces to data, and implementing community tools such as forums or mailing lists. As a result, successful implementation of a

MOD, particularly a multi-species MOD such as GDR, becomes expensive given the time and skill sets involved. The utilization of Drupal, a highly configurable, customizable content management system with built-in security features that allows easy creation of log-in permissions and many community-building modules, including web based forums, blogs, newsletters and collaborative authoring environment that is easier than wiki, will significantly reduce time and cost and to allow us to focus more on our scientific delivery objectives.

We already have experience using Chado and Drupal in database development. The Chado schema and

Drupal were used to build the Marine Genomics Project (http://www.marinegenomics.org/) at Clemson and the Cacao Genome Database (http://www.cacaogenomedb.org/) now being built at WSU. Using Chado and

Drupal has significantly expedited the development time for these new projects. Our experience and any

20

Tree Fruit GDR: Translating Genomics into Advances in Horticulture software in building tfGDR using Chado and Drupal will be freely available to bioinformatics community and it is expected to be great resource in building other model organism databases.

Integrated User Interface

From the tfGDR home page, users will be able to choose among the two sites; Rosaceae or Citrus. Each site will have the same header, search/download pages, graphic interface and analysis tools but with the data from the corresponding tree fruit families. The whole genome sequences and the annotation (gene model and protein) of apple, peach, strawberry, and Citrus will be viewable from the graphical viewer,

GBrowse. Any matching EST unigenes, markers, simple trait loci (gene), QTLs, BAC/fosmid clones, sequence polymorphisms and orthologs in other related or model species will be displayed along with the genomic sequences in GBrowse as separate tracks. The physical maps, with BACs, BAC contigs, anchored markers and sequences, will also be viewable through GBrowse. The conserved syntenic regions between species, such as peach and apple or Citrus and poplar, will be viewable through a graphic interface, Sybil. Users will be able to view and compare the genetic maps using CMap graphic interface.

The unigene, markers, genes (both gene model and simple trait loci), QTLs, BAC/fosmid clones and polymorphisms shown in GBrowse, Sybil and/or CMap will have a direct link to detailed data page. Users will also be able to search/download these data from individual search/download sites – gene, marker, clone (BAC or fosmid), phenotype and polymorphism. The search sites will include both simple and complex search options as in the current GDR search sites. The gene data page will have passwordprotected functionality for users to edit the data. We will develop the ontology search/browse page using the one in SGN as a model (SGN PD Mueller is a co-PD). The users will be able to search/browse controlled vocabularies. This page will be another tool for researchers to search for any ontology associated tfGDR data, such as ESTs, proteins, traits, QTLs, phenotypes, and allele/polymorphisms. The page will show controlled vocabularies and their relationships. When a term is selected, a page will provide data objects that have been associated with the terms.

Example Usage

1. Gene discovery that underlies important traits

A researcher has a peach accession with a simple-inherited resistance trait to A and she wants to find out what the underlying gene is. To see if the accession has an allele of a new gene or it represents a new gene, she searches for “resistance to A” in the phenotype search site. She finds a couple of accessions that are associated with the resistance to A and finds out through the linked germplasm and polymorphism pages that some have an allele ra1 and others have rb1. She can see further that there is also an rc1 allele but the accession has been lost. The researcher performs an allelism test and finds out that her plant is not allelic to ra1 or rb1. The researcher decides to genetically map the new phenotype to find out if it is an allele of rc1. From the marker search site, she downloads all the primer information of the SSR markers of the Prunus bin map. Using the markers, she maps the trait to the bin resolution. From the CMap viewer, she finds out that the bin where the trait is mapped also contains rc1, suggesting that her plant is allelic to rc1. To find closer markers, she searches for maps that contain rc1 in the CMap viewer. She finds a peach map that contains rc1 and can narrow down the rc1 between the two markers that are about 5 cM apart.

She goes to GBrowse to look at the peach genomic region between the two peach markers. From

GBrowse she can view the predicted genes, anchored markers and BACs, and polymorphism, allowing her to select the best markers to continue to perform fine mapping and gene discovery. When she finds the gene, she can register the gene in the gene page and continue to edit the gene data as a domain expert.

This was based on similar example presented in a MaizeGDB paper (Lawrence et al. 2008).

21

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

2. Search and Download markers for mapping and/or large-scale genotyping analyses

A researcher decides to generate a linkage map of pear and study the syntenic relationship with apple. He goes to the marker search site and query for SSR markers that are mapped in apple maps. He downloads the primers and other information to screen his pear mapping population. He uploads his new pear map data into CMap through secure site and compares his new map with other apple maps before releasing his data to public. He can let tfGDR know when he wants to release his data to the public site. When a researcher wants to find marker sets for large-scale genotyping projects, he could go to the marker search site and select the markers with the species of interest and marker type of interest. He could also limit the search results by selecting only the markers that has been found to be polymorphic in certain species or germplasm.

3. Enhance the markers for QTL

A researcher is interested in developing high quality markers for a fruit quality QTL of Strawberry, called

Ab. He searches trait/QTL search site for the fruit quality QTL Ab and finds two RFLP flanking markers. He wants to narrow the marker-QTL distance and develop PCR-based markers so that they can be used easily in marker-assisted breeding. He can follow the hyperlink from the marker site to GBrowse to examine the genomic region for any ESTs, predicted genes, and any PCR-based markers. From the GBrowse, he finds that some of the predicted genes in the region have orthologs in peach and apple. He follows the hyperlink to Sybil synteny viewer and finds that the region is syntenic among three Rosaceae species: strawberry, apple and peach. He then finds some SNP and SSR markers in the syntenic peach region. He can continue to test these SNP and SSR markers to see if they are transferable to Strawberry by using the marker information in marker page.

Expected outcomes and utility

1.

Ready view of genome sequences with associated sequence features, enabling further annotation.

2.

Genome comparison (synteny, orthologs, etc) to transfer knowledge among related species.

3.

EST unigene data (and associated sequence polymorphisms) that are valuable for developing markers for genetic diversity studies, linkage mapping, gene prediction and microarray development, leading to identification of key genes involved in crop quality and productivity.

4.

In silico developed eMarkers, such as SSR and SNPS to be tested and used in MAS.

5.

Comparative mapping facilitating utilization of markers among related species.

6.

The availability of allelic diversity data in breeding populations and germplasm collections that is crucial in the selection of the optimum samples that represent the overall genetic diversity for germplasm of interest.

7.

Inclusion of phenotypic data with the environmental/locality data in a publicly available database to increase utilization of these data as phenotypic data is expensive to collect and often dependent upon environmental conditions.

8.

Integrated phenotypic and genotypic data with exponentially multiplied value as it is placed into an information context where it now relates to other metrics of plant husbandry and cultivation.

9.

Development of the RosaCyc and RutaCyc pathway databases.

10.

Use of RosaCyc and RutaCyc to identify genes involved in pathways of interest to breeders.

11.

Visual representations of pathways to help biologists to understand the complex relationships between components of metabolic networks, and provide a valuable resource for the integration of transcriptomics, proteomics, and metabolomics data sets.

22

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

12.

Characterization of numerous pathways prevalent in the Rosaceae and Rutaceae that have not been characterized in Arabidopsis and rice: Likely examples are genes involved in the biosynthesis of aroma metabolites.

13.

Collaboration among biochemists, molecular biologists, breeders, and bioinformaticists: This information further drives planning of future research projects for molecular, evolutionary, and computational biologists, and breeders. These disciplines seek to understand the underlying biology of diverse form and function in plants, to discover genes for important traits, and devise effective breeding schemes to generate new cultivars.

14.

The first database schema that can integrate genomics, genetics, and phenotyping/genotyping data from various breeding programs will be implemented using the open-source database schema Chado and Drupal.

15.

This integrative data will accelerate the collection, curation, and integration of the crucial data for improvement of tree fruit crops of Rosaceae and Citrus.

16.

A collection of open source software tools for creating and managing biological databases using Chado and Drupal will be available through GMOD Generic Model Organism Database) project, enabling the re-use of any software by other community specialty crop databases.

17.

A comprehensive yet expandable database schema allows ready integration of new data types as understanding of the biology of crops and technology advances.

18.

Facilitate collaboration between researchers in Rosaceae, Solananceae and Citrus

OBJECTIVE 2

Provide data-mining resources of genomic, genetic, breeding and cultivar performance data for breeders and growers to expedite development and adoption of new and existing cultivars

Breeder’s Gateway: While the marker, trait, polymorphism, gene, and germplasm search sites described for Obj. 1 will be useful to interrogate the gene structure/function and genetic variability, and developing new markers utilizing the variants, breeders require a more combinatorial search interface to jointly analyze genotypic, phenotypic, and pedigree data for germplasm of interest within their programs and utilize directly in QTL detection and characterization. To meet this need, we will develop a gateway where breeders can interrogate tfGDR using combinatorial search categories such as germplasm (pedigree), trait, molecular marker, and genetic map. The result page will contain the pedigree, functional marker alleles, and phenotype value information for the selected germplasm, traits, and markers. The result will also be downloadable in text file, Excel spreadsheet, and other formats suitable to be analyzed by breeding software such as Pedigree-Based Analysis’s FlexQTL™ and PediMap. This feature will expedite the utilization of the tfGDR data in QTL identification, characterization, and breeding application.

Knowledge output of tfGDR analyses will have direct hyperlinks so that researchers can further interrogate tfGDR to identify potential new marker-trait associations, conduct localized genome analysis (what else maps to a region, what genes are there, what sequences can be translated into new markers) for marker improvement, identify potential cross-species markers, and discover new useful germplasm and phenotypes.

We will also develop a secure site where breeders can upload their data (pedigree, trait, and marker) into a transitional database and analyze them in combination with other public tfGDR data through an advanced

23

Tree Fruit GDR: Translating Genomics into Advances in Horticulture search interface. This will allow breeders to manage their data prior to any public release, or analyze and interpret their private data using tfGDR tools for personal use.

Grower’s Gateway: Few websites exist that provide access to searchable databases for cultivar performance and cultural and pest management recommendations for specialty crops where the target audience is the commercial grower. One such site does exist for peach

(http://www.clemson.edu/hort/Peach/index.php?p=72), created at Clemson University in 2000 by co-PD

Layne. For nine years, this site has been a very effective “gateway” for peach growers primarily in the southeastern U.S.A but also increasingly from growers in primary peach growing areas of other states

(such California, Georgia, and New Jersey), and Canada, Uruguay, Chile, Italy, and China. The site is used primarily by growers, county extension agents, state specialists, breeders, researchers, and nurserymen but is also accessed by small and hobby farmers and homeowners. Current offerings at the site include contacts for commercial nurseries, cultural management tips, general peach interest information, research and extension publications, plant protection information, a peach grower’s handbook, and a regional newsletter. The most comprehensive part of the site is the searchable database for “variety evaluations”.

The publicly accessible database includes more than three hundred cultivars and advanced selections that have been rated for performance in replicated trials at four SC locations from the year 2000 through 2008.

Cultivar performance does vary by environmental conditions (location and year) and the database captures these fluctuations within single genotypes. The on-line database provides evaluation data including a full cultivar description, chill hour rating, fruit set, size, shape, firmness, SSC content, and digital photos at harvest to a standard background and scale. The database has a search feature allowing simple or complex searches. Another feature was added to allow user-defined, side-by-side cultivar comparisons on the computer screen. Based on these long-term, on-farm grower trials and collaboration with the peach breeder for the USDA-ARS in Byron, GA (Dr. W.R. Okie), four new outstanding peach cultivars were recently released. They have been well received and are being grown by several major producers in the southeastern U.S.A.

For the past two years, the Washington Tree Fruit Research Commission has funded a project to create an online database of apple and cherry performance data from Washington breeders and producers.

Numerous rootstock and variety combinations have been and continue to be trialed in various parts of the state, capturing variations in fruit quality and other high priority traits that are known to be affected by environment, production practices, and genetics. This project organizes the wealth of data previously collected. The Washington apple and cherry database is expected to go online in the fall of 2009.

The existing GDR database and website will be used as a template to build a Grower’s Gateway for Tree

Fruit species. We will collect data from the literature and other projects such as the peach grower’s site at

Clemson, the cherry and apple database in Washington State, Citrus field trials, physiology, pathology, and entomology research projects focusing on one or more cultivars, and information directly from growers where possible. The information gathered, with analytical tools to weigh and summarize data across trials, will assist growers and members of other tree fruit industry sectors to make more informed decisions on what to plant and how to grow it and handle its products to attain and maintain maximum genetic potential.

Integration of these data into the proposed tfGDR database would link phenotype and performance data for named cultivars and advanced selections from breeding populations with the vast genomics and genetics data and the phenotype data that already exists and to be collected for Obj. 1. The integration of the database of grower’s information with that of breeders, genomicists, and geneticists will facilitate collaboration in addition to basic and applied research discoveries. For example, with greater interaction

24

Tree Fruit GDR: Translating Genomics into Advances in Horticulture and speaking a common data language, researchers can discern growers’ perceptions and values to prioritize breeding targets in their own endeavors to improve fruit tree cultivars for industry use. A platform

(or at least a database format) for breeders to release information on advanced selection performance to industry (and in objective comparison to existing cultivars) is expected to speed new cultivar adoption by providing growers with the information they seek to make planting decisions with greater confidence.

Expected outcomes and utility:

1.

Intuitive and comprehensive interface for breeders to utilize genomics and genetics data in the breeding program for enhancing breeding operations and more efficiently achieving breeding goals

2.

Secure site where breeders can upload and analyze their private data in comparison with other tfGDR data

3.

Web resources to boost interaction between breeders, industry sectors, and supporting researchers

4.

Comprehensive interface for growers and other industry members to view and search cultivar performance data

5.

Platform for industry to evaluate performance of new cultivars for objective decision-making on adoption

OBJECTIVE 3

Extend community outreach in genomics, genomics-assisted breeding and crop management

tfGDR training workshops at national and international conferences: We will hold tfGDR training workshops at the annual Plant and Animal Genome Conference beginning in 2011: tfGDR will apply yearly to the PAG computer demonstration session to hold a training session on GDR use and solicit feedback from the attendees building upon successful GDR demonstrations. We will advertise the workshop via the tfGDR mailing lists, newsletters, and Plant Workshop Sessions, to ensure maximum community participation.

Growers, research scientists and extension specialists knowledge sharing jamborees: We propose to offer onsite and online workshops aimed at specific target audiences including research scientists, extension professionals, and growers/producers. These workshops will be presented by growers, traditional plant breeders, geneticists, genomicists, bioinformaticists and molecular biologists. The underlying philosophy of the workshop is to bring all the participating members to a common platform enabling effective communication between lab-based scientists, field scientists and producers/growers, thereby approaching the issue of crop improvement collectively. This will help bridge the divide between problems that growers may have in the field and the solutions that lab-based scientists can provide. In addition, these workshops will serve as a conduit for educating the different functional groups about each others’ activities

(i.e., plant breeders, functional geneticists, genomicists, and bioinformaticists) and will include explicit training on how to use tfGDR. This activity will help in fostering relationships between groups that have traditionally worked in isolation. Video presentations will be developed on each aspect and made available over the tfGDR website via a section titled “Grower’s Corner”. The following is a list of areas that each functional group will address during the workshops.

1.

Grower/Producer: They will help design this workshop at all stages. They will define the existing issues/problems in the field and what needs to be done. Identification of core problem areas will thus be facilitated.

25

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

2.

Breeder: Explain how breeders have traditionally addressed industry priorities and how much time and labor goes into such an endeavor.

3.

Geneticist: Explain marker assisted selection (MAS) and how it can help a conventional breeder address the problems described by a grower in a much more directed and efficient manner.

4.

Genomicist: Explain the concept of genomics and how the activities like genomic sequencing, bioinformatics and marker identification can help geneticists and breeders with MAS. These efforts directly help breeders and ultimately growers as well.

5.

Molecular Biologist: Explain the extent and scope of their activities that can aid a genomicist in associating genomic information with function or physiology.

6.

Bioinformaticist: Explain bioinformatics and it’s role in crop improvement, and introduction to tfGDR

The jamboree is planned as one and a half day event. The first day will be devoted to presentations, followed by a reception dinner in the evening to enable informal discussions. The morning of the second day will be used for discussion and feedback from the participants who will be asked by the instructors to assess the course and provide action items for improvement. The group will work closely with extension specialists Desmond Layne (Clemson University) and Mercy Olmstead (University of Florida, July 2009) and WSU collaborator Gwen-Alyn Hoheisel in all aspects of its planning and execution. Gary Brown and

Theron DesRosier from the WSU Center for Teaching Learning and Technology (CTLT; https://my.wsu.edu/portal/page?_pageid=177,1&_dad=portal&_schema=PORTAL) will participate in roundtables and workshops to develop appropriate tools for evaluating the outcome of this project pre- and post-implementation. Through these assessment tools we will document the degree of knowledge gained from the face-to-face meetings, instructional materials and other online resources. An overall report will be published in a peer-reviewed journal (e.g., Journal of Education or Extension). Finally, from these workshops, modules will be developed online that incorporate video and PowerPoint presentations using

Adobe Acrobat Connect Professional (e.g., http://breeze.ucdavis.edu/p35493900/).

Additionally, it is very important that breeders, stakeholders and the general public are trained about the usefulness of tfGDR. We will address this final portion of the information conduit by establishing a website incorporating a public aspect, highlighting the utility of tfGDR in successful research stories that impact consumers, explaining the terminology at an appropriate level and including tfGDR training from the workshop. These extension and outreach will greatly enhance research and extension programs of horticultural specialty crops. Utilizing tfGDR in the process of breeding superior plant selections will distinguish the United States amongst others in the world market for horticultural specialty crops. In addition, in association with the continued growth of horticultural crop production, it will lead to more jobs with higher incomes, which in turn will create economic development and prosperity, enhancing the

quality of life in rural areas.

Expected Outcomes and Utility:

1.

Feedback solicited from tfGDR training workshops will be used to target future community needs.

2.

Increased awareness of tfGDR within the research and extension professional community through yearly tfGDR training workshops will be held at the PAG meeting.

3.

Establishment of effective conduit of information between researchers, extension professionals and stakeholders.

4.

Evaluation of extension and outreach efforts will be documented to assess effectiveness of communication between research, extension professionals, and stakeholder groups.

26

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

5.

Examples of tfGDR utilization in improving specialty crop breeding efforts and ultimately food quality will build support amongst stakeholders and the general public.

DETAILED OUTREACH PLAN

Date Activity

Jan. 2010 Initial workshop at Plant and Animal Genome Annual Meeting (San Diego, CA) for community of plant breeders and research scientists to provide feedback on inclusion of tfGDR in breeding, genetics and genomic programs.

June 2010 Development of material on genetics, genomics and bioinformatics for inclusion in the

Plant Breeding and Genomics Community of Practice for eXtension in concert with other

Coordinated Agriculture Projects (ongoing throughout project).

Nov. 2010 Jamboree for extension professionals, research scientists and stakeholders on genetics, genomics and breeding to be held in Florida. Extension and outreach leader will work

Dec. 2010 with local programs to coordinate workshop planning. Program assessment will include facilitated discussions and implementation of survey instruments.

Genetics, Genomics, and Breeding Workshop (explanation of translational genomics) in cooperation with the Washington State Horticultural Association in Yakima, Washington.

Program assessment will include facilitated discussions and implementation of survey instruments.

Jan. 2011

Jan. 2011

Jamboree for extension professionals, research scientists and stakeholders on genetics, genomics and breeding to be held in Washington State. Extension and outreach leader will work with local programs to coordinate workshop planning.

“Train the Trainer” Workshop at Plant and Animal Genome Annual Meeting (San Diego,

CA) for community of plant breeders and research scientists.

Publication of developed material for the GDR website (http://www.bioinfo.wsu.edu/gdr/) that explains expansion and benefits of database.

June 2011 Publication of multimedia presentations on the public benefits afforded by genetics, genomics, and breeding in cooperation with the Plant Breeding and Genomics CoP on eXtension.

July 2011 Jamboree for extension professionals, research scientists and stakeholders on genetics, genomics and breeding to be held in South Carolina. Extension and outreach leader will work with local programs to coordinate workshop planning. Program assessment will include facilitated discussions and implementation of survey instruments.

Jan. 2012 “Train the Trainer” Workshop at Plant and Animal Genome Annual Meeting (San Diego,

CA) for community of plant breeders and research scientists.

Aug. 2012 Presentation at the Annual Meeting of the American Society for Horticultural Science

(Miami, FL) on use and implementation of tfGDR and benefits to research and extension community.

Jan. 2013

July 2013

Final presentation at the Plant and Animal Genome Annual Meeting on use and implementation of tfGDR and benefits to research and extension community, including reports on evaluation of program effectiveness.

Final presentation at the Annual Meeting of the American Society for Horticultural

Science (Palm Desert, CA) on use and implementation of tfGDR and benefits to research and extension community, including reports on evaluation of program effectiveness.

27

Tree Fruit GDR: Translating Genomics into Advances in Horticulture

OVERALL PROJECT ASSESSMENT

We will utilize various avenues to assess the impact of the proposed project. Our website will include a form for users’ to submit feedback at anytime. We will also have surveys to test the usability and solicit any new ideas. The survey will be performed quarterly and the results will be discussed in Advisory Panel meeting. Dr. Brown and his team will be responsible for the development of the survey and the analyses of the results. They will also develop tools to assess the impact of workshops and the overall project. We will also have forums for users to discuss various subjects such as genome annotation, comparative mapping, fruit quality traits/QTLs, breeding and fruit tree growing, etc. We will also have email lists for rapid communication and also have quarterly newsletters to notify users of any new developments in tfGDR and the Rosaceae and Citrus communities. We will provide quarterly updates of our progress to the Advisory

Panel and have an annual Advisory Panel meeting where will discuss progress and challenges with the project. Finally, we will use Google Analytics to generate monthly reports on database use.

POTENTIAL PITFALLS OF THE OVERALL PROJECT

Potential pitfalls for objective 1 include, but are not limited to, data incompatibilities, data availability and data reliability. The data incompatibility will be minimized and overcome by using the modular and ontologydriven database schema Chado. The reliability will be minimized since the curation is done by the data generator or expert and re-examined by the tfGDR team.

In Objectives 2 and 3, pitfalls include using technologies that do not align properly with the use cases, availability of developer resources, underestimation of development time, and availability of computer resources. We will overcome this by building extensive Use Cases and by testing with sample data. We also reduce the time and cost in building the infrastructure by using open-source and also constructing community-driven interfaces.

Finally, for Objective 4, with large outreach programs, pitfalls will include problems with logistics and failure to identify the correct audience to have the greatest impact and how best to teach the material. This will be overcome by extensive planning and dialogue with the potential audience for the outreach program in the development stage.

SUSTAINABILITY PLAN

As the major repository for Rosaceae and Citrus Genomics, Genetics and Breeding data we will run full tape backups and incremental nightly backups of the data and the code in our subversion repository. As a community we must address the long term sustainability of database resources. To this end, we will be implementing an infrastructure to more easily facilitate transfer into a larger plant repository or for a collaborator to assume management should it be necessary.

28

Download