PROJECT SUMMARY TRPGR: Improving the Annotations of Plant Proteins and ESTs via Protein Domain Discovery and Phylogenomics Shin-Han Shiu, Dept. of Plant Biology, Michigan State University The overall goal of this project is to improve annotations of plant protein sequences and ESTs by: (1) identifying additional conserved protein domains in these sequences and (2) transferring functional information from genes of model species such as Arabidopsis thaliana to plant protein and EST sequences. Currently, conserved regions representing domains in plant proteins are not well studied. As a result, known protein domains in various databases can only provide a fragmentary description of plant protein space. In addition, knowledge of gene functions is most concentrated on model species. With the increasing number of plant genomes and over 9 million plant ESTs available, the challenge to determine the functions of these plant sequences can be met by first generating hypotheses on gene functions based on phylogenomic approaches, that is, inference of functions based on evolutionary relationships between sequences. The proposed studies have four major aims: (1) identify protein domains based on conservation among plant sequences, (2) classify plant proteins and ESTs into domain families, (3) infer plant protein and EST functions with phylogenomics, and (4) construct the Domain Database of Plant Proteins (DoPP) for broader dissemination of research data. Because we will identify a comprehensive set of plant protein domains allowing description of regions that are not adequately covered by current domain databases, our proposed research will lead to a significant improvement in the annotation of plant sequences. In addition, we will transfer functional information from model eukaryotes to genes from plant species important for agricultural, ecological, and/or evolutionary applications. The results of the proposed studies will also be of great value for advancing the fields of comparative and evolutionary genomics. In particular, having a dataset of plant protein domains will allow us to better investigate the rates of domain sequence evolution and the frequency of domain shuffling. The functional annotations will also provide a dataset for assessing functional conservation and divergence among plant duplicate genes. The creation of the DoPP database will greatly benefit the plant research community by providing domain and functional annotation information for their sequences of interest. Moreover, the project will foster an interdisciplinary training environment for students with different backgrounds. High school, undergraduate, graduate students, and postdoctoral researchers with biological science, mathematics, statistics, and computer science will be recruited to work on sub-projects of varying complexity. Their interactions will not only provide high school and undergraduate students with a realistic view of how science is done but also allow graduate students and postdoctoral researchers to learn how to be a mentor. To broaden dissemination of understanding of science and technology outside the campus, the PI has formed a partnership with the East Lansing Public Library to develop outreach activities aiming to enhance the general public’s understanding of science, evolution, and genomics using the proposed research project as an example.