ELLIGIBILITY STATEMENT The PI Shin-Han Shiu is an Assistant Professor in the Department of Plant Biology started Jan 1st, 2006. The proposed studies are planned to be the major components of an NSF grant proposal planned for the Genes and Genome Systems program in late 2007 or early 2008. Part of the content has been submitted as part of a NSF proposal earlier this year. Although the panel stated enthusiasm and rated the proposal in the highly meritorious category, it is not funded with some specific suggestions. We have developed the IRGP proposal to obtain additional preliminary data for next round of funding application. Since the New Faculty Grants Program has the stated goal in supporting proposals with a potential for developing into externally funded programs, our proposed objectives are consistent with the intention of IRGP. PROJECT STATEMENT Our goals are to use bioinformatic approaches to identify novel small protein genes in Arabidopsis thaliana and to experimentally verify 10 top candidates. The proposed studies will generate a computational tool for novel gene prediction that demonstrated to work by supporting experimental evidence. ABSTRACT The overall goal of the proposed study is to uncover small novel coding sequences between 90-300bp (mini-ORFs) in the Arabidopsis genome. Advances in whole genome expression profiling have led to the discovery that thousands of genes are not annotated in both human and Arabidopsis by most current gene finders. This is due to the conservative nature of gene finders that integrate multiple properties of known genes. In addition, the question of whether these novel genes code for proteins remains unresolved, because they are verified by evidence of transcription but not protein production. Small protein coding genes, several of which are known to be important for plant development and stress responses, are particularly difficult to uncover due to their lower levels of expression and the lack of statistical power during the prediction phase. The objectives of this research plan are to (1) detect mini-ORFs in the intergenic regions of the Arabidopsis genome by two computational approaches: hexamer composition bias and purifying selection and to (2) validate ten predicted mini-ORFs by a combination of RT-PCR, sequencing and, most importantly, analysis of translational fusions. This application contains a combination of computational and experimental approaches that will overcome current limitations in finding novel mini-ORFs in not only plants but also other eukaryotes, including human. The outcome of the proposed study will not only contribute to a more thorough annotation of the Arabidopsis genome but also provide a practical assessment of the relationship between coding potential, transcription, and translation. Our findings will provide an answer to the important question of how many novel transcriptional units are protein coding genes. In addition, we will generate a set of rigorously verified novel mini-ORFs. These mini-ORFs hold the key to a better understanding of why small coding sequences are missed by current gene finding methods; therefore, our work will lead to further refinement of in silico gene finding methods. Furthermore, the experimental verification of novel mini-ORFs provides a set of novel genes that can be characterized for their molecular functions that is not possible before. In addition to the intellectual merits, the proposed activities have broad impacts on the dissemination of science and technology and on the integration of research and education. The rapid influx of sequences and functional genomic data has resulted in exiting discoveries and a great need for a new generation of biologists conversant in the language of both computational and experimental biology. The planned research integrates several different fields including bioinformatics, molecular evolution, statistics, and molecular genetics. This highly inter-disciplinary research program will provide a unique training environment exposing students to computational and experimental sciences. To broaden dissemination of understanding on science and technology, the PI has formed a partnership with the East Lansing Public Library to develop outreach activities aiming to enhance the general public’s understanding of science, evolution, and genomics using the proposed research project as an example.