Molecular biology Final things to do Today we should finish the annotation project. It will be submitted next week. This week 1. Check the synteny of your fosmid against D. melanogaster. This is done using UCSC the browser at http://genome.ucsc.edu/. Use Blat to enter your sequence and find the correct region. 2. Go to the Drosophila Flybase (http://flybase.bio.indiana.edu/blast/) to determine the gene name and try to get the Refseq mRNA ID (see your blast output—it is the number like NM_001014690) of each of your genes. 3. Use the GeneChecker program to generate the files needed for submitting the annotation project (http://gep.wustl.edu, then click on “Projects”, then “Gene Model Checker”). We are working on the D. erects control chromosome (3L). When specifying where the end of the gene is, it is the last codon that codes for an amino acid (it does not include the stop codon). I believe the programs already indicate this as the last position, but you might check to make sure. Also note that if you have no information on the transcription start/end sites, just use the translation ones (i.e., beginning of the first exon and end of the last exon). For the written report, GEP would like you to highlight any unusual situations (for example unusual splice sites or introns that appear to have moved compared to other genes) and what evidence/thinking support your model of the genes. Include evidence for all of your exons, and for which GenScan predictions don’t seem to be valid exons. Next week 4. Fill out the online Post survey at the GEP web site. This is required, as it is needed to get further grants to continue these projects (http://gep.wustl.edu). 5. To submit the official annotation projects, I need four files emailed to me (revie @ clunet.edu): a. Project report (the form is in the annotation project file, which ends in .txt). b. One GFF file that contains all your gene models in GFF format (text file that ends in .gff). c. One file that contains all your peptide sequences (text file that ends in .pep). d. One file that contains all your nucleotide sequences [CDS] (text file that ends in .fasta). Note: The project report file is in your contig’s directory. To save files with endings other than txt, you must type it in, e.g., myfosmid.gff When you put in your gene models, you will get files generated for each gene. You need to combine the gene models into one file for the above submission files. Generate text files for each gene. When you save the data, include the fosmid number in the name of the file, and 1 change the extension to the ones listed above for the appropriate file (e.g. report_derecta_3Lextended_Jan2008_fosmid13.txt for item 1, where 13 is the number of your fosmid/contig). The annotation report due the last day of class is different than these files. That report provides evidence, analysis, etc. that are not needed by Washington University. 6. Assuming your results are good enough, you can get your names on the publication(s) that later come out. It may take 1-2 years, so you need to give me your email address, and update me when you change it. If we do not have your email address when the paper is submitted, you will not be included (it is a requirement of the journals that publish the papers). 2