Final annotation days - California Lutheran University

advertisement
Molecular biology
Final things to do
Today we should finish the annotation project. It will be submitted next week.
This week
1. Check the synteny of your fosmid against D. melanogaster. This is done using UCSC the
browser at http://genome.ucsc.edu/. Use Blat to enter your sequence and find the correct
region.
2. Go to the Drosophila Flybase (http://flybase.bio.indiana.edu/blast/) to determine the gene
name and try to get the Refseq mRNA ID (see your blast output—it is the number like
NM_001014690) of each of your genes.
3. Use the GeneChecker program to generate the files needed for submitting the annotation
project (http://gep.wustl.edu, then click on “Projects”, then “Gene Model Checker”). We are
working on the D. erects control chromosome (3L). When specifying where the end of the
gene is, it is the last codon that codes for an amino acid (it does not include the stop codon).
I believe the programs already indicate this as the last position, but you might check to make
sure. Also note that if you have no information on the transcription start/end sites, just use
the translation ones (i.e., beginning of the first exon and end of the last exon).
For the written report, GEP would like you to highlight any unusual situations (for example
unusual splice sites or introns that appear to have moved compared to other genes) and what
evidence/thinking support your model of the genes. Include evidence for all of your exons,
and for which GenScan predictions don’t seem to be valid exons.
Next week
4. Fill out the online Post survey at the GEP web site. This is required, as it is needed to get
further grants to continue these projects (http://gep.wustl.edu).
5. To submit the official annotation projects, I need four files emailed to me (revie @
clunet.edu):
a. Project report (the form is in the annotation project file, which ends in .txt).
b. One GFF file that contains all your gene models in GFF format (text file that ends in .gff).
c. One file that contains all your peptide sequences (text file that ends in .pep).
d. One file that contains all your nucleotide sequences [CDS] (text file that ends in .fasta).
Note: The project report file is in your contig’s directory. To save files with endings other
than txt, you must type it in, e.g., myfosmid.gff
When you put in your gene models, you will get files generated for each gene. You need to
combine the gene models into one file for the above submission files. Generate text files for
each gene. When you save the data, include the fosmid number in the name of the file, and
1
change the extension to the ones listed above for the appropriate file (e.g.
report_derecta_3Lextended_Jan2008_fosmid13.txt for item 1, where 13 is the number of
your fosmid/contig).
The annotation report due the last day of class is different than these files. That report
provides evidence, analysis, etc. that are not needed by Washington University.
6. Assuming your results are good enough, you can get your names on the publication(s) that
later come out. It may take 1-2 years, so you need to give me your email address, and
update me when you change it. If we do not have your email address when the paper is
submitted, you will not be included (it is a requirement of the journals that publish the
papers).
2
Download