NGS Bioinformatics Workshop 1.5 Tutorial – Genome Annotation April 5th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Workflow for Today Prepare to visualize annotation Get a genomic sequence from Genbank Repeat mask it. Retrieve a genomic sequence… Retrieve a (relatively small <100kb, eukaryote) genomic sequence clone from Genbank Query Nucleotide divisione.g. Arabidopsis BAC clone (HE601748.1) Select FASTA Save.. To File.. As “Fasta” (rename?) Blast is a low hanging fruit… Use BLAST to quickly survey for similar sequences Megablast against nucleotide e.g. HE601748 is closest to A. thaliana chr. 5? Megablast against reference RNA sequence db Repeat Masking Upload the clone file to RepeatMasker on the web and run with appropriate parameters: http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker Save the results (including the masked sequence) to your computer ab initio Gene Predictions Genscan: http://genes.mit.edu/GENSCAN.html Cut and paste results as text to a file Fgenesh: www.softberry.com Blast2GO http://www.blast2go.com Annotation workbench, via Gene Ontology (GO) terms. First, save the predicted peptides (e.g. from fgenesh) need to fix the FASTA headers to assign proper identifiers (could write a script?) (Java web) start blast2go workbench Load in peptides Do the analysis… e.g. run blastp, GO, annotation, Interpro, etc. See www.geneontology.org for details on GO http://www.ebi.ac.uk/interpro/ for interpro info EMBOSS European Molecular Biology Open Software Suite (EMBOSS): http://emboss.sourceforge.net Download and install version of interest (e.g. Linux, Mac OSX, Windows…) Decide what do to: http://emboss.sourceforge.net/apps/groups.html Let’s try a CpG island plot (cpgplot) Study Genes by Comparative Genomics JGI Vista toolkit: http://genome.lbl.gov/vista GenomeVista rVista