Biology 3492 Spring 2008 ----------------------------------------------------------------------------------------------------------------- Bioinformatics analysis of Tetrahymena genes ----------------------------------------------------------------------------------------------------------------Understanding what’s in a sequence. Once a gene is identified, one of first steps is trying to understand the biological function of its encoded protein is to use bioinformatics to look for proteins of known function that have shared structure. A number of bioinformatic tools exists to assist scientists in such analyses. A. One of the first tests one will want to do is see what proteins exist in other organisms that have sequence similarity to your protein. These could be direct homologues or ones with shared functional domains. This is a simple analysis using “BLAST” tools that perform pair-wise comparisons of a query (your proteins sequence) against all the know protein sequences in the genbank database. Go to the BLAST homepage: NCBI BLAST homepage http://www.ncbi.nlm.nih.gov/blast/Blast.cgi Select ‘protein BLAST’, paste in your protein sequence and search. Does the BLAST search identify any conserved protein domains? Note their location in your protein. Look at your ‘hits’. Is there a common class of proteins that are similar to your protein sequence? Save your search as a ‘Page Source’ for future reference B. Homology suggest that proteins share function, but not necessarily that they are involved in the same process. Orthologs are proteins that appear to be the same protein in different organisms that share a direct line to a common ancestoral protein (and thus are likely doing the same thing). Organisms that share closer evolutionary histories are more likely share orthologous proteins. A database of ciliate orthologs has been develop to assist in this analysis. Ciliate Ortholog Database http://oxytricha.princeton.edu/COD/ Select ‘BLASTO’, paste in your protein sequence and search. Does your protein fall into a ortholog group? Save your search as a ‘Page Source’ for future reference. If you have a direct ortholog group, select the ClustalW alignment display. Save for future reference, C. While BLAST searches allow us to find homologs as well as conserved protein domains, any one database can miss important information. The SMART database is a useful tool to look for conserved domains, not-so-conserved domains, and other functional motifs in one’s protein sequence. SMART modular domain database http://smart.embl-heidelberg.de/ Paste in your sequence and search for: Outlier homologues, PFAM domains, internal repeats, intrinsic protein disorder. Save you search. If you want to save your diagram, Print and Save as a PDF. 1 Biology 3492 Spring 2008 Database links NCBI Blast homepage http://www.ncbi.nlm.nih.gov/blast/Blast.cgi NCBI Protein BLAST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Proteins&PROGRAM=blastp&BLAST_PR OGRAMS=blastp&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on Tetrahymena Genome Database www.ciliate.org Ciliate Ortholog Database http://oxytricha.princeton.edu/COD/ Tetrahymena Genome Blast Search http://tigrblast.tigr.org/er-blast/index.cgi?project=ttg Paramecium Genome Database http://paramecium.cgm.cnrs-gif.fr/ ClustalW sites for multiple sequence alignments http://www.ebi.ac.uk/clustalw/ http://clustalw.genome.jp/ [Easier to download trees] SMART modular domain database http://smart.embl-heidelberg.de/ Pfam database of conserved protein motifs (two mirrored sites) http://pfam.sanger.ac.uk/ http://pfam.janelia.org/ PubMed (for literature searches) http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed 2