DC Bioinformatics I

advertisement
Biology 3492
Spring 2008
-----------------------------------------------------------------------------------------------------------------
Bioinformatics analysis of Tetrahymena genes
----------------------------------------------------------------------------------------------------------------Understanding what’s in a sequence.
Once a gene is identified, one of first steps is trying to understand the biological function of its
encoded protein is to use bioinformatics to look for proteins of known function that have shared
structure. A number of bioinformatic tools exists to assist scientists in such analyses.
A. One of the first tests one will want to do is see what proteins exist in other organisms that
have sequence similarity to your protein. These could be direct homologues or ones with shared
functional domains. This is a simple analysis using “BLAST” tools that perform pair-wise
comparisons of a query (your proteins sequence) against all the know protein sequences in the
genbank database. Go to the BLAST homepage:
NCBI BLAST homepage
http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
Select ‘protein BLAST’, paste in your protein sequence and search.
Does the BLAST search identify any conserved protein domains? Note their location in your
protein.
Look at your ‘hits’. Is there a common class of proteins that are similar to your protein
sequence? Save your search as a ‘Page Source’ for future reference
B. Homology suggest that proteins share function, but not necessarily that they are involved in
the same process. Orthologs are proteins that appear to be the same protein in different
organisms that share a direct line to a common ancestoral protein (and thus are likely doing the
same thing). Organisms that share closer evolutionary histories are more likely share
orthologous proteins. A database of ciliate orthologs has been develop to assist in this analysis.
Ciliate Ortholog Database
http://oxytricha.princeton.edu/COD/
Select ‘BLASTO’, paste in your protein sequence and search. Does your protein fall into a
ortholog group? Save your search as a ‘Page Source’ for future reference.
If you have a direct ortholog group, select the ClustalW alignment display. Save for future
reference,
C. While BLAST searches allow us to find homologs as well as conserved protein domains, any
one database can miss important information. The SMART database is a useful tool to look for
conserved domains, not-so-conserved domains, and other functional motifs in one’s protein
sequence.
SMART modular domain database
http://smart.embl-heidelberg.de/
Paste in your sequence and search for: Outlier homologues, PFAM domains, internal repeats,
intrinsic protein disorder. Save you search. If you want to save your diagram, Print and Save as
a PDF.
1
Biology 3492
Spring 2008
Database links
NCBI Blast homepage
http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
NCBI Protein BLAST
http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Proteins&PROGRAM=blastp&BLAST_PR
OGRAMS=blastp&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on
Tetrahymena Genome Database
www.ciliate.org
Ciliate Ortholog Database
http://oxytricha.princeton.edu/COD/
Tetrahymena Genome Blast Search
http://tigrblast.tigr.org/er-blast/index.cgi?project=ttg
Paramecium Genome Database
http://paramecium.cgm.cnrs-gif.fr/
ClustalW sites for multiple sequence alignments
http://www.ebi.ac.uk/clustalw/
http://clustalw.genome.jp/
[Easier to download trees]
SMART modular domain database
http://smart.embl-heidelberg.de/
Pfam database of conserved protein motifs (two mirrored sites)
http://pfam.sanger.ac.uk/
http://pfam.janelia.org/
PubMed (for literature searches)
http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed
2
Download