Practical Exercise

advertisement
Practical 2 Exploring bioinformatics software, tools and techniques
Introduction
During the first two practical sessions, you explored a number of biological databases
and acquired some database searching skills. Options for finding these databases
include, but are not limited to, (1) searching the Internet with a general purpose search
engine such as Google, using well-chosen keywords, (2) searching PubMed for
publications about databases, and (3) identifying catalogs of databases (such as
Nucleic Acids Research annual Database Issue, which is an example of a database
catalog).
In this practical, you will explore a catalog of available bioinformatics tools, and
apply some of these tools to extract useful information from biological data. Most
tools will have a number of available options to select when setting up your analysis.
The recommended options are usually selected as the default settings. However, they
may not be the best options for the problem you are addressing. It is advisable that
users of bioinformatics tools carefully explore the options available before performing
any analysis. For the purpose of today’s exercise, we will stick to default settings.
Some tools are available at multiple servers. The functionality of the software may differ
slightly from one server to another. For this practical, we will be focusing on software
hosted at NUS.
Objectives in General
By the end of this practical you will be:
-
familiar with the scope of bioinformatics tools/software and know where to search
for tools/software relevant to one’s research area
familiar with the application of bioinformatics tools to address biological problems
Problem Scenario
Your supervisor in the bioinformatics lab is currently working on p53 and he needs your
help for the bioinformatics component of his research. With the skills acquired from the
earlier practical, you have successfully aided your supervisor in surveying the existing
p53 databases. Now he requires your expertise in analyzing the nucleotide/protein
sequence of p53 using bioinformatics tools and techniques currently available so that he
will be able to infer relevant biological information from the primary sequence.
For this purpose, your supervisor is introducing to you the Bioinformatics Links
Directory
at
the
University
of
British
Columbia
(http://www.bioinformatics.ca/links_directory/), which provides a list of available
bioinformatics tools. You are required to browse through the resources listed to
identify appropriate bioinformatics databases and tools that address your supervisor’s
research needs.
Practical Exercise
The Bioinformatics Links Directory: Scope of bioinformatics software
New bioinformatics tools are developed on a regular basis. Some of these are
catalogued at the Bioinformatics Links Directory. You will now browse through the
resources listed there and try to identify tools for a particular analysis. Don’t spend too
much time on this – just try to get an appreciation of the diversity of tools
available.
1 Go to http://www.bioinformatics.ca/links_directory/.
List down any three categories of resources that you find particularly
interesting?
2 For each of those three categories, identify two tools that look like they might
do the same thing. For example, find two tools that allow you to perform a
sequence comparison, (eg, multiple sequence alignment)
Restrict this to software i.e. do not list any databases (yes, Bioinformatics
Links Directory also catalogues databases because of their collaboration with
NAR).
3 Are you able to find software suites that comprise of a wide range of
bioinformatics tools for protein and/or DNA sequence analysis?
(hint: “Do-it-all Tools for Proteins” under protein category, and “DNA and
Genomic analysis” under DNA category)
Note: From your previous experience in surveying the databases from the
Nucleic Acids Research annual Database Issue, you should recognize that
the resources provided on Bioinformatics Links Directory may not be
comprehensive. It is advisable that you try searching in PubMed and Google
if you are not able to find what you are looking for.
Preparing your input file for analysis
1 Using the database query skills that you have acquired, find the human p53
RefSeq protein sequence record. (hint: search NCBI Gene Database for
p53).
Click on NP_000537.3, then change the view from GenBank format to
FASTA format. The FASTA format has a one-line header starting with the >
character that contains a description of the sequence record, followed by
the sequence itself in one-letter code.
2 Save the sequence to a text file by clicking Send Select: Complete record;
File  Format: FASTA  Create file.
3 Open the file in Notepad or WordPad by right-clicking and selecting Notepad
or WordPad as the preferred application.
4 You will notice that the description line is too long. Most bioinformatics tools
prefer shorter names and where possible, you are recommended to rename
the descriptor to something more familiar to you, but to note the original
descriptor somewhere in case you need the accession number or other
information. In this case, you will re-name the description on the first line as
“wt human p53 protein”, and re-save your file.
Make sure that the descriptive information (in this case “wt human p53
protein”) is all on the same line as the > symbol, and that the sequence
starts on the next line.
Task
In this practical, you have been introduced the Bioinformatics Links Directory, which
contains a comprehensive list of bioinformatics tools and the Emboss suite, which
contains a variety of bioinformatics tools for protein and DNA sequence analysis.
Using the bioinformatics tools listed in the Bioinformatics Link Directory and/or the
Emboss suite, analyze the p53 mRNA and protein sequences (use the default program
settings and parameters) to retrieve the following information:
(Hint: For help and information, always refer to the user manuals accompanying the
bioinformatics software and tools)
– PROSITE motifs present in a protein sequence (eg, wt p53)
Protein sequence patterns (or motifs) represent a useful method of
determining the function(s) of proteins.
The Emboss program, patmatmotifs, takes a protein sequence and
compares it to the PROSITE motif database to scan for motifs present in the
sequence. To access patmatmotifs, go to Protein Motifs  patmatmotifs
from Wemboss. patmatmotifs scan the the wild-type p53 protein sequence
with motifs from the PROSITE database. Paste your result below.
– Antigenic sites in a protein sequence
Antigenic sites in protein are those regions that can be recognized by
antibodies. The Emboss program, Protein Motifs  antigenic predicts
potential antigenic sites within a protein sequence. Use the wild-type p53
protein sequence as the query (default parameters) to predict for antigenic
sites.
– DNA binding residues in a protein sequence
BindN takes an amino acid sequence as input and predicts potential DNA or
RNA-binding residues using support vector machines (SVMs). To access
BindN, go to the “Protein” category in the Bioinformatics Links Directory and
click on “Sequence Features”. Use the wild-type p53 protein sequence as the
query (default parameters) to predict for DNA binding residues in the sequence.
– Binding and interaction partners of a protein sequence
3D-partner is a tool to predict interacting partners and binding models of a
query protein sequence through the analysis of structural complexes. To access
3D-partner, go to the “Protein” category in the Bioinformatics Link Directory and
click on “Interactions, Pathways, Enzymes”. Use the wild-type p53 protein
sequence as the query (default parameters) to predict for the interaction
partners.
Download