Sequence Alignment and Database Searches

advertisement
BIT150 - Lab 2
Sequence Alignment and Database Searches
Copy the directory ‘Z:\08_Lab2’ into your own directory in ‘C:\YourLastname\08_Lab2’
A. SEQUENCE ALIGNMENT
The most basic task in sequence analysis is to ask whether two sequences are similar and
can be compared. Proteins with very similar sequences probably share structural
properties and similar functions.
Objective: Explore different methods of sequence alignment, interpret their results, and
compare them.
Activities:
1. Graphical method
Dotter (http://www.cgb.ki.se/cgb/groups/sonnhammer/Dotter.html ): A dot-matrix
program with interactive grayscale for DNA and protein sequence analysis.

Dotter is preinstalled on your lab computers. Follow these steps to run
Dotter:
1.1. The DNA sequence file WIS.txt to be used with Dotter is in
‘08_Lab2\Dotter files’. Copy this file into the
‘C:\BIT150\Programs\Dotter’ (an alternative is to write the PATH of each
file when you run the program).
1.2. Dotter needs to be started from the Command Prompt window:
Start-> Programs -> Accessories -> Command Prompt (create a
shortcut in your desktop).
Alternatively, Start -> Run… -> in Open, type cmd -> OK.
This is the old DOS operating system (case insensitive).
1.3. Move to the Dotter directory (located in
‘C:\BIT150\Programs\Dotter’), typing:
call C: -> press Enter;
cd BIT150\Programs\Dotter (to change directory).
To see the files present in the Dotter directory, type dir. Check for WIS.txt
and MITE2.txt.
1.4. Using Dotter, align the DNA sequence of the retroelement WIS,
WIS.txt, with itself to look for internal repeats. To do it, type:
dotter WIS.txt WIS.TXT -> press Enter -> wait….
1.5. Analyze the Dotter output:
1
 Dotter window: The first sequence runs along the x-axis and the
second sequence along the y-axis. Segments of 25 bp in one sequence
(along the X axis) are compared to segments of 25 bp in the second
sequence (Y axis). In regions where the two sequences are similar to
each other, a row of high scores runs diagonally across the dot matrix.
o Set width of the sliding window: (right click on the Dotter
window and select ‘Change size of sliding window’). The
default width of 25 residues over which the pairwise scores
are averaged has proven to be very robust, but you can
change the width of the sliding window.
o Print to a file: (right click on the Dotter window and select
‘Print’). You can print the alignment to a PostScript file
and later convert it to PDF.
 Greyramp Tool window: Generates windows along the
diagonals, and draws a dot in the center of the window only if the sum
of the scores of all ‘dots’ within that window is above the maximum
threshold, while dots below the minimum threshold get the minimum
intensity, and dots in between are ‘rendered’ with a grayscale intensity
proportional to their sum of scores. Interactive and dynamic changing
of maximum and minimum thresholds allows the exploration of
various signal stringencies.
 Alignment Tool window: Allows you to see the match that causes
a given dot in the dotplot. Move the crosshair of the Dotter window
with the left mouse button to the dot, and pop up the Alignment Tool.
Once in the proximity, use the cursor keys to move the crosshair one
residue at the time.
- Copy and paste the alignment into your Word document (use
Shift/PrintScreen to copy all what you have in your screen, open
Start/Programs/Accessories/Paint, paste the image, select what you
want, cut it, and finally paste it into your Word document).
- After aligning WIS.txt with itself, what type of repeat is present in
the sequence?
1.6. Using Dotter, compare now Seq1 sequence with Seq2 sequence.
Dotter uses .txt files. Copy and paste ONLY the sequence (plain sequence)
of Seq1 in Notepad and save it as Seq1.txt, and do the same with Seq2.
Remember to save these files in the Dotter directory.
>Seq1
CCTACCATACGAGTATCAGACCTATCAGGCCTATCCAGAGCAGATCATGGACTAACCCTAGGACATACCAT
>Seq2
ACTAATCATGGACTAACCCCCTAGGACATACCACTACATATGGCCTGATACCTCTGATACTCGTATGGTATCT
- Copy and paste the alignment into your Word document (use
Shift/PrintScreen).
2
- What types of repeats are present in the sequences? Interact with the
Greyramp Tool to identify them.
2. Dynamic-programming methods

Global: Needleman-Wunsch algorithm (1981)

Local: Smith-Waterman algorithm (1970)
2.1. Open the link: http://www.ebi.ac.uk/emboss/align/
Paste Seq1 and Seq2 from 1.6. into the Sequence1 and Sequence2
windows, respectively. Select DNA as molecule where asked.
2.2. Compare different alignment tools and parameters:
- Compare needle (global) and water (local) alignment results. What
differences can you observe between global and local alignment results?
- In water (local), compare the effect of changing the gap open penalty
from 10 to 1. Why does the number of gaps increase in the alignment?
3. Words methods (heuristic)

BLASTN: The Basic Local Alignment Search Tool (BLAST) finds
regions of local similarity between sequences.
3.1. Using BLAST 2 Sequences
(http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi), run the same two
sequences, Seq1 and Seq2 from 1.6. Select blastn as ‘Program’.
- Copy and paste the alignment into your Word document (use
Shift/PrintScreen).
- What type of repeat present in the sequences can you identify?
- Compare this alignment with those previously obtained using both
Dotter, needle (global), and water (local).
3.2. Change ‘gap open penalty’ from 5 (default) to 3. Run.
- Copy and paste the alignment into your Word document (use
Shift/PrintScreen).
- What types of repeats present in the sequences can you identify now?
3.3. Which of the three methods (needle (global), water (local), BLAST 2
Sequences) detected better the similarities observed in Dotter?

BLASTX: DNA-protein alignment (protein database using a translated
nucleotide query).
3.4. Using BLAST 2 Sequences, compare the genomic DNA sequence of
the Acyl Co-A Synthetase from Lab1 with the predicted protein sequence.
3
Sequences are in the file 08_Lab1\Sequin Acyl Co-A Synthetase\ Final
annotation.doc. and also in 08_Lab2.
Paste the Acyl Co-A synthetase DNA sequence in the Sequence 1 window
and the Acyl Co-A synthetase protein sequence in the Sequence 2
window. Select blastx as ‘Program’.
- Could you identify the 6 exons?
- Are the borders of the exons as precise as in the flat file prepared
using Sequin?
3.5. Change ‘gap extension penalty’ from 1 (default) to 2.
- Can you see any improvement?

BLASTP: Comparing two proteins.
3.6. Using BLAST 2 Sequences, align the following sequences. Select
blastp as ‘Program’.
>K_transport
VGALLLYLPISTTRPISFLDALFTATSAVTVTGLAVLDTYSDFTLFGKLVILFLIQVGGLGYMTLSTFFLVLLG
RRIGLKERLILAESLEYPSMHGLIRFLKRVFSFVFITELTGAILLSIYFSLKGVEDPVFNGIFHSVSAFNNAGF
STFKNG
>TRK system potassium uptake protein
NDIQTKYALIVTAFISIIISIKDKVPIIDSLFTVVSAMTSTGFTTINVGNLSSLSLFLIIFLMLIGGGAGTTTG
GVKIIRFLVILKALLYEIKEIIYPKSAVIHEHLDDMDLNYRIIREAFVVFFLYCLSSFLTALIFIALGYNPYDS
IFDAVSF
- Compare alignments with ‘Matrix’ BLOSUM62/BLOSUM80/
/PAM30/PAM70. Any change when changing matrices?
PAM (Percentage of Acceptable point Mutations per 108 years) matrices
BLOSUM (BLOcks SUbstitution Matrix) matrices
B. BLAST SEARCH
Open the link: http://www.ncbi.nlm.nih.gov/blast/

BLASTN: Go to nucleotide blast (blastn).
4.1. Randomly type in a 50-bp DNA sequence, choose database
Others\Nucleotide collection (nr/nt), optimize for blastn, then click on
BLAST.
Look at the E values.
4.2. Compare it with a real search using the Acyl Co-A synthetase DNA
sequence.

BLASTP: Go to protein blast (blastp).
4
4.3. Do the same as in 4.1. and 4.2., but now using a randomly generated
50-aa sequence and the Acyl Co-A synthetase protein sequence, selecting
algorithm blastp.

BLASTN: Go to nucleotide blast (blastn).
4.4. Choose database Others/Expressed sequence tags (est), in Organism
select ‘Hordeum’, and optimize for blastn. Search with the Acyl Co-A
synthetase DNA sequence. Good ESTs show alignment to exons in the
Acyl Co-A synthetase DNA sequence.
C. ENTREZ
Help file: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.chapter.EntrezHelp
Open the link: http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi
Get information about ‘Acyl CoA synthetase’.
Click on the number near Nucleotide.
Limit function:
If you want to see any Acyl CoA synthetase sequences from
Hordeum added to the nucleotide database in the last year:
Type ‘Hordeum’ in the query box;
Go to ‘Limits’;
In the ‘Limited to’ section, select Organism from ‘Fields’ menu;
In the ‘Limited to’ section, select 1 year in the ‘Modified in the
last’ menu, and click on Go;
History function:
Go to ‘History’; you will see the history of all your searches, and
you can type the number given to them in the query box for future searches.
5.4.
Boolean operators:
Perform the following searches and report the number of
Nucleotides found:
‘Acyl CoA Synthetase’
Since the name of this enzyme can appear also as ‘synthase’,
perform your search using truncation, ‘Acyl CoA synth*’
Acyl CoA Synthetase AND Hordeum
Acyl CoA Synthetase AND (Hordeum OR Triticum)
Acyl CoA Synthetase AND Hordeum OR Triticum
5.1.
5.2.
5.3.
5
Download