BIT150 - Lab 2 Sequence Alignment and Database Searches Copy the directory ‘Z:\08_Lab2’ into your own directory in ‘C:\YourLastname\08_Lab2’ A. SEQUENCE ALIGNMENT The most basic task in sequence analysis is to ask whether two sequences are similar and can be compared. Proteins with very similar sequences probably share structural properties and similar functions. Objective: Explore different methods of sequence alignment, interpret their results, and compare them. Activities: 1. Graphical method Dotter (http://www.cgb.ki.se/cgb/groups/sonnhammer/Dotter.html ): A dot-matrix program with interactive grayscale for DNA and protein sequence analysis. Dotter is preinstalled on your lab computers. Follow these steps to run Dotter: 1.1. The DNA sequence file WIS.txt to be used with Dotter is in ‘08_Lab2\Dotter files’. Copy this file into the ‘C:\BIT150\Programs\Dotter’ (an alternative is to write the PATH of each file when you run the program). 1.2. Dotter needs to be started from the Command Prompt window: Start-> Programs -> Accessories -> Command Prompt (create a shortcut in your desktop). Alternatively, Start -> Run… -> in Open, type cmd -> OK. This is the old DOS operating system (case insensitive). 1.3. Move to the Dotter directory (located in ‘C:\BIT150\Programs\Dotter’), typing: call C: -> press Enter; cd BIT150\Programs\Dotter (to change directory). To see the files present in the Dotter directory, type dir. Check for WIS.txt and MITE2.txt. 1.4. Using Dotter, align the DNA sequence of the retroelement WIS, WIS.txt, with itself to look for internal repeats. To do it, type: dotter WIS.txt WIS.TXT -> press Enter -> wait…. 1.5. Analyze the Dotter output: 1 Dotter window: The first sequence runs along the x-axis and the second sequence along the y-axis. Segments of 25 bp in one sequence (along the X axis) are compared to segments of 25 bp in the second sequence (Y axis). In regions where the two sequences are similar to each other, a row of high scores runs diagonally across the dot matrix. o Set width of the sliding window: (right click on the Dotter window and select ‘Change size of sliding window’). The default width of 25 residues over which the pairwise scores are averaged has proven to be very robust, but you can change the width of the sliding window. o Print to a file: (right click on the Dotter window and select ‘Print’). You can print the alignment to a PostScript file and later convert it to PDF. Greyramp Tool window: Generates windows along the diagonals, and draws a dot in the center of the window only if the sum of the scores of all ‘dots’ within that window is above the maximum threshold, while dots below the minimum threshold get the minimum intensity, and dots in between are ‘rendered’ with a grayscale intensity proportional to their sum of scores. Interactive and dynamic changing of maximum and minimum thresholds allows the exploration of various signal stringencies. Alignment Tool window: Allows you to see the match that causes a given dot in the dotplot. Move the crosshair of the Dotter window with the left mouse button to the dot, and pop up the Alignment Tool. Once in the proximity, use the cursor keys to move the crosshair one residue at the time. - Copy and paste the alignment into your Word document (use Shift/PrintScreen to copy all what you have in your screen, open Start/Programs/Accessories/Paint, paste the image, select what you want, cut it, and finally paste it into your Word document). - After aligning WIS.txt with itself, what type of repeat is present in the sequence? 1.6. Using Dotter, compare now Seq1 sequence with Seq2 sequence. Dotter uses .txt files. Copy and paste ONLY the sequence (plain sequence) of Seq1 in Notepad and save it as Seq1.txt, and do the same with Seq2. Remember to save these files in the Dotter directory. >Seq1 CCTACCATACGAGTATCAGACCTATCAGGCCTATCCAGAGCAGATCATGGACTAACCCTAGGACATACCAT >Seq2 ACTAATCATGGACTAACCCCCTAGGACATACCACTACATATGGCCTGATACCTCTGATACTCGTATGGTATCT - Copy and paste the alignment into your Word document (use Shift/PrintScreen). 2 - What types of repeats are present in the sequences? Interact with the Greyramp Tool to identify them. 2. Dynamic-programming methods Global: Needleman-Wunsch algorithm (1981) Local: Smith-Waterman algorithm (1970) 2.1. Open the link: http://www.ebi.ac.uk/emboss/align/ Paste Seq1 and Seq2 from 1.6. into the Sequence1 and Sequence2 windows, respectively. Select DNA as molecule where asked. 2.2. Compare different alignment tools and parameters: - Compare needle (global) and water (local) alignment results. What differences can you observe between global and local alignment results? - In water (local), compare the effect of changing the gap open penalty from 10 to 1. Why does the number of gaps increase in the alignment? 3. Words methods (heuristic) BLASTN: The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. 3.1. Using BLAST 2 Sequences (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi), run the same two sequences, Seq1 and Seq2 from 1.6. Select blastn as ‘Program’. - Copy and paste the alignment into your Word document (use Shift/PrintScreen). - What type of repeat present in the sequences can you identify? - Compare this alignment with those previously obtained using both Dotter, needle (global), and water (local). 3.2. Change ‘gap open penalty’ from 5 (default) to 3. Run. - Copy and paste the alignment into your Word document (use Shift/PrintScreen). - What types of repeats present in the sequences can you identify now? 3.3. Which of the three methods (needle (global), water (local), BLAST 2 Sequences) detected better the similarities observed in Dotter? BLASTX: DNA-protein alignment (protein database using a translated nucleotide query). 3.4. Using BLAST 2 Sequences, compare the genomic DNA sequence of the Acyl Co-A Synthetase from Lab1 with the predicted protein sequence. 3 Sequences are in the file 08_Lab1\Sequin Acyl Co-A Synthetase\ Final annotation.doc. and also in 08_Lab2. Paste the Acyl Co-A synthetase DNA sequence in the Sequence 1 window and the Acyl Co-A synthetase protein sequence in the Sequence 2 window. Select blastx as ‘Program’. - Could you identify the 6 exons? - Are the borders of the exons as precise as in the flat file prepared using Sequin? 3.5. Change ‘gap extension penalty’ from 1 (default) to 2. - Can you see any improvement? BLASTP: Comparing two proteins. 3.6. Using BLAST 2 Sequences, align the following sequences. Select blastp as ‘Program’. >K_transport VGALLLYLPISTTRPISFLDALFTATSAVTVTGLAVLDTYSDFTLFGKLVILFLIQVGGLGYMTLSTFFLVLLG RRIGLKERLILAESLEYPSMHGLIRFLKRVFSFVFITELTGAILLSIYFSLKGVEDPVFNGIFHSVSAFNNAGF STFKNG >TRK system potassium uptake protein NDIQTKYALIVTAFISIIISIKDKVPIIDSLFTVVSAMTSTGFTTINVGNLSSLSLFLIIFLMLIGGGAGTTTG GVKIIRFLVILKALLYEIKEIIYPKSAVIHEHLDDMDLNYRIIREAFVVFFLYCLSSFLTALIFIALGYNPYDS IFDAVSF - Compare alignments with ‘Matrix’ BLOSUM62/BLOSUM80/ /PAM30/PAM70. Any change when changing matrices? PAM (Percentage of Acceptable point Mutations per 108 years) matrices BLOSUM (BLOcks SUbstitution Matrix) matrices B. BLAST SEARCH Open the link: http://www.ncbi.nlm.nih.gov/blast/ BLASTN: Go to nucleotide blast (blastn). 4.1. Randomly type in a 50-bp DNA sequence, choose database Others\Nucleotide collection (nr/nt), optimize for blastn, then click on BLAST. Look at the E values. 4.2. Compare it with a real search using the Acyl Co-A synthetase DNA sequence. BLASTP: Go to protein blast (blastp). 4 4.3. Do the same as in 4.1. and 4.2., but now using a randomly generated 50-aa sequence and the Acyl Co-A synthetase protein sequence, selecting algorithm blastp. BLASTN: Go to nucleotide blast (blastn). 4.4. Choose database Others/Expressed sequence tags (est), in Organism select ‘Hordeum’, and optimize for blastn. Search with the Acyl Co-A synthetase DNA sequence. Good ESTs show alignment to exons in the Acyl Co-A synthetase DNA sequence. C. ENTREZ Help file: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.chapter.EntrezHelp Open the link: http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi Get information about ‘Acyl CoA synthetase’. Click on the number near Nucleotide. Limit function: If you want to see any Acyl CoA synthetase sequences from Hordeum added to the nucleotide database in the last year: Type ‘Hordeum’ in the query box; Go to ‘Limits’; In the ‘Limited to’ section, select Organism from ‘Fields’ menu; In the ‘Limited to’ section, select 1 year in the ‘Modified in the last’ menu, and click on Go; History function: Go to ‘History’; you will see the history of all your searches, and you can type the number given to them in the query box for future searches. 5.4. Boolean operators: Perform the following searches and report the number of Nucleotides found: ‘Acyl CoA Synthetase’ Since the name of this enzyme can appear also as ‘synthase’, perform your search using truncation, ‘Acyl CoA synth*’ Acyl CoA Synthetase AND Hordeum Acyl CoA Synthetase AND (Hordeum OR Triticum) Acyl CoA Synthetase AND Hordeum OR Triticum 5.1. 5.2. 5.3. 5