Sequence analysis Practical INTRODUCTION Hox proteins are involved in the regulation of development (morphogenesis) and have been identified in a diverse group of organisms, such as animals, fungi and plants. Hox proteins are known to contain a characteristic signature. The goals of this practical are to: 1. retrieve a protein from Genbank using an accession number. 2. Identify a homolog in another species using BLAST 3. Align the two sequence sequences using DOTMATCHER (http://sf01.bic.nus.edu.sg/EMBOSS/) 4. Identify homologs in additional species 5. Align all sequences using ClustalW 1. Extract the human Hox A4 (NP_002132) protein sequence in FASTA format from NCBI (http://www.ncbi.nlm.nih.gov/) and save it to a text file on your desktop. 2. Use the NCBI BLAST server (http://blast.ncbi.nlm.nih.gov/Blast.cgi) to identify homologs of Hox A4 in mouse, sea-bass (Morone saxatilis) and horn shark (Heterodontus francisci). Add these sequences to your protein text file Organism % identity between A4 orthologs E-values Human – Mouse Human – Sea-bass Human – Horn shark 3. Generate a dot-plot for human and shark Hox A4 proteins by aligning these sequences using the DOTMATCHER software of EMBOSS ((http://sf01.bic.nus.edu.sg/EMBOSS/) 3.1 How do you interpret the long diagonal line in the top right corner of the matrix? (remember to copy the dotplot into your report). 4. Using ClustalX, generate a multiple sequence alignment of all four Hox A4 proteins. 4.1 What region is conserved in all species …………………(amino acid position) 4.2 What is the corresponding exon number ……… 4.3 Save the alignment in a textfile. How is identity indicated in the alignment? 5. Align seq1 and seq2 using the DOTMATCHER software of EMBOSS ((http://sf01.bic.nus.edu.sg/EMBOSS/) 5.1 What scenario would explain the dot-plot pattern that you observe? 5.2 Modify the window size to 100. What is the effect on the dot-plot? >seq1 CACTCGATGCAGGCGCTGTCCTGGCGCAAGCTCTACTTAAGCCGCGCCAAGCT CAAAGCTTCCAGCCGGACCTCGGCTCTGCTCTCCGGCTTCGCCATGGTAGCGA TGGTGGAAGTCCAGCTGGACACAGACCATGACTACCCACCAGGGTTGCTCATC GTCTTTAGTGCCTGCACCACAGTGCTAGTGGCCGTGCACCTGTTTGCCCTCATG ATCAGCACCTGCATCCTGCCCAACATCGAGGCTGTGAGCAACGTCCACAACCT CAACTCGGTCAAAGAGTCACCCCACGAGCGCATGCATCGCCACATCGAGCTGG CCTGGGCCTTCTCCACGGTCATCGGGACGCT >seq2 CACTCGATGCAGGCGCTGTCCTGGCGCAAGCTCTACTTAAGCCGCGCCAAGCT CAAAGCTTCCAGCCGGACCTCGGCTCTGCTCTCCGGCTTCGCCATGGTAGCGA TGGTGGAAGTCCAGCTGGACACAGACCATGACTAATGATCAGCACCTGCATCC TGCCCAACATCGAGGCTGTGAGCAACGTCCACAACCTCAACTCGGTCAAAGAG TCACCCCACGAGCGCATGCATCGCCACATCGAGCTGGCCTGGGCCTTCTCCAC GGTCATCGGGACGCT 6. Questions about BLAST 6.1 What is a high scoring sequence pair (HSP)? 6.2 What is the Expect value? 6.3 Why is the following statement false: “The query length and query composition affects the Expect value”.