ANALYZING GENE AND PROTEIN SEQUENCES Angelique Bosse Montgomery Blair High School abosse@mbhs.edu Table of Contents Page Introduction 1 Scenarios 1 Activity #1 BLAST 3 Activity #2 Taxonomic Information 3 Activity #3 Gene Sequence Search 4 Activity #4 Restriction Enzyme Analysis 5 Activity #5 Find Similar Protein Sequences from Various Species 5 Activity #6 Compare the Amino Acid Sequences From Two Different Species 6 Activity #7 Phylogenetic Tree Diagrams 7 Activity #8 Predict the Secondary Structure of a Protein 8 Activity #9 Find Information on Any Enzyme 8 Activity #10 Biochemical Pathways 9 Activity #11 Genetic Diseases Associated with a Protein 9 Activity #12 Finding Mutations Associated with a Protein 10 Acknowledgments 10 Introduction The following activities introduce students to the use of databases as a resource for gene and protein information. The series of twelve activities can be done in order, in parts, or can be mixed and matched to suit the objectives of the unit. The teacher may tailor the activities to the level of students. Advanced students may solve complex, open-ended problems and work at their own pace. Other students may work in pairs to complete focused tasks. Whatever the level of student, using gene and protein databases and other Internet resources, students can perform many types of analyses on the sequences. Students may search various databases, investigate taxonomy, and perform a number of analyses. For example students can compare human gene/protein sequences to other species’ sequences, simulate restriction analysis, predict secondary structure of a protein, and generate phylogenetic tree diagrams. The students may also find information on enzymes, biochemical pathways, genetic diseases, and mutations. Scenarios Below are some ideas of how to approach these bioinformatics activities: 1. Phenylketonuria: This disease, also known as PKU, is caused by an inborn error of metabolism. PKU is autosomal recessive, which means babies born with PKU have two abnormal copies of the gene phenylalanine hydroxylase. This gene codes for an enzyme that converts the amino acid phenylalanine to the amino acid tyrosine. Defective enzyme results in a build-up of phenylalanine. The high levels of phenylalanine result in mental retardation. The disease may be detected at birth through the routine newborn blood screening. If PKU is detected, the patient must follow a diet that restricts phenylalanine ingestion. Strict adherence to the diet is necessary to prevent the devastating effects of PKU. Have students search for the phenylalanine hydroxylase gene (Activity #3), perform a restriction analysis (Activity #4), and identify and classify known mutations of the phenylalanine hydroxylase enzyme (Activity #12). Next, have the students find a known mutation that alters a restriction site. Ask the students to devise a restriction analysis test that could be used to detect a particular defect in the gene. 2. Sickle Cell Anemia: This disease is caused by a mutation in the beta chain of hemoglobin. The disease is autosomal recessive, which means that sickle cell patients have two abnormal copies of the gene. Abnormal hemoglobin may polymerize, especially under low oxygen conditions, and cause sickling of red blood cells. The normally doughnut-shaped red blood cells can move through blood vessels with ease. Abnormal sickle cells may get stuck in the narrow blood vessels. The inability of the blood cells to flow to the tissues deprives the tissues of needed oxygen and results in organ damage. Have students find beta globin sequences from two different organisms (Activity #5). Then have them align the two sequences and compare them to determine similarities and differences (Activity #6). Next, have students align several organisms’ sequences and draw rooted and unrooted phylogenetic trees (Activity #7). The phylogenetic analysis should reveal evolutionary relationships. Finally, have students determine the secondary structures of beta hemoglobin (Activity #8). Students can compare the secondary structures of B hemoglobin in various organisms. 3. Amyloidosis: Amyloidosis is caused by an abnormal accumulation of the fibrous amyloid protein. Amyloidosis is observed in a variety of disorders such as rheumatoid arthritis, diabetes, and Alzheimers. The disorder, commonly seen in dogs, is often caused by lysozyme enzyme abnormalities. Although the disease may affect any breed of dog, the disease is more common in certain breeds of dogs. For example, beagles, collies, and Shar-Peis is are at greater risk of amyloidosis. The accumulation of amyloid protein can occur in any organ and interferes with the normal functions of that organ. Have students find information about lysozyme enzyme (Activity #9). Then students should investigate the biochemical pathway that lysozyme is a part of (Activity #10). Finally, students can discover the diseases associated with lysozyme enzyme (Activity #11). Activity #1 BLAST -go to BLAST at http://www.ncbi.nlm.nih.gov/BLAST/ -click on “Basic BLAST search” -click in the large blank box and input a random 35 base sequence (using a,c,g,t only) -click on “Search” -on the next screen click on “Format results” The results give you several pieces of information: 1. references to journal articles 2. a distribution of blast hits on the query sequence (your inputted sequence is in red, then in black are corresponding sequence similarities. If you slowly move the mouse down the rows of black lines, you will see identifications for each hit.) 3. a list of sequences producing significant alignments with your inputted sequence 4. a diagram for each sequence showing the letters in common with your inputted sequence -go back to the home page at http://www.ncbi.nlm.nih.gov/BLAST/ -click on “Web BLAST tutorial” (in gray margin to the left) -this tutorial introduces the various BLAST programs available such as blastp, blastn, blastx, tblastn, and tblastx Activity #2 Taxonomic Information Taxonomic information can be accessed by either of the following two approaches: -go to BLAST at http://www.ncbi.nlm.nih.gov/BLAST/ -click on “Basic BLAST search” -click in the blank box and input a random 35 base sequence (using a,c,g,t only) -click on “Search” -on the next screen click on “Format results” The results give you several pieces of information: 1. references to journal articles 2. a distribution of blast hits on the query sequence (your inputted sequence is in red, then in black are corresponding sequence similarities. If you slowly move the mouse down the rows of black lines, you will see identifications for each hit.) 3. a list of sequences producing significant alignments with your inputted sequence 4. a diagram for each sequence showing the letters in common with your inputted sequence -from the list of results on BLAST, click on an accession number (highlighted code number to the left of the hits in the list of sequences producing significant alignments) -result is information about your selection -click on highlighted term next to “ORGANISM” -result is information such as preferred common name, other names, and lineage. OR -go to NCBI Databases at http://www.ncbi.nlm.nih.gov/Database/index.html -click on “Taxonomy” (part of the flow chart) -the taxonomy browser enables you to search for information about all organisms that are represented in the genetic databases with at least one nucleotide or protein -click on the highlighted word “tree” under “the taxonomy browser” -at this page, one could explore any taxa on the list provided, or enter a particular organism in the white box next to “search for”. -in the skinny white box, next to “search for”, type in “Escherichia Coli” and click “Go” -result is lineage and option to get additional information such as genetic code translation, nucleotide or protein sequence, and structural information -click on name to get more information Activity #3 Gene Sequence Search Gene sequences can be accessed in either of the following two approaches: -go to Entrez at http://www.ncbi.nlm.nih.gov/Entrez/ -click on “Nucleotide” (part of the diagram) -in the skinny box next to “Search for” type in a protein name such as “phenylalanine hydroxylase” -click on “Go” -the result is a list of various types of the requested sequences -click on any of the accession numbers displayed and you will be able to view information about that sequence such as definition, source, classification, journal articles, comments, amino acid sequence, and the nucleotide sequence. OR -go to Biology Workbench at http://workbench.sdsc.edu/ -click “Enter the Biology Workbench 3.2” and enter user name and password, click “ok” (or set up a free account first) -scroll down and click “Nucleic Tools” -select “Ndjinn- Multiple Database Search” -click “Run” -in the blank box next to “Exact Match” type in a protein such as “phenylalanine hydroxylase” -select “Show 10 Hits” -scroll down to select a database, click in the little white box next to your choice, such as “GBPRI- GenBank Primate Sequences” -click “Search” -result is a list of “hits” for your requested sequence under “Matching Database Record” -select a record by clicking the little white box next to all EXCEPT the one you want (a check mark should only be next to your choice) -select “Show Record(s)” -result is information about the selected sequence such as locus, definition, source, references, base count, and amino acid sequence. -To view the nucleic acid sequence, scroll to the bottom of the screen and click “Show Sequence(s)” Activity #4 Restriction Enzyme Analysis -go to Entrez at http://www.ncbi.nlm.nih.gov/Entrez/ -click on “Nucleotide” (part of the flow chart) -in the skinny box next to “Search for” type in a protein name such as “phenylalanine hydroxylase” -click on “Go” -the result is a list of various types of the protein -click on any of the accession numbers displayed and you will be able to view information about that sequence such as definition, source, classification, journal articles, comments, amino acid sequence, and the nucleotide sequence. -scroll down to the bottom of the page to see the nucleotide sequence. -“copy” the sequence (you will be pasting it at another site) -go to http://darwin.bio.geneseo.edu/~yin/WebGene/RE.html -click on the toggle switch in front of “Copy and paste your sequence in the box” -click on the white box and paste your sequence in the box -scroll down and click “Send this sequence to analyse or customize the page” -result is a description of your sequence including size, number of each type of base. Also, a list of restriction endonucleases which cut your sequence, a list by base # of enzymes that cut and how many times they cut, and restriction cuts sorted by enzyme in alphabetical order. Activity #5 Find Similar Protein Sequences From Various Species -go to Biology Workbench at http://workbench.sdsc.edu/ -click “Enter the Biology Workbench 3.2” and enter user name and password, click “ok” (or set up a free account first) -scroll down and click “Protein Tools” -select “Ndjinn- Multiple Database Search” -click “Run” -in the skinny white box next to “Exact Match” type in a protein such as “beta hemoglobin” and select “Show all Hits” -scroll down to choose a database such as “SWISSPROT- Swissprot Database” by clicking in white box -go back up to top of screen and click “Search” -result is a list of “hits” for your requested protein under “Matching Database Record”. Hits include the protein sequences from a variety of organisms -to view a protein sequence (and other information) select a protein of interest -click “Show Record(s)”. -Scroll to the bottom to “sequence” to view the amino acid sequence. Activity #6 Compare Amino Acid Sequences From Two Different Species -go to Biology Workbench at http://workbench.sdsc.edu/ -click “Enter the Biology Workbench 3.2” and enter user name and password, click “ok” (or set up a free account first) -scroll down and click “Protein Tools” -select “Ndjinn- Multiple Database Search” -click “Run” -in the skinny white box next to “Exact Match” type in a protein such as “ beta hemoglobin” and select “Show 10 Hits” -scroll down to choose a database such as “SWISSPROT- Swissprot Database” by clicking in white box -go back up to top of screen and click “Search” -result is a list of “hits” for your requested protein under “Matching Database Record”. -select two unrelated species (such as human and rabbit) by clearing the little white boxes next to all EXCEPT your selections (only the white boxes next to chick and bovine should have check marks) -click “Import Sequence(s)” -from the menu list, scroll down and select “CLUSTALW- Multiple Sequence Alignment” -click the boxes next to your two imported sequences -click “Run” -on the next screen click “Submit” -result is alignment of your two chosen sequences -above the aligned sequence is a “consensus key” showing what each of the symbols means. The amino acids are classified as fully conserved, strongly conserved, weakly conserved, or no consensus. -click “Import Alignments” -from the menu, click “BOXSHADE- Color-Coded Plots of Pre-Aligned Sequences” -click white box next to “CLUSTALW” protein and your two imported sequences -click “Run” -on the next screen click “Submit” -scroll down to see color-coded plot of your sequence alignment green= identical blue= similar gray= different -click “Return” Activity #7 Phylogenetic Tree Diagrams -go to Biology Workbench at http://workbench.sdsc.edu/ -click “Enter the Biology Workbench 3.2” and enter user name and password, click “ok” (or set up a free account first) -scroll down and click “Protein Tools” -select “Ndjinn- Multiple Database Search” -click “Run” -in the skinny white box next to “Exact Match” type in a protein such as “beta hemoglobin” and select “Show All Hits” -scroll down to choose a database such as “SWISSPROT- Swissprot Database” by clicking in white box -go back up to top of screen and click “Search” -result is a list of “hits” for your requested protein under “Matching Database Record”. Hits include the protein sequences from a variety of organisms -select six unrelated species’ beta hemoglobin proteins (such as chick, sheep, horse, bovine, pig) by highlighting each -import the selected sequences by clicking “Import Sequence(s)” -from the menu select “CLUSTALW- Multiple Sequence Alignment” -click the little white box next to your chosen 6 sequences to analyze -click “Run” -on the next screen, scroll down and click “Submit” -on the next screen, click “Import Alignment(s)” -from the menu select “DRAWTREE- Draw Unrooted Phylogenetic Tree from Alignment” -click lower white box next to “CLUSTALW-protein” and your six selected sequences -click “Run” -scroll down, click “Submit” -scroll down to see the tree -click “Return” -from the menu, select “DRAWGRAM- Draw Rooted Phylogenetic Tree form Alignment” -be sure the white box next to “CLUSTALW- protein” is checked -click “Run” -on the next screen, click “Submit” -scroll down to see the tree -click “Return” Activity #8 Predict the Secondary Structure of a Protein -go to Biology Workbench at http://workbench.sdsc.edu/ -click “Enter the Biology Workbench 3.2” and enter user name and password, click “ok” (or set up a free account first) -scroll down and click “Protein Tools” -select “Ndjinn- Multiple Database Search” -click “Run” -in the skinny white box next to “Exact Match” type in a protein such as “beta hemoglobin” and select “Show 10 Hits” -scroll down to choose a database such as “SWISSPROT- Swissprot Database” by clicking in white box -go back up to top of screen and click “Search” -result is a list of “hits” for your requested protein under “Matching Database Record”. -when the list of “hits” is produced, select one to import. Clear all the little white boxes next to the hits EXCEPT for your selection. (Only the white box next to your selection should have a check mark). -click “Import Sequence(s)” -from the menu list, scroll down and click “GOR4- Predict Secondary Structure of PS” -click the white box next to your imported sequence -click “Run” -on next screen click “Submit” -scroll down to see color-coded amino acid sequence. Refer to the key below the amino acid sequence which indicates that red is alpha helix, blue is beta sheet, and black is random coil Activity #9 Find Information on Any Enzyme -go to ExPASy Molecular Biology Server at http://www.expasy.ch -under “Databases” click on “ENZYME” -under “Access to ENZYME” -click on “by description (official name) or alternative name(s)” -type in an enzyme such as “lysozyme” in skinny box next to “Type keyword(s) you want to search for” -press enter (on your keyboard) -click on the colored number next to the enzyme name (for lysozyme it is 3.2.1.17) -result is information such as official name, alternative name, reaction catalyzed, and links to literature relating to the enzyme. Activity #10 Biochemical Pathways -go to ExPASy Molecular Biology Server at http://www.expasy.ch -under “Databases” click on “ENZYME” -under “Access to ENZYME” -click on “by description (official name) or alternative name(s)” -type in an enzyme such as “lysozyme” in skinny box next to “Type keyword(s) you want to search for” -press enter (on your keyboard) -click on the colored number (EC) next to the enzyme name (for lysozyme it is 3.2.1.17) -result is information such as official name, alternative name, reaction catalyzed, and links to literature relating to the enzyme. -in the table of information, under “Cross-References”, click on the highlighted number next to “Biochemical Pathways” (for lysozyme it is N2) -result is a diagram of the biochemical pathway including the one your enzyme catalyzes -to find out more about the enzyme, note your ECC number and go to http://prowl.rockefeller.edu/enzymes/enzymes.htm -follow the ECC number sequence next to your enzyme (for lysozyme it is 3.2.1) to find function. For example, under enzyme classes, click 3. “hydrolases”, to the right, under list of “hydrolyzing these bonds” click 3.2 “glycosidic”, to the right under “specifically cleaving”, click “3.2.1” next to O-glycosyl bonds Activity #11 Genetic Diseases Associated with a Protein -go to On-Line Mendelian Inheritance of Man- OMIM at http://www.ncbi.nlm.nih.gov/Omim/ -click “Search the OMIM Database” -type in a protein such as “lysozyme” in the box next to “Enter one or more search keywords:” -click “Submit Search” -click on the colored number next to your choice (for lysozyme it is *153450) -results give text information about the protein such as functions, location, discovery of the enzyme, information about the protein, and research. -clicking on the “ALLELIC VARIANTS” header found further down the page to obtain information about related diseases and pertinent references Activity #12 Finding Mutations Associated with a Protein -go to the Human Gene Mutation Database at http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html -click on the “Search now!” button -in the box next to “Enter keyword(s)” type in a protein such as “phenylalanine hydroxylase” -click “Search” -under “here is the result…” click on “PAH” -result is journal articles with information about mutations in the gene and a table of mutations -under “Mutation type”, click on “nucleotide substitutions (missense/nonsense)” -result for phenylalanine hydroxylase is 218 specific examples of mutations showing both the nucleotide and amino acids mutated, resulting phenotype “phenyketonuria and hyperphenylalaninaemia”, and journal references -click on the number under “Reference” to get a list of references realated to your mutation Acknowledgements: Some of the activities in this packet were adapted from a NABT workshop presented by Dr. Gerald Schlink and Dr. Scott Wells from Missouri Southern State College.