analyzing gene and protein sequences

advertisement
ANALYZING GENE AND PROTEIN SEQUENCES
Angelique Bosse
Montgomery Blair High School
abosse@mbhs.edu
Table of Contents
Page
Introduction
1
Scenarios
1
Activity #1 BLAST
3
Activity #2 Taxonomic Information
3
Activity #3 Gene Sequence Search
4
Activity #4 Restriction Enzyme Analysis
5
Activity #5 Find Similar Protein Sequences from Various Species
5
Activity #6 Compare the Amino Acid Sequences From Two Different Species
6
Activity #7 Phylogenetic Tree Diagrams
7
Activity #8 Predict the Secondary Structure of a Protein
8
Activity #9 Find Information on Any Enzyme
8
Activity #10 Biochemical Pathways
9
Activity #11 Genetic Diseases Associated with a Protein
9
Activity #12 Finding Mutations Associated with a Protein
10
Acknowledgments
10
Introduction
The following activities introduce students to the use of databases as a resource for gene
and protein information. The series of twelve activities can be done in order, in parts, or
can be mixed and matched to suit the objectives of the unit. The teacher may tailor the
activities to the level of students. Advanced students may solve complex, open-ended
problems and work at their own pace. Other students may work in pairs to complete
focused tasks.
Whatever the level of student, using gene and protein databases and other Internet
resources, students can perform many types of analyses on the sequences. Students may
search various databases, investigate taxonomy, and perform a number of analyses. For
example students can compare human gene/protein sequences to other species’
sequences, simulate restriction analysis, predict secondary structure of a protein, and
generate phylogenetic tree diagrams. The students may also find information on
enzymes, biochemical pathways, genetic diseases, and mutations.
Scenarios
Below are some ideas of how to approach these bioinformatics activities:
1. Phenylketonuria:
This disease, also known as PKU, is caused by an inborn error of metabolism. PKU is autosomal
recessive, which means babies born with PKU have two abnormal copies of the gene
phenylalanine hydroxylase. This gene codes for an enzyme that converts the amino acid
phenylalanine to the amino acid tyrosine. Defective enzyme results in a build-up of
phenylalanine. The high levels of phenylalanine result in mental retardation.
The disease may be detected at birth through the routine newborn blood screening. If PKU is
detected, the patient must follow a diet that restricts phenylalanine ingestion. Strict adherence to
the diet is necessary to prevent the devastating effects of PKU.
Have students search for the phenylalanine hydroxylase gene (Activity #3), perform a restriction
analysis (Activity #4), and identify and classify known mutations of the phenylalanine
hydroxylase enzyme (Activity #12). Next, have the students find a known mutation that alters a
restriction site. Ask the students to devise a restriction analysis test that could be used to detect a
particular defect in the gene.
2. Sickle Cell Anemia:
This disease is caused by a mutation in the beta chain of hemoglobin. The disease is autosomal
recessive, which means that sickle cell patients have two abnormal copies of the gene. Abnormal
hemoglobin may polymerize, especially under low oxygen conditions, and cause sickling of red
blood cells. The normally doughnut-shaped red blood cells can move through blood vessels with
ease. Abnormal sickle cells may get stuck in the narrow blood vessels. The inability of the blood
cells to flow to the tissues deprives the tissues of needed oxygen and results in organ damage.
Have students find beta globin sequences from two different organisms (Activity #5). Then have
them align the two sequences and compare them to determine similarities and differences
(Activity #6). Next, have students align several organisms’ sequences and draw rooted and
unrooted phylogenetic trees (Activity #7). The phylogenetic analysis should reveal evolutionary
relationships. Finally, have students determine the secondary structures of beta hemoglobin
(Activity #8). Students can compare the secondary structures of B hemoglobin in various
organisms.
3. Amyloidosis:
Amyloidosis is caused by an abnormal accumulation of the fibrous amyloid protein. Amyloidosis
is observed in a variety of disorders such as rheumatoid arthritis, diabetes, and Alzheimers. The
disorder, commonly seen in dogs, is often caused by lysozyme enzyme abnormalities. Although
the disease may affect any breed of dog, the disease is more common in certain breeds of dogs.
For example, beagles, collies, and Shar-Peis is are at greater risk of amyloidosis. The
accumulation of amyloid protein can occur in any organ and interferes with the normal functions
of that organ.
Have students find information about lysozyme enzyme (Activity #9). Then students should
investigate the biochemical pathway that lysozyme is a part of (Activity #10). Finally, students
can discover the diseases associated with lysozyme enzyme (Activity #11).
Activity #1 BLAST
-go to BLAST at http://www.ncbi.nlm.nih.gov/BLAST/
-click on “Basic BLAST search”
-click in the large blank box and input a random 35 base sequence (using a,c,g,t only)
-click on “Search”
-on the next screen click on “Format results”
The results give you several pieces of information:
1. references to journal articles
2. a distribution of blast hits on the query sequence (your inputted sequence is in red,
then in black are corresponding sequence similarities. If you slowly move the mouse
down the rows of black lines, you will see identifications for each hit.)
3. a list of sequences producing significant alignments with your inputted sequence
4. a diagram for each sequence showing the letters in common with your inputted
sequence
-go back to the home page at http://www.ncbi.nlm.nih.gov/BLAST/
-click on “Web BLAST tutorial” (in gray margin to the left)
-this tutorial introduces the various BLAST programs available such as blastp, blastn,
blastx, tblastn, and tblastx
Activity #2 Taxonomic Information
Taxonomic information can be accessed by either of the following two approaches:
-go to BLAST at http://www.ncbi.nlm.nih.gov/BLAST/
-click on “Basic BLAST search”
-click in the blank box and input a random 35 base sequence (using a,c,g,t only)
-click on “Search”
-on the next screen click on “Format results”
The results give you several pieces of information:
1. references to journal articles
2. a distribution of blast hits on the query sequence (your inputted sequence is in red,
then in black are corresponding sequence similarities. If you slowly move the mouse
down the rows of black lines, you will see identifications for each hit.)
3. a list of sequences producing significant alignments with your inputted sequence
4. a diagram for each sequence showing the letters in common with your inputted
sequence
-from the list of results on BLAST, click on an accession number (highlighted code number to the
left of the hits in the list of sequences producing significant alignments)
-result is information about your selection
-click on highlighted term next to “ORGANISM”
-result is information such as preferred common name, other names, and lineage.
OR
-go to NCBI Databases at http://www.ncbi.nlm.nih.gov/Database/index.html
-click on “Taxonomy” (part of the flow chart)
-the taxonomy browser enables you to search for information about all organisms that are
represented in the genetic databases with at least one nucleotide or protein
-click on the highlighted word “tree” under “the taxonomy browser”
-at this page, one could explore any taxa on the list provided, or enter a particular
organism in the white box next to “search for”.
-in the skinny white box, next to “search for”, type in “Escherichia Coli” and click “Go”
-result is lineage and option to get additional information such as genetic code translation,
nucleotide or protein sequence, and structural information
-click on name to get more information
Activity #3 Gene Sequence Search
Gene sequences can be accessed in either of the following two approaches:
-go to Entrez at http://www.ncbi.nlm.nih.gov/Entrez/
-click on “Nucleotide” (part of the diagram)
-in the skinny box next to “Search for” type in a protein name such as “phenylalanine
hydroxylase”
-click on “Go”
-the result is a list of various types of the requested sequences
-click on any of the accession numbers displayed and you will be able to view information about
that sequence such as definition, source, classification, journal articles, comments, amino
acid sequence, and the nucleotide sequence.
OR
-go to Biology Workbench at http://workbench.sdsc.edu/
-click “Enter the Biology Workbench 3.2” and enter user name and password, click “ok”
(or set up a free account first)
-scroll down and click “Nucleic Tools”
-select “Ndjinn- Multiple Database Search”
-click “Run”
-in the blank box next to “Exact Match” type in a protein such as “phenylalanine hydroxylase”
-select “Show 10 Hits”
-scroll down to select a database, click in the little white box next to your choice, such as
“GBPRI- GenBank Primate Sequences”
-click “Search”
-result is a list of “hits” for your requested sequence under “Matching Database Record”
-select a record by clicking the little white box next to all EXCEPT the one you want (a check
mark should only be next to your choice)
-select “Show Record(s)”
-result is information about the selected sequence such as locus, definition, source, references,
base count, and amino acid sequence.
-To view the nucleic acid sequence, scroll to the bottom of the screen and click “Show
Sequence(s)”
Activity #4 Restriction Enzyme Analysis
-go to Entrez at http://www.ncbi.nlm.nih.gov/Entrez/
-click on “Nucleotide” (part of the flow chart)
-in the skinny box next to “Search for” type in a protein name such as “phenylalanine
hydroxylase”
-click on “Go”
-the result is a list of various types of the protein
-click on any of the accession numbers displayed and you will be able to view information about
that sequence such as definition, source, classification, journal articles, comments, amino
acid sequence, and the nucleotide sequence.
-scroll down to the bottom of the page to see the nucleotide sequence.
-“copy” the sequence (you will be pasting it at another site)
-go to http://darwin.bio.geneseo.edu/~yin/WebGene/RE.html
-click on the toggle switch in front of “Copy and paste your sequence in the box”
-click on the white box and paste your sequence in the box
-scroll down and click “Send this sequence to analyse or customize the page”
-result is a description of your sequence including size, number of each type of base. Also, a list
of restriction endonucleases which cut your sequence, a list by base # of enzymes that
cut and how many times they cut, and restriction cuts sorted by enzyme in alphabetical
order.
Activity #5 Find Similar Protein Sequences From Various Species
-go to Biology Workbench at http://workbench.sdsc.edu/
-click “Enter the Biology Workbench 3.2” and enter user name and password, click “ok”
(or set up a free account first)
-scroll down and click “Protein Tools”
-select “Ndjinn- Multiple Database Search”
-click “Run”
-in the skinny white box next to “Exact Match” type in a protein such as “beta hemoglobin” and
select “Show all Hits”
-scroll down to choose a database such as “SWISSPROT- Swissprot Database” by clicking in
white box
-go back up to top of screen and click “Search”
-result is a list of “hits” for your requested protein under “Matching Database Record”. Hits
include the protein sequences from a variety of organisms
-to view a protein sequence (and other information) select a protein of interest
-click “Show Record(s)”.
-Scroll to the bottom to “sequence” to view the amino acid sequence.
Activity #6 Compare Amino Acid Sequences From Two Different Species
-go to Biology Workbench at http://workbench.sdsc.edu/
-click “Enter the Biology Workbench 3.2” and enter user name and password, click “ok”
(or set up a free account first)
-scroll down and click “Protein Tools”
-select “Ndjinn- Multiple Database Search”
-click “Run”
-in the skinny white box next to “Exact Match” type in a protein such as “ beta hemoglobin” and
select “Show 10 Hits”
-scroll down to choose a database such as “SWISSPROT- Swissprot Database” by clicking in
white box
-go back up to top of screen and click “Search”
-result is a list of “hits” for your requested protein under “Matching Database Record”.
-select two unrelated species (such as human and rabbit) by clearing the little white boxes next to
all EXCEPT your selections (only the white boxes next to chick and bovine should have
check marks)
-click “Import Sequence(s)”
-from the menu list, scroll down and select “CLUSTALW- Multiple Sequence Alignment”
-click the boxes next to your two imported sequences
-click “Run”
-on the next screen click “Submit”
-result is alignment of your two chosen sequences
-above the aligned sequence is a “consensus key” showing what each of the symbols means. The
amino acids are classified as fully conserved, strongly conserved, weakly conserved, or
no consensus.
-click “Import Alignments”
-from the menu, click “BOXSHADE- Color-Coded Plots of Pre-Aligned Sequences”
-click white box next to “CLUSTALW” protein and your two imported sequences
-click “Run”
-on the next screen click “Submit”
-scroll down to see color-coded plot of your sequence alignment
green= identical
blue= similar
gray= different
-click “Return”
Activity #7 Phylogenetic Tree Diagrams
-go to Biology Workbench at http://workbench.sdsc.edu/
-click “Enter the Biology Workbench 3.2” and enter user name and password, click “ok”
(or set up a free account first)
-scroll down and click “Protein Tools”
-select “Ndjinn- Multiple Database Search”
-click “Run”
-in the skinny white box next to “Exact Match” type in a protein such as “beta hemoglobin” and
select “Show All Hits”
-scroll down to choose a database such as “SWISSPROT- Swissprot Database” by clicking in
white box
-go back up to top of screen and click “Search”
-result is a list of “hits” for your requested protein under “Matching Database Record”. Hits
include the protein sequences from a variety of organisms
-select six unrelated species’ beta hemoglobin proteins (such as chick, sheep, horse, bovine, pig)
by highlighting each
-import the selected sequences by clicking “Import Sequence(s)”
-from the menu select “CLUSTALW- Multiple Sequence Alignment”
-click the little white box next to your chosen 6 sequences to analyze
-click “Run”
-on the next screen, scroll down and click “Submit”
-on the next screen, click “Import Alignment(s)”
-from the menu select “DRAWTREE- Draw Unrooted Phylogenetic Tree from Alignment”
-click lower white box next to “CLUSTALW-protein” and your six selected sequences
-click “Run”
-scroll down, click “Submit”
-scroll down to see the tree
-click “Return”
-from the menu, select “DRAWGRAM- Draw Rooted Phylogenetic Tree form Alignment”
-be sure the white box next to “CLUSTALW- protein” is checked
-click “Run”
-on the next screen, click “Submit”
-scroll down to see the tree
-click “Return”
Activity #8 Predict the Secondary Structure of a Protein
-go to Biology Workbench at http://workbench.sdsc.edu/
-click “Enter the Biology Workbench 3.2” and enter user name and password, click “ok”
(or set up a free account first)
-scroll down and click “Protein Tools”
-select “Ndjinn- Multiple Database Search”
-click “Run”
-in the skinny white box next to “Exact Match” type in a protein such as “beta hemoglobin” and
select “Show 10 Hits”
-scroll down to choose a database such as “SWISSPROT- Swissprot Database” by clicking in
white box
-go back up to top of screen and click “Search”
-result is a list of “hits” for your requested protein under “Matching Database Record”.
-when the list of “hits” is produced, select one to import. Clear all the little white boxes next to
the hits EXCEPT for your selection. (Only the white box next to your selection should
have a check mark).
-click “Import Sequence(s)”
-from the menu list, scroll down and click “GOR4- Predict Secondary Structure of PS”
-click the white box next to your imported sequence
-click “Run”
-on next screen click “Submit”
-scroll down to see color-coded amino acid sequence. Refer to the key below the amino acid
sequence which indicates that red is alpha helix, blue is beta sheet, and black is
random coil
Activity #9 Find Information on Any Enzyme
-go to ExPASy Molecular Biology Server at http://www.expasy.ch
-under “Databases” click on “ENZYME”
-under “Access to ENZYME”
-click on “by description (official name) or alternative name(s)”
-type in an enzyme such as “lysozyme” in skinny box next to “Type keyword(s) you want to
search for”
-press enter (on your keyboard)
-click on the colored number next to the enzyme name (for lysozyme it is 3.2.1.17)
-result is information such as official name, alternative name, reaction catalyzed, and links to
literature relating to the enzyme.
Activity #10 Biochemical Pathways
-go to ExPASy Molecular Biology Server at http://www.expasy.ch
-under “Databases” click on “ENZYME”
-under “Access to ENZYME”
-click on “by description (official name) or alternative name(s)”
-type in an enzyme such as “lysozyme” in skinny box next to “Type keyword(s) you want to
search for”
-press enter (on your keyboard)
-click on the colored number (EC) next to the enzyme name (for lysozyme it is 3.2.1.17)
-result is information such as official name, alternative name, reaction catalyzed, and links to
literature relating to the enzyme.
-in the table of information, under “Cross-References”, click on the highlighted number next to
“Biochemical Pathways” (for lysozyme it is N2)
-result is a diagram of the biochemical pathway including the one your enzyme catalyzes
-to find out more about the enzyme, note your ECC number and go to
http://prowl.rockefeller.edu/enzymes/enzymes.htm
-follow the ECC number sequence next to your enzyme (for lysozyme it is 3.2.1) to find function.
For example, under enzyme classes, click 3. “hydrolases”, to the right, under list of
“hydrolyzing these bonds” click 3.2 “glycosidic”, to the right under “specifically
cleaving”, click “3.2.1” next to O-glycosyl bonds
Activity #11 Genetic Diseases Associated with a Protein
-go to On-Line Mendelian Inheritance of Man- OMIM at http://www.ncbi.nlm.nih.gov/Omim/
-click “Search the OMIM Database”
-type in a protein such as “lysozyme” in the box next to “Enter one or more search keywords:”
-click “Submit Search”
-click on the colored number next to your choice (for lysozyme it is *153450)
-results give text information about the protein such as functions, location, discovery of the
enzyme, information about the protein, and research.
-clicking on the “ALLELIC VARIANTS” header found further down the page to obtain
information about related diseases and pertinent references
Activity #12 Finding Mutations Associated with a Protein
-go to the Human Gene Mutation Database at http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html
-click on the “Search now!” button
-in the box next to “Enter keyword(s)” type in a protein such as “phenylalanine hydroxylase”
-click “Search”
-under “here is the result…” click on “PAH”
-result is journal articles with information about mutations in the gene and a table of mutations
-under “Mutation type”, click on “nucleotide substitutions (missense/nonsense)”
-result for phenylalanine hydroxylase is 218 specific examples of mutations showing both the
nucleotide and amino acids mutated, resulting phenotype “phenyketonuria and
hyperphenylalaninaemia”, and journal references
-click on the number under “Reference” to get a list of references realated to your mutation
Acknowledgements:
Some of the activities in this packet were adapted from a NABT workshop presented by
Dr. Gerald Schlink and Dr. Scott Wells from Missouri Southern State College.
Download