Biology 21 Macromolecular Structure Discovering the Connection between DNA Sequence and Protein Structure A gene is a sequence of DNA nucleotides that provides the information for producing a specific protein. The RNA copy of the gene contains the code words, or codons, that determine the order of the amino acids in the protein. Each codon is a series of three nucleotides that identifies a specific amino acid to be placed at a given position in the protein. A chart at the bottom of the page shows the “codon dictionary” used by all living organisms to produce proteins from gene-derived RNA sequences. In this exercise, you will 1. Use a computer simulation to show the production of a protein from a short DNA sequence. 2. Determine the amino acid sequence for an unknown human RNA sequence provided by your instructor. 3. Identify the gene for this unknown human RNA sequence using the online Human Genome database at the National Center for Biotechnology Information. Codon Dictionary UUU = phenylalanine PHE UCU = serine SER UUC = phenylalanine UCC = serine UUA = leucine LEU UCA = serine UUG = leucine UCG = serine CUU = leucine CCU = proline PRO CUC = leucine CCC = proline CUA = leucine CCA = proline CUG = leucine CCG = proline AUU = isoleucine ILE ACU = threonine THR AUC = isoleucine ACC = threonine AUA = isoleucine ACA = threonine AUG = methionine (start) MET ACG = threonine GUU = valine VAL GCU = alanine ALA GUC = valine GCC = alanine GUA = valine GCA = alanine GUG = valine GCG = alanine UAU = tyrosine TYR UAC = tyrosine UAA = stop UAG = stop CAU = histidine HIS CAC = histidine CAA = glutamine GLN CAG = glutamine AAU = asparagine ASN AAC = asparagine AAA = lysine LYS AAG = lysine GAU = aspartic acid ASP GAC = aspartic acid GAA = glutamic acid GLU GAG = glutamic acid UGU = cysteine UGC = cysteine UGA = stop UGG = tryptophan CGU = arginine CGC = arginine CGA = arginine CGG = arginine AGU = serine AGC = serine AGA = arginine AGG = arginine GGU = glycine GGC = glycine GGA = glycine GGG = glycine CYS TRP ARG SER ARG GLY Glossary of some terms you may encounter for online search (Part 3) Intron Region that interrupts a gene, does not contribute to the protein sequence Exon Part of the gene specifying the amino acid sequence, separated from other exons by introns Missense Mutation that causes a substitution of one amino acid for another in a protein. Nonsense Mutation that causes the codon for an amino acid to be changed to a stop codon, leading to a shortened protein. p Short arm of a chromosome q Long arm of a chromosome Directions 1. Using a computer simulation to show the production of a protein from a short DNA sequence a. Go to http://learn.genetics.utah.edu/content/begin/dna/transcribe/. (Or access from Otherlinks at homepage.smc.edu/colavito_mary/biology21.htm) b. Follow the onscreen directions to produce the protein-coding RNA strand (called messenger RNA). The following chart shows the correspondence between nucleotides in the gene sequence and nucleotides placed in the RNA copy. Nucleotide in DNA Nucleotide Placed in RNA A U T A G C C G c. Follow the onscreen directions to produce the amino acid sequence from the RNA. [Hint: All gene sequences begin with the “Start” codon, AUG, so that methionine will be positioned as the first amino acid in the chain.] 2. Determining the amino acid sequence for an unknown human RNA sequence provided by your instructor a. Obtain a worksheet showing an unknown human RNA sequence from your instructor. b. Produce the DNA strand that would be complementary to this RNA sequence. c. Using the codon dictionary, determine the amino acid sequence for the protein encoded by this unknown human RNA sequence. d. Record your results on the worksheet. 3. Identifying the gene for this unknown human RNA sequence using the online Human Genome database at the National Center for Biotechnology Information a. Go to http://blast.ncbi.nlm.nih.gov/Blast.cgi (Or access from Otherlinks at homepage.smc.edu/colavito_mary/biology21.htm) b. Under “Basic Blast”, choose Nucleotide Blast. c. Type your unknown human RNA sequence into the “Enter Query Sequence” box at the top, make sure that the Human Genomic + Transcript Database is selected and then select the “BLAST” button at the bottom of the screen. d. When the results are displayed, scroll down to the heading: “Sequences producing significant alignments”. Choose one of the sequences at the top of the list, and click on its Accession number. It is best to use a sequence showing a gene, transcript (mRNA), protein or disease name rather than one listed simply as “clone”, “predicted” or “human sequence”. e. When the next screen is displayed, check for information on the gene name and chromosomal location. f. If you need additional information, scroll down to the section labeled “Features”. This section has items indicated in the table below: Source Shows chromosome number and map location Protein Shows the sequence of the protein derived from this gene Gene Gene Name Synonyms Gene ID-links to general information and pubmed database articles for this gene MIM – summary of data on inheritance and molecular biology of the gene --useful for identifying mutations as described below Click on number next to MIM* Chromosomal Location will be reported as Gene Map Locus Click on the Chromosomal Location to see a table giving information on diseases caused by mutations in the gene. To find mutations in the gene: 1. Choose Allelic Variants from the OMIM menu on the left side of the screen 2. Choose any one variant that involves a single amino acid change. Ex. ARG142CYS means that cysteine replaces arginine at position 142. MIM or OMIM Online Mendelian Inheritance in Man