CS 177 DNA RNA Mutations Amino acids, protein structure DNA, RNA, protein overview DNA, RNA, protein overview Questions about the genome in an organism: How much DNA, how many nucleotides? How many genes are there? What types of proteins appear to be coded by these genes? Questions about the proteome: What proteins are present? DNA RNA Mutations Amino acids, protein structure Where are they? When are they present - under what conditions? DNA, RNA, protein overview Lecture 2 * DNA and its components * RNA and its components * Mutations * Amino acids, review of protein structure DNA RNA Mutations Amino acids, protein structure DNA overview DNA deoxyribonucleic acid 4 bases Pyrimidine (C4N2H4) A = Adenine Purine (C5N4H4) T = Thymine C = Cytosine G = Guanine Nucleoside Nucleotide base + sugar (deoxyribose) base + sugar O -- - PO O P4 O OH DNA RNA Mutations Amino acids, protein structure 5’ CH2 O O 4’ 1’ H H 3’ OH H H 2’ H Numbering of carbons? sugar + phosphate Linking nucleotides 3’ 5’ 3’ Hydrogen bonds Linking nucleotides: 3’ N-H------N 3’ 3’ N-H------O The 3’-OH of one nucleotide is linked to the 5’-phosphate of the next nucleotide What next? 3’ Thymine 3’ 2nm Adenine 3’ 3’ Cytosine DNA RNA 3’ Mutations 5’ Amino acids, protein structure 3’ Guanine Base pairing 3’ 5’ A T 3’ Base pairing (Watson-Crick): C 3’ A/T (2 hydrogen bonds) G G/C (3 hydrogen bonds) 3’ Always pairing a purine and a pyrimidine yields a constant width A 3’ T 3’ DNA base composition: A + G = T + C (Chargaff’s rule) T 3’ A 3’ DNA RNA C 3’ Mutations G 5’ Amino acids, protein structure 3’ DNA conventions 1. DNA is a right-handed helix DNA RNA Mutations Amino acids, protein structure DNA conventions 1. DNA is a right-handed helix 2. The 5’ end is to the left by convention 5’ -ATCGCAATCAGCTAGGTT- 3’ sense (forward) 3’ -TAGCGTTAGTCGATCCAA- 5’ antisense (reverse) 3’ -TAGCGTTAGTCGATCCAA- 5’ 5’ -ATCGCAATCAGCTAGGTT- 3’ DNA Amino acids, protein structure 3 ’ T A G C G T T A G T C G A T C C A A 5 ’ Mutations 5 ’ A T C G C A A T C A G C T A G G T T 3 ’ RNA DNA structure Some more facts: 1. Forces stabilizing DNA structure: Watson-Crick-H-bonding and base stacking (planar aromatic bases overlap geometrically and electronically energy gain) 2. Genomic DNAs are large molecules: Eschericia coli: 4.7 x 106 bp; ~ 1 mm contour length Human: 3.2 x 109 bp; ~ 1 m contour length 3. Some DNA molecules (plasmids) are circular and have no free ends: mtDNA bacterial DNA (only one circular chromosome) 4. Average gene of 1000 bp can code for average protein of about 330 amino acids 5. Percentage of non-coding DNA varies greatly among organisms Organism DNA RNA Mutations Amino acids, protein structure small virus ‘typical’ virus bacterium yeast human amphibians plants # Base pairs # Genes 4 x 103 3 3x 5 x 106 1 x 107 3.2 x 109 < 80 x 109 < 900 x 109 Non-coding DNA 105 3000 6000 30,000? very little 200 10 - 20% > 50% 99% ? ? 23,000 - >50,000 > 99% very little RNA structure RNA 3 major types of RNA messenger RNA (mRNA); template for protein synthesis transfer RNA (tRNA); adaptor molecules that decode the genetic code ribosomal RNA (rRNA); catalyzing the synthesis of proteins ribonucleic acid 4 bases Pyrimidine (C4N2H4) A = Adenine Purine (C5N4H4) U = Uracil C = Cytosine G = Guanine Thymine (DNA) Nucleoside base Uracil (RNA) Nucleotide + sugar (ribose) base + sugar O -- DNA OH RNA 5’ CH2 Mutations Amino acids, protein structure - PO O P4 O O O 4’ 1’ H H 3’ OH H H 2’ OH sugar + phosphate Base interactions in RNA Base pairing: U/A/(T) (2 hydrogen bonds) G/C (3 hydrogen bonds) RNA base composition: A+G/ =U+C Chargaff’s rule does not apply (RNA usually prevails as single strand) RNA structure: - usually single stranded DNA RNA Mutations Amino acids, protein structure - many self-complementary regions RNA commonly exhibits an intricate secondary structure (relatively short, double helical segments alternated with single stranded regions) - complex tertiary interactions fold the RNA in its final three dimensional form - the folded RNA molecule is stabilized by interactions (e.g. hydrogen bonds and base stacking) RNA structure Primary structure A) single stranded regions formed by unpaired nucleotides Secondary structure B) duplex double helical RNA (A-form with 11 bp per turn) C C) hairpin duplex bridged by a loop of unpaired nucleotides D) internal loop D nucleotides not forming Watson-Crick base pairs E DNA RNA Mutations Amino acids, protein structure F G E) bulge loop unpaired nucleotides in one strand, other strand has contiguous base pairing F) junction B A three or more duplexes separated by single stranded regions G) pseudoknot tertiary interaction between bases of hairpin loop and outside bases RNA structure Primary structure Secondary structure Tertiary structure C D E DNA RNA Mutations Amino acids, protein structure F B A G RNA structure How to predict RNA secondary/tertiary structure? Probing RNA structure experimentally: - physical methods (single crystal X-ray diffraction, electron microscopy) - chemical and enzymatic methods - mutational analysis (introduction of specific mutations to test change in some function or protein-RNA interaction) Thermodynamic prediction of RNA structure: - RNA molecules comply to the laws of thermodynamics, therefore it should be possible to deduce RNA structure from its sequence by finding the conformation with the lowest free energy - Pros: only one sequence required; no difficult experiments; does not rely on alignments - Cons: thermodynamic data experimentally determined, but not always accurate; possible interactions of RNA with solvent, ions, and proteins Comparative determination of RNA structure: DNA RNA Mutations Amino acids, protein structure - basic assumption: secondary structure of a functional RNA will be conserved in the evolution of the molecule (at least more conserved than the primary structure); when a set of homologous sequences has a certain structure in common, this structure can be deduced by comparing the structures possible from their sequences - Pros: very powerful in finding secondary structure, relatively easy to use, only sequences required, not affected by interactions of the RNA and other molecules - Cons: large number of sequences to study preferred, structure constrains in fully conserved regions cannot be inferred, extremely variable regions cause problems with alignment Amino acids/proteins The “central dogma” of modern biology: DNA RNA protein Getting from DNA to protein: Two parts: 1. Transcription in which a short portion of chromosomal DNA is used to make a RNA molecule small enough to leave the nucleus. 2. Translation in which the RNA code is used to assemble the protein at the ribosome The genetic code - The code problem: 4 nucleotides in RNA, but 20 amino acids in proteins - Bases are read in groups of 3 (= a codon) - The code consists of 64 codons (43 = 64) - All codons are used in protein synthesis: - 20 amino acids - 3 stop codons - AUG (methionine) is the start codon (also used internally) DNA RNA Mutations Amino acids, protein structure - The code is non-overlapping and punctuation-free - The code is degenerate (but NOT ambiguous): each amino acid is specified by at least one codon - The code is universal (virtually all organisms use the same code) The genetic code Base 2 T C Phenylalanine F T Leucine L A Leucine L Isoleucine I Mutations Amino acids, protein structure Valine V Cysteine C STOP Proline P Threonine T DNA G Tyrosine Y Serine S Methionine M RNA G Alanine A Histidine H Glutamine Q STOP T C A Tryptophan W G Arginine R Asparagine N Serine S Lysine K Arginine R Aspartate B Glutamate Z Glycine G T C A G T C A G T C A G In-class exercise 1. Which amino acids are specified by single codons? methionine and tryptophan Base 3 Base 1 C A 2. How many amino acids are specified by the first two nucleotides only? five: proline, threonine, valine, alanine, glycine 3. What is the RNA code for the start codon? AUG Amino acids G Hydrophobic A V L I DNA RNA Mutations Amino acids, protein structure M F W P Amino acids Hydrophyllic S T C Y N Q K R H DNA RNA Mutations Amino acids, protein structure D E Reading frames Reading frame (also open reading frame): The stretch of triplet sequence of DNA that potentially encodes a protein. The reading frame is designated by the initiation or start codon and is terminated by a stop codon. - a reading frame is not always easily recognizable - each strand of RNA/DNA has three possible starting points (position one, two, or three): Position 1 CAG AUG AGG UCA GGC AUA gln met arg ser gly ile Position 2 C AGA UGA GGU CAG GCA UA arg trp gly gln ala Position 3 CA GAU GAG GUC AGG CAU A asp glu val arg his - mutations within an open reading frame that delete or add nucleotides can disrupt the reading frame (frameshift mutation): DNA RNA Wildtype gln met arg Mutations Amino acids, protein structure CAG AUG AGG UCA GGC AUA GAG Mutant ser gly ile glu CAG AUG AGU CAG GCA UAG AG gln met ser gln ala Up to 30% of mutations causing humane disease are due to premature termination of translation (nonsense mutations or frameshift) Mutations Mutation: any heritable change in DNA Sources of mutation: Spontaneous mutations: mutations occur for unknown reasons Induced mutations: exposure to substance (mutagen) known to cause mutations, e.g. X-rays, UV light, free radicals Mutations may influence one or several base pairs a) Nucleotide substitutions (point mutation) 1) Transitions (Pu Pu; Py Py) 2) Transversions (Pu Py) In-class exercise How many transition and transversion events are possible? 2 transitions: T C; A G 4 transversions: T A; T G C A; C G one to many bases can be involved frequently associated with repeated sequences (“hot spots”) lead to frameshift in protein-coding genes, except when N = 3X also caused by insertion of transposable elements into genes b) Insertion or deletion (“indels”) DNA - RNA Mutations Amino acids, protein structure “Weighting” of mutation events plays important role for phylogenetic analyses (model of sequence evolution) Mutations Mutations may influence phenotype a) Silent (or synonymous) substitution - nucleotide substitution without amino acid change no effect on phenotype mostly third codon position other possible silent substitutions: changes in non-coding DNA b) Replacement substitution - causes amino acid change - neutral: protein still functions normally - missense: protein loses some functions (e.g. sickle cell anemia: mutation in ß-globin) c) Sense/nonsense substitution - sense: involves a change from a termination codon to one that codes for an amino acid - nonsense: creates premature termination codon Mutation rates DNA RNA Mutations Amino acids, protein structure = a measure of the frequency of a given mutation per generation - mutation rates are usually given for specific loci (e.g. sickle cell anemia) the rate of nucleotide substitutions in humans is on the order of 1 per 100,000,000 range varies from 1 in 10,000 to 1 in 10,000,000,000 every human has about 30 new mutations involving nucleotide substitutions mutation rate is about twice as high in male as in female meiosis Mutations A single amino acid substitution in a protein causes sickle-cell disease DNA RNA Mutations Amino acids, protein structure Review of protein structure DNA RNA Mutations Amino acids, protein structure Making a polypeptide chain Review of protein structure Primary structure Proteins are chains of amino acids joined by peptide bonds Polypeptide chain The structure of two amid acids The N-C-C sequence is repeated throughout the protein, forming the backbone DNA RNA Mutations Amino acids, protein structure The bonds on each side of the C atom are free to rotate within spatial constrains, the angles of these bonds determine the conformation of the protein backbone The R side chains also play an important structural role Review of protein structure Secondary structure: Interactions that occur between the C=O and N-H groups on amino acids Much of the protein core comprises helices and sheets, folded into a threedimensional configuration: - regular patterns of H bonds are formed between neighboring amino acids the amino acids have similar angles the formation of these structures neutralizes the polar groups on each amino acid the secondary structures are tightly packed in a hydrophobic environment Each R side group has a limited volume to occupy and a limited number of interactions with other R side groups helix DNA RNA Mutations Amino acids, protein structure sheet Secondary structure Other Secondary structure elements (no standardized classification) - random coil - loop - others (e.g. 310 helix, -hairpin, paperclip) Super-secondary structure DNA RNA Mutations Amino acids, protein structure - In addition to secondary structure elements that apply to all proteins (e.g. helix, sheet) there are some simple structural motifs in some proteins - These super-secondary structures (e.g. transmembrane domains, coiled coils, helix-turn-helix, signal peptides) can give important hints about protein function Secondary structure Structural classification of proteins (SCOP) DNA RNA Mutations Amino acids, protein structure Class 1: mainly alpha Class 2: mainly beta Class 3: alpha/beta Class 4: few secondary structures Secondary structure Alternative SCOP DNA RNA Mutations Amino acids, protein structure Class : only helices Class : antiparallel sheets Class / : mainly sheets with intervening helices Class + : mainly segregated helices with antiparallel sheets Membrane structure: hydrophobic helices with membrane bilayers Multidomain: contain more than one class Review of protein structure Q: If we have all the Psi and Phi angles in a protein, do we then have enough information to describe the 3-D structure? A: No, because the detailed packing of the amino acid side chains is not revealed from this information. However, the Psi and Phi angles do determine the entire secondary structure of a protein DNA RNA Mutations Amino acids, protein structure Tertiary structure Tertiary structure The tertiary structure describes the organization in three dimensions of all the atoms in the polypeptide The tertiary structure is determined by a combination of different types of bonding (covalent bonds, ionic bonds, h-bonding, hydrophobic interactions, Van der Waal’s forces) between the side chains Many of these bonds are very week and easy to break, but hundreds or thousands working together give the protein structure great stability DNA RNA Mutations Amino acids, protein structure If a protein consists of only one polypeptide chain, this level then describes the complete structure Tertiary structure Proteins can be divided into two general classes based on their tertiary structure: - Fibrous proteins have elongated structure with the polypeptide chains arranged in long strands. This class of proteins serves as major structural component of cells Examples: silk, keratin, collagen - Globular proteins have more compact, often irregular structures. This class of proteins includes most enzymes and most proteins involved in gene expression and regulation DNA RNA Mutations Amino acids, protein structure Quaternary structure The quaternary structure defines the conformation assumed by a multimeric protein. The individual polypeptide chains that make up a multimeric protein are often referred to as protein subunits. Subunits are joined by ionic, H and hydrophobic interactions Example: Haemoglobin (4 subunits) DNA RNA Mutations Amino acids, protein structure Structure displays Common displays are (among others) cartoon, spacefill, and backbone cartoon DNA RNA Mutations Amino acids, protein structure spacefill backbone Summary protein structure Primary structure: Sequence of amino acids Secondary structure: Interactions that occur between the C=O and N-H groups on amino acids Tertiary structure: Organization in three dimensions of all the atoms in the polypeptide Quaternary structure: DNA Conformation assumed by a multimeric protein RNA Mutations Amino acids, protein structure The four levels of protein structure are hierarchical: each level of the build process is dependent upon the one below it Next week First quiz Lecture 1 - Bioinformatics definitions - The human genome project Lecture 2 - DNA structure - RNA structure - Mutations - Amino acids - Proteins DNA RNA Mutations Amino acids, protein structure