Lecture 42 Deep Sequencing of Anzick-1 KC: Determination of an individual’s complete genomic DNA sequence is feasible, rapid, and relatively inexpensive. Phases of a genomic sequencing project! o KC: regardless of the purpose of the work, all genomic sequencing projects can be broken down into three phases Template preparation Sequencing and imaging Alignment/assembly The genome of a Late Plestocene human from a Clovis burial site in western Montana o Sarah Anzick! (important author) Skeleton extract DNA Fragment DNA o Dry skeleton DNA will be in the A conformation o Rehydration during purification returns the DNA to the B conformation o Random DNA fragmentation is performed using a nebulizer or a sonicator DNA fragments are referred to as “insert” DNA Skeleton extract DNA Fragment DNA attach linker DNA o Ligase is used to attach synthetic double-stranded DNA linkers at either end of insert DNA fragments Skeleton extract DNA Fragment DNA DNA Ligase Size selection! o Size selection of DNA works bc DNA is a linear molecule DNA is negatively charged During electrophoresis DNA migrates through the agarose matrix according to its length Library of insert DNA molecules ~300 bp long Genomic insert DNA is ~200 bp of each book in the library o KC: library was constructed from the DNA prepared form 1000s of cells each base in genome will be present in multiple inserts Polymerase Chain Reaction: PCR o PCR is used to amplify library DNA o Strand separation for DNA is termed denaturing the DNA Done by heating the sample to 95 celsius o Single-stranded DNA “primers” ANNEAL to template DNA strands by complimentary base pairing to linker sequences Anneal: recombine (DNA) in the double-stranded form following separation by heat Original Denature the DNA Anneal primers DNA polymerase deoxyribotrinucleotides (dNTP) 2 copies!! EX: 1 original 2 copies denature/anneal primers/DNA synthesis 4 copies 2 PCR cycles amplifies the DNA four-fold (2^2 = 4) Phase 2: Sequencing and Imaging o “Massive Parallel Sequencing” o o How much DNA is in a human diploid genome? 6.14 billion base pair (bp) ~12 Gbases Massive Parallel Sequencing o A: P7 primers o B: denatured library DNAs are annealed to the primer on the solid support o C: Bridge amplification by fancy PCR and final denaturing step Sequencing Reaction: o 1. Sequencing primers added and annealed o 2. DNA polymerase; fluorescence tagged 3’-blocked dNTPs Fluorescence Labeled Nucleotides o o o 3. Wash/laser light/photograph (imaging) 4. Remove fluorophore and blocking group; generates 3’OH (sequencing reaction) Repetitive Steps Data collection good for 75-100 cycles Each cycle: Add fluorescent dNTP and DNA polymerase Was/laser and photograph Remove fluorophore and 3’ blocking group From Data Set to Sequence o Raw data from 1 cycle (example slide 22) Phase 3: Alignment and Assembly o Assembly of a Sequence Ensemble DNA sequence reads from different inserts Sequence alignment “Coverage” is the number of times each base appears in the ensemble of insert DNA sequences Determined DNA sequence Ensemble from Massive Parallel Sequencing o “Deep sequencing” requires >7 fold coverage of the sequence Anzick-1 sequence had an average 14.4 fold coverage Individual Variation o Alignment of an individual’s EDN1 gene with the reference sequence from the Human Genome Diversity Project o Single Nucleotide Polymorphisms (SNP) are normal variations encountered within a population Some affect the human phenotype, but many are silent SNPs set individuals apart from one another within a species Simplified Genetic Models o Human Genome Diversity Project Genetic Affinity of Anzick-1 o Highest in Latin America/South America Lecture 43 Replication “DNA Synthesis” (part I) KC: Prior to division a cell must duplicate its entire genomic DNA in a process called replication. o Each daughter cell inherits a complete complement of the genetic information. “Central Dogma” of Molecular Biology o DNA (replication/transcription) RNA (translation) Protein In eukaryotes, DNA is packaged into the nucleus o Humans have “diploid” genomes with 2 complete sets of genes in every cell Reaction Mechanism of DNA Polymerases o KC: DNA polymerase catalyzes the extension of a DNA strand ONLY in the 5’’3 direction o Requirements for DNA synthesis DNA template strand w/ a primer (RNA or DNA) annealed to it Primer must have a 3’-OH Reaction substrates are deoxyribonucleotide triphosphates (dNTP) For DNA synthesis: (dNMP)n + dNTP (dNMP)n+1 + PPi Structure of DNA Polymerase o Catalytic site for DNA synthesis is located in the “palm” of the enzyme What happens to the Parental DNA? o 3 possible outcomes of DNA synthesis Conservative replication yields an original intact DNA molecule (parent) and one entirely newly synthesized DNA Semiconservative replication yields 2 DNA molecules, each with 1 parental strand and 1 newly synthesized strand Non-conservative replication yields 2 newly synthesized DNA molecules and destruction of the parental strands Meselson and Stahl Experiment o KC: DNA synthesis is semiconservative; product DNA contains 1 parental strand and 1 newly synthesized daughter strand E. coli grown in 15N medium DNA prepared density gradient centrifugation o KC: DNA synthesis is semiconservative; product DNA contains 1 parental strand and 1 newly synthesized daughter strand E. coli grown in 15N medium shifted to 14N medium DNA prepared density gradient centrifugation Replication Fork o Leading strand DNA synthesis occurs continuously in the 5’3’ direction o Lagging strand synthesis occurs discontinuously in fragments as a series of 5’3’ reactions Products = Okazaki fragments Bidirectional DNA Synthesis o DNA synthesis begins at an origin of replication o Replication forks proceed in both directions, hence “bidirectional synthesis” Replication of a Circular DNA Molecule o KC: bi-directional, semiconservative replication yields 2 DNA molecules with identical nucleotide sequences o DNA polymerase cannot form a bond between the loose ends where the replication forks meet The “nick” is repaired by DNA ligase Bacterial Genomes and Plasmids are Circular DNAs o The over-winding of DNA results in positive supercoils o The under-winding of helix yields negative supercoils Effect of Replication on the Double Helix o Helicase activity in DNA polymerase separates the 2 strands Torsional stress from unwinding DNA at the replication fork results in overwinding of the DNA ahead of the fork causing positive supercoiling Both topoisomerase type I and type II enzymes are capable of relieving negative and positive supercoils in DNA “Gyrase” is a topoisomerase that introduces negative supercoils ahead of a replication fork Topoisomerases o Mechanism of a topoisomerase type I is a single-stranded break, pass the intact strand through the break, and seal the nick o Mechanism of a topoisomerase type II is a double-stranded break, pass an intact segment of DNA through the break, and then rejoin broken strands Prokaryotic DNA Polymerases o DNA Pol III carries out genomic replication and DNA repair NOTE: high polymerization rate, high processivity, and proofreading o DNA Pol I has roles in replication, recombination, and DNA repair NOTE: 3’5’ proofreading and 5’3’ exonuclease activities o DNA Pol II, IV, and V function in DNA repair Architecture of Prokaryotic DNA Pol III 16 o DnaB helicase is not considered part of DNA pol III, but is required for strand separation o Clamp loader! Scaffold for DNA pol III complex Assembles beta clamp onto DNA o Core polymerase catalyzes DNA synthesis o Beta sliding clamp maintains contact with the DNA Beta sliding clamp on DNA o Beta sliding clamp encircles DNA providing the means for high processivity/rapid DNA polymerization A prokaryotic beta clamp has 2 identical subunits beta clamp opens and closes like a lock washer Replication Initiation Occurs at oriC o KC: replication occurs only once per cell cycle; initiation is the regulated step in control of DNA synthesis Origin of replication “oirC” is a unique 245 bp sequence R1-5 and I1-3 are binding sites for the DnaA protein IHF/FIS are binding sites for proteins called replication initiation factors. o HU is a required DNA bending protein. DNA unwinding element (DUE) is an AT-rich segment where strand separation occurs. Replication Initiation o Binding of DnaA proteins to the R and I sites stresses the DNA helix causing localized strand separation at the DUE o DnaC loads a DnaB helicase at both ends of the bubble Binding of DnaB helicase commits the cell to replication and division. o After release of DnaC, DNA pol III’s are assembled on to both DnaB helicases. o DnaA is the released from the DNA. Primase o DNA polymerase III requires a primed DNA template o Primase is a DNA template-dependent, and primer-independent RNA polymerase. o Primase synthesizes a <9nt RNA Replication Elongation o DNA synthesized as a continuous polynucleotide chain by one core polymerase generating the leading strand. o The lagging strand DNA is synthesized as a series of Okazaki fragments o Primase functions at the replication fork just as core polymerase nears completion of an Okazaki fragment. Primase begins synthesis of a new RNA primer at either CTG or CAG. o Single strand binding protein “SSB” protects gaps in the DNA double helix. o A new clamp is loaded onto the lagging strand at each new RNA primer. o DNA pol III-dependent synthesis of an Okazaki fragment is complete when the enzyme reaches the previous primer. o Lagging strand core polymerase pauses and then releases its clamp. o Lagging strand core polymerase is then transferred to newly loaded β clamp. old clamp is abandoned o Lagging strand core polymerase initiates synthesis of the next Okazaki fragment. o clamp loader acquires a new clamp and opens it in preparation for loading onto the next primer Completing the Lagging Strand: o DNA Pol III pauses at an Okazaki primer, then abandons its sliding clamp and transfers to a newly loaded clamp. o DNA Pol I enters and uses its 5’ →3’ exonuclease to remove the primer. o As RNA is being digested DNA pol I synthesizes DNA. o DNA ligase repairs the nick linking the Okazaki fragments into a single DNA strand. Decatenation of the Chromosomes o Following completion of replication the chromosomes are in a “catenated” state. o Topoisomerase (type II) separates the two circular DNAs. Identical chromosomes are sorted into each of the daughter cells L-44: Replication (part II) Genetic Fidelity During Replication Key concept: An error in replication can introduce a mutation into the genome. The mutation will be permanent and inherited by subsequent daughter cells. 3 mechanisms to avoid mistakes during replication: o Presynthetic error control demands correct base pairing. o Proofreading removes mismatched bases. o Mismatch repair examines newly synthesized DNA and removes mismatched bases after passage of a replication fork. Presynthetic Error Control o Geometry of Watson and Crick base pairs allows them to fit into the catalytic site of DNA polymerase. Mis-paired base pairs are excluded Base Pairs Containing Tautomers o Tautomeric base pairs result from chemical rearrangements of the bonds within a base Tautomeric shifts occur spontaneously Tautomeric bases can participate in forming alternative base pairs Isomerization of the tautomeric base to a normal base results in a mismatched base pair (i.e. T=G – enol form slide 4). The geometry of some of these base pairs are compatible with the catalytic sites of DNA polymerases. Proofreading o High-fidelity DNA polymerases have two active sites. o o 1) The catalytic site for DNA synthesis. 2) A 3’ → 5’ exonuclease site for removing mis-incorporated nucleotides DNA pol I and III are high-fidelity enzymes. Other errors in replication occur Eukaryotic DNA Polymerases o Key concept: Although the details vary, the overall process of replication in eukaryotes is very similar to replication in prokaryotes. o Eukaryotic cells Cells have many DNA polymerases (~15) with specialized functions “Replisome” carries out genome replication Other DNA polymerases are involved in a variety of functions, such as DNA repair DNA synthesis is semi-conservative and bidirectional The mechanism is both template- and primer-dependent DNA synthesis is always in the 5’ →3’ direction o continuous leading strand synthesis o Replisome has both DNA Pol and DNA Pol Primase is in a complex with DNA Pol Replication initiation is very different Many of the proteins involved in eukaryotic replication are similar to those found in bacteria, but the nomenclature is different. Mechanistic differences The rate of DNA synthesis is slower (max. ~50 nt/sec) Okazaki fragments are smaller (100-200 nt) o In humans, ~50 million are synthesized during one round of replication Completion and joining of Okazaki fragments is different Replication termination is different in the details Eukaryotic chromosomes have multiple origins of replication o Coordinate regulation of origins requires “licensing” Replication elongation is similar but not identical Most DNA polymerases have both presynthetic error control and proofreading activity. Differences Unique to Eukaryotes o Eukaryotic chromosomes are linear and can be much longer than bacterial DNA molecules o Study of the architecture of genomic DNA polymerase “replisome” is ongoing o discontinuous lagging strand need to address convergence of replication forks Replication of telomeres is unique to eukaryotes. Eukaryotic Replisome: a Work in Progress o Familiar proteins with new names: DNA pol is core polymerase for the leading strand (epsilon = leading) DNA pol is core polymerase for the lagging strand (del = lagging) MCM = helicase DNA Pol -primase is a complex that contains primase for RNA primer synthesis and a separate DNA synthesis activity RFC = clamp loader PCNA = clamp RPA = single-strand DNA binding protein (SSB) DNA ligase FEN1 = endonuclease Eukaryotic Clamp: “PCNA” o Proliferating Cell Nuclear Antigen (PCNA) has three identical subunits. Multiple Origins in Eukaryotes o Humans have 46 chromosomes, so there must be at least 46 coordinately regulated origins of replication Eukaryotic: Origin of Replication o Key concept: Origins of replication on all chromosomes must be activated once -- and only once – during S phase of the cell cycle. o Each of our 46 chromosomes contains 1000s origins of replication o On average origins appear to be ~25000 bp apart o Locations of origins may vary from one cell type to another o Generally an origin of replication will be an AT-rich element o Generally origins are associated with genes. Licensing Coordinate Replication Initiation o Origin of replication complexes (ORC) bind tightly to DNA in early G1 phase o During mid-G1, cell division cycle 6 (CDC6) joins the ORC, followed by Cdt1 o Mini Chromosome Maintenance (MCM) joins the ORC in late G1 o In S phase, replication is initiated by phosphorylation of complex proteins by a cell cycle-dependent kinase (cdk) and another protein kinase (DDK) o replisome is assembled and bidirectional DNA synthesis initiated. Lagging Strand Synthesis o Replication elongation is essentially the same as seen in prokaryotes o An important difference is that the replisome uses 2 different core enzymes o DNA Pol synthesizes the leading strand DNA Pol makes the lagging strand big difference is in completion of the lagging strand! DNA Pol is a “strand-displacing” DNA polymerase FEN1 is flap endonuclease-1 that clips off the overhang DNA ligase repairs the nick left in the sugar-phosphate backbone to complete the joining of two Okazaki fragments. Replication Termination o Two replication forks approach each other from opposite directions along a eukaryotic chromosome o DNA becomes heavily supercoiled o Supercoiling stalls the both replication forks o Type II topoisomerases eases the supercoiling o DNA Polprimase functions as a DNA polymerase to synthesize DNA across the gap o DNA ligase repairs nicks in both strands completing the DNA o Type II Topoisomerase resolves the final structure separating thus chromosomes. Telomere: the End of a Chromosome o Telomeres are specialized “T loop” DNA structures found at both ends of a eukaryotic chromosome o o Telomeres function to protect the chromosome from cellular exonucleases and DNA repair enzymes Telomeres consist of 10,000 bp of a short repeat sequence and have several specific proteins bound to them repeat in a human telomeres is TTAGGG. Telomerase is a Special Reverse Transcriptase o Telomerase has an internal template RNA with 1½ copies of the CA repeat Template RNA anneals to the existing TG sequence at a telomere The process is repeated many times Following removal of the primer, the overhanging 3’ end base pairs to the CA strand forming a T-loop Telomerase catalyzes 5’ 3’ DNA synthesis of the TG strand Following each round of TG synthesis, telomerase shifts so that the internal template RNA anneals to newly synthesized DNA Complimentary CA strand of a telomere is synthesized by DNA Pol primase acting as a DNA polymerase Lecture 45: Prokaryotic Transcription and Gene Control Transcription DNA-template dependent synthesis of RNA Catalyzed by RNA Polymerase Begins at a Promoter Ends at a Terminator Cis- and Trans-acting Factors Key concept: Trans-acting factors are diffusible, so they can function at multiple sites in a genome. Usually, trans-acting factors are DNA binding proteins (i.e. transcription factors), but some non-coding RNAs are also trans-acting factors. Cis-acting elements are closely tied to the gene. Typically, a cis-acting element is a DNA sequence. Ribonucleic Acid (RNA) RNA is a linear single-stranded polynucleotide chain with: o a sequence always read 5’ →3’ o a ribose-phosphate backbone o uracil (U) in place of thymine (T) o intramolecular base pairing that yields complex secondary and tertiary structures. Note: Double stranded RNA and DNA-RNA hybrids have structures similar to ADNA. Glossary of RNAs in a Prokaryotic Cell Messenger RNA (mRNA) houses a sequence of bases that encodes primary AA sequence for a protein o mRNA serves as the template for translation by a ribosome Transfer RNA (tRNA) carries an amino acid into the catalytic site of a ribosome. o tRNA base pairs to mRNA to ensure selection of the correct AA for incorporation into a nascent polypeptide chain. Ribosomal RNAs (rRNA) are structural components of a ribosome, the enzyme that catalyzes translation Terms, Jargon, and Gene Coordinates Transcription starts at a promoter and ends at a terminator Finished RNA molecule = primary transcript. The transcription start site of a gene is always +1. o Key concept: In bacteria, primary transcript is used as an mRNA without further modification. RNA Transcript Resembles the Coding Strand DNA a strand is said to be the “reverse complement” of the DNA coding strand and the RNA primary transcript. The Primary Transcript Sequence is Similar to the DNA Coding Strand! Similar sequence, except replace thymine with uracil Prokaryotic Promoters Key concept: A promoter is a cis-acting element in the genome where RNA polymerase binds to initiate transcription. Bacterial promoter sequences are located on the coding strand: o the -35 Region o the -10 Region o spacer between the -35 and -10 Regions o most transcripts (RNAs) start with a purine (+1) often within the sequence CAT. The bacterial consensus promoter: TTGACA-N -TATAAT-N -CAT o Note: Promoters have an orientation. As diagrammed here, the red is template strand and the blue is coding strand Prokaryotic Genes Key concept: A gene is a unit of heredity. o some function in the organism o Most genes encode the info needed to make a protein o gene includes the DNA encoding the protein and the regulatory elements needed for its transcription. Open reading frame (ORF) = sequence of bases that encodes the primary sequence of a protein. Organization of Bacterial Genomes Operons are coordinately regulated gene clusters o ORFs encoding proteins are arranged 5’ → 3’ in a transcription unit o Use of one promoter and terminator yields a polycistronic mRNA with multiple ORFs each encoding a different protein. Constitutive Promoter Strength Depends on Sequence Key concept: Similarity to the consensus sequence = major determinant of the rate of transcription initiation from a constitutive promoter o “Strong promoters” = very high sequence identity with promoter consensus sequence o “Weak promoters” = several base differences o mutation in a promoter that moves away from the consensus sequence decreases the rate of transcription initiation. RNA Polymerase E. coli has only one RNA polymerase (465 kD), and it catalyzes transcription only in the 5’→3’ direction o catalytic mechanism is similar to a DNA polymerase RNA polymerase holoenzyme is responsible for transcription initiation and synthesis of first 10 nucleotides of an RNA chain. Holoenzyme consists of the 2’ subunits: o Subunit recognizes a promoter o Subunits 2 (40 kD) are essential for enzyme assembly and are involved in interaction with activators o Subunits /’ form the catalytic core o Subunit provides structural stability. RNA polymerase core enzyme (2’ carries out transcription elongation. E. coli has Seven Sigma Subunits 70 holoenzyme binds to most of the promoters in the E. coli genome o Other factors function on promoters for genes with highly specialized functions. Promoters recognized by the alternative σ have different consensus sequences. Transcription Initiation RNA polymerase is DNA template-dependent but primer-independent. The substrates are ribonucleotide triphosphates (NTPs) o Holoenzyme binds to a promoter forming the “closed complex” o Unwinding of DNA 12-15 bp forms a transcription bubble converting a closed complex to an “open complex” o Holoenzyme initiates RNA synthesis and synthesizes about 10 nt. -- Rate of synthesis for those first 10 nt is ~1 nt/sec. Transcription Elongation Dissociation of factor yields core enzyme allowing RNA polymerase to complete “promoter clearance” N o NusA protein replaces Transcription elongation rate by core enzyme accelerates to ~ 50-90 bases/sec o rate of transcription elongation can be slowed by formation of RNA secondary structure in the transcript Transcription termination results in release of RNA and dissociation of core enzyme from DNA. Mechanism of RNA Polymerase RNA polymerases use essentially the same mechanism as DNA polymerases RNA polymerases do not have proofreading exonuclease activity (fixing the ends), so the error rate is about 10E-4 to 10E-5 Core Enzyme in Transcription Elongation transcription bubble is 12-15 bases DNA-RNA hybrid extends ~8 bp Topoisomerases relieve supercoiling o “nascent RNA” near RNA exit Rho-Dependent Termination Key concept: Termination of transcription requires Rho ( protein o is a hexameric protein that binds a nascent RNA chain at a rut site o is a RNA helicase that translocates along the RNA 5’ →3’ o RNA polymerase pauses in response to formation of a secondary structure o Transcription terminates when makes contact with RNA polymerase. Key concept: termination signal resides in the nascent RNA chain o Termination occurs due to formation of a stable hairpin structure followed by a series of 7 U’s. -- “rho-independent terminator.” …AAGGGCCCAUUAGGGCCCUUUUUUU o Hairpin formation results in only a few weak U=A base pairs between transcript RNA and DNA. DNA/RNA is unstable, so the RNA dissociates terminating transcription No protein factor is needed! DNA Double Helix Key Concept: Gene expression is dependent on sequence-specific binding of proteins transcription factors to DNA. Exposed Chemical Groups in Base Pairs Key Concept: Each base pair presents a unique set of chemical groups in the major groove o CG and AT pairs can be distinguished in the minor groove. Sequence-Specific DNA Binding Proteins Key concept: molecular basis of DNA sequence-specific binding by a DNA binding protein to DNA are chemical bonds between AAs in a recognition helix and the base pairs. o α-helix of DNA binding protein is positioned within the major groove o “recognition helix” participates in H bonds & van der Waals interactions w/ base pairs o In prokaryotes (i.e. bacteria), predominant DNA binding domain motif for positioning a recognition helix is called the helix-turn-helix o Formation of stable DNA-Protein complex is dependent on additional non-covalent chemical bonds outside the recognition helix-base pair contacts Examples of DNA-Protein Interactions T=A with Glutamine/Asparagine (TAGA) C=G with Arginine (CGA) Helix-Turn-Helix Motif helix-turn-helix motif is defined by 2 α helices 2nd helix = recognition helix DNA binding domain of bacterial transcription factor helix-turn-helix motif o found in some eukaryotic DNA binding proteins Note that binding of a protein to DNA results in “induced bend” in the axis of the DNA Dimeric DNA Binding Proteins Homodimeric DNA binding proteins are common o Recognition helix exists on both subunits Thus, sequence is a palindrome! Same sequence of bases is seen 5’→3’ along both strands Why dimers and oligomers? o improved specificity & stability Inducible Promoters Key Concept: The primary reason for gene regulation in bacteria is to respond to changes in the environment, such as nutrient availability. Regulation of the rate transcription initiation is the most important step determining whether a gene is expressed. Regulated genes/operons equipped w/ inducible promoter! (can be turned on/off) Activators are transcription factors that increase rate of transcription from a promoter Repressors are transcription factors that decrease rate of transcription from a promoter. In bacteria, both activators and repressors have helix-turn-helix DNA binding domains. Positive Regulation Activator requires ligand “inducer” to bind DNA Ligand prevents activator from binding to DNA Activators recruit RNA polymerase to a promoter. Negative Regulation Repressor requires ligand to bind DNA Ligand “inducer” prevents repressor from binding to DNA. Repressors inhibit transcription by RNA polymerase! Regulatory Elements Key concept: Bacterial promoters = inherently in active state available for transcription most prokaryotic gene regulation occurs as a result of repressor binding to a site proximal to an inducible promoter Repressor binds to an operator either within the promoter or downstream of the promoter o Repressor bound atop promoter sequence blocks RNA polymerase from binding to promoter o Repressor bound downstream inhibits promoter clearance! Only want repressor bound in beginning (upstream) to block transcription Key concept: Some prokaryotic gene regulation results from binding of an activator to a site proximal to an inducible promoter. o Activator binds to positive regulatory element located upstream of a promoter o Activator recruits RNA polymerase to a weak promoter. Structural Genes of the lac Operon All 3 genes in the operon function in lactose metabolism Glucose = preferred carbon source for E. coli consumed ahead of lactose o In the absence of glucose, adenylate cyclase produces cAMP. -galactosidase catalyzes two reactions: o Lactose (12C) → galactose (6C) + glucose (6C) o Lactose (12C) → allolactose (12C) Regulation in the lac Operon Lac operon has 3 genes: lacZ, lacY, lacA (ZYA) Negative regulation o lacI gene is located upstream of the lac operon and encodes the lac repressor lac repressor binds the operator in the absence of allolactose (or IPTG) operator site is located at the transcription start site Positive regulation o cAMP receptor protein (CRP) binds DNA in the presence of its inducer, cAMP o CRP binding site located upstream of lac promoter lac Repressor is a Tetramer 4 identical subunits: a “homotetramer” forms a “dimer of dimers” o All 4 subunits have helix-turn-helix motifs for sequence-specific binding to 2 separate palindromic operator sites o Inducer binding pocket is located b/t globular domains far from DNA binding domain lac Operon has Three Operator Sites O1 at lac promoter is the highest affinity binding site both O2/O3 are low affinity sites because of differences from consensus lac repressor recognition sequence o lac repressor binds to O1 & either O2/O3 Occupancy of O1 site is responsible for repressor activity Mutation destroying O1 results in a loss of regulation of the lac promoter. o NO O1 = NO REG CRP binds as a Dimer CRP = activator; its inducer cAMP is present only under low gluc. conditions o Homodimeric helix-turn-helix protein o CRP-cAMP binds 5’ to the promoter o lac promoter is “weak” because it differs from the consensus promoter o CRP recruits RNA polymerase to the lac promoter L-46: Eukaryotic Genes, Transcription and Gene Control (part 1) Key concept: In complex multicellular organisms, each cell type expresses a unique set of genes from an identical DNA sequence in the genome o gene regulation defines the properties of each cell! Genome size o In humans, DNA contains over 3 billlion bp (haploid) Number of chromosomes = 46 (diploid) Approx. # of genes = 29,000 (haploid) o In E. coli, DNA contains over 4 million bp Number of chromosomes = 1 Approx. # of genes = 4435 DNA in Eukaryotes o Chromosome structure & DNA content changes during cell cycle Cell division yields 2 diploid daughter cells w/ identical DNA content G1 = diploid S = DNA replication G2 = tetraploid (G2*2 = 4 (tetra)) The Human Genome o Key concept: A gene is a unit of heredity Includes DNA encoding a functional RNA/protein along w/ regulatory elements controlling expression Every gene has a specific location in the genome EDN-1 gene is at 6p24. Eukaryotic Chromatin Key concept: Chromatin = chromosomal material in a cell o consists of DNA and the proteins bound to the DNA. The most abundant proteins are the histones. DNA in eukaryotic nucleus is packaged into higher order structures o very little naked DNA o 10 nm fiber o 30 nm fiber o organized as loops of chromatin from the nuclear scaffold Chromatin in Interphase Nucleus Heterochromatin = dark staining matter in a nucleus (“Diff” Dark) o Level of staining reflects condensed chromatin structure o DNA contains fully inactivated genes & many types of repetitive DNA (telomeres/centromeres) Euchromatin = light staining material (“true” light) o Level of staining suggests a more open chromatin structure o Genes in euchromatin are available for transcription DNA-Histone Interaction Key concept: nucleosome = basic unit of chromatin. o Nucleosomes have a repeating unit of ~ 200 bp o “beads on a string” refers to the 10nm fiber 147 bp of DNA wraps twice around each histone core ~ 50 bp spacer DNA connects nucleosomes. DNA-histone core contacts are sequence-independent o Electrostatic interactions & H-bonds occur b/t the positively charged histone proteins and the negatively charged sugar-phosphate backbone of DNA Structure of the Histone Core Histone core contains 2 copies each of histones H2A, H2B, H3 and H4 o Note: “histone tails” H1 locks DNA to the nucleosome The 30nm Fiber 30 nm fiber = first level of org. for higher order chromatin o 2 chromatin fibers coiled around one another! Organization of Mammalian Genomes Key concept: Humans have two copies each of our ~29,000 genes o About 21,000 genes encode proteins Only 1.5% of human DNA encodes protein (i.e. exons) o Most of the rest (~8000) encode functional non-coding RNAs tRNA, rRNA, snRNA, miRNA, lncRNA etc. o Non-expressed DNA includes: introns, repetitive sequences, and transposons “Gene Expression” Key concept: Eukaryotes final level of expression of func’nl protein is regulated at many levels o Transcription o RNA processing o mRNA turnover o translation o posttranslational modification o cellular trafficking o protein turnover Glossary of RNAs in a Eukaryotic Cell Messenger RNA (mRNA) encodes AA sequence for a protein o Primary transcript of a gene undergoes RNA processing to generate mature mRNA (can be used by ribosome) Transfer RNA (tRNA) delivers AA to ribosome Ribosomal RNAs (rRNA) = ribosomal components Small nuclear RNA (snRNA) = components of the spliceosome -- enzyme that catalyzes intron removal during RNA processing MicroRNA (miRNA)/small interfering (siRNA) act on mature mRNAs to decrease translation (mRNA) Small nucleolar RNAs (snoRNA) = small RNA molecules that guide posttranscriptional base modifications in tRNAs, rRNAs and snRNAs. Long non-coding RNA (lncRNA) = RNA molecules that do not encode protein o Some lncRNAs influence gene expression Eukaryotic Cells: Three RNA Polymerases RNA polymerase I synthesizes ribosomal rRNA RNA polymerase II makes mRNA and some small RNAs o RNA polymerase II “RNA pol II” = large multi-subunit enzyme Catalytic mechanism is the same RNA Pol II does not identify promoters -- must be recruited by transcription factors (more specifically activators) RNA polymerase III generates tRNA and other small RNAs. RNA Polymerases subunits homologous [to prokaryotic core enzyme are present in eukaryotic RNA polymerases] o C-terminal domain (CTD) L-47: Eukaryotic Gene Transcription (part 2) Key concept: The combination of genetic and epigenetic effects determine whether a gene is transcribed in each cell o “Epigenetic” marks influence chromatin structure. o “Genetic” effects are associated with the sequence of bases in a gene’s cis-acting elements These sequence elements serve as binding sites for transcription factors that regulate the gene Eukaryotic Gene Organization Key concept: Eukaryotic genes stand alone in single transcription units. o Each gene has its own promoter(s) + terminator(s). o Genes are organized with intron-exon structure Exons code for sequence that will be included in a mature mRNA Intron sequences will be eliminated during RNA processing Primary transcript includes both introns + exon sequences Primary Transcripts Key concept: Introns are common in mammalian genes o Primary transcripts of mammalian genes most often contain more intron sequence than exon sequence. More intron = more useless junk! Only 3% of yeast genes have an intron -- most only one 92% of human genes have an intron o Exon lengths average ~170 bp o Intron lengths vary greatly Most are 100-5000 bp Maybe 10% are >11,000 bp longest known human intron is 1.1 kbp in a gene coding for a K channel on chromosome 4. Overview of Eukaryotic Gene Expression Key concept: Activation of a gene requires both open chromatin structure and binding of transcription factors (activators) that recruit RNA polymerase II to the promoter. Eukaryotic Chromatin Key concept: Chromatin structure is a product of the combined actions of epigenetic marks, such as histone modifications and DNA methylation, and trans-acting protein factors that bind to epigenetic marks or to DNA. Condensed chromatin is “closed” and viewed as transcriptionally silent o o inaccessible to most activators and chromatin remodeling complexes accessible only by “pioneering” transcriptional factors. Active transcription occurs in regions of “open” chromatin in the 10 nm fiber conformation. Modifications of the Histone Tails Histone tails provide areas for interaction b/t nucleosomes o Histone tails include the N-termini of histones H3 and H4, plus both the N and C termini of H2A and H2B o Posttranslational modifications of the histone tails regulate chromatin structure Hypermethylation is associated with closed chromatin (meth closed) Histone acetylation is found in areas of open chromatin (his open) Histone Modifying Enzymes Histone acetyltransferase (HAT) acetylates lys in the histone tails Histone deacetylase (HDAC) removes acetyl groups from the histone tails Histone methyltransferase (HMT) methylates lys and arg in the histone tails Actions of HATs & HDACs oppose each other acetylation of histone tails is readily reversible. o DNA methylation more stable! Histone Code Key concept: Many different [known] posttranslational modifications of the histone tails o Some associated w/ closed and others with open chromatin o Histone may have several modifications Transcribed gene vs. silenced gene Histone Acetylation Favors DNA Accessibility! Histone acetyltransferase (HAT) is recruited to a gene locus by transcription factor and then acetylates surrounding histones. Acetylation Alters Nucleosome Interactions Acetylation neutralizes positive charge on lysines in histone tails. Loss of the charge: o reduces electrostatic attraction b/t nucleosomes (repels) o weakens electrostatic interactions w/ DNA o favors recruitment of a chromatin remodeling complex! Transition to Open Chromatin Chromatin remodeling complexes like as SWI/SNF manipulate nucleosomes o SWI/SNF is recruited to local sites in the genome SWI/SNF either binds to acetylated histone or is recruited to a gene by proteinprotein interactions with a pioneering transcription factor Chromatin remodeling complexes alter chromatin structure by: unwrapping DNA from nucleosomes repositioning nucleosomes evicting nucleosomes Chromatin remodeling “spreads” to surrounding areas of the genome changing accessibility of the DNA for activator proteins Active Promoter Free of Nucleosomes Key concept: combined actions of HATs + chromatin remodeling complexes (i.e. SWI/SNF ) together lead to open chromatin structure transcription start site is devoid of nucleosomes to make room for assembly of the transcription preinitiation complex HATs bind near promoters and act to maintain open chromatin structure Cis-acting Elements in Mammalian Genes Key concept: Mammalian genes have many cis-acting elements that provide binding sites for transcription factors o Some are promoter proximal elements, others called enhancers are at distant locations along the chromosome. transcription start site within promoter region Promoter proximal elements often located upstream o specific transcription factors bind to these sequences Enhancers located at promoter distal positions, cover ~200-500 bp, and contain several different transcription factor binding sites human genome has 21000 protein coding genes, and perhaps as many as one million enhancers! Eukaryotic Sequence Specific DNA Binding Motifs Key concept: Transcription factors = sequence specific DNA binding proteins o These proteins typically have a modular design with a DNA binding domain (DBD) and an activation domain (AD) DBD designed to position recognition helix into major groove forming chemical bonds w/ base pairs. ~ 80% of eukaryotic sequence-specific DNA binding proteins have one of the following motifs: o helix-turn-helix o similar motif called homeodomain o 1/2 zinc finger motifs: classical zinc finger nuclear receptor zinc finger o 1/2 extended dimerization domains motifs: leucine zipper “bZip” proteins helix-loop-helix proteins! Homeodomain Proteins Homeodomain motif named for a group of proteins important for developmental processes (“homeotic” proteins) o Motif is superficially similar to helix-turn-helix o homeodomain has three α helices 3rd helix = recognition helix Classical Zinc-Finger Motif Consensus sequence of classical Zn fingers o … C-X 5 - C-X 3-(F/Y)-X 5-L-X2-3 - H-X3-4 - H … Major structural features are: o 2 strands and an helix o Zn coordinated by 2 cys & 2 his o conserved phe/tyr & leu form a “strut” positioning the recognition helix. Zn fingers can be used to recognize many different DNA sequences. o Recognition helix AAs interact w/ base pairs Multiple Zn Fingers in One Protein Some Zn finger proteins have single Zn finger motif & bind as dimers/larger oligomers Other proteins can have many Zn fingers arranged in tandem o Each recognition helix makes sequence-specific bonds w/ base pairs these proteins bind as monomers Zinc Fingers Key concept: The recognition helix AAs facing the DNA major groove can be different for each Zn finger. Therefore, each Zn finger recognizes a different base sequence Non-palindromic sequence Nuclear Receptor Zn Finger Nuclear receptor transcription factors bind DNA as dimers o Some are homodimers, others are heterodimers o Both subunits contribute a recognition helix o Each subunit has two Zn fingers: Only first Zn finger has recognition helix second is a structural feature supporting recognition helix Four cys coordinate each Zn in both Zn fingers Dimerization Domains and DNA Binding Leucine zipper proteins have a series of leucines aligned along helix o leucines participate in protein-protein interactions for dimerization o recognition helices are extensions of leucine-containing helices. Helix-loop-helix proteins also have a dimerization domain o extensions the helices form the recognition helices L-48: Eukaryotic Gene Transcription (part 3) Key concept: Capacity for fine control of the level of gene expression in a cell is central to multicellular organisms w/ complex genomes. Overview of Eukaryotic Gene Expression Key concept: Once chromatin structure is open, transcription factors (activators) bind specifically to DNA and recruit RNA polymerase II o RNA pol II cannot locate promoter sequences No activators no transcription Activation of Transcription o o Key concept: In eukaryotes, most gene regulation is positive, because activators are required to recruit RNA pol II to a promoter. Before initiation of transcription, a very large protein complex must be assembled on cis-acting regulatory elements of the gene. Assembly is a multistep process requiring: binding of transcription factors (activators) to promoter proximal site and enhancers recruitment of coactivator proteins Include other transcription factors, HAT, and chromatin remodeling complex recruitment of mediator assembly of transcription pre-initiation complex Eukaryotic Activators Key concept: Activators are transcription factors that exert positive gene regulation. In eukaryotes, they have a modular design with a sequence-specific DBD (Zn finger), flexible hinge, and an activation domain (AD). AD = area of protein-protein interaction w/ other proteins needed to transcribe a gene. o These include: o Transcription factor Coactivators HAT chromatin remodeling complex mediator preinitiation complex Activator DBD (i.e. zinc finger) binds to a specific DNA sequence. Activator binding may occur at either a promoter proximal site or in an enhancer (distal) Gene Activation by Glucocorticoid Receptor Transcription factor “GR” = nuclear receptor acting as an activator for many mammalian genes. Nuclear receptors have characteristic DBD composed of 2 zinc fingers from each dimer subunit o GR binds to a hormone response element (HRE) o HREs can be located at promoter proximal sites or in enhancers. In this example, GR recruits a coactivator (Hic-5) that binds to its AD o The GR-coactivator (Hic-5) complex subsequently recruits: coactivator with HAT activity (p300-CBP) mediator (MED1) indirectly the preinitiation complex and RNA polymerase II. Note that the only sequence‐specific binding is through GR ‐DBD! o AD (activation domain) + DBD (DNA binding domain) = SSB (sequence-specific binding) Transcription Factors also bind Enhancers Key concept: Enhancers contain recognition sequences for several different transcription factors. Together these transcription factors account for enhancer function. For example, several transcription factor binding sites are clustered in the human interferon β gene enhancer. These are bound by: o the interferon regulatory factors IRF3 and IRF7 o the common transcription factors Jun/ATF2 and p50/p65 (NF κB). Enhancer activity depends on DNA bending by high mobility group (HMG) proteins. Mediator Key concept: Although some transcription factors directly interact with the preinitiation complex, mediator provides primary means of communication b/t activators & preinitiation complex (i.e. indirect interaction w/ preinitiation complex) Mediator = large protein complex with >30 subunits o Many mediator subunits make protein-protein contact w/ transcription factor ADs o Mediator also makes protein-protein contacts w/ general transcription factors of preinitiation complex. Transcription Preinitiation Complex TFIID binds at promoter (dp) TFIIA may join the complex TFIIB binds to DNA / TBP TFIIF/RNA pol II joins complex by binding to TFIIB TFIIE / TFIIH enter complex in succession TFIIH is a complex with 2 distinct functions: o DNA helicase generate transcription bubble (heli-bubble!) o Protein kinase phosphorylates RNA pol II CTD to initiate transcription TFIID finds Promoters TFIID is a complex made up of TBP + many TAF proteins o TATA Binding Protein (TBP) locates and binds to TATA boxes in eukaryotic promoters. TATA boxes are present in only about 10% of promoters TBP is an example of a minor groove DNA binding protein o 13 TBP-associated factors (TAFs) function in vivo in recognition of other sequence elements, such as the Inr and DRE sites. HO Gene Activation Order of events leading to HO gene activation in yeast: o Pioneering transcription factor SWI5 is an activator that binds to an upstream enhancer (-1200 to -1400). o SWI5 recruits chromatin remodeling complex SWI/SNF to open the chromatin exposing the histone tails o GCN5 complex (HAT) enters to acetylate histones continuing the process of chromatin de-condensation o SBF activator binds at several sites in the HO gene 5’ promoter proximal regulatory region o SBF recruits mediator to interact with the preinitiation complex o Preinitiation complex including RNA Pol II is assembled at nucleosome-free HO gene promoter leading to transcription initiation. Of Mice and Men Genome wide studies in humans show single nucleotide polymorphism (SNP) at -355 kbp in human kit ligand (KITLG) locus o Chromosomal abnormality (inversion) upstream of the murine Kitl gene Altered LEF Binding Site Changes Kitl Expression (blondes) A →G SNP inhibits binding of activator LEF to enhancer o Kitl gene expression is reduced enough to make visible difference in coat color (blondes) Combinatorial Control Key concept: level of transcription of a gene depends on occupancy of its cis-acting regulatory elements by transcription factors. o For transcription, binding of several activators to a gene is required o Coordinate action b/t factors = combinatorial control Note: eukaryotic genes typically have 6+ regulatory sites. Negative Gene Regulation in Eukaryotes Although eukaryotic gene expression is largely positive, negative gene regulation is also important. Repressors are transcription factors that inhibit transcription by: o Competitive binding to activator binding site = displacing the activator o binding to activator = prevent interaction w/ mediator o o altering assembly of preinitiation complex providing docking site for HDAC Repression by Chromatin Modification mechanism for gene transcription down-reg. often involves modulation of chromatin structure o repressor has a DBD (i.e. Zn finger) that binds to negative regulatory element in gene o repression domain “RD” recruits histone deacetylase (HDAC) to remove acetyl groups from the histone tails. H4K16 in Chromatin Structure HDAC targets H4K16ac catalyzing deacetylation to H4K16 o Positive charge at H4K16 favors conversion of 10 nm →30 nm fiber electrostatic interactions occur b/t H4K16 + acidic AAs on histones H2A and H2B of the adjacent nucleosome The Combination of Regulatory Proteins Dictates Gene Expression weak activator + strong activator + strong repressor = activator neutralized by corepressor DNA Methylation: Another Epigenetic Mechanism Affecting Transcription Key concept: Hypermethylation of a CpG island silences promoter. Cytosine in the sequence CG called a “CpG” is methylated by DNA methyltransferase (DNMT) at many sites within the genome o Deacetylation or Methylation = prevent transcription o Some promoter proximal regions of genes have a cluster of CG base pairs [“CpG island”] CpG island is a >200 base pair GC-rich element in which observed:expected of CpG > 60%. DNA Methylation Methyl group of 5-methyl cytosine protrudes into major groove o Binding of some transcription factors is blocked by methylation, others are indifferent to it, and some require methylation to bind. o Methylation affects factor binding! Gene Silencing in Heterochromatin H3K9me3is associated with condensed heterochromatin. Heterochromatin Key concept: Condensation to heterochromatin silences genes. Condensation depends on actions of HMTs + chromatin associated proteins o Histone methyltransferase (HMT) methylates H3K9 →H3K9me3 o H3K9me3 = docking site for heterochromatin protein (HP1) o HP1:HP1 protein-protein interactions compact the chromatin o HP1 recruits more HMT “spreading” the heterochromatin Transcriptional Control of Pax6 Gene Human Pax6 gene has many regulatory features typically found in regulated mammalian gene o 3 promoters allow for development-dependent regulation, and defects in Pax6 are associated with aniridia (i.e. iris development defects) o Multiple enhancers direct expression from promoters in eye, brain, spine and pancreas Different complements of transcription factors bind to these enhancers to activate tissue-specific transcription. Promoter Proximal Factors that Regulate EDN1 Key concept: same gene may be subject to very different regulatory mechanisms in different cell types o stimulus for expression in one cell may not affect expression in another cell/organ. Different cell types vary in their responses to various signals o Each cell type has a different complement of receptors o Binding a signaling molecule to its receptor activates second messenger systems o As a result, transcription factors bind to response elements activating the gene (or not). L49: Post-transcriptional RNA Processing Makings of an RNA! Key Concept: In eukaryotes, transcription yields primary RNA transcript (pre-mRNA) that is a RNA copy of DNA coding strand. This primary transcript must undergo multiple levels of RNA processing to generate a translation-ready mature RNA transcript (mRNA) Components of a mature RNA transcript (mRNA) Key Concept: A mature mRNA has 5’ & 3’ untranslated region (UTR) flanking the protein coding sequence. This encodes the primary AA sequence for the proteins. The 5’ end has a “cap” and the 3’ end has a “poly-A tail.” pre-mRNA undergoes multiple RNA processing steps in the nucleus to produce a mature RNA molecule (mRNA) ready for translation mRNA is exported to the cytoplasm for translation Generating a mature mRNA requires several steps Key Concept: RNA pol II does not distinguish between coding (exons) and non-coding (introns) DNA sequences o primary RNA transcript (pre-mRNA) contains both! o Need to differentiate b/t coding and non-coding sequences RNA processing occurs co-transcriptionally in “transcription factories” Key Concept: Transcription occurs at discrete sites within the nucleus termed “transcription factories” o RNA-processing occurs on-site, co-transcriptionally The CTD of RNA Pol II serves as a docking site for RNA processing enzymes YSPTSPS repeat is an unstructured segment in the Cterminal domain o serines are phosphorylated by protein kinases Phosphorylation of RNA Pol II CTD is necessary for: o promoter escape o docking sites for RNA processing enzymes including: capping enzyme cap binding complex cleavage/termination complexes RNA splicing factors 5’ Cap Assembly 5’ cap is assembled on the 5’ end of the mRNA molecules by “capping enzyme.” o cap provides protection from 5’ →3’ exonucleases o cap is also important for translation initiation Transcription Termination and Poly(A) tail signal for termination is AAUAAA sequence located in nascent RNA o RNA pol II extends transcript through and beyond the termination sequence. CTD bound termination factors recognize the cleavage sequence and bind to the RNA o Termination occurs with cleavage of completed primary transcript (pre-mRNA) and RNA pol II dissociates. This occurs after binding of poly-A polymerase 3’ end undergoes polyadenylation by “poly-A polymerase.” o “poly-A tail” is 100-250 nucleotides long o Creation of the “poly-A tail” is template-independent o “poly-A tail” protects RNA from degradation & serves as binding site for proteins that facilitate RNA turnover / translation Genes in higher organisms undergo splicing to remove introns Key Concept: Most mRNAs in mammals undergo splicing to remove introns and join exons o Exon sequence in the mRNA is exactly as they are organized 5’ to 3’ along the gene. Components of the Spliceosome Key Concept: Spliceosome is the nuclear complex responsible for removing intron sequences and ligation of exon sequences 5’ 3’ to generate mRNA (“mature”) >100 different proteins and RNA molecules function in splicing. Many are contained in the small nuclear ribonuclear protein (snRNP) complexes that make up the bulk of spliceosome o Each snRNP complex contains ~10 proteins & unique snRNA molecule o snRNPs are named for their respective snRNA molecule Each snRNA has unique 100-200nt sequence Components of the Spliceosome Each snRNP has a unique function: o U1 snRNP binds to the 5’ splice site o U2 snRNP binds to the branch site and aligns it for the 1st splicing reaction o U4 snRNP binds to and sequesters U6 snRNP o U5 snRNP aligns the pre-mRNA for the 2nd splicing reaction o U6 snRNP promotes catalysis of splicing reaction Splice sites are identified by consensus sequences Key Concept: GU/AG rule governs RNA splicing o 5’ end of an intron is GU (GU5). 5’ splice site is immediately upstream o 3’ end of an intron is AG (AG3). 3’ splice site is immediately downstream o Within intron is the branch point A. branch point = 20–50bp upstream of 3’ splice site o Pyrimidine-rich region is between the branch point and 3’ splice site. Splicing Mechanism first step of splicing begins with defining the intron o U1 snRNP is recruited to 5’ splice site o U2 snRNP is recruited to branch point A nucleotide snRNA in the snRNP base pairs to the pre-mRNA U1 snRNA anneals to 5’ splice site U2 snRNA anneals to branch site such that A nucleotide bulges out o exposes 2’ OH for 1st transesterification reaction Splicing Mechanism U1 snRNP is recruited to the 5’ splice site 16 U2 snRNP is recruited to branch point A nucleotide U5 snRNP and U4 snRNP/U6 snRNP bind to the complex to complete spliceosome assembly Dynamic rearrangement the spliceosome initiates splicing activity o Both U1 snRNP + U4 snRNP exit the complex o Spliceosome now in catalytically active conformation Splicing: Catalytic Mechanism In catalytic conformation U2, U5, and U6 snRNAs base pair to each other aligning the branch site with the 5’ splice site for the 1st transesterification reaction U2, U5, and U6 snRNAs rearrange, aligning the 5’ and 3’ splice sites for the 2nd transesterification reaction Intron product released as “lariat” structure Exons joined by formation of splice junction Two-step transesterification 1st transesterification: o branch point A nucleotide attacks G nucleotide of 5’ splice site 2nd transesterification: o 3’ end of 5’ exon attacks G nucleotide of 3’ splice site o leads to formation of splice junction (exons) & release of intron lariat (introns) Alternative Splicing: Expanding the coding capacity of the genome Key Concept: Alternative splicing = mechanism for developmental or tissue-specific production of differing mRNAs from a single gene. o Refers to programed inclusion/exclusion of exons in different tissues at different stages of development o ~92% of all genes are alternatively spliced o ~1/3 of all hereditary diseases are thought to have a splicing component – o Alternative splicing expands the coding capacity of the genome Splicing factors bind to the pre-mRNA to define alternative splicing events o Key Concept: Splicing factors bind to key sequences in pre-mRNA to drive splice site recognition by the spliceosome and define alternative splicing choices. Due to different complements of these proteins in different tissues and cell types, exons recognized in one cells may not be used in another o o Several key sequences in pre-mRNA that splicing factors bind to: ESE: Exonic Splicing Enhancer ISE: Intronic Splicing Enhancer ESS: Exonic Splicing Silencer ISS: Intronic Splicing Silencer SR/hnRNP proteins = splicing factors that bind to these sequences & drive splice site usage Differing Alternative Splicing Mechanisms o Key Concept: Many different alt. splicing mech’s that expand coding capacity of the genome o Alt. splicing not only differs between tissue types in a single organism, but also varies between different species SMA: Molecular Mechanism o o All individuals have 2 copies of the SMN gene C T mutation in SMN2 leads to inefficient inclusion of exon 7 (mutation disrupts a key ESE) FDA approves landmark treatment for SMA using antisense oligonucleotides (ASOs) o ASO binds to the SMN2 pre-mRNA and blocks a key ISS bound by a hnRNP protein; leads to inclusion of exon 7 and restoration of the functional SMN2 protein mRNA turnover o Once RNA degradation has begun, most of mRNA molecule is reduced to nucleotides by a large, multi-enzyme complex in the cytoplasm called the exosome.\ L-50: Translation “Protein Synthesis” (part 1) o Key concept: coding sequence of mRNA carries info. needed for primary structure of 1 protein o protein synthesis = translation o occurs on enzyme complex = ribosome Principles of the Genetic Code o o Key concept: codon in mRNA = series of 3 bases specifying one AA in a protein o Code is read 5’→3’, and the protein is synthesized from the amino- (N-) to the carboxyl(C-) terminus Coding sequence (cistron) or open reading frame (ORF) in a mRNA is a continuous series of codons o In bio, coding sequence begins w/ start codon (AUG) and ends at stop codon. o One protein encoded in the coding sequence in mRNA o Genetic code is non-overlapping. The Universal Genetic Code o Key concept: genetic code is the same in prokaryotic & eukaryotic organisms o Genetic code has 64 possible codons ex. CAU encodes histidine o Code is “degenerate” some AAs are encoded by more than one codon leu, arg and ser have have 6 codons each met and trp only 1 all others 2-4 o AUG = initiation codon o 3 stop codons (UAA UAG and UGA) -- do not code for an amino acid o “wobble base” = 3rd base in a codon. advantage of the wobble base = <61 tRNAs needed to cover genetic code For example, Ile-tRNA has the anticodon IAU, so it base pairs to all three isoleucine codons AUA, AUU and AUC. Mutations Affecting the Genetic Code o o Key Concept: Mutations occur in DNA, but can affect the sequence of a protein Single-base substitutions: o nonsense mutation introduces a stop codon. (i.e. GGA →UGA) “stop” the nonsense! o missense mutation replaces one AA codon with another. (i.e. GGA →GUA) missense misplace replace! o silent mutation changes DNA sequence w/o altering encoded protein. (i.e. GGA →GGU) “silent killer” (no alteration)! Frameshift Mutations o Frameshifts = caused by insertion/deletion w/i coding sequence, but change isn’t multiple of 3 bases o frameshift often leads to an early stop codon truncating a protein o o Mutations that affect RNA splicing often generate frameshifts In-frame deletion of UUC (phe) codon in CFTR = most frequent mutation associated w/ CF Transfer RNA (tRNA) is an Adaptor Molecule o o tRNAs deliver amino acids to the ribosome. tRNAs: o 75-93 nt long o form a “cloverleaf” secondary structure o have many modified bases o have AAs bound at the 3’ end. Primary determinants for AA selection are bases #1 and #2 of the mRNA codon tRNA anticodon base pairs to codon in antiparallel orientation. Wobble Bases Base pairing between tRNA-mRNA: o Only Watson-Crick base pairs occur between codon bases #1 and #2 and anticodon bases #2 and #3. o Non-canonical base pairs are allowed at wobble bases (codon #3 to anticodon #1) o anticodon U A or U G o anticodon G C or G U o anticodon inosinate (I) I A or I U or I C Tertiary Structure of tRNA o o Base pairing between D & T ΨC arms generates a relatively rigid twisted L conformation of tRNA L shape essential for positioning of AA into catalytic site of a ribosome. Aminoacyl-tRNA Synthetases o Each amino acid requires its own aminoacyl-tRNA synthetase. o Reaction: amino acid + tRNA + ATPMg2+ aminoacyl-tRNA + AMP + PPi “The Second Genetic Code” o o Key concept: Aminoacyl-tRNA synthetases read variant sequence elements to select the correct tRNA to be charged with a particular AA Locations of determinants on tRNAs recognized by amino acyl-tRNA synthetases are not fixed o anticodons contribute to AA selection o Other sites also contribute Ribosomes o o Prokaryotes have a single ribosome -- 70S ribosome Eukaryotic cells have at least 2 o 80S ribosomes located in the cytoplasm and endoplasmic reticulum Polysomes o Polysomes = multiple ribosomes bound to a single mRNA. Each ribosome within a polysome synthesizes a copy of the same protein o provides for very efficient utilization of an mRNA Organization of Bacterial Genomes o Operons are coordinately regulated gene clusters o Open reading frames arranged 5’ 3’ in a transcription unit o Use of one promoter and terminator yields a polycistronic mRNA atp operon encodes nine proteins Ribosome meets mRNA Molecule o Shine-Dalgarno / ribosome binding sequence (RBS) = site on a mRNA where ribosomes bind to initiate translation o RBS = 8-13 nt purine-rich element in mRNA o Start codon = AUG encoding N-formylmethionine (fMet) Key concept: Although fMet is always the first AA in a prokaryotic protein, the second and each subsequent amino acids can be any one of the 20 L-51: Translation “Protein Synthesis” (part 2) Prokaryotic 70S Ribosome o Ribosomes have three tRNA binding sites: o A site for aminoacyl tRNA o P site for peptidyl tRNA o E site for empty Prokaryotic mRNA Binds to 30S Subunit o Key concept: Initiation Factors (IF) act on the 30S ribosomal subunit o IF-1 blocks premature tRNA binding at the A site o IF-3 blocks premature binding of 50S subunit o mRNA binds to the 30S subunit by base pairing between the 16S rRNA and the RBS o As a result, the AUG start codon is positioned at the P-site o IF-2-GTP binds charged initiator tRNA (fMet-tRNA) and escorts it into the P-site mRNA start codon base pairs with the fMet-tRNA anticodon Completion of the Translation Initiation Complex o 50S subunit binds forming the intact 70S ribosome o complex is now a complete translation initiation complex: A site is vacant P site has fMet-tRNAfMet E site is vacant Aminoacyl-tRNAs bind to the A Site for Elongation o Key concept: Elongation factors (EFs) act on the 70S ribosome. EF-Tu-GTP binds an aminoacyltRNA in the cytoplasm and delivers this charged-tRNA to the ribosomal A site. -- Base pairing occurs between the aminoacyl-tRNA anticodon and the second codon in the mRNA. -- EF-Tu-GTP hydrolyzes GTP GDP + Pi and exits. -- A site has an aminoacyl-tRNA -- P site contains fMettRNAfMet -- E site remains vacant -- EF-Tu-GDP is recycled to EF-Tu-GTP by EF-Ts. First Peptide Bond o Key concept: ribosome is a “peptidyl transferase”. -- tRNA positions the second residue of the chain into the catalytic site. -- The free amino group on the acyl-tRNA in the A site attacks fMettRNA in the P site. -- fMet is transferred to nascent chain in the A site forming the first peptide bond. -- Uncharged tRNA is left in the P site. Ribosome Translocation o The EF-G-GTP “translocase” moves the mRNA through the cleft in the ribosome. -- EF-G-GTP binds near the A site. -- GTP → GDP results in shifting the ribosome one codon along the mRNA. -- A site is again vacant. -- P site contains dipeptidyl-tRNA. -- E site has an uncharged tRNA that will soon dissociate o Key concept: Process is repeated for each AA in the nascent protein. -- EF-Tu brings the next aminoacyl-tRNA into the A site. -- A new peptide bond is formed. -- EF-G translocase repositions the next codon into the A site. -- Translation elongation occurs at 20 residues/sec. Termination at a Stop Codon o o o Key concept: Translation is terminated when ribosome reaches a stop codon (UAA, UAG, UGA). At a stop codon, the ribosome pauses and waits for a new charged tRNA Release Factor (RF) binds in the A site. -- RF-1 recognizes UAG or UAA. -- RF-2 recognizes UAA or UGA. -- Release factors activate peptidyl-transferase hydrolyzing the peptidyl-tRNA bond terminating translation. -- Ribosome recycling factor (RRF) and EF-G dissociate the complex. -IF-3 rebinds to 30S subunit. Antibiotics Target Translation o Puromycin is a very effective translation inhibitor Eukaryotic Translation is in the Cytoplasm o Key concept: Eukaryotic translation is very similar to bacterial translation, but important differences exist o mRNAs synthesized & RNA processed in the nucleus o mRNA transferred to the cytoplasm -- may be intercepted by miRNA-RISC o 80S ribosome carries out translation RNA Interference (RNAi) o o o Key Concept: RNAi is a group of mech’s in eukaryotes involving small ncRNAs that reduce expression of specific genes. MicroRNA (miRNA) & small interfering (siRNA) act on mature mRNA in the cytoplasm miRNA regulates both whether an mRNA can be translated and its stability siRNA regulates mRNA levels by direct endonuclease cleavage miRNA Synthesis o pri-miRNA stem-loop structure is excised by the Drosha-DGCR8 complex. -- Pre-miRNA is exported to cytoplasm. -- Pre-miRNA undergoes a second cleavage by Dicer (22-25 base pairs). -- The single-stranded miRNA / siRNA is then loaded into the RNA Induced Silencing Complex (RISC). miRNA-RISC Action o o miRNA/RISC represses translation and increases turnover of target mRNAs. -- A miRNA has imperfect base pairing to the target mRNA 3’ UTR siRNA/RISC cleaves a target mRNA using an endonuclease activity. -- siRNA cleavage requires perfect base pairing between the siRNA and the target mRNA coding sequence. Complementary Partially Complementary Translational Repression mRNA Cleavage miRNA RISC Target mRNA Recognition o o siRNA action occurs if there is ideal base pairing between siRNA and a target mRNA (“si) o Usually target site is located in the coding sequence of an mRNA miRNA action occurs if the base pairing is imperfect (“mal”) o usually the target site is in the -- Usually the target site is in 3’ UTR of an mRNA Eukaryotic Preinitiation Complex o In eukaryotes, the details of translation initiation varies. The major differences are: -- eIF2 brings charged tRNAi to the 40S subunit before mRNA arrives. -- eIF-4 complex binds to the mRNA 5’ cap and brings it to the 40S. -- The ribosome scans the mRNA 5’→3’ to locate the AUG within a Kozak sequence. -- 60S binds to complete 80S ribosome. Eukaryotic Initiation Complex o Translation in eukaryotes initiates at Kozak sequence o Most often, first AUG in an mRNA is selected as the translation start site Eukaryotic Polysomes o Translation initiation occurs in succession as each new ribosome initiates at the Kozak sequence. o Many ribosomes on a single mRNA = polysome o Each ribosome is making a copy of the same protein Ricin Toxin o Ricin = among most toxic natural substances known o LD50 ~ 22μg/kg (or 1.76 mg for an adult) o enters as single polypeptide, then cleaved into ricin toxin A & B o “RTA” depurinates 28S rRNA at A4324 eliminating an elongation factor binding site. L-52: GO BACK IF TIME PERMITS L-53: DNA Damage and Repair o Key concept: A mutation is an accidental change in the sequence of bases in the genome. o Result from DNA damage that is an ongoing threat to the cell o Some damage is spontaneous, but often caused by environmental factors o Vast majority of DNA damage is single-strand breaks! Double strand breaks are RARE Mutagens o o Mutagens are compounds that promote changes in DNA sequences o Often these chemicals are also cancer-causing carcinogens. “Ames test” o Salmonella typhimurium (defect in a his gene) medium lacking his ± test compound disc at varying concentrations score growth to detect reversion of mutation Deamination of DNA Bases o o o Deamination of C→U and 5-meC→T occur spontaneously in a human cell. Deamination of A & G occurs spontaneously, but at a slower rate. common chemicals can accelerate rate of deamination o EX: sodium nitrate, sodium nitrite, nitrosamine DNA Damage Must be Fixed! o Key concept: Mutations in genome = permanent and subsequently inherited by daughter cells. Oxidation of DNA o o Reactive oxygen species “ROS” = generated by respiration Hydroxide free radical (OH) inserts into either G or T Depurination o o caused by hydrolysis of the N--glycosyl bond linking a purine base to the sugarphosphate backbone Depurination yields an “abasic” site (“AP” site) Spontaneous Methylation of G in Cells o Non-enzymatic methylation of G by an S-adenosylmethioninedependent mechanism yielding 7methylguanine. DNA Damage: Alkylating Agents o Alkylating agents covalently modify bases in DNA o Alkylation distorts DNA double helix Thymine Dimer o Ultraviolet radiation is a common cause of DNA damage (UV = bad) o UV results in the formation of a cyclobutane ring between adjacent two pyrimidine rings o especially common for pairs of thymines forming a thymine-dimer o cyclobutane ring kinks the axis of the DNA helix DNA Strand Breakage o o o Ionizing radiation causes DNA strand breakage (IR = bad) Major sources of ionizing radiation are cosmic rays, X-rays, and radioactive materials Radiation can cause either: o single strand break (nick) o double strand break -- may be staggered! Radiation also Damages Bases in DNA DNA Repair o Key concept: A cell invests enormous resources to protect integrity of its genome. Collectively these systems are called “DNA repair” Characteristics of DNA Repair Mechanisms o Most DNA repair mechanisms can be broken down into 4 distinct phases: o 1) Recognition of the lesion o 2) Excision of the lesion. o 3) Resynthesis of the DNA o 4) Ligation of loose ends Eukaryotic Mismatch Repair o o o MutS binds to single base pair mismatches or “indels” of 1-3 bases o Alpha = single mismatch MutS binds larger indels of up to 13 bases o Beta = bigger indels MutL binds along with PCNA activating MutL endonuclease to nick the new strand o Lalpha = activates endo Base Excision Repair o o DNA glycosylase cleaves the N- -glycosyl bond making an abasic site (AP site). Abasic sites arise from activity of DNA glycosylase/depurination o AP endonuclease initiates repair of abasic sites by cleaving the sugar-phosphate backbone at the abasic site Base Excision Repair (continued) o o DNA Pol I has both the 5’ →3’ exonuclease & DNA synthesis activities o In eukaryotes DNA Pol serves this purpose o DNA Pol I and DNA Pol are high fidelity polymerases DNA ligase seals the nick! Prokaryotic Nucleotide Excision Repair o “NER” repairs lesions that distort the DNA double helix, such as thymine dimers or alkylation o UvrABC excinuclease makes two nicks in the damaged strand flanking the lesion o UvrD helicase excises the damaged DNA leaving a gap o DNA Pol I fills the gap o DNA ligase seals the nick