BCH 4024 Exam 4 Review

Lecture 42 Deep Sequencing of Anzick-1
KC: Determination of an individual’s complete genomic DNA sequence is feasible, rapid, and relatively
Phases of a genomic sequencing project!
o KC: regardless of the purpose of the work, all genomic sequencing projects can be
broken down into three phases
 Template preparation
 Sequencing and imaging
 Alignment/assembly
The genome of a Late Plestocene human from a Clovis burial site in western Montana
o Sarah Anzick! (important author)
Skeleton  extract DNA  Fragment DNA
o Dry skeleton DNA will be in the A conformation
o Rehydration during purification returns the DNA to the B conformation
o Random DNA fragmentation is performed using a nebulizer or a sonicator
 DNA fragments are referred to as “insert” DNA
Skeleton  extract DNA  Fragment DNA  attach linker DNA
o Ligase is used to attach synthetic double-stranded DNA linkers at either end of insert
DNA fragments
Skeleton  extract DNA  Fragment DNA  DNA Ligase  Size selection!
o Size selection of DNA works bc
 DNA is a linear molecule
 DNA is negatively charged
 During electrophoresis DNA migrates through the agarose matrix according to
its length
 Library of insert DNA molecules ~300 bp long
 Genomic insert DNA is ~200 bp of each book in the library
o KC: library was constructed from the DNA prepared form 1000s of cells  each base in
genome will be present in multiple inserts
Polymerase Chain Reaction: PCR
o PCR is used to amplify library DNA
o Strand separation for DNA is termed denaturing the DNA
 Done by heating the sample to 95 celsius
o Single-stranded DNA “primers” ANNEAL to template DNA strands by complimentary
base pairing to linker sequences
 Anneal: recombine (DNA) in the double-stranded form following separation by
Original  Denature the DNA  Anneal primers  DNA polymerase
deoxyribotrinucleotides (dNTP)  2 copies!!
 EX: 1 original  2 copies  denature/anneal primers/DNA synthesis  4 copies
 2 PCR cycles amplifies the DNA four-fold (2^2 = 4)
Phase 2: Sequencing and Imaging
o “Massive Parallel Sequencing”
How much DNA is in a human diploid genome?
 6.14 billion base pair (bp) ~12 Gbases
Massive Parallel Sequencing
o A: P7 primers
o B: denatured library DNAs are annealed to the primer on the solid support
o C: Bridge amplification by fancy PCR and final denaturing step
Sequencing Reaction:
o 1. Sequencing primers added and annealed
o 2. DNA polymerase; fluorescence tagged 3’-blocked dNTPs
 Fluorescence Labeled Nucleotides
3. Wash/laser light/photograph (imaging)
4. Remove fluorophore and blocking group; generates 3’OH (sequencing reaction)
Repetitive Steps
 Data collection good for 75-100 cycles
 Each cycle:
 Add fluorescent dNTP and DNA polymerase
 Was/laser and photograph
 Remove fluorophore and 3’ blocking group
From Data Set to Sequence
o Raw data from 1 cycle (example slide 22)
Phase 3: Alignment and Assembly
o Assembly of a Sequence Ensemble
 DNA sequence reads from different inserts
 Sequence alignment
 “Coverage” is the number of times each base appears in the ensemble
of insert DNA sequences
 Determined DNA sequence
Ensemble from Massive Parallel Sequencing
o “Deep sequencing” requires >7 fold coverage of the sequence
 Anzick-1 sequence had an average 14.4 fold coverage
Individual Variation
o Alignment of an individual’s EDN1 gene with the reference sequence from the Human
Genome Diversity Project
o Single Nucleotide Polymorphisms (SNP) are normal variations encountered within a
 Some affect the human phenotype, but many are silent
 SNPs set individuals apart from one another within a species
Simplified Genetic Models
o Human Genome Diversity Project
Genetic Affinity of Anzick-1
o Highest in Latin America/South America
Lecture 43 Replication “DNA Synthesis” (part I)
KC: Prior to division a cell must duplicate its entire genomic DNA in a process called replication.
o Each daughter cell inherits a complete complement of the genetic information.
“Central Dogma” of Molecular Biology
o DNA  (replication/transcription) RNA  (translation) Protein
In eukaryotes, DNA is packaged into the nucleus
o Humans have “diploid” genomes with 2 complete sets of genes in every cell
Reaction Mechanism of DNA Polymerases
o KC: DNA polymerase catalyzes the extension of a DNA strand ONLY in the 5’’3
o Requirements for DNA synthesis
 DNA template strand w/ a primer (RNA or DNA) annealed to it
 Primer must have a 3’-OH
 Reaction substrates are deoxyribonucleotide triphosphates (dNTP)
 For DNA synthesis:
 (dNMP)n + dNTP  (dNMP)n+1 + PPi
Structure of DNA Polymerase
o Catalytic site for DNA synthesis is located in the “palm” of the enzyme
What happens to the Parental DNA?
o 3 possible outcomes of DNA synthesis
 Conservative replication yields an original intact DNA molecule (parent) and one
entirely newly synthesized DNA
 Semiconservative replication yields 2 DNA molecules, each with 1 parental
strand and 1 newly synthesized strand
 Non-conservative replication yields 2 newly synthesized DNA molecules and
destruction of the parental strands
Meselson and Stahl Experiment
o KC: DNA synthesis is semiconservative; product DNA contains 1 parental strand and 1
newly synthesized daughter strand
 E. coli grown in 15N medium  DNA prepared  density gradient centrifugation
o KC: DNA synthesis is semiconservative; product DNA contains 1 parental strand and 1
newly synthesized daughter strand
 E. coli grown in 15N medium  shifted to 14N medium DNA prepared 
density gradient centrifugation
Replication Fork
o Leading strand DNA synthesis occurs continuously in the 5’3’ direction
o Lagging strand synthesis occurs discontinuously in fragments as a series of 5’3’
 Products = Okazaki fragments
Bidirectional DNA Synthesis
o DNA synthesis begins at an origin of replication
o Replication forks proceed in both directions, hence “bidirectional synthesis”
Replication of a Circular DNA Molecule
KC: bi-directional, semiconservative replication yields 2 DNA molecules with identical
nucleotide sequences
o DNA polymerase cannot form a bond between the loose ends where the replication
forks meet
 The “nick” is repaired by DNA ligase
Bacterial Genomes and Plasmids are Circular DNAs
o The over-winding of DNA results in positive supercoils
o The under-winding of helix yields negative supercoils
Effect of Replication on the Double Helix
o Helicase activity in DNA polymerase separates the 2 strands
 Torsional stress from unwinding DNA at the replication fork results in overwinding of the DNA ahead of the fork causing positive supercoiling
 Both topoisomerase type I and type II enzymes are capable of relieving negative
and positive supercoils in DNA
 “Gyrase” is a topoisomerase that introduces negative supercoils ahead of a
replication fork
o Mechanism of a topoisomerase type I is a single-stranded break, pass the intact strand
through the break, and seal the nick
o Mechanism of a topoisomerase type II is a double-stranded break, pass an intact
segment of DNA through the break, and then rejoin broken strands
Prokaryotic DNA Polymerases
o DNA Pol III carries out genomic replication and DNA repair
 NOTE: high polymerization rate, high processivity, and proofreading
o DNA Pol I has roles in replication, recombination, and DNA repair
 NOTE: 3’5’ proofreading and 5’3’ exonuclease activities
o DNA Pol II, IV, and V function in DNA repair
Architecture of Prokaryotic DNA Pol III 16
o DnaB helicase is not considered part of DNA pol III, but is required for strand separation
o Clamp loader!
 Scaffold for DNA pol III complex
 Assembles beta clamp onto DNA
o Core polymerase catalyzes DNA synthesis
o Beta sliding clamp maintains contact with the DNA
Beta sliding clamp on DNA
o Beta sliding clamp encircles DNA providing the means for high processivity/rapid DNA
 A prokaryotic beta clamp has 2 identical subunits beta clamp opens and closes
like a lock washer
Replication Initiation Occurs at oriC
o KC: replication occurs only once per cell cycle; initiation is the regulated step in control
of DNA synthesis
 Origin of replication “oirC” is a unique 245 bp sequence
 R1-5 and I1-3 are binding sites for the DnaA protein
IHF/FIS are binding sites for proteins called replication initiation factors.
o HU is a required DNA bending protein.
DNA unwinding element (DUE) is an AT-rich segment where strand
separation occurs.
Replication Initiation
o Binding of DnaA proteins to the R and I sites stresses the DNA helix causing localized
strand separation at the DUE
o DnaC loads a DnaB helicase at both ends of the bubble
 Binding of DnaB helicase commits the cell to replication and division.
o After release of DnaC, DNA pol III’s are assembled on to both DnaB helicases.
o DnaA is the released from the DNA.
o DNA polymerase III requires a primed DNA template
o Primase is a DNA template-dependent, and primer-independent RNA polymerase.
o Primase synthesizes a <9nt RNA
Replication Elongation
o DNA synthesized as a continuous polynucleotide chain by one core polymerase
generating the leading strand.
o The lagging strand DNA is synthesized as a series of Okazaki fragments
o Primase functions at the replication fork just as core polymerase nears completion of an
Okazaki fragment.
 Primase begins synthesis of a new RNA primer at either CTG or CAG.
o Single strand binding protein “SSB” protects gaps in the DNA double helix.
o A new  clamp is loaded onto the lagging strand at each new RNA primer.
o DNA pol III-dependent synthesis of an Okazaki fragment is complete when the enzyme
reaches the previous primer.
o Lagging strand core polymerase pauses and then releases its  clamp.
o Lagging strand core polymerase is then transferred to newly loaded β clamp.
 old clamp is abandoned
o Lagging strand core polymerase initiates synthesis of the next Okazaki fragment.
o  clamp loader acquires a new clamp and opens it in preparation for loading onto the
next primer
Completing the Lagging Strand:
o DNA Pol III pauses at an Okazaki primer, then abandons its  sliding clamp and transfers
to a newly loaded clamp.
o DNA Pol I enters and uses its 5’ →3’ exonuclease to remove the primer.
o As RNA is being digested DNA pol I synthesizes DNA.
o DNA ligase repairs the nick linking the Okazaki fragments into a single DNA strand.
Decatenation of the Chromosomes
o Following completion of replication the chromosomes are in a “catenated” state.
o Topoisomerase (type II) separates the two circular DNAs. Identical chromosomes are
sorted into each of the daughter cells
L-44: Replication (part II)
Genetic Fidelity During Replication
Key concept: An error in replication can introduce a mutation into the genome. The
mutation will be permanent and inherited by subsequent daughter cells.
3 mechanisms to avoid mistakes during replication:
o Presynthetic error control demands correct base pairing.
o Proofreading removes mismatched bases.
o Mismatch repair examines newly synthesized DNA and removes mismatched bases
after passage of a replication fork.
Presynthetic Error Control
o Geometry of Watson and Crick base pairs allows them to fit into the catalytic site of
DNA polymerase.
Mis-paired base pairs are excluded
Base Pairs Containing Tautomers
o Tautomeric base pairs result from chemical rearrangements of the bonds within a base
Tautomeric shifts occur spontaneously
Tautomeric bases can participate in forming alternative base pairs
Isomerization of the tautomeric base to a normal base results in a mismatched
base pair (i.e. T=G – enol form slide 4).
The geometry of some of these base pairs are compatible with the catalytic sites
of DNA polymerases.
o High-fidelity DNA polymerases have two active sites.
1) The catalytic site for DNA synthesis.
2) A 3’ → 5’ exonuclease site for removing mis-incorporated nucleotides
DNA pol I and III are high-fidelity enzymes.
Other errors in replication occur
Eukaryotic DNA Polymerases
o Key concept: Although the details vary, the overall process of replication in eukaryotes
is very similar to replication in prokaryotes.
o Eukaryotic cells
Cells have many DNA polymerases (~15) with specialized functions
“Replisome” carries out genome replication
Other DNA polymerases are involved in a variety of functions, such as DNA
DNA synthesis is semi-conservative and bidirectional
The mechanism is both template- and primer-dependent
DNA synthesis is always in the 5’ →3’ direction
o continuous leading strand synthesis
Replisome has both DNA Pol  and DNA Pol 
Primase is in a complex with DNA Pol 
Replication initiation is very different
Many of the proteins involved in eukaryotic replication are similar to those
found in bacteria, but the nomenclature is different.
Mechanistic differences
The rate of DNA synthesis is slower (max. ~50 nt/sec)
Okazaki fragments are smaller (100-200 nt)
o In humans, ~50 million are synthesized during one round of
Completion and joining of Okazaki fragments is different
Replication termination is different in the details
Eukaryotic chromosomes have multiple origins of replication
o Coordinate regulation of origins requires “licensing”
Replication elongation is similar but not identical
Most DNA polymerases have both presynthetic error control and
proofreading activity.
Differences Unique to Eukaryotes
o Eukaryotic chromosomes are linear and can be much longer than bacterial DNA
o Study of the architecture of genomic DNA polymerase “replisome” is ongoing
discontinuous lagging strand
need to address convergence of replication forks
Replication of telomeres is unique to eukaryotes.
Eukaryotic Replisome: a Work in Progress
o Familiar proteins with new names:
DNA pol  is core polymerase for the leading strand (epsilon = leading)
DNA pol  is core polymerase for the lagging strand (del = lagging)
MCM = helicase
DNA Pol -primase is a complex that contains primase for RNA primer synthesis
and a separate DNA synthesis activity
RFC = clamp loader
PCNA = clamp
RPA = single-strand DNA binding protein (SSB)
DNA ligase
FEN1 = endonuclease
Eukaryotic Clamp: “PCNA”
o Proliferating Cell Nuclear Antigen (PCNA) has three identical subunits.
Multiple Origins in Eukaryotes
Humans have 46 chromosomes, so there must be at least 46 coordinately regulated
origins of replication
Eukaryotic: Origin of Replication
o Key concept: Origins of replication on all chromosomes must be activated once -- and
only once – during S phase of the cell cycle.
o Each of our 46 chromosomes contains 1000s origins of replication
o On average origins appear to be ~25000 bp apart
o Locations of origins may vary from one cell type to another
o Generally an origin of replication will be an AT-rich element
o Generally origins are associated with genes.
Licensing Coordinate Replication Initiation
o Origin of replication complexes (ORC) bind tightly to DNA in early G1 phase
o During mid-G1, cell division cycle 6 (CDC6) joins the ORC, followed by Cdt1
o Mini Chromosome Maintenance (MCM) joins the ORC in late G1
o In S phase, replication is initiated by phosphorylation of complex proteins by a cell
cycle-dependent kinase (cdk) and another protein kinase (DDK)
o replisome is assembled and bidirectional DNA synthesis initiated.
Lagging Strand Synthesis
o Replication elongation is essentially the same as seen in prokaryotes
o An important difference is that the replisome uses 2 different core enzymes
DNA Pol  synthesizes the leading strand
DNA Pol  makes the lagging strand
big difference is in completion of the lagging strand!
DNA Pol  is a “strand-displacing” DNA polymerase
FEN1 is flap endonuclease-1 that clips off the overhang
DNA ligase repairs the nick left in the sugar-phosphate backbone to
complete the joining of two Okazaki fragments.
Replication Termination
o Two replication forks approach each other from opposite directions along a eukaryotic
o DNA becomes heavily supercoiled
o Supercoiling stalls the both replication forks
o Type II topoisomerases eases the supercoiling
o DNA Polprimase functions as a DNA polymerase to synthesize DNA across the gap
o DNA ligase repairs nicks in both strands completing the DNA
o Type II Topoisomerase resolves the final structure separating thus chromosomes.
Telomere: the End of a Chromosome
o Telomeres are specialized “T loop” DNA structures found at both ends of a eukaryotic
Telomeres function to protect the chromosome from cellular exonucleases and DNA
repair enzymes
Telomeres consist of 10,000 bp of a short repeat sequence and have several specific
proteins bound to them
repeat in a human telomeres is TTAGGG.
Telomerase is a Special Reverse Transcriptase
o Telomerase has an internal template RNA with 1½ copies of the CA repeat
Template RNA anneals to the existing TG sequence at a telomere
The process is repeated many times
Following removal of the primer, the overhanging 3’ end base pairs to the CA
strand forming a T-loop
Telomerase catalyzes 5’ 3’ DNA synthesis of the TG strand
Following each round of TG synthesis, telomerase shifts so that the internal
template RNA anneals to newly synthesized DNA
Complimentary CA strand of a telomere is synthesized by DNA Pol primase
acting as a DNA polymerase
Lecture 45: Prokaryotic Transcription and Gene Control
DNA-template dependent
synthesis of RNA
Catalyzed by RNA Polymerase
Begins at a Promoter
Ends at a Terminator
Cis- and Trans-acting Factors
Key concept: Trans-acting factors are diffusible, so they can function at multiple sites in a
genome. Usually, trans-acting factors are DNA binding proteins (i.e. transcription factors), but
some non-coding RNAs are also trans-acting factors. Cis-acting elements are closely tied to the
gene. Typically, a cis-acting element is a DNA sequence.
Ribonucleic Acid (RNA)
RNA is a linear single-stranded polynucleotide chain with:
o a sequence always read 5’ →3’
o a ribose-phosphate backbone
o uracil (U) in place of thymine (T)
o intramolecular base pairing that yields complex secondary and tertiary structures.
 Note: Double stranded RNA and DNA-RNA hybrids have structures similar to ADNA.
Glossary of RNAs in a Prokaryotic Cell
Messenger RNA (mRNA) houses a sequence of bases that encodes primary AA sequence for a
o mRNA serves as the template for translation by a ribosome
Transfer RNA (tRNA) carries an amino acid into the catalytic site of a ribosome.
o tRNA base pairs to mRNA to ensure selection of the correct AA for incorporation into a
nascent polypeptide chain.
Ribosomal RNAs (rRNA) are structural components of a ribosome, the enzyme that catalyzes
Terms, Jargon, and Gene Coordinates
Transcription starts at a promoter and ends at a terminator
Finished RNA molecule = primary transcript.
The transcription start site of a gene is always +1.
o Key concept: In bacteria, primary transcript is used as an mRNA without further
RNA Transcript Resembles the Coding Strand
DNA a strand is said to be the “reverse complement” of the DNA coding strand and the RNA
primary transcript.
The Primary Transcript Sequence is Similar to the DNA Coding Strand!
Similar sequence, except replace thymine with uracil
Prokaryotic Promoters
Key concept: A promoter is a cis-acting element in the genome where RNA polymerase binds to initiate
Bacterial promoter sequences are located on the coding strand:
o the -35 Region
o the -10 Region
o spacer between the -35 and -10 Regions
o most transcripts (RNAs) start with a purine (+1) often within the sequence CAT.
The bacterial consensus promoter: TTGACA-N -TATAAT-N -CAT
o Note: Promoters have an orientation. As diagrammed here, the red is template strand
and the blue is coding strand
Prokaryotic Genes
Key concept: A gene is a unit of heredity.
o some function in the organism
o Most genes encode the info needed to make a protein
o gene includes the DNA encoding the protein and the regulatory elements needed for
its transcription.
Open reading frame (ORF) = sequence of bases that encodes the primary sequence of a protein.
Organization of Bacterial Genomes
Operons are coordinately regulated gene clusters
o ORFs encoding proteins are arranged 5’ → 3’ in a transcription unit
o Use of one promoter and terminator yields a polycistronic mRNA with multiple ORFs
each encoding a different protein.
Constitutive Promoter Strength Depends on Sequence
Key concept: Similarity to the consensus sequence = major determinant of the rate of
transcription initiation from a constitutive promoter
o “Strong promoters” = very high sequence identity with promoter consensus sequence
o “Weak promoters” = several base differences
o mutation in a promoter that moves away from the consensus sequence decreases the
rate of transcription initiation.
RNA Polymerase
E. coli has only one RNA polymerase (465 kD), and it catalyzes transcription only in the 5’→3’
o catalytic mechanism is similar to a DNA polymerase
RNA polymerase holoenzyme is responsible for transcription initiation and synthesis of first 10
nucleotides of an RNA chain. Holoenzyme consists of the 2’ subunits:
o Subunit  recognizes a promoter
o Subunits 2 (40 kD) are essential for enzyme assembly and are involved in interaction
with activators
o Subunits /’ form the catalytic core
o Subunit provides structural stability.
RNA polymerase core enzyme (2’ carries out transcription elongation.
E. coli has Seven Sigma  Subunits
70 holoenzyme binds to most of the promoters in the E. coli genome
o Other  factors function on promoters for genes with highly specialized functions.
Promoters recognized by the alternative σ have different consensus sequences.
Transcription Initiation
RNA polymerase is DNA template-dependent but primer-independent.
The substrates are ribonucleotide triphosphates (NTPs)
o Holoenzyme binds to a promoter forming the “closed complex”
o Unwinding of DNA 12-15 bp forms a transcription bubble converting a closed complex
to an “open complex”
o Holoenzyme initiates RNA synthesis and synthesizes about 10 nt. -- Rate of synthesis
for those first 10 nt is ~1 nt/sec.
Transcription Elongation
Dissociation of factor yields core enzyme allowing RNA polymerase to complete “promoter
clearance” N
o NusA protein replaces 
Transcription elongation rate by core enzyme accelerates to ~ 50-90 bases/sec
o rate of transcription elongation can be slowed by formation of RNA secondary structure
in the transcript
Transcription termination results in release of RNA and dissociation of core enzyme from DNA.
Mechanism of RNA Polymerase
RNA polymerases use essentially the same mechanism as DNA polymerases
RNA polymerases do not have proofreading exonuclease activity (fixing the ends), so the error
rate is about 10E-4 to 10E-5
Core Enzyme in Transcription Elongation
transcription bubble is 12-15 bases
DNA-RNA hybrid extends ~8 bp
Topoisomerases relieve supercoiling
o “nascent RNA” near RNA exit
Rho-Dependent Termination
Key concept: Termination of transcription requires Rho ( protein
o is a hexameric protein that binds a nascent RNA chain at a rut site
o  is a RNA helicase that translocates along the RNA 5’ →3’
o RNA polymerase pauses in response to formation of a secondary structure
o Transcription terminates when  makes contact with RNA polymerase.
Key concept: termination signal resides in the nascent RNA chain
o Termination occurs due to formation of a stable hairpin structure followed by a series
of 7 U’s. -- “rho-independent terminator.” …AAGGGCCCAUUAGGGCCCUUUUUUU
o Hairpin formation results in only a few weak U=A base pairs between transcript RNA
and DNA. DNA/RNA is unstable, so the RNA dissociates terminating transcription
 No protein factor is needed!
DNA Double Helix
Key Concept: Gene expression is dependent on sequence-specific binding of proteins 
transcription factors to DNA.
Exposed Chemical Groups in Base Pairs
Key Concept: Each base pair presents a unique set of chemical groups in the major groove
o CG and AT pairs can be distinguished in the minor groove.
Sequence-Specific DNA Binding Proteins
Key concept: molecular basis of DNA sequence-specific binding by a DNA binding protein to DNA
are chemical bonds between AAs in a recognition helix and the base pairs.
o α-helix of DNA binding protein is positioned within the major groove
o “recognition helix” participates in H bonds & van der Waals interactions w/ base pairs
o In prokaryotes (i.e. bacteria), predominant DNA binding domain motif for positioning a
recognition helix is called the helix-turn-helix
o Formation of stable DNA-Protein complex is dependent on additional non-covalent
chemical bonds outside the recognition helix-base pair contacts
Examples of DNA-Protein Interactions
T=A with Glutamine/Asparagine (TAGA)
C=G with Arginine (CGA)
Helix-Turn-Helix Motif
helix-turn-helix motif is defined by 2 α helices
2nd helix = recognition helix
DNA binding domain of bacterial transcription factor  helix-turn-helix motif
o found in some eukaryotic DNA binding proteins
Note that binding of a protein to DNA results in “induced bend” in the axis of the DNA
Dimeric DNA Binding Proteins
Homodimeric DNA binding proteins are common
o Recognition helix exists on both subunits
 Thus, sequence is a palindrome!
 Same sequence of bases is seen 5’→3’ along both strands
Why dimers and oligomers?
o improved specificity & stability
Inducible Promoters
Key Concept: The primary reason for gene regulation in bacteria is to respond to changes in the
environment, such as nutrient availability. Regulation of the rate transcription initiation is the
most important step determining whether a gene is expressed.
Regulated genes/operons  equipped w/ inducible promoter! (can be turned on/off)
Activators are transcription factors that increase rate of transcription from a promoter
Repressors are transcription factors that decrease rate of transcription from a promoter.
In bacteria, both activators and repressors have helix-turn-helix DNA binding domains.
Positive Regulation
Activator requires ligand “inducer” to bind DNA
Ligand prevents activator from binding to DNA
Activators recruit RNA polymerase to a promoter.
Negative Regulation
Repressor requires ligand to bind DNA
Ligand “inducer” prevents repressor from binding to DNA.
Repressors inhibit transcription by RNA polymerase!
Regulatory Elements
Key concept: Bacterial promoters = inherently in active state available for transcription  most
prokaryotic gene regulation occurs as a result of repressor binding to a site proximal to an
inducible promoter
Repressor binds to an operator either within the promoter or downstream of the promoter
o Repressor bound atop promoter sequence blocks RNA polymerase from binding to promoter
o Repressor bound downstream inhibits promoter clearance!
 Only want repressor bound in beginning (upstream) to block transcription
Key concept: Some prokaryotic gene regulation results from binding of an activator to a site
proximal to an inducible promoter.
o Activator binds to positive regulatory element located upstream of a promoter
o Activator recruits RNA polymerase to a weak promoter.
Structural Genes of the lac Operon
All 3 genes in the operon function in lactose metabolism
Glucose = preferred carbon source for E. coli  consumed ahead of lactose
o In the absence of glucose, adenylate cyclase produces cAMP.
-galactosidase catalyzes two reactions:
o Lactose (12C) → galactose (6C) + glucose (6C)
o Lactose (12C) → allolactose (12C)
Regulation in the lac Operon
Lac operon has 3 genes: lacZ, lacY, lacA (ZYA)
Negative regulation
o lacI gene is located upstream of the lac operon and encodes the lac repressor
 lac repressor binds the operator in the absence of allolactose (or IPTG)
 operator site is located at the transcription start site
Positive regulation
o cAMP receptor protein (CRP) binds DNA in the presence of its inducer, cAMP
o CRP binding site  located upstream of lac promoter
lac Repressor is a Tetramer
4 identical subunits: a “homotetramer” forms a “dimer of dimers”
o All 4 subunits have helix-turn-helix motifs for sequence-specific binding to 2 separate
palindromic operator sites
o Inducer binding pocket is located b/t globular domains far from DNA binding domain
lac Operon has Three Operator Sites
O1 at lac promoter is the highest affinity binding site
both O2/O3 are low affinity sites because of differences from consensus lac repressor
recognition sequence
o lac repressor binds to O1 & either O2/O3
Occupancy of O1 site is responsible for repressor activity
Mutation destroying O1 results in a loss of regulation of the lac promoter.
o NO O1 = NO REG
CRP binds as a Dimer
CRP = activator; its inducer cAMP is present only under low gluc. conditions
o Homodimeric helix-turn-helix protein
o CRP-cAMP binds 5’ to the promoter
o lac promoter is “weak” because it differs from the consensus promoter
o CRP recruits RNA polymerase to the lac promoter
L-46: Eukaryotic Genes, Transcription and Gene Control (part 1)
Key concept: In complex multicellular organisms, each cell type expresses a unique set of genes
from an identical DNA sequence in the genome
o gene regulation defines the properties of each cell!
Genome size
o In humans, DNA contains over 3 billlion bp (haploid)
 Number of chromosomes = 46 (diploid)
 Approx. # of genes = 29,000 (haploid)
o In E. coli, DNA contains over 4 million bp
 Number of chromosomes = 1
 Approx. # of genes = 4435
DNA in Eukaryotes
o Chromosome structure & DNA content changes during cell cycle
 Cell division yields 2 diploid daughter cells w/ identical DNA content
 G1 = diploid
 S = DNA replication
 G2 = tetraploid (G2*2 = 4 (tetra))
The Human Genome
o Key concept: A gene is a unit of heredity
 Includes DNA encoding a functional RNA/protein along w/ regulatory elements
controlling expression
 Every gene has a specific location in the genome
 EDN-1 gene is at 6p24.
Eukaryotic Chromatin
Key concept: Chromatin = chromosomal material in a cell
o consists of DNA and the proteins bound to the DNA. The most abundant proteins are
the histones.
DNA in eukaryotic nucleus is packaged into higher order structures
o very little naked DNA
o 10 nm fiber
o 30 nm fiber
o organized as loops of chromatin from the nuclear scaffold
Chromatin in Interphase Nucleus
Heterochromatin = dark staining matter in a nucleus (“Diff”  Dark)
o Level of staining reflects condensed chromatin structure
o DNA contains fully inactivated genes & many types of repetitive DNA
Euchromatin = light staining material (“true”  light)
o Level of staining suggests a more open chromatin structure
o Genes in euchromatin are available for transcription
DNA-Histone Interaction
Key concept: nucleosome = basic unit of chromatin.
o Nucleosomes have a repeating unit of ~ 200 bp
o “beads on a string” refers to the 10nm fiber
 147 bp of DNA wraps twice around each histone core
 ~ 50 bp spacer DNA connects nucleosomes.
DNA-histone core contacts are sequence-independent
o Electrostatic interactions & H-bonds occur b/t the positively charged histone proteins
and the negatively charged sugar-phosphate backbone of DNA
Structure of the Histone Core
Histone core contains 2 copies each of histones H2A, H2B, H3 and H4
o Note: “histone tails”
H1 locks DNA to the nucleosome
The 30nm Fiber
30 nm fiber = first level of org. for higher order chromatin
o 2 chromatin fibers coiled around one another!
Organization of Mammalian Genomes
Key concept: Humans have two copies each of our ~29,000 genes
o About 21,000 genes encode proteins
 Only 1.5% of human DNA encodes protein (i.e. exons)
o Most of the rest (~8000) encode functional non-coding RNAs
 tRNA, rRNA, snRNA, miRNA, lncRNA etc.
o Non-expressed DNA includes: introns, repetitive sequences, and transposons
“Gene Expression”
Key concept: Eukaryotes final level of expression of func’nl protein is regulated at many levels
o Transcription
o RNA processing
o mRNA turnover
o translation
o posttranslational modification
o cellular trafficking
o protein turnover
Glossary of RNAs in a Eukaryotic Cell
Messenger RNA (mRNA) encodes AA sequence for a protein
o Primary transcript of a gene undergoes RNA processing to generate mature mRNA (can
be used by ribosome)
Transfer RNA (tRNA) delivers AA to ribosome
Ribosomal RNAs (rRNA) = ribosomal components
Small nuclear RNA (snRNA) = components of the spliceosome -- enzyme that catalyzes intron
removal during RNA processing
MicroRNA (miRNA)/small interfering (siRNA) act on mature mRNAs to decrease translation (mRNA)
Small nucleolar RNAs (snoRNA) = small RNA molecules that guide posttranscriptional base
modifications in tRNAs, rRNAs and snRNAs.
Long non-coding RNA (lncRNA) = RNA molecules that do not encode protein
o Some lncRNAs influence gene expression
Eukaryotic Cells: Three RNA Polymerases
RNA polymerase I synthesizes ribosomal rRNA
RNA polymerase II makes mRNA and some small RNAs
o RNA polymerase II “RNA pol II” = large multi-subunit enzyme
 Catalytic mechanism is the same
 RNA Pol II does not identify promoters -- must be recruited by transcription
factors (more specifically activators)
RNA polymerase III generates tRNA and other small RNAs.
RNA Polymerases
subunits homologous [to prokaryotic core enzyme are present in eukaryotic RNA polymerases]
o C-terminal domain (CTD)
L-47: Eukaryotic Gene Transcription (part 2)
Key concept: The combination of genetic and epigenetic effects determine whether a gene is
transcribed in each cell
o “Epigenetic” marks influence chromatin structure.
o “Genetic” effects are associated with the sequence of bases in a gene’s cis-acting
These sequence elements serve as binding sites for transcription factors that
regulate the gene
Eukaryotic Gene Organization
Key concept: Eukaryotic genes stand alone in single transcription units.
o Each gene has its own promoter(s) + terminator(s).
o Genes are organized with intron-exon structure
Exons code for sequence that will be included in a mature mRNA
Intron sequences will be eliminated during RNA processing
Primary transcript includes both introns + exon sequences
Primary Transcripts
Key concept: Introns are common in mammalian genes
o Primary transcripts of mammalian genes most often contain more intron sequence than
exon sequence.
 More intron = more useless junk!
Only 3% of yeast genes have an intron -- most only one
92% of human genes have an intron
o Exon lengths average ~170 bp
o Intron lengths vary greatly
Most are 100-5000 bp
Maybe 10% are >11,000 bp
longest known human intron is 1.1 kbp in a gene coding for a K channel on
chromosome 4.
Overview of Eukaryotic Gene Expression
Key concept: Activation of a gene requires both open chromatin structure and binding of
transcription factors (activators) that recruit RNA polymerase II to the promoter.
Eukaryotic Chromatin
Key concept: Chromatin structure is a product of the combined actions of epigenetic marks,
such as histone modifications and DNA methylation, and trans-acting protein factors that bind
to epigenetic marks or to DNA.
Condensed chromatin is “closed” and viewed as transcriptionally silent
inaccessible to most activators and chromatin remodeling complexes
accessible only by “pioneering” transcriptional factors.
Active transcription occurs in regions of “open” chromatin in the 10 nm fiber conformation.
Modifications of the Histone Tails
Histone tails provide areas for interaction b/t nucleosomes
o Histone tails include the N-termini of histones H3 and H4, plus both the N and C termini
of H2A and H2B
o Posttranslational modifications of the histone tails regulate chromatin structure
 Hypermethylation is associated with closed chromatin (meth  closed)
 Histone acetylation is found in areas of open chromatin (his  open)
Histone Modifying Enzymes
Histone acetyltransferase (HAT) acetylates lys in the histone tails
Histone deacetylase (HDAC) removes acetyl groups from the histone tails
Histone methyltransferase (HMT) methylates lys and arg in the histone tails
Actions of HATs & HDACs oppose each other  acetylation of histone tails is readily reversible.
o DNA methylation more stable!
Histone Code
Key concept: Many different [known] posttranslational modifications of the histone tails
o Some associated w/ closed and others with open chromatin
o Histone may have several modifications
Transcribed gene vs. silenced gene
Histone Acetylation Favors DNA Accessibility!
Histone acetyltransferase (HAT) is recruited to a gene locus by transcription factor and then
acetylates surrounding histones.
Acetylation Alters Nucleosome Interactions
Acetylation neutralizes positive charge on lysines in histone tails. Loss of the charge:
o reduces electrostatic attraction b/t nucleosomes (repels)
o weakens electrostatic interactions w/ DNA
o favors recruitment of a chromatin remodeling complex!
Transition to Open Chromatin
Chromatin remodeling complexes like as SWI/SNF manipulate nucleosomes
o SWI/SNF is recruited to local sites in the genome
 SWI/SNF either binds to acetylated histone or is recruited to a gene by proteinprotein interactions with a pioneering transcription factor
Chromatin remodeling complexes alter chromatin structure by:
 unwrapping DNA from nucleosomes
 repositioning nucleosomes
 evicting nucleosomes
Chromatin remodeling “spreads” to surrounding areas of the genome changing accessibility of
the DNA for activator proteins
Active Promoter Free of Nucleosomes
Key concept: combined actions of HATs + chromatin remodeling complexes (i.e. SWI/SNF )
together lead to open chromatin structure
transcription start site is devoid of nucleosomes to make room for assembly of the transcription
preinitiation complex
HATs bind near promoters and act to maintain open chromatin structure
Cis-acting Elements in Mammalian Genes
Key concept: Mammalian genes have many cis-acting elements that provide binding sites for
transcription factors
o Some are promoter proximal elements, others called enhancers are at distant locations
along the chromosome.
 transcription start site  within promoter region
 Promoter proximal elements often located upstream
o specific transcription factors bind to these sequences
 Enhancers located at promoter distal positions, cover ~200-500 bp, and contain several
different transcription factor binding sites
 human genome has 21000 protein coding genes, and perhaps as many as one million
Eukaryotic Sequence Specific DNA Binding Motifs
Key concept: Transcription factors = sequence specific DNA binding proteins
o These proteins typically have a modular design with a DNA binding domain (DBD) and an
activation domain (AD)
DBD  designed to position recognition helix into major groove forming chemical bonds w/
base pairs. ~ 80% of eukaryotic sequence-specific DNA binding proteins have one of the
following motifs:
o helix-turn-helix
o similar motif called homeodomain
o 1/2 zinc finger motifs:
 classical zinc finger
 nuclear receptor zinc finger
o 1/2 extended dimerization domains motifs:
 leucine zipper “bZip” proteins
 helix-loop-helix proteins!
Homeodomain Proteins
Homeodomain motif  named for a group of proteins important for developmental processes
(“homeotic” proteins)
o Motif is superficially similar to helix-turn-helix
o homeodomain has three α helices
 3rd helix = recognition helix
Classical Zinc-Finger Motif
Consensus sequence of classical Zn fingers
o … C-X 5 - C-X 3-(F/Y)-X 5-L-X2-3 - H-X3-4 - H …
Major structural features are:
o 2  strands and an  helix
o Zn coordinated by 2 cys & 2 his
o conserved phe/tyr & leu form a “strut” positioning the recognition helix.
Zn fingers can be used to recognize many different DNA sequences.
o Recognition helix AAs interact w/ base pairs
Multiple Zn Fingers in One Protein
Some Zn finger proteins have single Zn finger motif & bind as dimers/larger oligomers
Other proteins can have many Zn fingers arranged in tandem
o Each recognition helix makes sequence-specific bonds w/ base pairs
 these proteins bind as monomers
Zinc Fingers
Key concept: The recognition helix AAs facing the DNA major groove can be different for each Zn
finger. Therefore, each Zn finger recognizes a different base sequence
 Non-palindromic sequence
Nuclear Receptor Zn Finger
Nuclear receptor transcription factors bind DNA as dimers
o Some are homodimers, others are heterodimers
o Both subunits contribute a recognition helix
o Each subunit has two Zn fingers:
 Only first Zn finger has recognition helix
 second is a structural feature supporting recognition helix
 Four cys coordinate each Zn in both Zn fingers
Dimerization Domains and DNA Binding
Leucine zipper proteins have a series of leucines aligned along  helix
o leucines participate in protein-protein interactions for dimerization
o recognition helices are extensions of leucine-containing helices.
Helix-loop-helix proteins also have a dimerization domain
o extensions the  helices form the recognition helices
L-48: Eukaryotic Gene Transcription (part 3)
Key concept: Capacity for fine control of the level of gene expression in a cell is central to multicellular
organisms w/ complex genomes.
Overview of Eukaryotic Gene Expression
Key concept: Once chromatin structure is open, transcription factors (activators) bind
specifically to DNA and recruit RNA polymerase II
o RNA pol II cannot locate promoter sequences
No activators  no transcription
Activation of Transcription
Key concept: In eukaryotes, most gene regulation is positive, because activators are
required to recruit RNA pol II to a promoter. Before initiation of transcription, a very
large protein complex must be assembled on cis-acting regulatory elements of the gene.
Assembly is a multistep process requiring:
 binding of transcription factors (activators) to promoter proximal site and
 recruitment of coactivator proteins
Include other transcription factors, HAT, and chromatin remodeling
recruitment of mediator
assembly of transcription pre-initiation complex
Eukaryotic Activators
Key concept: Activators are transcription factors that exert positive gene regulation. In
eukaryotes, they have a modular design with a sequence-specific DBD (Zn finger), flexible hinge,
and an activation domain (AD). AD = area of protein-protein interaction w/ other proteins
needed to transcribe a gene.
o These include:
Transcription factor
chromatin remodeling complex
preinitiation complex
Activator DBD (i.e. zinc finger) binds to a specific DNA sequence. Activator binding may
occur at either a promoter proximal site or in an enhancer (distal)
Gene Activation by Glucocorticoid Receptor
Transcription factor “GR” = nuclear receptor acting as an activator for many mammalian genes.
Nuclear receptors have characteristic DBD composed of 2 zinc fingers from each dimer subunit
o GR binds to a hormone response element (HRE)
o HREs can be located at promoter proximal sites or in enhancers.
In this example, GR recruits a coactivator (Hic-5) that binds to its AD
o The GR-coactivator (Hic-5) complex subsequently recruits:
coactivator with HAT activity (p300-CBP)
mediator (MED1)
indirectly the preinitiation complex and RNA polymerase II.
Note that the only sequence‐specific binding is through GR ‐DBD!
o AD (activation domain) + DBD (DNA binding domain) = SSB (sequence-specific binding)
Transcription Factors also bind Enhancers
Key concept: Enhancers contain recognition sequences for several different transcription
factors. Together these transcription factors account for enhancer function.
For example, several transcription factor binding sites are clustered in the human interferon β
gene enhancer. These are bound by:
o the interferon regulatory factors IRF3 and IRF7
o the common transcription factors Jun/ATF2 and p50/p65 (NF κB).
Enhancer activity depends on DNA bending by high mobility group (HMG) proteins.
Key concept: Although some transcription factors directly interact with the preinitiation
complex, mediator provides primary means of communication b/t activators & preinitiation
complex (i.e. indirect interaction w/ preinitiation complex)
Mediator = large protein complex with >30 subunits
o Many mediator subunits make protein-protein contact w/ transcription factor ADs
o Mediator also makes protein-protein contacts w/ general transcription factors of
preinitiation complex.
Transcription Preinitiation Complex
TFIID binds at promoter (dp)
TFIIA may join the complex
TFIIB binds to DNA / TBP
TFIIF/RNA pol II joins complex by binding to TFIIB
TFIIE / TFIIH enter complex in succession
TFIIH is a complex with 2 distinct functions:
o DNA helicase  generate transcription bubble (heli-bubble!)
o Protein kinase  phosphorylates RNA pol II CTD to initiate transcription
TFIID finds Promoters
TFIID is a complex made up of TBP + many TAF proteins
o TATA Binding Protein (TBP) locates and binds to TATA boxes in eukaryotic promoters.
 TATA boxes are present in only about 10% of promoters
 TBP is an example of a minor groove DNA binding protein
o 13 TBP-associated factors (TAFs) function in vivo in recognition of other sequence
elements, such as the Inr and DRE sites.
HO Gene Activation
Order of events leading to HO gene activation in yeast:
o Pioneering transcription factor SWI5 is an activator that binds to an upstream enhancer
(-1200 to -1400).
o SWI5 recruits chromatin remodeling complex SWI/SNF to open the chromatin exposing
the histone tails
o GCN5 complex (HAT) enters to acetylate histones continuing the process of chromatin
o SBF activator binds at several sites in the HO gene 5’ promoter proximal regulatory
o SBF recruits mediator to interact with the preinitiation complex
o Preinitiation complex including RNA Pol II is assembled at nucleosome-free HO gene
promoter leading to transcription initiation.
Of Mice and Men
Genome wide studies in humans show single nucleotide polymorphism (SNP) at -355 kbp in
human kit ligand (KITLG) locus
o Chromosomal abnormality (inversion) upstream of the murine Kitl gene
Altered LEF Binding Site Changes Kitl Expression (blondes)
A →G SNP inhibits binding of activator LEF to enhancer
o Kitl gene expression is reduced enough to make visible difference in coat color (blondes)
Combinatorial Control
Key concept: level of transcription of a gene depends on occupancy of its cis-acting regulatory
elements by transcription factors.
o For transcription, binding of several activators to a gene is required
o Coordinate action b/t factors = combinatorial control
Note: eukaryotic genes typically have 6+ regulatory sites.
Negative Gene Regulation in Eukaryotes
Although eukaryotic gene expression is largely positive, negative gene regulation is also
important. Repressors are transcription factors that inhibit transcription by:
o Competitive binding to activator binding site = displacing the activator
o binding to activator = prevent interaction w/ mediator
altering assembly of preinitiation complex
providing docking site for HDAC
Repression by Chromatin Modification
mechanism for gene transcription down-reg. often involves modulation of chromatin structure
o repressor has a DBD (i.e. Zn finger) that binds to negative regulatory element in gene
o repression domain “RD” recruits histone deacetylase (HDAC) to remove acetyl groups
from the histone tails.
H4K16 in Chromatin Structure
HDAC targets H4K16ac catalyzing deacetylation to H4K16
o Positive charge at H4K16 favors conversion of 10 nm →30 nm fiber
 electrostatic interactions occur b/t H4K16 + acidic AAs on histones H2A and H2B
of the adjacent nucleosome
The Combination of Regulatory Proteins Dictates Gene Expression
weak activator + strong activator + strong repressor = activator neutralized by corepressor
DNA Methylation: Another Epigenetic Mechanism Affecting Transcription
Key concept: Hypermethylation of a CpG island silences promoter. Cytosine in the sequence CG
called a “CpG” is methylated by DNA methyltransferase (DNMT) at many sites within the
o Deacetylation or Methylation = prevent transcription
o Some promoter proximal regions of genes have a cluster of CG base pairs [“CpG island”]
 CpG island is a >200 base pair GC-rich element in which observed:expected of
CpG > 60%.
DNA Methylation
Methyl group of 5-methyl cytosine protrudes into major groove
o Binding of some transcription factors is blocked by methylation, others are indifferent to
it, and some require methylation to bind.
o Methylation affects factor binding!
Gene Silencing in Heterochromatin
H3K9me3is associated with condensed heterochromatin.
Key concept: Condensation to heterochromatin silences genes. Condensation depends on
actions of HMTs + chromatin associated proteins
o Histone methyltransferase (HMT) methylates H3K9 →H3K9me3
o H3K9me3 = docking site for heterochromatin protein (HP1)
o HP1:HP1 protein-protein interactions compact the chromatin
o HP1 recruits more HMT “spreading” the heterochromatin
Transcriptional Control of Pax6 Gene
Human Pax6 gene has many regulatory features typically found in regulated mammalian gene
o 3 promoters allow for development-dependent regulation, and defects in Pax6 are
associated with aniridia (i.e. iris development defects)
o Multiple enhancers direct expression from promoters in eye, brain, spine and pancreas
 Different complements of transcription factors bind to these enhancers to
activate tissue-specific transcription.
Promoter Proximal Factors that Regulate EDN1
Key concept: same gene may be subject to very different regulatory mechanisms in different cell
o stimulus for expression in one cell may not affect expression in another cell/organ.
Different cell types vary in their responses to various signals
o Each cell type has a different complement of receptors
o Binding a signaling molecule to its receptor activates second messenger systems
o As a result, transcription factors bind to response elements activating the gene (or not).
L49: Post-transcriptional RNA Processing
Makings of an RNA!
Key Concept: In eukaryotes, transcription yields primary RNA transcript (pre-mRNA) that is a
RNA copy of DNA coding strand. This primary transcript must undergo multiple levels of RNA
processing to generate a translation-ready mature RNA transcript (mRNA)
Components of a mature RNA transcript (mRNA)
Key Concept: A mature mRNA has 5’ & 3’ untranslated region (UTR) flanking the protein coding
sequence. This encodes the primary AA sequence for the proteins. The 5’ end has a “cap” and
the 3’ end has a “poly-A tail.”
pre-mRNA undergoes multiple RNA processing steps in the nucleus to produce a mature RNA
molecule (mRNA) ready for translation
mRNA is exported to the cytoplasm for translation
Generating a mature mRNA requires several steps
Key Concept: RNA pol II does not distinguish between coding (exons) and non-coding (introns)
DNA sequences
o primary RNA transcript (pre-mRNA) contains both!
o Need to differentiate b/t coding and non-coding sequences
RNA processing occurs co-transcriptionally in “transcription factories”
Key Concept: Transcription occurs at discrete sites within the nucleus termed “transcription
o RNA-processing occurs on-site, co-transcriptionally
The CTD of RNA Pol II serves as a docking site for RNA processing enzymes
YSPTSPS repeat is an unstructured segment in the Cterminal domain
o serines are phosphorylated by protein kinases
Phosphorylation of RNA Pol II CTD is necessary for:
o promoter escape
o docking sites for RNA processing enzymes including:
 capping enzyme
 cap binding complex
 cleavage/termination complexes
 RNA splicing factors
5’ Cap Assembly
5’ cap is assembled on the 5’ end of the mRNA molecules by “capping enzyme.”
o cap provides protection from 5’ →3’ exonucleases
o cap is also important for translation initiation
Transcription Termination and Poly(A) tail
signal for termination is AAUAAA sequence located in nascent RNA
o RNA pol II extends transcript through and beyond the termination sequence.
CTD bound termination factors recognize the cleavage sequence and bind to the RNA
o Termination occurs with cleavage of completed primary transcript (pre-mRNA) and RNA
pol II dissociates. This occurs after binding of poly-A polymerase
3’ end undergoes polyadenylation by “poly-A polymerase.”
o “poly-A tail” is 100-250 nucleotides long
o Creation of the “poly-A tail” is template-independent
o “poly-A tail” protects RNA from degradation & serves as binding site for proteins that
facilitate RNA turnover / translation
Genes in higher organisms undergo splicing to remove introns
Key Concept: Most mRNAs in mammals undergo splicing to remove introns and join exons
o Exon sequence in the mRNA is exactly as they are organized 5’ to 3’ along the gene.
Components of the Spliceosome
Key Concept: Spliceosome is the nuclear complex responsible for removing intron sequences
and ligation of exon sequences 5’  3’ to generate mRNA (“mature”)
>100 different proteins and RNA molecules function in splicing. Many are contained in the small
nuclear ribonuclear protein (snRNP) complexes that make up the bulk of spliceosome
o Each snRNP complex contains ~10 proteins & unique snRNA molecule
o snRNPs are named for their respective snRNA molecule
 Each snRNA has unique 100-200nt sequence
Components of the Spliceosome
Each snRNP has a unique function:
o U1 snRNP binds to the 5’ splice site
o U2 snRNP binds to the branch site and aligns it for the 1st splicing reaction
o U4 snRNP binds to and sequesters U6 snRNP
o U5 snRNP aligns the pre-mRNA for the 2nd splicing reaction
o U6 snRNP promotes catalysis of splicing reaction
Splice sites are identified by consensus sequences
Key Concept: GU/AG rule governs RNA splicing
o 5’ end of an intron is GU (GU5). 5’ splice site is immediately upstream
o 3’ end of an intron is AG (AG3). 3’ splice site is immediately downstream
o Within intron is the branch point A. branch point = 20–50bp upstream of 3’ splice site
o Pyrimidine-rich region is between the branch point and 3’ splice site.
Splicing Mechanism
first step of splicing begins with defining the intron
o U1 snRNP is recruited to 5’ splice site
o U2 snRNP is recruited to branch point A nucleotide
snRNA in the snRNP base pairs to the pre-mRNA
U1 snRNA anneals to 5’ splice site
U2 snRNA anneals to branch site such that A nucleotide bulges out
o exposes 2’ OH for 1st transesterification reaction
Splicing Mechanism
U1 snRNP is recruited to the 5’ splice site 16
U2 snRNP is recruited to branch point A nucleotide
U5 snRNP and U4 snRNP/U6 snRNP bind to the complex to complete spliceosome assembly
Dynamic rearrangement the spliceosome initiates splicing activity
o Both U1 snRNP + U4 snRNP exit the complex
o Spliceosome  now in catalytically active conformation
Splicing: Catalytic Mechanism
In catalytic conformation U2, U5, and U6 snRNAs base pair to each other aligning the branch site
with the 5’ splice site for the 1st transesterification reaction
U2, U5, and U6 snRNAs rearrange, aligning the 5’ and 3’ splice sites for the 2nd
transesterification reaction
Intron product released as “lariat” structure
Exons joined by formation of splice junction
Two-step transesterification
1st transesterification:
o branch point A nucleotide attacks G nucleotide of 5’ splice site
2nd transesterification:
o 3’ end of 5’ exon attacks G nucleotide of 3’ splice site
o leads to formation of splice junction (exons) & release of intron lariat (introns)
Alternative Splicing: Expanding the coding capacity of the genome
Key Concept: Alternative splicing = mechanism for developmental or tissue-specific production
of differing mRNAs from a single gene.
o Refers to programed inclusion/exclusion of exons in different tissues at different stages
of development
o ~92% of all genes are alternatively spliced
o ~1/3 of all hereditary diseases are thought to have a splicing component –
o Alternative splicing expands the coding capacity of the genome
Splicing factors bind to the pre-mRNA to define alternative splicing events
Key Concept: Splicing factors bind to key sequences in pre-mRNA to drive splice site
recognition by the spliceosome and define alternative splicing choices. Due to different
complements of these proteins in different tissues and cell types, exons recognized in one
cells may not be used in another
Several key sequences in pre-mRNA that splicing factors bind to:
 ESE: Exonic Splicing Enhancer
 ISE: Intronic Splicing Enhancer
 ESS: Exonic Splicing Silencer
 ISS: Intronic Splicing Silencer
SR/hnRNP proteins = splicing factors that bind to these sequences & drive splice site usage
Differing Alternative Splicing Mechanisms
Key Concept: Many different alt. splicing mech’s that expand coding capacity of the genome
o Alt. splicing not only differs between tissue types in a single organism, but also varies
between different species
SMA: Molecular Mechanism
All individuals have 2 copies of the SMN gene
C  T mutation in SMN2 leads to inefficient inclusion of exon 7 (mutation disrupts a key ESE)
FDA approves landmark treatment for SMA using antisense oligonucleotides (ASOs)
ASO binds to the SMN2 pre-mRNA and blocks a key ISS bound by a hnRNP protein; leads to
inclusion of exon 7 and restoration of the functional SMN2 protein
mRNA turnover
Once RNA degradation has begun, most of mRNA molecule is reduced to nucleotides by a large,
multi-enzyme complex in the cytoplasm called the exosome.\
L-50: Translation “Protein Synthesis” (part 1)
Key concept: coding sequence of mRNA carries info. needed for primary structure of 1 protein
o protein synthesis = translation
o occurs on enzyme complex = ribosome
Principles of the Genetic Code
Key concept: codon in mRNA = series of 3 bases specifying one AA in a protein
o Code is read 5’→3’, and the protein is synthesized from the amino- (N-) to the carboxyl(C-) terminus
Coding sequence (cistron) or open reading frame (ORF) in a mRNA is a continuous series of
o In bio, coding sequence begins w/ start codon (AUG) and ends at stop codon.
o One protein encoded in the coding sequence in mRNA
o Genetic code is non-overlapping.
The Universal Genetic Code
Key concept: genetic code is the same in prokaryotic & eukaryotic organisms
o Genetic code has 64 possible codons
 ex. CAU encodes histidine
o Code is “degenerate”  some AAs are encoded by more than one codon
 leu, arg and ser have have 6 codons each
 met and trp only 1
 all others 2-4
o AUG = initiation codon
o 3 stop codons (UAA UAG and UGA) -- do not code for an amino acid
o “wobble base” = 3rd base in a codon.
 advantage of the wobble base = <61 tRNAs needed to cover genetic code
 For example, Ile-tRNA has the anticodon IAU, so it base pairs to all three
isoleucine codons AUA, AUU and AUC.
Mutations Affecting the Genetic Code
Key Concept: Mutations occur in DNA, but can affect the sequence of a protein
Single-base substitutions:
o nonsense mutation introduces a stop codon. (i.e. GGA →UGA)
 “stop” the nonsense!
o missense mutation replaces one AA codon with another. (i.e. GGA →GUA)
 missense  misplace  replace!
o silent mutation changes DNA sequence w/o altering encoded protein. (i.e. GGA →GGU)
 “silent killer” (no alteration)!
Frameshift Mutations
Frameshifts = caused by insertion/deletion w/i coding sequence, but change isn’t multiple of 3 bases
o frameshift often leads to an early stop codon truncating a protein
Mutations that affect RNA splicing often generate frameshifts
In-frame deletion of UUC (phe) codon in CFTR = most frequent mutation associated w/ CF
Transfer RNA (tRNA) is an Adaptor Molecule
tRNAs deliver amino acids to the ribosome. tRNAs:
o 75-93 nt long
o form a “cloverleaf” secondary structure
o have many modified bases
o have AAs bound at the 3’ end.
Primary determinants for AA selection are bases #1 and #2 of the mRNA codon
 tRNA anticodon base pairs to codon in antiparallel orientation.
Wobble Bases
Base pairing between tRNA-mRNA:
o Only Watson-Crick base pairs occur between codon bases #1 and #2 and anticodon
bases #2 and #3.
o Non-canonical base pairs are allowed at wobble bases (codon #3 to anticodon #1)
o anticodon U A or U G
o anticodon G C or G U
o anticodon inosinate (I) I A or I U or I C
Tertiary Structure of tRNA
Base pairing between D & T ΨC arms generates a relatively rigid twisted L conformation of tRNA
L shape  essential for positioning of AA into catalytic site of a ribosome.
Aminoacyl-tRNA Synthetases
Each amino acid requires its own aminoacyl-tRNA synthetase.
o Reaction: amino acid + tRNA + ATPMg2+ aminoacyl-tRNA + AMP + PPi
“The Second Genetic Code”
Key concept: Aminoacyl-tRNA synthetases read variant sequence elements to select the correct
tRNA to be charged with a particular AA
Locations of determinants on tRNAs recognized by amino acyl-tRNA synthetases are not fixed
o anticodons contribute to AA selection
o Other sites also contribute
Prokaryotes have a single ribosome -- 70S ribosome
Eukaryotic cells have at least 2
o 80S ribosomes located in the cytoplasm and endoplasmic reticulum
Polysomes = multiple ribosomes bound to a single mRNA. Each ribosome within a polysome
synthesizes a copy of the same protein
o provides for very efficient utilization of an mRNA
Organization of Bacterial Genomes
Operons are coordinately regulated gene clusters
o Open reading frames arranged 5’ 3’ in a transcription unit
o Use of one promoter and terminator yields a polycistronic mRNA
 atp operon encodes nine proteins
Ribosome meets mRNA Molecule
Shine-Dalgarno / ribosome binding sequence (RBS) = site on a mRNA where ribosomes bind to
initiate translation
o RBS = 8-13 nt purine-rich element in mRNA
o Start codon = AUG encoding N-formylmethionine (fMet)
 Key concept: Although fMet is always the first AA in a prokaryotic protein, the
second and each subsequent amino acids can be any one of the 20
L-51: Translation “Protein Synthesis” (part 2)
Prokaryotic 70S Ribosome
Ribosomes have three tRNA binding sites:
o A site for aminoacyl tRNA
o P site for peptidyl tRNA
o E site for empty
Prokaryotic mRNA Binds to 30S Subunit
Key concept: Initiation Factors (IF) act on the 30S ribosomal subunit
o IF-1 blocks premature tRNA binding at the A site
o IF-3 blocks premature binding of 50S subunit
o mRNA binds to the 30S subunit by base pairing between the 16S rRNA and the RBS
o As a result, the AUG start codon is positioned at the P-site
o IF-2-GTP binds charged initiator tRNA (fMet-tRNA) and escorts it into the P-site
 mRNA start codon base pairs with the fMet-tRNA anticodon
Completion of the Translation Initiation Complex
50S subunit binds forming the intact 70S ribosome
o complex is now a complete translation initiation complex:
 A site is vacant
 P site has fMet-tRNAfMet
 E site is vacant
Aminoacyl-tRNAs bind to the A Site for Elongation
Key concept: Elongation factors (EFs) act on the 70S ribosome. EF-Tu-GTP binds an aminoacyltRNA in the cytoplasm and delivers this charged-tRNA to the ribosomal A site. -- Base pairing
occurs between the aminoacyl-tRNA anticodon and the second codon in the mRNA. -- EF-Tu-GTP
hydrolyzes GTP GDP + Pi and exits. -- A site has an aminoacyl-tRNA -- P site contains fMettRNAfMet -- E site remains vacant -- EF-Tu-GDP is recycled to EF-Tu-GTP by EF-Ts.
First Peptide Bond
Key concept: ribosome is a “peptidyl transferase”. -- tRNA positions the second residue of the
chain into the catalytic site. -- The free amino group on the acyl-tRNA in the A site attacks fMettRNA in the P site. -- fMet is transferred to nascent chain in the A site forming the first peptide
bond. -- Uncharged tRNA is left in the P site.
Ribosome Translocation
The EF-G-GTP “translocase” moves the mRNA through the cleft in the ribosome. -- EF-G-GTP
binds near the A site. -- GTP → GDP results in shifting the ribosome one codon along the mRNA.
-- A site is again vacant. -- P site contains dipeptidyl-tRNA. -- E site has an uncharged tRNA that
will soon dissociate
Key concept: Process is repeated for each AA in the nascent protein. -- EF-Tu brings the next
aminoacyl-tRNA into the A site. -- A new peptide bond is formed. -- EF-G translocase repositions
the next codon into the A site. -- Translation elongation occurs at 20 residues/sec.
Termination at a Stop Codon
Key concept: Translation is terminated when ribosome reaches a stop codon (UAA, UAG, UGA).
At a stop codon, the ribosome pauses and waits for a new charged tRNA
Release Factor (RF) binds in the A site. -- RF-1 recognizes UAG or UAA. -- RF-2 recognizes UAA or
UGA. -- Release factors activate peptidyl-transferase hydrolyzing the peptidyl-tRNA bond
terminating translation. -- Ribosome recycling factor (RRF) and EF-G dissociate the complex. -IF-3 rebinds to 30S subunit.
Antibiotics Target Translation
Puromycin is a very effective translation inhibitor
Eukaryotic Translation is in the Cytoplasm
Key concept: Eukaryotic translation is very similar to bacterial translation, but important
differences exist
o mRNAs synthesized & RNA processed in the nucleus
o mRNA transferred to the cytoplasm -- may be intercepted by miRNA-RISC
o 80S ribosome carries out translation
RNA Interference (RNAi)
Key Concept: RNAi is a group of mech’s in eukaryotes involving small ncRNAs that reduce
expression of specific genes. MicroRNA (miRNA) & small interfering (siRNA) act on mature mRNA
in the cytoplasm
miRNA regulates both whether an mRNA can be translated and its stability
siRNA regulates mRNA levels by direct endonuclease cleavage
miRNA Synthesis
pri-miRNA stem-loop structure is excised by the Drosha-DGCR8 complex. -- Pre-miRNA is
exported to cytoplasm. -- Pre-miRNA undergoes a second cleavage by Dicer (22-25 base
pairs). -- The single-stranded miRNA / siRNA is then loaded into the RNA Induced
Silencing Complex (RISC).
miRNA-RISC Action
miRNA/RISC represses translation and increases turnover of target mRNAs. -- A miRNA
has imperfect base pairing to the target mRNA 3’ UTR
siRNA/RISC cleaves a target mRNA using an endonuclease activity. -- siRNA cleavage
requires perfect base pairing between the siRNA and the target mRNA coding sequence.
Complementary Partially Complementary Translational Repression mRNA Cleavage
Target mRNA Recognition
siRNA action occurs if there is ideal base pairing between siRNA and a target mRNA (“si)
o Usually target site is located in the coding sequence of an mRNA
miRNA action occurs if the base pairing is imperfect (“mal”)
o usually the target site is in the -- Usually the target site is in 3’ UTR of an mRNA
Eukaryotic Preinitiation Complex
In eukaryotes, the details of translation initiation varies. The major differences are: -- eIF2 brings
charged tRNAi to the 40S subunit before mRNA arrives. -- eIF-4 complex binds to the mRNA 5’
cap and brings it to the 40S. -- The ribosome scans the mRNA 5’→3’ to locate the AUG within a
Kozak sequence. -- 60S binds to complete 80S ribosome.
Eukaryotic Initiation Complex
Translation in eukaryotes initiates at Kozak sequence
o Most often, first AUG in an mRNA is selected as the translation start site
Eukaryotic Polysomes
Translation initiation occurs in succession as each new ribosome initiates at the Kozak sequence.
o Many ribosomes on a single mRNA = polysome
o Each ribosome is making a copy of the same protein
Ricin Toxin
Ricin = among most toxic natural substances known
o LD50 ~ 22μg/kg (or 1.76 mg for an adult)
o enters as single polypeptide, then cleaved into ricin toxin A & B
o “RTA” depurinates 28S rRNA at A4324 eliminating an elongation factor binding site.
L-53: DNA Damage and Repair
Key concept: A mutation is an accidental change in the sequence of bases in the genome.
o Result from DNA damage that is an ongoing threat to the cell
o Some damage is spontaneous, but often caused by environmental factors
o Vast majority of DNA damage is single-strand breaks!
 Double strand breaks are RARE
Mutagens are compounds that promote changes in DNA sequences
o Often these chemicals are also cancer-causing carcinogens.
“Ames test”
o Salmonella typhimurium (defect in a his gene)  medium lacking his ± test compound
disc at varying concentrations  score growth to detect reversion of mutation
Deamination of DNA Bases
Deamination of C→U and 5-meC→T occur spontaneously in a human cell.
Deamination of A & G occurs spontaneously, but at a slower rate.
common chemicals can accelerate rate of deamination
o EX: sodium nitrate, sodium nitrite, nitrosamine
DNA Damage Must be Fixed!
Key concept: Mutations in genome = permanent and subsequently inherited by daughter cells.
Oxidation of DNA
Reactive oxygen species “ROS” = generated by respiration
Hydroxide free radical (OH) inserts into either G or T
caused by hydrolysis of the N--glycosyl bond linking a purine base to the sugarphosphate
Depurination yields an “abasic” site (“AP” site)
Spontaneous Methylation of G in Cells
Non-enzymatic methylation of G by an S-adenosylmethioninedependent mechanism yielding 7methylguanine.
DNA Damage: Alkylating Agents
Alkylating agents covalently modify bases in DNA
o Alkylation distorts DNA double helix
Thymine Dimer
Ultraviolet radiation is a common cause of DNA damage (UV = bad)
o UV results in the formation of a cyclobutane ring between adjacent two pyrimidine rings
o especially common for pairs of thymines forming a thymine-dimer
o cyclobutane ring kinks the axis of the DNA helix
DNA Strand Breakage
Ionizing radiation causes DNA strand breakage (IR = bad)
Major sources of ionizing radiation are cosmic rays, X-rays, and radioactive materials
Radiation can cause either:
o single strand break (nick)
o double strand break -- may be staggered!
Radiation also Damages Bases in DNA
DNA Repair
Key concept: A cell invests enormous resources to protect integrity of its genome. Collectively
these systems are called “DNA repair”
Characteristics of DNA Repair Mechanisms
Most DNA repair mechanisms can be broken down into 4 distinct phases:
o 1) Recognition of the lesion
o 2) Excision of the lesion.
o 3) Resynthesis of the DNA
o 4) Ligation of loose ends
Eukaryotic Mismatch Repair
MutS binds to single base pair mismatches or “indels” of 1-3 bases
o Alpha = single mismatch
MutS binds larger indels of up to 13 bases
o Beta = bigger indels
MutL binds along with PCNA activating MutL endonuclease to nick the new strand
o Lalpha = activates endo
Base Excision Repair
DNA glycosylase cleaves the N- -glycosyl bond making an abasic site (AP site).
Abasic sites arise from activity of DNA glycosylase/depurination
o AP endonuclease initiates repair of abasic sites by cleaving the sugar-phosphate
backbone at the abasic site
Base Excision Repair (continued)
DNA Pol I has both the 5’ →3’ exonuclease & DNA synthesis activities
o In eukaryotes DNA Pol  serves this purpose
o DNA Pol I and DNA Pol  are high fidelity polymerases
DNA ligase seals the nick!
Prokaryotic Nucleotide Excision Repair
“NER” repairs lesions that distort the DNA double helix, such as thymine dimers or alkylation
o UvrABC excinuclease makes two nicks in the damaged strand flanking the lesion
o UvrD helicase excises the damaged DNA leaving a gap
o DNA Pol I fills the gap
o DNA ligase seals the nick
Random flashcards

39 Cards


30 Cards

African nomads

18 Cards


14 Cards

Create flashcards