Biochemistry Chen Yonggang Zhejiang University Schools of Medicine Regulation of Gene Expression: Putting information to work Information encoded in DNA is of no use unless it is expressed • Each cell has many genes but few are active at any time • Although all cells have the total complement of genes only a portion are expressed • Cells must respond to the situation at hand and do what is needed for survival • Cells only have so much energy and must use it efficiently Bacteria are simpler than eucaryotic cells but have similar processes • In trying to understand the design of living things we study bacteria as models for life • Once processes are understood in bacteria we find that the same processes function in eucaryotes, but with much more complexity • Bacteria don’t complain, need no sleep and have no protectors. In fact no one worries whether they live or die Gene expression systems be understood by studying two models • Biochemical processes can be subdivided into two categories – Anabolism • The process of building complexity – Catabolism • The process of breaking down complexity • Living things oxidize fuels and use the energy to synthesize molecules and do work Life processes are controlled by enzymes----- gene products • For each process activated in the cell, genes must be expressed • The gene is transcribed to form mRNAs and proteins(enzymes) are made • The enzymes carry out catabolic (degradative) steps or in other cases, anabolic (synthetic) steps The easiest bacterium to study is E. coli • Since it lives in the human colon, there is an easy source • It is easy to isolate, it lives on almost any fuel source • It grows at a comfortable temperature (37o C) • A lot is known about it including its entire chromosomal DNA sequence The energy for growth and function comes from food • E. coli prefers glucose as a fuel. Glucose, a sugar can be oxidized to provide the energy for ATP synthesis and to generate the Proton Motive Force, energy sources used by the bacterium to drive its life processes • However, in the colon it gets a variety of foods depending on the whims of its host • When we drink milk, E.coli needs to use lactose, milk sugar in place of glucose E. coli does not normally express the enzymes needed to use lactose • The genes required to utilize lactose are located together in a segment of DNA and are transcribed together to make a polycistronic mRNA, containing information for making all the enzymes needed • Such a coordinately expressed set of genes is called an Operon The lac operon, an inducible operon McKee 18.28 The lac operon contains regulatory and structural genes The Operon is organized in the chromosome Lac I CAP Lac P Lac O LacZ LacY LacA Lac P is the Promoter for the operon Lac I, CAP and LacO are regulatory genes LacZ, LacY and Lac A are structural genes for the proteins needed to metabolize lactose Lactose is a dissacharide To use lactose, 3 structural genes are required at the same time • Lac Z encodes b -galactosidase which cleaves the b -galactosidic bond joining the glucose and the galactose • Lac Y encodes a lactose permease, a membrane protein which transports lactose into the bacterium • Lac A encodes a transacetylase, an enzyme whose role is not well understood Structure of Lac operon 1 Regulatory genes and a set of structural genes 2 Promoter: LacP 3 Regulatory genes: LacI---encodes a repressor, LacO—can be bound by the repressor, CAP site structural genes: LacZ--encodes b -galactosidase , LacY--- encodes a lactose permease, LacA-encodes transacetylase The LacI gene encodes a control protein, a repressor • Lac I is located upstream of the lac promoter Lac P and thus is not under regulation, it is encodes a constitutive protein, the repressor which has high affinity for the Lac O gene, called the operator • Normally the repressor binds to the operator forming a repressor/operator complex The repressor/operator interferes with RNA polymerase binding • The operator overlaps the Lac P gene so when repressor binds it blocks access of RNA polymerase to the promoter • The repressor blocks the transcription of the Lac operon • Under normal circumstances the lac operon is not expressed, no proteins are synthesized, it is a silent region of the chromosome When lactose is available, the Lac operon is activated • An enzyme(β-galactosidase) converts a small portion of the available lactose to a modified molecule, allolactose(a β-1,6isomer of lactose) • Allolactose is an inducer of the Lac operon, • As an inducer, it turns on the operon by virture of its affinity for the repressor • When allolactose binds the repressor undergoes a conformational change Induction of the lac operon McKee 18.29 The repressor/inducer complex does not bind to the operator • Removal of the repressor from the operator opens the promoter for RNA polymerase binding allowing transcription of the lac operon and the utilization of lactose • In the laboratory, IPTG(Isopropylthiogalactoside), a gratuitous inducer (is not cleaved by bgalactosidase) is used to study the process Control of a catabolic operon Devlin 8-3 Mutations in genes affect operon expression • Mutations in the Lac I gene yield a repressor which always binds the operator, such Repressor Constitutive Mutations never allow Lac activation • Mutations in the Lac O gene yield an operator which cannot bind repressor, such Operator Constitutive Mutations never allow the Lac operon to be repressed Glucose is a breakdown product of lactose and the preferred fuel • E. coli prefers to use glucose • The presence of glucose inactivates another protein Adenyl Cyclase Normally Adenyl Cyclase mediates the reaction ATP 5’,3’-cAMP + PPi • The availability of glucose reduces cAMP • When glucose is gone, Adenyl Cyclase is activated and cAMP is formed CAP encodes a protein, Catabolite gene Activator Protein • When lactose is the only fuel cAMP is synthesized • cAMP binds Catabolite gene Activator Protein • The cAMP-Catabolite Activator Protein Complex is a DNA binding protein which creates an abrupt kink in the DNA in the Lac P or promoter • The kink increases RNA polymerase binding to the promoter, turning on the operon cAMP-CAP exerts positive control, enhancing transcription of the operon Devlin 8-6 The Lac operon is an inducible system • Many catabolic pathways employ induction as a regulatory motif • An inducer, either the molecule to be catabolized or a related molecule induces the transcription of the genes needed for its catabolism The lac operon, an inducible operon McKee 18.28 Anabolic, biosynthetic operons are regulated differently • Tryptophan is an important amino acid, further the sythesis of tryptophan uses energy and critical starting materials • When tryptophan is available, E. coli does not need to engage in its synthesis • Under these conditions the Trp operon is repressed • Anabolic processes are repressed when not needed The Trp operon is similar to the Lac operon in structure • It has regulatory genes and structural genes • There are 5 structural genes encoding 3 enzymes needed for Tryptophan synthesis (multiple subunits for 2 enzymes) • The genes are arranged as seen before TrpP TrpO Trpa Trp E TrpD TrpC TrpB TrpA Trpf • The promoter Trp P and operator Trp O are upstream from the structural genes Trp R, not associated with the operon constituitvely synthesizes a repressor • The trp repressor binds to the operator only when complexed with a corepressor, a small molecule(in this case, tryptophan) • Thus, normally the operon is turned on, however when the corepressor is available, the operator will be repressed, turned off • The Trp operon is a repressible operon as are many anabolic, biosynthetic operons • In the absence of corepressor it is derepressed The trp operon is repressible In addition to gene regulation, the pathway is regulated by enzymes • The first enzyme in the pathway is Anthranilate Synthetase, encoded by genes TrpD and TrpE • As is common in anabolic pathways, this first committed enzyme in the pathway is regulated by Feedback Inhibition • Thus, if Tryptophan is present in the medium the enzyme is inhibited and this stops the synthesis of new tryptophan Such enzymes, called FluxDetermining Enzymes are regulated • Flux-determining enzymes have an active site which carries out the enzymatic function • They have another(allo) site(steric) which can bind the end product of the pathway • Binding the feedback inhibitor to the allosteric site alters the conformation of the enzyme and diminishes its activity Tryptophan synthesis is controlled by gene expression and enzyme activity • Ability to synthesize tryptophan is controlled by the ability to transcribe an mRNA- Repressible • The flux through the pathway is controlled by allosteric feedback inhibition-Inhibitable • The amount of tryptophan synthesized meets the needs of the bacterium Eucaryotic Regulation • Most eucaryotic gene expression is regulated by the same kinds of processes seen in simpler organisms • Induction and Repression are common model in eucaryotes • Because of the differences in gene organization expression does not involve operons in eucaryotes Scientists used to believe that based on evolution, simpler is better • As we study gene expression we find more and more complexity • The complexity does not appear to be careless or repetitive • Novel designs for precise control seem to be necessary and are almost unimaginable until discovered • Makes one wonder? Eucaryotic genes are organized in logical arrays • As in procaryotes, many eucaryotic genes are clustered by function and need • Ribosomal RNA genes are clustered into multiple tandemly repeated arrays • Histone genes are also clustered and tandemly repeated in some organisms • Most genes are present in single copies Gene clustering may be related to expression • The different globin genes, expressing the various polypeptides of hemoglobin are clustered • Thus during development moving from early to fetal to adult hemoglobin synthesis utilizes co-located genes Gene expression in eucaryotes occurs from chromatin, not DNA • During expression of genes in Drosophila segments of DNA in the polytene chromosomes become puffed during development • Puffing is related to the transition from condensed (heterochromatin) to dispersed chromatin(euchromatin) • Euchromatin is transcriptionally active • Transcription is hormonally induced The structure of transcriptionally active chromatin is unique • DNAse I digestion patterns of euchromatin and heterochromatin are different • When condensed, the fragments produced are larger than when dispersed • Covalent modification of histones, such as phosphorylation and acetylation alters digestion patterns • Digestion hypersensitivity is related to bound regulatory proteins The DNA of active genes may be altered either in structure or access • Transcriptionally active globin genes are more sensitive than quiescent genes to DNAse I digestion • Methylation of Cytosines is lower in actively transcribing genes. Since methylation blocks access to the major groove in DNA---demethylation could open the DNA to the binding of regulatory proteins Gene expression is primarily controlled by transcription • As in procaryotes, the expression of genes depends upon the transcription of information • Regulation of eucaryotic transcription is more complex than that for procaryotes---more components • Transcription requires access to DNA • Nucleosomes must be disrupted prior to transcription Eucaryotic RNA polymerases cannot act alone • Eucaryotic RNA polymerase II transcribes DNA to form hnRNA and mRNA to form active proteins • For RNA polymerase II to function, a preinitiation complex must be formed at the TATA box immediately upstream of the RNA Pol binding site • Formation of the preinitiation complex is dependent upon binding domains for general transcription factors (TFs) which require access to the major groove of the DNA DNA access can be modified • Acetylation of histones reduces the lysinederived positive charge on histones. Since the negative charges on the DNA backbone bind to + charged histones, DNA binding is reduced • Protein (SWI/SNF) complexes interact with RNA Polymerase II and ATP to open access to DNA sequences for TF binding Methylation of cytosines blocks access to TFs Activation of transcription depends on forming an initiation complex • Activators and inhibitors of transcription alter the probability of forming the complex • In contrast to procaryotes where one or two proteins promote transcription at a promoter, eucaryotic regulation depends on many TFs, both general and specific factors acting at many different sites in the DNA • Each transcription factor has a specific role In general, all transcription factors have two domains • Transcription factors have a protein binding domain and a DNA binding domain and may also bind co-activators • DNA binding domains are sequence specific • DNA binding domains are conserved across species • Common motifs are seen throughout all eucaryotes There are many general TFs required for transcription • One of these factors (TFIID) is a large complex which contains among other proteins a TATA binding protein(TBP) • This complex serves as the foundation for the assembly of the initiation complex at the TATA box(-27 bp) • Binding of this complex causes a large distortion in the DNA double helix The TATA box anchors the initation complex Devlin 8-22 While binding to the TATA box is essential, it is not sufficient • Eucaryotic promoters are defined as all sequences which affect gene transcription • Thus eucaryotic promoters require multiple transcription factor binding sites • CAAT and GC boxes are often components • Other sequences which bind such effectors as hormone receptor binding sites are called response elements • Enhancers are distant factor binding sites Eucaryotic genes can have multiple promoters • Since “promoter” denotes all sequences involved in transcription, one gene, activated by multiple events will have multiple response elements and even enhancers • Thus a gene may be induced or repressed in concert with other genes in response to varying stimuli • Stimuli are transduced by specific transcription factors which bind to response elements and generate the induction or repression Four common DNA binding motifs are used in TFs • Helix-turn-helix (H-T-H) motif proteins bind in the grooves of DNA straddling the strand • Zinc finger motif proteins bind in the major groove of DNA and recognize specific sequences • Leucine zipper motif proteins form a dimer with another protein, it can scissor across and recognize DNA sequences • Helix-Loop-Helix (H-L-H) motif proteins are similar to HTH Transcription factor DNA binding domains are sequence specific • All appear to bind in the major groove • Most have dyad symmetry • All have strong secondary structure character • Each has variation in its expression • Most are very small in comparison with the transcription factor as a whole Helix-turn-helix proteins are found in both procaryotes and eucaryotes • The cro proteins which regulates viral lysogeny/lytic phases employs an H-T-H motif • All H-T-H proteins employ a 20 amino acid sequence organized into a 7 aa a helix-a 4 aa non-helical turn, followed by a 9 aa helix • Generally, these are repeated across a dyad axis to form a symmetrical DNA binding domain HTH domains bind in the major groove of DNA Devlin 8-24 Each domain has a specific role • The 9 amino acid helix is the DNA binding domain which recognizes the DNA sequence through interaction with hydrophobic amino acids such as valine or leucine • The 4 amino acid turn drapes over the polynucleotide strand • The 7 amino acid helix stabilizes the binding to the DNA Specific binding occurs in the major groove Devlin 9-19,20 The Zinc finger motif binds specific sequences in the DNA • Zinc fingers are so named because of the looping nature of the structure which can wrap in the groove of the DNA • Zinc is coordinated between 4 cysteines or 2 histidines and 2 cysteines • Many zinc fingers can exist within a single transcription factor, increasing the specificity of binding Zinc fingers can wrap in the major groove of DNA Devlin 9-22 Zinc fingers mediate hydrogen bonding • Sequential multiple zinc fingers can probe multiple turns of the DNA double helix, binding specific patterns of H bonds • Some zinc finger motifs use one finger to recognize and an adjacent finger to stabilize binding Leucine zippers use dimeric helical molecules to bind DNA • Leucine zipper use antiparallel proteins to scissor across the DNA double helix • They recognize DNA sequences and allow specific binding in the major groove • The leucines give rise to strong a helical structures • They form hydrophobic heterodimeric or homodimeric binding structures Leucine zippers are common motifs Devlin 8-26,9-25 The Helix-Loop-Helix is a variation on the HTH motif • HLH proteins such as TFIID (TATA binding) use a dyad symmetric HLH to bind specific sequences • Conformational changes induced by interaction with other transcription factors shift the attachment to the specific DNA sequences HLH proteins bind to DNA sequences Devlin 8-27 Transcription is controlled through common mechanisms • Primary binding of TFIID to the TATA box provides the foundation of the preinitiation complex • Binding of specific initiation factors to recognition sequences more remote from the gene interact through DNA folding to form the initiation complex • Binding of cofactors activates the transcription factors Initiation of transcription • Each transcription factor has one or multiple specific DNA binding domains • Each specific transcription factor is activated in response to a cellular event • The initiation complex recruits chromatin modifying enzymes such as acetylases to loosen the DNA structure • The complex recruits RNA polymerase II to initiate and elongate the mRNA Transcription is precisely regulated to meet the needs of the cell Devlin 8-30 Questions for you to think about • How could proto-oncogenes such as ras, fos and myc modify cell function to give rise to cancer? • Why are we concerned about the effects of carcinogens, UV light and retroviral infection with regard to tumor suppression? • What are heat shock effector elements? • What do Immunoglobulin heavy chain M and G genes and hemoglobin e, g, and b chain genes have in common? (hint: different genes are expressed under different developmental and functional conditions)