A Symbolic and Graphical Gene Regulation Model of the lac Operon

advertisement
IMS 2003, 5th International Mathematica
Symposium, London, Great Britain.
A Symbolic and Graphical Gene Regulation
Model of the lac Operon
Garret Suen and Christian Jacob
Department of Computer Science
University of Calgary, Canada
{sueng, jacob}@cpsc.ucalgary.ca
1
Abstract
We present a symbolic, grammar-based model for the classic lac operon gene regulation
system implemented in Mathematica. This functional model focuses on the information
processing aspect of gene regulation through pattern matching on symbolic expressions.
Our lac operon notebook provides a viewer component for animated, two-dimensional
visualization of the simulated gene interaction processes, and is also connected to a 3D
visualization engine.
2
Introduction
Biological research has changed drastically over the last decade. Nowadays, in an effort to
minimize the time spent in the laboratory, data gathering and analysis is done primaily on
computers. This has heralded new interdisciplinary fields between biology and computer
science. Biological computing and bioinformatics seek a better understanding of
biological phenomena through innovative programming techniques and algorithmic
analysis. Classic biological models, that are well understood today, can provide the basis
for investigating larger, more complex models. For example, models of gene regulation in
prokaryotic cells (without a nucleus) lead to a better understanding of gene regulation in
more complex eukaryotic cells (with a nucleus). In this paper, we present a symbolic
model of the lactose operon, one of the simplest and best understood models of gene
regulation in the bacterium Escherichia coli [1, 6].
We will show how gene regulation mechanisms, which mainly rely on key-lock matching
among cell components, can be immediately implemented through pattern matching on
symbolic expressions. We will give examples of the basic data structures we use to encode
an E. coli cell proteome, demonstrate selected rules that model interactions among proteins
and other cell units, and show a simple way to visualize the dynamics of these interactions
over time, both in 2D and 3D.
3
The Lactose Operon
Escherichia coli (E. coli) is a single-celled bacterium that resides in the gut of humans. It is
a prokaryotic organism, i.e., it does not have a nucleus to enclose its circular DNA. Hence,
the DNA in E. coli is free to interact with all other elements within the cell. The lac
operon, in particular, is a group of genes found on the E. coli genome, the constituents of
which represent a classic and intensively studied model for gene regulation [1, 6, 7].
3.1
Gene Regulation in the lac Operon: The Key Players
In a lactose-rich environment E. coli uses the sugar lactose (short: lac) as its primary food
source, which is converted into glucose, the bacterium's major source of energy, and
galactose. The process of converting lactose into its constituents is controlled by the
regulatory mechanisms of the lac operon structure, which encodes for proteins that
facilitate the breakdown and conversion of lactose. The gene itself can regulate the
creation of these molecules dependent on the amount of lactose present in the cell.
More specifically, the lac operon consists of four genes (Fig. 1): the lacI gene, the lacZ
gene, the lacY gene, and the lacA gene. The lacZ, lacY and lacA genes are adjacent to one
another on the operon. They are preceeded by a control complex consisting of an operator
region and a promoter region. This promoter-operator complex allows the binding of a
specific protein, RNA polymerase, that creates the enzyme b-galactosidase, which in turn
breaks down lactose into glucose and galactose (Fig. 1a).
(a)
(b)
Figure 1. Key components involved in the lac operon gene regulation process [4] : (a) docking of the
repressor complex at the operator site turns the lac operon off; (b) repressor inhibition, through a
conformational change, turns the lac operon on.
The lacI gene, which encodes for the repressor protein, is located downstream from the
main gene complex and is preceeded by its own control complex consisting of a single
promoter region. The repressor serves as the basic control mechanism for the lac operon
(Fig. 1b).
3.2
The lac Operon Model: DNA and Cell Cytoplasm
In contrast to a probabilistic or logical network approach [2], our symbolic, grammarbased model of the lac operon uses explicit representations of the components involved in
the regulation process. The interaction and regulation mechanisms among these
components are implemented as rewriting rules in Mathematica [5]. A similar grammarbased approach is suggested in [3], which, to our knowledge, has never been implemented
as a working, computational model. More specifically, we model the DNA strand, the
cytoplasm, the energy compounds such as lactose, glucose, etc., and enzymes by explicit
symbolic expressions. The following list data structure represents the two operon sections
on the DNA and its surrounding cytoplasm (compare Fig. 1):
Cytoplasm = 8
Operon@Promoter@D, LacI@DD,
Operon@Promoter@D, Operator@D, LacZ@D, LacY@DD,
RNAPolymerase@D, Lactose, Glucose, LactosePermease@D,
Bgal@D, RepressorTetramer@S1@D, S2@D, S3@D, S4@DD<
We have made the following simplifications in our model of the lac operon: (1) As the
Watson-Crick complementarity of the double-strand encoding does not have any influence
on the functional aspect of gene regulation, we represent DNA as a single-stranded list of
Operon[...] expressions. (2) We have not included explicit transcription and
translation of proteins as this has no influence on the regulatory aspect of the gene. Hence,
proteins are translated and transcribed immediately after RNA polymerase has docked onto
and read the single DNA strand. (3) As we primarily focus on energy compounds and
proteins that have a direct correlation to the operon, we do not consider galactose, a byproduct of lactose cleaving, and thiogalactoside transacetylase, an enzyme encoded by the
lacA gene, whose function is still unknown. (4) As we primarily focus on the lactose-b
-galactosidase interaction, we also do not consider the CAP catabolite repressor section,
which regulates the production of b-galactosidase based on glucose concentrations.
4
lac Operon Gene Regulation: Step-by-Step
In this section we give a detailed decription of the simulation steps required in our
grammar-based model of the lac operon system and how the gene regulatory interactions
are visualized. These iconic representations are automatically generated in the form of a
frame-by-frame animation. Each frame in the animation represents the application of a
single interaction rule among elements in the system. Elements—i.e., proteins, enzymes,
and energy compounds—are represented as circles, squares or diamonds of different
colours or graylevels (Fig. 2). DNA elements are simple rectangle bands at the bottom of
the display. All elements involved in the functional, grammar-based model are included in
this visualization. In order to enhance the understanding of the dynamic interactions,
arrows indicate which elements are interacting in each animation frame according to a
particular rule.
In this section we give a detailed decription of the simulation steps required in our
grammar-based model of the lac operon system and how the gene regulatory interactions
are visualized. These iconic representations are automatically generated in the form of a
frame-by-frame animation. Each frame in the animation represents the application of a
single interaction rule among elements in the system. Elements—i.e., proteins, enzymes,
and energy compounds—are represented as circles, squares or diamonds of different
colours or graylevels (Fig. 2). DNA elements are simple rectangle bands at the bottom of
the display. All elements involved in the functional, grammar-based model are included in
this visualization. In order to enhance the understanding of the dynamic interactions,
arrows indicate which elements are interacting in each animation frame according to a
particular rule.
mutation state: i+ o + z+
TetramerBindOperator
mutation state: i+ o+ z+
TetramerBindOperator
Bgal
Bgal
LactosePermease
LactosePermease
RNAPolymerase
RNAPolymerase
Promoter
Promoter
HaL
LacI
Promoter Operator
LacI
Promoter Operator
HbL
LacY
mutation state: i+ o+ z+
TetramerBindLactose
RNAPolymerase
LacZ
Lactose
Repressor
RNAPolymerase
Repressor
Bgal
Repressor
Repressor
Repressor
Repressor
LactosePermease
LactosePermease
Lactose
RepressorLactose
LacI
LacY
mutation state: i+ o+ z+
TetramerBindLactose
Bgal
Promoter
LacZ
Lactose
HcL
Promoter Operator
Repressor
LacZ
LacY
mutation state: i+ o+ z+
BgalBindLactose
Promoter
LacI
HdL
Promoter Operator
LacZ
LacY
mutation state: i+ o + z+
BgalBindLactose
Lactose
RNAPolymerase
RNAPolymerase
Bgal-Lactose Complex
Bgal
Lactose
Lactose
Lactose
Promoter
LacI
LactosePermease
Lactose
HeL
Promoter Operator
LacZ
Lactose
LacY
Promoter
LacI
LactosePermease
Lactose
HfL
Promoter Operator
LacZ
LacY
Figure 2. (a, b): binding of a repressor tetramer to the lacZ operator; (c, d): binding of four lactose
molecules to a repressor tetramer, with subsequent conformational change; (e, f): b-galactosidase breaks
down lactose into a b-gal-lactose complex.
4.1
Transcription and Translation of Structure Genes
RNA polymerase enzymes are present in large quantities in the cell. Therefore, we
explicitly list them in the cytoplasm. RNA polymerase has a direct affinity for the
promoter sites located on the operon. Once docking has occured, RNA polymerase will
move along the operon and transcribe/translate the associated structure gene (Section 6).
Hence, the repressor is synthesized from the lacI gene, Lactose Permease is synthesized
from the lacY gene, and b-galactosidase is synthesized from the lacZ gene. This model
skips the more complicated process of mRNA (messenger RNA) creation and its
subsequent conversion into a protein through the action of ribosomes. Here is an example
of the rules we use to capture the docking of RNA polymerase onto a promoter site and its
subsequent reading of the lacI gene:
Cytoplasm ê. 8x___, Operon@Promoter@D, LacI@DD,
y___, RNAPolymerase@D, z___< :>
8x, Operon@Promoter@RNAPolymerase@DD, LacI@DD, y, z<
% ê. Operon@Promoter@RNAPolymerase@DD, LacI@DD :>
Operon@Promoter@D, LacI@RNAPolymerase@DDD
From the lacI gene, which RNA polymerase reads, a repressor is synthesized:
% ê. 8x___, Operon@Promoter@D, LacI@DD< :>
8x, Operon@Promoter@D, LacI@DD,
y, z, Repressor@D, RNAPolymerase@D<
We use similar rules for the other interactions among cell components described in the
following section, where we use graphical representations instead.
4.2
Binding of Repressor Tetramer to the lacZ Operator
Four repressor molecules synthesized through the lacI gene form a repressor tetramer,
which has an affinity for binding to the operator region that preceeds the lacZ gene (Fig.
2a, 2b). Once binding of the repressor tetramer has occurred, transcription of the lacZ gene
can not be accomplished, as the docking site for the RNA polymerase is blocked by the
repressor. Consequently, in the absence of lactose this mechanism turns the lacZ gene off
in order to preserve cellular resources.
4.3
Lactose Entering the Cell
Lactose is free to enter the cell from the outside environment through the use of lactose
permease, a protein encoded by the lacY gene. In the presence of lactose the lacZ gene is
turned on. This is accomplished through the binding action of four lactose molecules to a
single repressor tetramer (Fig. 2c, 2d). This binding causes structural deformation of the
repressor, such that it is no longer able to bind to the operator region (Fig. 1b). As such,
RNA polymerase is again free to bind to the operator and synthesize b-galactosidase.
4.4
b-Galactosidase Breaks Down Lactose
The release of the repressor tetramer from the operator allows the lacZ gene to be
transcribed and translated again, resulting in an increase in the synthesis of b
-galactosidase, which subsequently cleaves lactose into its constituent parts (glucose and
galactose) and reduces the concentration of lactose in the cell (Fig. 2e, 2f). The removal of
lactose from the cell allows the repressor to bind to the operator region again and reduce
the production of b-galactosidase, thus controlling gene regulation.
Figure 3. A snapshot during the synthesis of b-galactosidase: several RNA polymerases are attached to
the circular double helix of the DNA. Codons, i.e., triplets of nucleotides (A, G, C, T), are represented by
colour-coded spheres. Both mRNA strands and chains of amino acids are represented as long cylinders.
5
The lac Operon Notebook: Animations and Data Plots
Our lac operon gene regulation system iteratively applies the rewriting rules descrided in
the previous section. We simulate interactions among the elements by incorporating
rounds of rule application. Each round allows for rewrite rules to execute on the current
state of the cell. The order of the rewrite rules applied in each round is chosen randomly,
so as to model the randomness found in the biological system of the lac operon. Users of
the lac operon notebook can interactively run the simulation by choosing the number of
rounds the simulation runs. The output for each round consists of a list of rewrite rules
and their associated graphics, which display the current state of the cell (Fig. 2). In
addition, data plots are generated for each simulation run, which chart the concentrations
of the following four elements over the simulation time period: Lactose, Glucose,
Repressor, and b-galactosidase.
6
Visualization in 3D
In order to enhance our functional, grammar-based model of gene regulation by a
visualization component, we have designed a 3D visualization engine, based on the
Java 3 DTM library. To realistically model the lac operon, important coding regions are
included into the DNA strand. The lacZ gene, with its associated control complex is
incorporated into the DNA structure. In addition, the lacI gene along with its control
complex is also part of the visualization. All other interactive elements including RNA
polymerase, repressor molecules, and b-galactosidase molecules are rendered as spheres of
different colours (Fig. 3).
There are two primary processes not yet considered in our symbolic, functional lac operon
model, namely transcription and translation. One major objective of our 3D visualization is
to model these processes—on an appropriate abstract level—as they occur in the cell.
These intermediate processes ultimately are of interest to understand gene regulation
processes in general and the lac operon, in particular. Transcription is the process of
converting a DNA template strand into an intermediate single strand of messenger RNA
(mRNA). RNA polymerase will read a given template strand of DNA and transcribe the
codons into mRNA. The use of colour-coded codons allows for ease of interpretation when
viewing the various DNA and RNA structures. Translation is the process of converting an
mRNA strand into the appropriate protein. It is facilitated by two enzymes: ribosomes and
transfer RNA (tRNA). Ribosomes and tRNA affect the creation of proteins based on the
mRNA encoding. The use of colour-coded amino acids directly relate to specific codons
on both mRNA and DNA strands.
7
Conclusion and Future Work
We have presented a symbolic, grammar-based model of the classic lac operon gene
regulation system. The outlined functional model focuses on the information processing
aspect of gene regulation. As we are using symbolic expressions to represent all involved
structures and functions, it is relatively straight-forward to combine it with a genetic
programming engine. This will enable us to actually evolve gene regulation mechanisms
and compare our results with other regulatory techniques evolved by nature. We are
currently working on the integration of the described gene regulation model into our
evolutionary computation environment, Evolvica [5]. Secondly, we will also include
activators, such as CAP catabolites, into the model, which regulate the production of b
-galactosidase based on glucose concentrations. Finally, we are working on an extension of
the current 3D visualization engine, so that all aspects of the functional model are directly
translated into animated, three-dimensional visualizations, which are automatically
generated from the modeled genomic interaction processes.
Further information is available at: http://www.cpsc.ucalgary.ca/~jacob/LacOperon.
Acknowledgement
We would like to thank Julie Andreotti and Ian Burleigh for their help with the
implementation of the 3D visualization.
References
[1] Beckwith, J. R., Zipser, D. (eds.): The Lactose Operon. The Operon. Cold Spring
Harbor Laboratory Press, Cold Spring Harbor, New York (1970).
[2] Bower, J. M., Bolouri, H. (eds.): Computational Modeling of Genetic and Biochemical
Networks. MIT Press, Cambridge, MA (2001).
[3] Collado-Vides, J.: Towards a grammatical paradigm for the study of the regulation of
gene expression. In: Goodwin, B., Saunders, P. (eds.): Theoretical Biology. Epigenetic
and Evolutionary Order from Complex Systems. Johns Hopkins University Press,
Baltimore, ML (1992): 211-224.
[4] Crotty, S., Basu, A., Onufryk, C., Ingram, V.: MIT Biology Hypertextbook.
http://cyberbio.mit.edu/esgbio (1996).
[5] Jacob, C.: Illustrating Evolutionary Computation with Mathematica. Morgan
Kaufmann Publishers, San Francisco (2001).
[6] Müller-Hill, B.: The lac Operon. A Short History of a Genetic Paradigm. de Gruyter,
Berlin (1996).
[7] Ptashne, M., Gann, A.: Genes & Signals. Cold Spring Harbor Laboratory Press, Cold
Spring Harbor, New York (2002).
Download