Gene Regulatory Networks

advertisement
CS5314
Paradigms in Bioinformatics
Midterm #2
Alexandru Cioaca
While prokaryotes are simple, mostly unicellular organisms concerned with basic
interactions with the environment, eukaryotes have a much more complex organization, with
cells that specialize in different functions (differentiation) and self-assemble in higher order
structures such as tissues and organs. By employing these varied functions in a more or less
coordinated and intentional fashion, the cells synergize towards building upon the basic needs
of survival and reproduction. They benefit from the apparent advantage of a larger set of skills
through which they can interact with other cells, individuals, species or the environment.
However, almost all the cells in an eukaryote organism contain a copy of the DNA molecule of
the individual. In other words, each one of these cells contains a copy of the genome so it has
immediate access to all the biological information regarding the totality of aspects of the
particular individual. This information is structured in basic units known as genes, one gene
being nothing more than a specific section of the DNA, encoded as the order in which
nucleotides are laid out in linear sequence. Through biochemical reactions, this information is
used to synthesize proteins (mostly) which accomplish various tasks inside the organism in
order to support the processes of life.
But if each body cell contains the information describing all that the cell is designated to
do but also what different types of cells are designated to do, then how do cells differentiatiate
in the first place and how does the organism know what subset of genes has to be active for
each cell type? At the same time, it is very intuitive that gene expression doesn’t happen all the
time for every gene. While it is true that there some genes (called housekeeping genes)
responsible with the permanent cycle of routines sustaining metabolism, it is also true that
there are genes that get expressed only under particular circumstances, usually when their
products are really needed inside the cell. This suggests that at the cellular level, there must
exist mechanisms regulating the activity of genes. The activity of these mechanisms falls under
the broad term of “gene regulation”. If life can be seen as a complex system of biochemical
processes consisting, at the lowest level, of using genomic information to act in a specific
manner, then gene regulation consists in controlling these processes and the way they are
interconnected.
Since the pathway of gene expression has several steps, it is useful to revise their order
for a better understanding of where gene regulation can intervene. The structure of the DNA
molecule resides in the nucleus of the cell and contains all the genes. When a particular gene is
about to be expressed, a temporary copy of its information is created under the form of a RNA
molecule (more specifically, mRNA). This process is called “transcription” and takes place in the
nucleus. The mRNA molecule is then processed through a couple of other chemical reactions
responsible for improving its robustness. Then, the mRNA moves from the nucleus to the
cytoplasm and based on the information transported from the nucleus, it instructs the
ribosomes to synthesize chains of aminoacids called polypeptides which bind to form the
protein product. This process is called “translation”. The pathway can be topologically extended
both at the beginning and the end. At the end, because the protein product is subject again to
transportation or various other reactions which might stop it from accomplishing the task it’s
meant to. While at the beginning, we can think of the fact that the DNA molecule, although
identical in information in each body cell throughout the organism, it differs from cell to cell in
the set of genes that are active.
These steps in gene expression where control mechanisms can act are:
-
Pre-transcription (before transcription is initiated, e.g. active genes)
-
Transcription (copying DNA into mRNA might be blocked through certain mechanisms or might
require some auxiliary cellular activity)
-
Post-transcription (mRNA might not be robust enough to make it to the ribosomes)
-
Translation (the structure of mRNA might not be suitable for use by ribosomes)
-
Post-translation (the protein product might be obstructed from pursuing its actions)
We can say there are two big categories of control mechanisms: some that facilitate or
induce a certain event and some that deny or inhibit a certain event.
An important observation that has to be done is the fact that we cannot consider to be control
mechanisms those various breakdowns inside the intermediary steps in protein synthesis that
have a temporary or non-deterministic origin, such as random faults or insufficient resources.
Regulation implies steadiness and under similar circumstances, similar results have to be
obtained, with little to nothing variation acceptable. Regulation does not happen by chance but
through means which are as deeply embedded in the organism as the processes under control.
At the same time, this thing tells us that regulation is a product of evolution as well so it
appeared through the same natural principles of hit’n’miss, from mutation to mutation, until it
served an advantage in survival.
Following up is a list of gene expression regulatory mechanisms in chronological order,
from -pre-transcription to post-translation:
-
Chromatin structure
In eukaryotic organisms, the DNA strand can reach a length of 2 meters, but it has a diameter of
molecular scale. In order for it to fit inside the microscopic nucleus of cells, it is packaged
arranged around molecules called histone in a tight thread-spool fashion. These structural units
are called “nucleosomes” and are, at their own turn, tightly bound in structures known as
chromosomes. Studies done on DNA in vitro have showed that it makes a difference whether
key sites of a gene are to be found wrapped around histones or suspended between them.
These sites allow certain proteins to bind to them, proteins which play an important role in
transcription to RNA. Two relevant examples are the transcriptional activator protein (TAP) and
the TATA-box-bnding protein (TBP). In the attached figures we can see the unfavorable case
where these two binding sites are inaccessible. However, there are several multiprotein
complexes called chromatin-remodeling complexes (CRC) that can be employed in order to
decompact DNA off the nucleosomes so that TAP and TBP become accessible. This is shown in
the attached figures. Once this happens, the transcriptional factor can attach close to the gene
represented by these sites and is joined by RNA polymerase which transcribes the particular
gene. The different distributions of DNA onto the nucleosomes are hard to predict from one cell
to another but due to the tight compression of DNA, it is very likely that there will be genes
whose binding sites will not be accessible. In this case, CRCs act as regulators that make the
genes transcribable. Since CRCs are nothing but protein complexes, they are synthesized from
information contained in other genes.
-
Altering of DNA structure
Sometimes, changes occur in the DNA sequence of somatic cells and are transmitted to their
descendents. These changes are programmed and are either deletions or transpositions. They
have direct effects on gene expression, since the sequence of nucleotides is no longer the
same. An example for programmed deletions takes place in the bone-marrow-derived cells and
thymus-derived cells of vertebrate immune systems. They have complementary roles; B cells
produce antibodies that mark antigens for destruction, while T cells recognize this mark and
prevent them from entering the cell. B cells are able to synthesize only type of antibody and
have been discovered that this is the expression of a particular gene. However, each one of the
particular genes responsible for synthesizing antibodies was found to be a subsequence of a
longer initial sequence. This long sequence is cut and joined after mitosis based on reacting at
the encounter with a type of antigen. One of the most commonly-found antibodies in the
organism is immunoglobin G and its structure was found to resemble the letter Y. When
targeting different antigens, most of its structure was chemically similar, except for its upper
ends. Their configuration was proved to be due to the way programmed deletions occur, which
brings nearer to the constant part of the gene the correct type of DNA sequence associated
with the antibody in cause.
Another example of altering DNA structure is that of programmed transpositions in regulating
yeast mating type. This organism has two mating types, a and α. The difference stands only in
phenotype, as studies have revealed the fact that the genotype contains biological information
about both genders under the form of interchangeable cassettes. Through DNA rearrangement,
yeast can switch to either a and α in the lineage of a particular cell and mate from this
perspective.
-
Alternative promoters
There are genes that have more than one associated promoter. From the same protein-coding
regions, depending on which one of the promoters is active, different transcripts can be
obtained. In this case, the control mechanism is the active promoter. Its active status is
determined in the cell cycle. For example, the gene for alcohol dehydrogenase in Drosophila
uses one promoter when it is in larvae state and another one when in adult state. This is a
fascinating and elegant solution, comparable to that of dynamic pointers in high-level
programming languages.
-
Epigenetic control
Epigenetic is a term that means “on genes”. It refers to a type of control over gene expression
that is not caused by altering the sequence of bases in DNA, but to an external factor that
prevents a particular sequence of being read (transcribed) for what it is supposed to be. One
example is the addition of a methyl (CH3) group to the number-5 carbon atom in the cytosine
bases. This process is called methylation and it causes a lower transcription rate of the
methylated sequence. Another type of epigenetic mechanism refers to specialized proteins that
bind at a particular sequence of the DNA molecule with the same effect in the transcription
rate. Heavy methylation is associated with the inactivation of genes in the X chromosome, for
example. As cells undergo division in females, there is a moment in the cell lineage where one
of the X chromosomes becomes inactive and all descendants of that cell will inherit this
particularity. Another example is that of “genomic imprinting” in mammals, where hundreds of
genes are heavily methylated in the germ line, but in a different fashion from male to female.
This is retained throughout embryonic development but it can be reversed later on in
development. Various theories suggest that this is a parental conservation instinct at the
expense of the fetus so there is a balance between the exchange of resources between the
mother and the progeny.
-
Transcriptional initiation
The initiation of transcript is probably the most used regulatory strategy. Transcription takes
place in the nucleus and produces an mRNA molecule that carries information about a gene
encoded in the DNA molecule to the cytoplasm. This is achieved by RNA polymerase which
copies a sequence of nucleotides into mRNA. Thus, RNA polymerase has to know where is this
sequence of interest located and when to start copying it. The latter issue usually involves
proteins known as inducer and repressor. Inducers correspond to positive regulation and
activate transcription of a certain gene. Without the activity or presence of an inducer, the gene
is inactive. When the cell needs the gene to be expressed, a chain of events unfold so that the
inducer attaches in a location close to the gene (upstream) which signals the start of
transcription. If the gene is constantly active, then it is probably regulated negatively, through
proteins called repressors that bind upstream (by themselves or along with a protein complex)
and disable the expression of the gene until further events deem it necessary to recommence.
The most important factor in transcriptional initiation is a protein called transcriptional
activator protein (TAP). This binds upstream of the gene and recruits the transcription complex
which at its turn, triggers the recruitment of RNA polymerase holoenzyme. Transcriptional
activator proteins are mostly gene-specific; their action can be negatively regulated by proteins
that bind to it and block the transcription complex. Some categories of TAPs are helix-turn-helix
motif and zinc fingers. As there are two types of regulation, that is positive and negative, this
implies a large variety of possibilities for interacting with the environment. A protein product
that is required in special circumstances could be associated with a gene that is normally
inactive. The lack of the protein product might negatively regulate another gene that is
responsible for producing the TAP which enables the transcription of the gene associated with
our missing protein product. Another plausible scenario of regulation deals with synthesizing
products that defend against a high concentration of an unwanted molecule in the gene. When
the unwanted molecule is present, a normally inactive gene might be positively regulated by
the intruder and its associated product will start being produced.
Another class of regulatory mechanisms are DNA sequences found at a variety of locations
around genes, called enhancers and silencers. As the names suggest, their molecular structure
is designated to either hasten or strength transcription (enhancers) by bonding with the
transcriptional complex or on the contrary, prevent the transcription (silencers).
-
Transcript Processing
The transcription process from the same gene under the same promoter can still yield different
mRNA molecules. This is due to an important feature of the genome called “alternative
splicing”. Since most of the eukaryotic genes are non-contiguous blocks of coding sequences of
base pairs, the first draft of mRNA contains two types of sequences: exons, which give the final
form of mRNA and introns, which are removed. However, by alternating the selection of exons
and introns in the post-transcription processing of mRNA, the cell can come up with more than
one expression from the same gene. For example, the 30000 human genes can encode 64000
to 90000 proteins, based on this alternation. Thus, gene expression can be regulated to keep
certain sections on the initial mRNA molecule and discard other. This is governed by decisional
factors from within the cell as it processes the mRNA in order to obtain sequences that are
viable for translation. These decisional factors are means to regulate gene expression and act
through the same biochemical algorithms developed by evolution.
-
RNA Transport
Once DNA has been transcribed into mRNA and this has been processed for translation, mRNA
is heading towards ribosomes in the cytoplasm for translation. Regulation factors have been
found that can stop it on its way, RNA interference being one of them. RNA interference works
through small RNA molecules that can cleave mRNA in non-translational sequences or even
block it from being translated by the ribosome. These molecules are of two types: small
interfering RNA (siRNA) and micro RNA (miRNA). They are produced in the cytoplasm from a
special molecule called double-stranded RNA and are first chopped in even smaller sequences
by the dicer enzyme. These cleavage products are recruited by an RNA-induced silencing
complex protein (RISC) and target mRNA with complementary sequences. Their effect on the
mRNA is different: RISC with siRNA cleaves mRNA, while miRNA attaches to it and prevents
translation.
-
Transcript Stability
The mRNA molecule has a lifetime of about 3 hours in most eukaryotes and it is meant for
being translated in the same cell. This is due to the fact that each cell differentiate through the
active set of genes that describe the function of that cell. Under special circumstances, this rule
does not stand and the mechanisms that ensure a certain destination and length of life for the
mRNA molecule are overwritten. An example occurs in newly fertilized eggs whose metabolism
translates preexisting cytoplasmic mRNAs transcribed by the mother. This is definitely not
common practice in mature organisms. For example, the way this becomes possible in
Drosophila is through the elongation of the poly-A tail of the mRNA. Another relevant example
is that of silkworm fibroin mRNA. During cocoon formation, the silk gland synthesizes silk
fibroin in large amounts. There are three factors controlling this unusual behavior: cells become
highly polyploidy accumulating a large number of chromosomes, hence copy of the silk gene,
the promoter of this gene is strong and enhances the rate of transcription and the transcribed
mRNA is very stable, which a lifetime of days. At the same time, there are factors that can
speed up the degradation of mRNA. One of them is the deadenylation-dependent pathway,
through which an enzyme trims the length of the poly-A tail of the mRNA which makes it
susceptible to a decapping enzyme that removes the 5’ cap. Without it, the mRNA is unable to
initiate translation and is rapidly degrade by exonucleases. The other one is called
deadenylation-independent pathway which either decaps or cleaves mRNA. These regulation
mechanisms are useful to prevent the synthesis of incomplete polypeptides in the cell.
-
Initiation of Translational
Translation is the process through which mRNA is used by the ribosome to synthesize the
polypeptides that compose the protein. This process takes place outside of the nucleus and it is
independent from transcription. Eukaryotes can regulate gene expression at this level too. The
two basic types of regulation that can be imposed here is the obstruction or facilitation of
mRNA to be translated and the rate at which proteins are produced. In contrast with the
examples given above in the case of transcript stability, here we are referring to a regular
messenger RNA transcript, but an intensification or relaxation of the translation process. The
most interesting example of regulation at this level is given by recently discovered small
regulatory RNA molecules complementary in sequence with mRNA. These are called “antisense
RNA” and they act by pairing with mRNA over short sequences, the consequence being either
inhibition or activation of the translation. An example of inhibition can be found in E.coli’s
through the OxyS regulatory RNA which affects the gene flhA (TAP). This molecule has the
ability of binding at critical sites, rendering the mRNA unable to bind with the ribosome. On the
other hand, DsrA regulatory RNA activates the translation of the gene rpoS, responsible for
encoding a sigma factor for RNA polymerase that allows transcription of a new set of RNAs
from a special set of promoters at stationary phase in cell cultures when the cell density is high
and the intensity of cell proliferation is low. The 5’ end of rpoS mRNA is self-complementary
and it curls under the shape of a hairpin, trapping the ribosome-binding site and the
translational start site. These sites become exposed under the effect of DsrA on the rpoS mRNA,
so translation can be issued. See attached figures.
-
Post-Translational Modification
After the protein has been synthesized by the ribosome under the form of series of
polypeptides (chains of aminoacids), its functions can be extended through further chemical
modifications consisting in joining other molecules to it, cleaving its structure at different sites
or changing some of its aminoacids groups. These operations are usually performed by
specialized enzymes which can be considered the control mechanism active at this level. One of
the organelles responsible for this type of regulation is the Golgi apparatus.
-
Protein Stability
Some proteins degrade faster than other. The rate at which they decay can be a consequence
of external molecules acting upon it, for example to regulate an excess of the protein in cause
or a fault which generated a protein to be active in the wrong cell. Another factor of decay can
be embedded in the protein, under the form of aminoacid sequences that break down in time
easier. This means that once a protein is synthesized, there are still ways of controlling its
behavior.
-
Protein Transport
Last step in expressing a gene consists in transporting the protein to its designated
“workplace”. Responsible for the displacement of proteins are, obviously, other proteins called
carrier proteins. The most challenging transport occurs through the cellular membrane,
between two neighboring cells. This also hints at the possible reasons why protein transport
should be controlled: there has to exist a mechanism to check outgoing or incoming proteins
and make sure they are eligible for transport. Since these kinds of verifications can only occur
from a biochemical point-of-view, the structure of the carrier protein enables them to verify the
compatibility of the protein to be transported with its destination.
A gene regulatory network (GRN), as the name implies, represents a set of genes
responsible for influencing the expression of another target gene whose product is required by
the cell. The term “network” suggests that the effect they have on the gene to be expressed is
similar to that of a network of on-off switches and potentiometers, so both digital and analog
controls. The scientific approach towards studying and modeling GRNs employs mathematical
concepts such as graph theory and combinatorial logic. At a basic level, a GRN can be thought
of as a black-box that exerts one specific action of control on the gene to be expressed, action
that can be represented as the resultant action of all the genes part of the network. But looking
at what happens at the molecular level, the situation is far more complicated. The individual
regulatory genes come into play at different times and determine different characteristics of
the protein synthesis process. Some of them are interconnected as adjacent units, where the
product of one gene directly communicates (reacts) with the next one in line, while some of
them can be considered on far off branches of the network; they work in parallel and appear to
be independent but their products cumulate after other conditions (part of the same GRN) are
met.
Considering the fact that gene expression has the ultimate single goal of providing a
finite product under the form of a protein, GRN can be seen as a converging network of
regulatory genes (like a funnel towards protein synthesis), orchestrating the intermediate steps
by enabling or disabling, amplifying or attenuating certain biochemical reactions. Most
definitions of GRN place their action in the transcriptional scope, as the GRN determines when
and how much RNA is transcribed for synthesis in the ribosome. However, considering the
various types of gene regulation presented above, an even larger scope of GRN extends to all
the steps undertaken inside the cell towards protein synthesis and usage.
Just as genes vary greatly in DNA sequence and proteins in their structure, GRNs can
come in different forms. Their common features are those given by their general role of
regulation. First of all, GRNs need be able to read the features of interest in the environment
(cell, tissue, etc) through input signals. These input signals could be the concentration of a
particular molecule such as proteins and hormones. Then, GRNs need be able to generate the
appropriate output signals through which they influence the outcome of the target gene
expression process. Since we are working in the same molecular context, it comes as no
surprise that these output signals are molecules as well, mostly proteins. It is interesting to
note, from an engineering perspective, that this type of communication is neither asynchronous
nor synchronous and it does not involve neither closed or open channels. The cell is functioning
so well exactly because there are no constraints imposed on communication. Input signals are
read from the “wild “ and generate the release of output signals into the “wild”; other similar
mechanisms are responsible for transporting this output signal to its place of action where it
will act as a decisional factor. Also, as all regulatory structures, GRNs need a feedback loop
which can be seen as nothing more than input signals (from the GRN perspective) that were
generated by the environment after the GRN started taking action. As we can see, input, output
and feedback signals are not that different in concept, all of them being molecules. It is their
particular structure that makes the difference, but their particular structure is more than a
symbol or a tag, it is also the actual function they are designated for. We are literally talking
about a permanent circle of life, where nothing is enforced or requested, except that everybody
plays a small part, binding with the molecules with which it is compatible and treating them (or
the event of binding) as an input signal; based on this, other molecules are synthesized which
will serve as output signal and influence the expression of other “coworkers”. Only when we
place cellular life in an abstract framework and start discerning genes in regulatory or structural
genes we are able to see causalities.
The relation of gene regulatory networks with the gene regulation mechanisms
presented above is the fact that GRNs can employ any of them as nodes of the network.
Moreover, as we can see from above, most of these mechanisms contain more than one step
so we can say that they are gene regulatory networks as well. Taking the most trivial example of
regulation, we could have a housekeeping gene that is always active and its biochemical
pathway isn’t influenced by the action of any other regulatory gene. Even in this case, we still
have more than one regulatory mechanism involved, as translation has to be started by a
transcriptional activator protein which is the expression of another gene, and mRNA has to be
processed and then translated into a protein. But if we choose to consider this as not being a
network, we can see that there are no other regulatory mechanisms as straightforward; they all
increase the complexity of the pathway so under the assumption we made, there is more than
one gene contributing to the final product. In my opinion as a control engineer, “gene
regulatory networks” is nothing more than the appropriate term for what actually happens
during gene expression. It is true, however, that we can establish degrees of complexity in
these networks, based on the number of participant genes (number of nodes) and the
interactions between them (number of arcs). For example, it would be unfair to place under the
same category housekeeping genes with the complex system of genes controlling early
development in embryos. The threshold between simple networks and complex networks is
debatable, so probably approaches such as complexity theory and chaos theory could also be
suitable for their study and it will also eliminate the bias between classifying some of the genes
as structural and some as functional, which is only a matter of perspective.
The two papers studying chromatin structures and gene regulation I chose are:
“Mechanism of Protein Access to Specific DNA Sequences in Chromatin: A Dynamic Equilibrium
Model for Gene Regulation”,
K. J. Polach and J. Widom
J. Mol. Biol. (1995) 254, 130–149
“High-throughput mapping of the chromatin structure of human promoters”,
Fatih Ozsolak, Jun S Song, X Shirley Liu and David E Fisher
Nature Biotechnology, Vol. 25, No. 2, Feb. 2007
The first paper deals with the problem of certain DNA sequences tightly wrapped around the
nucleosomes not being accessible to transcription regulatory proteins, but they are still
transcribed. The authors present three alternatives as an answer: proteins bind before DNA is
packaged in chromosomes, the DNA sequences of interest are never actually packaged or there
are mechanisms of active invasion in order to transcribe those sequences. They provide
counter-arguments for all three models. For the first one, cells that are prevented from
replicating their DNA are still undergo transcription. The second one is dismissed through
physical considerations, as there is nothing in the structure of DNA that could enforce the same
distribution along the nucleosomes in cell of the organism. And for the third one, the problem is
that it lacks an explanation for how proteins are able to target the right nucleosome.
The authors are trying to extend the third model by considering the nucleosomes as dynamic
structures that expose temporarily stretches of their DNA. They use an approach based on
modeling mathematically the kinetics of nucleosomes and trying to prove the correctness of
their model through the laws of energy conservation. In parallel, they are performing
experiments in vitro with sea urchin cells by replacing the regulatory protein with a restriction
enzyme and engineering nucleosomes with sites for E. In this way, they expect to detect
through gel electrophoresis the effect of the enzyme on the nucleosomes. From their
experiments they observe that all restriction sites are cleaved, hence accessible. At the same
time, they estimate an equilibrium constant that happens to quantify how well inside the
nucleosome is the the restriction site located; the further it is from exposure, the more energy
is needed for dislocating it. The authors conclude their assumption is true and the model is a
good approximation of the underlying mechanics. Nucleosomes are not static and temporarily
expose the binding sites needed for regulatory proteins. These proteins use that short window
of time to attach to the promoter and they recruit other protein complexes which help
displacing the nucleosomes for proper transcription.
The second paper is trying to address the problem of observing the motion of nucleosomes in
experiments. The authors present a high-throughput microarray approach and an analysis
algorithm for examining nucleosomes-positioning in promoters of 3600 human genes.
First, they are performing an in vivo footprinting experiment on the DNA molecule on human
cancer cells and hybridize isolated nucleosomal and input DNA in the microarray. They study
the data using signal-processing techniques such as wavelet decomposition against highfrequency noise and edge-detection for curve profiling. These curves have oscillatory shapes
and based on the peaks of the oscillations, the location of each nucleosome is inferred. Then
they focus on extrapolating from their data locations of transcription factors and discover these
to be mostly between nucleosomes. The paper was interesting to read as it involved the use of
signal processing and statistical analysis.
Download