Synthetic Biology

advertisement
qwertyuiopasdfghjklzxcvbnmq
wertyuiopasdfghjklzxcvbnmqw
ertyuiopasdfghjklzxcvbnmqwer
tyuiopasdfghjklzxcvbnmqwerty
Synthetic Biology
uiopasdfghjklzxcvbnmqwertyui
Project Proposal
opasdfghjklzxcvbnmqwertyuiop
asdfghjklzxcvbnmqwertyuiopas
dfghjklzxcvbnmqwertyuiopasdf
ghjklzxcvbnmqwertyuiopasdfgh
jklzxcvbnmqwertyuiopasdfghjkl
zxcvbnmqwertyuiopasdfghjklzx
cvbnmqwertyuiopasdfghjklzxcv
bnmqwertyuiopasdfghjklzxcvbn
mqwertyuiopasdfghjklzxcvbnm
qwertyuiopasdfghjklzxcvbnmq
wertyuiopasdfghjklzxcvbnmqw
Qaiser Habib(1308149)
Savasteeva Natasha
Raheel Malik(9755551)
Page |1
Table of Contents
Abstract ........................................................................................................................................... 2
Chapter 1: Random Number Generators......................................................................................... 3
1.1. Usage and Application of Random numbers ....................................................................... 3
1.2. Random number generators ................................................................................................. 3
1.3. Pseudo-Random Number Generators (PRNG) .................................................................... 4
1.4. True Random Number Generators (TRNGs) ....................................................................... 5
1.5. Generation Methods ............................................................................................................. 5
1.5.1. Physical Methods .......................................................................................................... 6
1.5.2. Computational Methods ................................................................................................ 7
1.5.3. Generation from probability distribution ...................................................................... 7
Chapter 2: Molecular Biology ........................................................................................................ 8
DNA ............................................................................................................................................ 8
DNA Structure ............................................................................................................................ 8
Transcription ............................................................................................................................... 8
Translation .................................................................................................................................. 9
Regulatory Elements ................................................................................................................... 9
Transformation ............................................................................................................................ 9
Synthetic Biology........................................................................................................................ 9
Chapter 3: Objectives .................................................................................................................... 10
Chapter 4: Proposal ....................................................................................................................... 11
Construction .............................................................................................................................. 11
Functioning ............................................................................................................................... 12
Implementation ......................................................................................................................... 13
Risks.......................................................................................................................................... 13
Conclusion .................................................................................................................................... 15
References ..................................................................................................................................... 16
Page |2
Abstract
Random numbers have a vital importance in various fields of science and arts and their
applications range from Telecommunications to Gambling and Data Sampling. Various methods
exist in literature for the generation of random numbers, they can be Pseudorandom numbers
which are generated based on a mathematical formula which is made as complex as possible,
depending upon the specific usage, or True random numbers which are generated using physical
or mechanical methods and do not follow a pre-determined pattern. Pseudorandom number
generators are in widespread usage in computational applications whereas True random number
generators are used in scenarios where speed holds lesser essence and unpredictability carries
much weightage.
This report presents one method for generating random numbers i.e. using noise in biological
systems. We present the construction of a gene regulatory network which exploits the race
condition between the operations of two repressors, each competing for establishment and each,
if active, results in a ‘0’ or a ‘1’ state. The off and on states are represented by the expression of
encoding gene for Green Fluorescent Protein or Red Fluorescent Protein such that each state is
virtually equally likely to occur. This gives us a 1-bit randomness and can be cascaded to form nbit random numbers.
Page |3
Chapter 1: Random Number Generators
Even though it may look simple at first sight to give a definition of what a random number is,
it proves to be quite difficult in practice. A random number is generated by a process whose
outcome is unpredictable, and which cannot be reproduced with reliability. Which means the
process of random number generation is also a random process with a certain probability;
hence it can be any probabilistic random process with a random outcome. If one were to be
given a number, it is simply impossible to verify whether it was produced by a random
number generator or not. In order to study the randomness of the output of such a generator,
it is hence absolutely essential to consider sequences of numbers, but first we should know
why do we need random numbers? The following section emphasis on the need of random
numbers, there usage and application.
1.1. Usage and Application of Random numbers
Random numbers are useful for a variety of purposes, such as generating data encryption
keys, simulating and modeling complex phenomena and for selecting random samples from
larger data sets. They have also been used aesthetically, for example in literature and music,
and are of course ever popular for games. When discussing single numbers, a random
number is one that is drawn from a set of possible values, each of which is equally probable,
i.e., a uniform distribution. When debating an arrangement of random numbers, each number
drawn must be statistically independent of the others.
The many applications of randomness have led to the development of several different
methods for generating random data. Many of these have existed since ancient times,
including dice, coin flipping, the shuffling of playing cards, and many other techniques.
Because of the mechanical nature of these techniques, generating large amounts of
sufficiently random numbers (important in statistics) required a lot of work and/or time.
Thus, results would sometimes be collected and distributed as random number tables.
1.2. Random number generators
Nowadays, after the advent of computational random number generators, a growing number
of government-run lotteries, and lottery games, are using RNGs instead of more traditional
drawing methods. RNGs are also used today to determine the odds of modern slot machines.
A random number generator (often abbreviated as RNG) is a computational or physical
device designed to generate a sequence of numbers or symbols that lack any pattern, i.e.
appear random. Random number generators have applications in gambling, statistical
Page |4
sampling, computer cryptography, completely randomized design, and other areas where
producing an unpredictable result is desirable. The generation of pseudo-random numbers is
an important and common task in computer programming. While cryptography and certain
numerical algorithms require a very high degree of weaker forms of randomness are also
closely associated with hash algorithms and in creating amortized searching and sorting
algorithms. With the advent of computers, programmers recognized the need for a means of
introducing randomness into a computer program. However, surprising as it may seem, it is
difficult to get a computer to do something by chance. A computer follows its instructions
blindly and is therefore completely predictable. (A computer that doesn't follow its
instructions in this manner is broken.) There are two main approaches to generating random
numbers using a computer: Pseudo-Random Number Generators (PRNGs) and True Random
Number Generators (TRNGs). The approaches have quite different characteristics and each
has its pros and cons. In the following text we will discuss different approaches of random
number generators.
1.3. Pseudo-Random Number Generators (PRNG)
A Pseudo-Random Number Generator (PRNG), also known as a deterministic random bit
generator (DRBG) [1], is an algorithm for generating a sequence of numbers that
approximates the properties of random numbers. The sequence cannot be termed as truly
random because it is entirely determined by a small set of comparative initial values, called
the PRNG's state, meanwhile hardware RNGs can be used to generate sequences that are
close to truly random. Most common classes of PRNGs are Linear Congruential generators,
Lagged Fibonacci generators, Linear Feedback Shift Registers, Feedback with Carry Shift
Registers, and Generalised Feedback Shift Registers. Contemporary illustrations of
pseudorandom algorithms include Blum Blum Shub, Fortuna, and the Mersenne twister.
As pseudorandom numbers are not truly random in essence because of the seed value being
used for initiating the random number generation process; a seed value or seed state is the
initial state for starting the random number generation procedure. It will always produce the
same sequence thereafter when initialized with that state. The maximum length of the
sequence before it begins to repeat is determined by the size of the state, measured in bits.
However, since the length of the maximum period potentially doubles with each bit of 'state'
added, it is easy to build PRNGs with periods long enough for many practical applications.
If a PRNG's internal state contains n bits, its period can be no longer than 2𝑛 results. For
some PRNGs the period length can be calculated without going through the whole
period. Linear Feedback Shift Registers (LFSRs) are usually chosen to have periods of
exactly2𝑛−1. Linear congruential generators have periods that can be calculated by factoring.
Page |5
Although PRNGs will repeat their results after they reach the end of their period, a repeated
result does not imply that the end of the period has been reached, since its internal state may
be larger than its output; this is particularly obvious with PRNGs with a 1-bit output.
1.4. True Random Number Generators (TRNGs)
A hardware or true random number generator is a piece of electronics i.e. an electronic
hardware that is attached to a computer and generates genuine random numbers as opposed
to the pseudo-random numbers that are produced by a computer program.
TRNG devices are every so often based on atomic phenomena that generate a low-level,
probabilistic random "noise" signal, such as thermal noise or the photoelectric effect or any
other significant phenomena. These processes are, in theory, completely random because
they are not based on any initial or seed value.
A quantum-based hardware random number generator characteristically consists of a
transducer to convert the physical phenomena of noise to an electronic pulse or signal, an
amp and other electronic circuitry to bring the output of the transducer into a seeable
jurisdiction, and some type of analog to digital converter to convert the output into a digital
representation, often a simple binary digit 0 or 1.By frequently sampling the randomly
varying signal, a sequence of random numbers is obtained.
1.5. Generation Methods
Some of the earliest methods of random number generation are still in use e.g. rolling a dice,
flipping a coin etc. The methods for random number generation can be categorised in the
following general arrangement.
Page |6
Generation Methods for
Random Numbers
Physical Methods
Computational Methods
Generation from a
probability distribution
Fig1. Realization of RNGs
1.5.1. Physical Methods
The original methods for generating random numbers dice, coin flipping, roulette wheels are
still used today, mainly in games as they tend to be too slow for most applications in statistics
and cryptography.
A physical random number generator can be based on an essentially random atomic or
subatomic physical phenomenon since it is a true RNG method, whose unpredictability can
be traced to the laws of quantum mechanism. Sources of entropy include radioactive decay,
thermal noise, and short noise, avalanche noise in Zener diodes, clock drift, and radio noise.
Recently a team at Bar-Ilan University in Israel has been able to create a physical random bit
generator at a 300 Gbit/s rate, making it the fastest ever. [4]
Various imaginative ways of collecting this entropic information have been devised. Another
common entropy source is the behaviour of human users of the system. While people are not
considered good randomness generators upon request, they generate random behaviour quite
well in the context of playing mixed strategy games.[5] Some security-related computer
software requires the user to make a lengthy series of mouse movements or keyboard inputs
to create sufficient entropy needed to generate random keys or to initialize pseudorandom
number generators.[6]
Page |7
1.5.2. Computational Methods
Computational methods are a class of Pseudo-random number generators (PRNGs) based on
algorithms that can automatically create long sequence of numbers with good random
properties but eventually the sequence repeats. The string of values generated by such
algorithms is generally determined by a fixed number called a seed. One of the most
common PRNG is the linear congruential generator, which uses the reappearance
𝑋𝑛+1 = (𝑎𝑋𝑛 + 𝑏)𝑚𝑜𝑑 𝑚
Equation [1]
to generate numbers. The maximum number of values in a sequence that above equation can
produce is the modulus, m. To avoid certain non-random properties of a single linear
congruential generator, several such random number generators with slightly different values
of the multiplier coefficient can be used in parallel, with a "master" random number
generator that selects from among the several different generators.
1.5.3. Generation from probability distribution
There are some methods of generating a random number based on probability density
function. These methods are based on comprising alteration of uniform random numbers in
some way. Due to this alteration these methods work just as well in generating true and
pseudo-random numbers. One of the methods is called inversion method, which involves
integration of an area greater than or equal to the random number itself [7]. A second
method, called the acceptance-rejection method, involves choosing an x and y value and
testing whether the function of x is greater than the y value. If it is, the x value is accepted.
Otherwise, the x value is rejected and the algorithm tries again [8].
Page |8
Chapter 2: Molecular Biology
Molecular biology is the biological discipline that explains biological systems in terms of
molecular concepts. This field covers areas of biology and chemistry such as genetics and
biochemistry. It entails the understanding of interactions between various cellular processes
e.g. interaction between DNA, RNA and protein biosynthesis as well as the regulation of
these processes.
DNA
Deoxyribonucleic acid or DNA is a nucleic acid that contains the hereditary information used
in the growth and working of all known living organisms. The main responsibility of
DNA molecules is the long-term storage of information. The DNA segments that carry the
genetic information are called genes, but other DNA sequences have structural purposes, or
are involved in regulating the use of this genetic information [9].
DNA Structure
A DNA molecule consists of two anti-parallel helix strands of polymers of simple units
called nucleotides [10]. A DNA molecule is composed of four types of bases; it’s the
sequence of these four bases responsible for encoding the hereditary information of DNA.
This information is interpreted using the genetic code, which specifies the sequence of
the amino acids within proteins. The code is read by copying stretches of DNA into the
related nucleic acid RNA, in a process called transcription.
Transcription
Transcription is the process of creating a complementary RNA copy of a sequence of DNA.
Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as
a complementary language that can be converted back and forth from DNA to RNA by the
action of the correct enzymes. During transcription, a DNA sequence is read by RNA
polymerase, which produces a complementary, antiparallel RNA strand. As opposed to DNA
replication, transcription results in an RNA complement that includes uracil (U) in all
instances where thymine (T) would have occurred in a DNA complement [10]. This
transcript is usually called messenger-RNA or mRNA.
Transcription can be explained easily in 4 or 5 simple steps, each moving like a wave along
the DNA.
Page |9
1. RNA polymerase unwinds/"unzips" the DNA by breaking the hydrogen bonds between
complementary nucleotides.
2. RNA nucleotides are paired with complementary DNA bases.
3. RNA sugar-phosphate backbone forms with assistance from RNA polymerase.
4. Hydrogen bonds of the untwisted RNA+DNA helix break, freeing the newly synthesized
RNA strand.
5. If the cell has a nucleus, the RNA is further processed and then moves through the small
nuclear pores to the cytoplasm.
Translation
The process of translation involves reading the messenger RNA and transforming it into amino-acids
or polypeptides which are the building blocks of proteins. Thus, this process can be considered as
converting a coding sequence into actual protein domain. Various coding sequences amount to
different proteins and we can alter those sequences and hence control what proteins are synthesized.
Regulatory Elements
Regulatory elements are agents that control the expression of genes at different levels like
transcriptional and translational. Transcriptional regulation can be accomplished using Repressors and
Attenuation. Repressors are specific proteins that bind the DNA and prevent the RNA polymerase
activity, which implies that no mRNA is made. Examples of repressors can be, LacI, C1, TetR, etc.
Translational regulation involves controlling the translation of the mRNA into proteins; this is usually
done by mechanisms such as: RNAi and Ribozymes.
Transformation
Transformation is the process of uptake of exogenous genetic material by cells from their
surroundings through the cell membrane. Transformation occurs in nature but may also be affected by
artificial means. Bacteria those are capable of being transformed, whether naturally or artificially are
competent. Artificial Transformation is usually carried out using Plasmids, which are DNA molecules
which are separate from and can replicate independently of the chromosomal DNA.
Synthetic Biology
Synthetic biology refers to the field of system design using molecular biology tools. Since all
living cells comprise double helix structures called DNA which in turn are composed of
genes such that each gene encodes a specific protein. Thus, we can manipulate at some extent
expression of the genes as desired through regulatory elements. Synthetic Biology is a
relatively new field and so far it has led to the synthesis of various chemical compounds,
enzymes that are naturally not produced by this particular cells type or species.
P a g e | 10
Chapter 3: Objectives
The project’s objectives can be outlined as follows:
1. To understand the concepts behind implementing physical and mathematical systems
based on biological designs.
2. To illustrate the applicability of biological systems for real-life scenarios.
3. To design a novel biological system to generate true random numbers based on noise in
gene regulatory networks.
4. To illustrate the boundaries between predictable and unpredictable aspects of biological
systems.
P a g e | 11
Chapter 4: Proposal
In this work, we propose the construction of a genetic random number generator module by
using techniques of genetic engineering. This can be accomplished by the introduction of an
artificial network based on naturally existing λ-bacteriophage toggle-switch[11] in Escherichia
coli cells.
PRM
CIII
PIac
RBS
N
loxP
PL
EGFP
CI
loxP
PRM
RBS
PRE
PR
PR
Cre
Cro
Cr
o
P *
iac
CII
PRE
Antisense gfp RNA
Tetr
Piac#
Antisense Cre RNA
Fig 2: State I. Cro repressor activated and hence GFP is expressed. Plac* represents lac-operon with OR-3 operator
sequence. Plac# represents lac-operon with OR-1 operator sequence
C1
PIac
RBS
loxP
ERFP
lox
P
RBS
Cre
Piac*
Antisense gfp RNA
Piac#
Antisense Cre RNA
Fig 3: State II. C1 repressor activated and hence RFP is expressed.
Construction
To design our system; we use a fragment of DNA derived from the lambda phage genome, it
contains genes encoding the C1 repressor, CII and CIII assistant proteins, N antiterminator
protein and a mutated version of the Cro repressor. It is flanked by specific sequences that allow
P a g e | 12
the fragment to be integrated into bacterial chromosome via in vivo recombination. Additionally,
it has a tetracycline resistance selection marker. To introduce inducibility into the system the
LacI binding site is to be inserted right downstream of -10 box of Pr promoter. Therefore, no
transcription of Cro will take place until Isopropyl β-D-1-thiogalactopyronoside (IPTG) is added
to the medium. The other network components include Enhanced Green Fluorescent Protein
(EGFP) encoding gene followed by recombinase Cre gene fused to an ssrA tag that causes rapid
gene degradation under control of the Plac promoter. The transcriptional terminator is situated
after the Cre gene. Their corresponding antisense[12] RNAs are controlled by hybrid promoters.
The promoter for recombinase antisense RNA has OR1 operator site so expression can be turned
off upon C1 binding. Meanwhile, the promoter upstream of EGFP antisense RNA contains OR3
operator site thus transcription can be prevented upon Cro binding. The last circuit element is a
Enhanced Red Fluorescent Protein (ERFP) encoding gene that is situated between recombinase
recognition sites. It lacks a promoter, but has a transcriptional terminator. However, it can be
expressed out of the promoter downstream of EGFP if being flipped by means of recombinase.
The loxP recombinase Cre recognition sequences are placed in 5’Un-Translated Region (UTR)
of EGFP encoding gene and in between EGFP stop codon and Cre RBS.
Functioning
To generate true random numbers we first have to adjust the expression of both lambda
repressors to the system requirements. In nature, they are not expressed simultaneously. There is
a significant delay in C1 synthesis since its establishment depends on the CII and CIII proteins
presence, while Cro is produced as soon as phage DNA enters host cell. To approximate chances
between Cro and C1, Cro polypeptide bearing mutation in either DNA binding domain or
dimerization domain or both has to be created first. This mutation would weaken the binding
affinity of Cro to the operator sequence at low concentrations. The mutant showing the result
closest to the desired is to be implemented in the circuit.
To diminish the effect of leakiness of the LacI promoter on the expression of recombinase Cre
and avoid spontaneous inversion to occur the ribosome binding site upstream of Cre coding
region should be weak enough so that antisense RNA binding to the Cre mRNA is first order
event rather than initiation of translation by ribosome assembly.
The first step of the model building is introduction of the race unit of the circuit Lambda phage
DNA fragment into bacterial chromosome. The necessity of its insertion is to ensure that only
one copy of this piece of DNA is present in the cell. Otherwise, the outcome of the logic would
be ambiguous, in the presence of multiple copies each of them would have its own end result and
since all the proteins that are involved in network work in trans; it is likely that DNA fragment
responsible for the outcome visualization would be constantly flipped and both proteins would
be synthesized at the same time.
P a g e | 13
Unlike race element of the circuit, the several copies of the other network component containing
fluorescent protein encoding genes as well as both antisense RNA and recombinase sequences
allow fast accumulation of the outcome reporter protein. So, the next step is cell transformation
with low copy plasmids bearing the processing part of the circuit.
Upon induction by IPTG, Cro transcription and translation takes place. Whether one or another
repressor is established cannot be predicted but can be visualized post-factum. The Cro
establishment will block synthesis of EGFP antisense RNA, and thus engender translation of the
messenger RNA which results in accumulation of the protein. At the same time, C1
establishment will incite a cascade of events. At the beginning it blocks recombinase Cre
antisense RNA production, so eventually protein synthesis occurs. As soon as active
recombinase is present it will mediate inversion of the DNA segment that is enclosed between
loxP recognition sites.
Even though inversion do not affect Cre gene directly, it actually terminates gene transcription.
The reason is that in contrast to EGFP, ERFP encoding gene has a transcriptional terminator at
the end of the sequence so transcription of Cre recombinase from Plac promoter ceases. In
absence of transcripts protein cannot be synthesized to act again. Thus the orientation of inverted
DNA fragment is fixed.
Implementation
It should be noticed that the choice of bacterial train is limited by strains that do not possess at
tetracycline resistant gene in their genome since we use it has a selection marker.
The single cell plating technique is used to grow transformed E.coli cells. The cells are grown at
optimal condition in minimal medial supplemented with appropriate antibiotics (To select
transformed cells) and IPTG. Upon colonies formation (cells composing colony are clones of a
single parent cell and therefore the same phenotype) the plates are visualized by means of
fluorescent microscopy. Each plate represents one bit of the random number that is to be
generated and is assumed to be one if red colonies overcome green ones and 0 if the majority of
colonies are green.
Risks
First of all, even with mutated version of Cro the percentage of C1 establishment may not reach
wanted (desired) 50% and thus might substantially lower, for e.g. 35% of C1 and 65% of Cro.
The other problem that can have an impact on the circuit outcome arises from the fact that
recombinase as regulatory proteins acts in trans on specific sequences whenever it meet it and
does not distinguished between the defragment that already underwent inversion and the ones
P a g e | 14
that did not. Therefore, some EGFP/.ERFP encoding sequences might be flipped several times,
while the others (on the other plasmids) might not be affected at all. Because of that, each cell
will produce both fluorescent proteins simultaneously.
P a g e | 15
Conclusion
This work presented a novel technique for true random number generation: using biological
systems and the existence of a race condition between two repressors in a gene regulatory
network. We argue that this method generates numbers with quality as high as any other true
random number generator, each outcome is equally likely and independent of the other
outcome. A brief implementation procedure is also presented and some possible risks and
hurdles that might be faced in the process are highlighted, possible ways to overcome the
risks are presented. Although biological systems are relatively undeveloped, there’s hope for
significant improvement in terms of applicability, speed and reliability and it can be asserted
that these systems will prove much more versatile compared to prevailant systems since
they’re based on components occurring in nature.
P a g e | 16
References
[1].Michael Luby, Pseudorandomness and Cryptographic Applications, Princeton Univ Press, 1996. A definitive
source of techniques for provably random sequences.
[2].Donald Knuth. The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition.
Addison-Wesley, 1997. ISBN 0-201-89684-2. Chapter 3, pp. 1–193. Extensive coverage of statistical tests for nonrandomness.
[3].R. Matthews Maximally Periodic Reciprocals Bulletin of the Institute of Mathematics and its Applications 28
147-148 1992
[4].An optical ultrafast random bit generator Kanter, Ido; Aviad, Yaara; Reidler, Igor; Cohen, Elad; Rosenbluh,
Michael Nature Photonics, Volume 4, Issue 1, pp. 58-61 (2010).
[5].Halprin, Ran; Naor, Moni (PDF). Games for Extracting Randomness. Department of Computer Science and
Applied Mathematics, Weizmann Institute of Science. Retrieved 2009-06-27. Main site : http://math166pc.weizmann.ac.il/
[6].TrueCrypt Foundation. "TrueCrypt Beginner's Tutorial, Part 3". Retrieved 2009-06-27.
[7].vo 52,number 1, unified inversion technique for fermion and boson integral equations, qian xie and nan-xian
chen, july 1995,
[8].Donald Knuth (1997). "Chapter 3 - Random Numbers". The Art of Computer Programming. Vol. 2:
Seminumerical algorithms (3 ed.).
[9]
[10] “DNA Structure and function”. Richard R. Sinden(1994). Academic Press Inc. ISBN: 0-12-645750-6.
[11] Sabine Brantl and E. Gerhart H. Wagner. An Antisense RNA-Mediated Transcriptional Attenuation Mechanism
Functions in Escherichia coli. Journal of Bacteriology, May 2002, p. 2740-2747, vol. 184, No. 10.
[12] Timothy S. Gardner, Charles R. Cantor and James J. Collins. Construction of a genetic toggle switch in
Escherichia coli. Nature 403, 339-342(20 January 2000)
Download