qwertyuiopasdfghjklzxcvbnmq wertyuiopasdfghjklzxcvbnmqw ertyuiopasdfghjklzxcvbnmqwer tyuiopasdfghjklzxcvbnmqwerty Synthetic Biology uiopasdfghjklzxcvbnmqwertyui Project Proposal opasdfghjklzxcvbnmqwertyuiop asdfghjklzxcvbnmqwertyuiopas dfghjklzxcvbnmqwertyuiopasdf ghjklzxcvbnmqwertyuiopasdfgh jklzxcvbnmqwertyuiopasdfghjkl zxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcv bnmqwertyuiopasdfghjklzxcvbn mqwertyuiopasdfghjklzxcvbnm qwertyuiopasdfghjklzxcvbnmq wertyuiopasdfghjklzxcvbnmqw Qaiser Habib(1308149) Savasteeva Natasha Raheel Malik(9755551) Page |1 Table of Contents Abstract ........................................................................................................................................... 2 Chapter 1: Random Number Generators......................................................................................... 3 1.1. Usage and Application of Random numbers ....................................................................... 3 1.2. Random number generators ................................................................................................. 3 1.3. Pseudo-Random Number Generators (PRNG) .................................................................... 4 1.4. True Random Number Generators (TRNGs) ....................................................................... 5 1.5. Generation Methods ............................................................................................................. 5 1.5.1. Physical Methods .......................................................................................................... 6 1.5.2. Computational Methods ................................................................................................ 7 1.5.3. Generation from probability distribution ...................................................................... 7 Chapter 2: Molecular Biology ........................................................................................................ 8 DNA ............................................................................................................................................ 8 DNA Structure ............................................................................................................................ 8 Transcription ............................................................................................................................... 8 Translation .................................................................................................................................. 9 Regulatory Elements ................................................................................................................... 9 Transformation ............................................................................................................................ 9 Synthetic Biology........................................................................................................................ 9 Chapter 3: Objectives .................................................................................................................... 10 Chapter 4: Proposal ....................................................................................................................... 11 Construction .............................................................................................................................. 11 Functioning ............................................................................................................................... 12 Implementation ......................................................................................................................... 13 Risks.......................................................................................................................................... 13 Conclusion .................................................................................................................................... 15 References ..................................................................................................................................... 16 Page |2 Abstract Random numbers have a vital importance in various fields of science and arts and their applications range from Telecommunications to Gambling and Data Sampling. Various methods exist in literature for the generation of random numbers, they can be Pseudorandom numbers which are generated based on a mathematical formula which is made as complex as possible, depending upon the specific usage, or True random numbers which are generated using physical or mechanical methods and do not follow a pre-determined pattern. Pseudorandom number generators are in widespread usage in computational applications whereas True random number generators are used in scenarios where speed holds lesser essence and unpredictability carries much weightage. This report presents one method for generating random numbers i.e. using noise in biological systems. We present the construction of a gene regulatory network which exploits the race condition between the operations of two repressors, each competing for establishment and each, if active, results in a ‘0’ or a ‘1’ state. The off and on states are represented by the expression of encoding gene for Green Fluorescent Protein or Red Fluorescent Protein such that each state is virtually equally likely to occur. This gives us a 1-bit randomness and can be cascaded to form nbit random numbers. Page |3 Chapter 1: Random Number Generators Even though it may look simple at first sight to give a definition of what a random number is, it proves to be quite difficult in practice. A random number is generated by a process whose outcome is unpredictable, and which cannot be reproduced with reliability. Which means the process of random number generation is also a random process with a certain probability; hence it can be any probabilistic random process with a random outcome. If one were to be given a number, it is simply impossible to verify whether it was produced by a random number generator or not. In order to study the randomness of the output of such a generator, it is hence absolutely essential to consider sequences of numbers, but first we should know why do we need random numbers? The following section emphasis on the need of random numbers, there usage and application. 1.1. Usage and Application of Random numbers Random numbers are useful for a variety of purposes, such as generating data encryption keys, simulating and modeling complex phenomena and for selecting random samples from larger data sets. They have also been used aesthetically, for example in literature and music, and are of course ever popular for games. When discussing single numbers, a random number is one that is drawn from a set of possible values, each of which is equally probable, i.e., a uniform distribution. When debating an arrangement of random numbers, each number drawn must be statistically independent of the others. The many applications of randomness have led to the development of several different methods for generating random data. Many of these have existed since ancient times, including dice, coin flipping, the shuffling of playing cards, and many other techniques. Because of the mechanical nature of these techniques, generating large amounts of sufficiently random numbers (important in statistics) required a lot of work and/or time. Thus, results would sometimes be collected and distributed as random number tables. 1.2. Random number generators Nowadays, after the advent of computational random number generators, a growing number of government-run lotteries, and lottery games, are using RNGs instead of more traditional drawing methods. RNGs are also used today to determine the odds of modern slot machines. A random number generator (often abbreviated as RNG) is a computational or physical device designed to generate a sequence of numbers or symbols that lack any pattern, i.e. appear random. Random number generators have applications in gambling, statistical Page |4 sampling, computer cryptography, completely randomized design, and other areas where producing an unpredictable result is desirable. The generation of pseudo-random numbers is an important and common task in computer programming. While cryptography and certain numerical algorithms require a very high degree of weaker forms of randomness are also closely associated with hash algorithms and in creating amortized searching and sorting algorithms. With the advent of computers, programmers recognized the need for a means of introducing randomness into a computer program. However, surprising as it may seem, it is difficult to get a computer to do something by chance. A computer follows its instructions blindly and is therefore completely predictable. (A computer that doesn't follow its instructions in this manner is broken.) There are two main approaches to generating random numbers using a computer: Pseudo-Random Number Generators (PRNGs) and True Random Number Generators (TRNGs). The approaches have quite different characteristics and each has its pros and cons. In the following text we will discuss different approaches of random number generators. 1.3. Pseudo-Random Number Generators (PRNG) A Pseudo-Random Number Generator (PRNG), also known as a deterministic random bit generator (DRBG) [1], is an algorithm for generating a sequence of numbers that approximates the properties of random numbers. The sequence cannot be termed as truly random because it is entirely determined by a small set of comparative initial values, called the PRNG's state, meanwhile hardware RNGs can be used to generate sequences that are close to truly random. Most common classes of PRNGs are Linear Congruential generators, Lagged Fibonacci generators, Linear Feedback Shift Registers, Feedback with Carry Shift Registers, and Generalised Feedback Shift Registers. Contemporary illustrations of pseudorandom algorithms include Blum Blum Shub, Fortuna, and the Mersenne twister. As pseudorandom numbers are not truly random in essence because of the seed value being used for initiating the random number generation process; a seed value or seed state is the initial state for starting the random number generation procedure. It will always produce the same sequence thereafter when initialized with that state. The maximum length of the sequence before it begins to repeat is determined by the size of the state, measured in bits. However, since the length of the maximum period potentially doubles with each bit of 'state' added, it is easy to build PRNGs with periods long enough for many practical applications. If a PRNG's internal state contains n bits, its period can be no longer than 2𝑛 results. For some PRNGs the period length can be calculated without going through the whole period. Linear Feedback Shift Registers (LFSRs) are usually chosen to have periods of exactly2𝑛−1. Linear congruential generators have periods that can be calculated by factoring. Page |5 Although PRNGs will repeat their results after they reach the end of their period, a repeated result does not imply that the end of the period has been reached, since its internal state may be larger than its output; this is particularly obvious with PRNGs with a 1-bit output. 1.4. True Random Number Generators (TRNGs) A hardware or true random number generator is a piece of electronics i.e. an electronic hardware that is attached to a computer and generates genuine random numbers as opposed to the pseudo-random numbers that are produced by a computer program. TRNG devices are every so often based on atomic phenomena that generate a low-level, probabilistic random "noise" signal, such as thermal noise or the photoelectric effect or any other significant phenomena. These processes are, in theory, completely random because they are not based on any initial or seed value. A quantum-based hardware random number generator characteristically consists of a transducer to convert the physical phenomena of noise to an electronic pulse or signal, an amp and other electronic circuitry to bring the output of the transducer into a seeable jurisdiction, and some type of analog to digital converter to convert the output into a digital representation, often a simple binary digit 0 or 1.By frequently sampling the randomly varying signal, a sequence of random numbers is obtained. 1.5. Generation Methods Some of the earliest methods of random number generation are still in use e.g. rolling a dice, flipping a coin etc. The methods for random number generation can be categorised in the following general arrangement. Page |6 Generation Methods for Random Numbers Physical Methods Computational Methods Generation from a probability distribution Fig1. Realization of RNGs 1.5.1. Physical Methods The original methods for generating random numbers dice, coin flipping, roulette wheels are still used today, mainly in games as they tend to be too slow for most applications in statistics and cryptography. A physical random number generator can be based on an essentially random atomic or subatomic physical phenomenon since it is a true RNG method, whose unpredictability can be traced to the laws of quantum mechanism. Sources of entropy include radioactive decay, thermal noise, and short noise, avalanche noise in Zener diodes, clock drift, and radio noise. Recently a team at Bar-Ilan University in Israel has been able to create a physical random bit generator at a 300 Gbit/s rate, making it the fastest ever. [4] Various imaginative ways of collecting this entropic information have been devised. Another common entropy source is the behaviour of human users of the system. While people are not considered good randomness generators upon request, they generate random behaviour quite well in the context of playing mixed strategy games.[5] Some security-related computer software requires the user to make a lengthy series of mouse movements or keyboard inputs to create sufficient entropy needed to generate random keys or to initialize pseudorandom number generators.[6] Page |7 1.5.2. Computational Methods Computational methods are a class of Pseudo-random number generators (PRNGs) based on algorithms that can automatically create long sequence of numbers with good random properties but eventually the sequence repeats. The string of values generated by such algorithms is generally determined by a fixed number called a seed. One of the most common PRNG is the linear congruential generator, which uses the reappearance 𝑋𝑛+1 = (𝑎𝑋𝑛 + 𝑏)𝑚𝑜𝑑 𝑚 Equation [1] to generate numbers. The maximum number of values in a sequence that above equation can produce is the modulus, m. To avoid certain non-random properties of a single linear congruential generator, several such random number generators with slightly different values of the multiplier coefficient can be used in parallel, with a "master" random number generator that selects from among the several different generators. 1.5.3. Generation from probability distribution There are some methods of generating a random number based on probability density function. These methods are based on comprising alteration of uniform random numbers in some way. Due to this alteration these methods work just as well in generating true and pseudo-random numbers. One of the methods is called inversion method, which involves integration of an area greater than or equal to the random number itself [7]. A second method, called the acceptance-rejection method, involves choosing an x and y value and testing whether the function of x is greater than the y value. If it is, the x value is accepted. Otherwise, the x value is rejected and the algorithm tries again [8]. Page |8 Chapter 2: Molecular Biology Molecular biology is the biological discipline that explains biological systems in terms of molecular concepts. This field covers areas of biology and chemistry such as genetics and biochemistry. It entails the understanding of interactions between various cellular processes e.g. interaction between DNA, RNA and protein biosynthesis as well as the regulation of these processes. DNA Deoxyribonucleic acid or DNA is a nucleic acid that contains the hereditary information used in the growth and working of all known living organisms. The main responsibility of DNA molecules is the long-term storage of information. The DNA segments that carry the genetic information are called genes, but other DNA sequences have structural purposes, or are involved in regulating the use of this genetic information [9]. DNA Structure A DNA molecule consists of two anti-parallel helix strands of polymers of simple units called nucleotides [10]. A DNA molecule is composed of four types of bases; it’s the sequence of these four bases responsible for encoding the hereditary information of DNA. This information is interpreted using the genetic code, which specifies the sequence of the amino acids within proteins. The code is read by copying stretches of DNA into the related nucleic acid RNA, in a process called transcription. Transcription Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes. During transcription, a DNA sequence is read by RNA polymerase, which produces a complementary, antiparallel RNA strand. As opposed to DNA replication, transcription results in an RNA complement that includes uracil (U) in all instances where thymine (T) would have occurred in a DNA complement [10]. This transcript is usually called messenger-RNA or mRNA. Transcription can be explained easily in 4 or 5 simple steps, each moving like a wave along the DNA. Page |9 1. RNA polymerase unwinds/"unzips" the DNA by breaking the hydrogen bonds between complementary nucleotides. 2. RNA nucleotides are paired with complementary DNA bases. 3. RNA sugar-phosphate backbone forms with assistance from RNA polymerase. 4. Hydrogen bonds of the untwisted RNA+DNA helix break, freeing the newly synthesized RNA strand. 5. If the cell has a nucleus, the RNA is further processed and then moves through the small nuclear pores to the cytoplasm. Translation The process of translation involves reading the messenger RNA and transforming it into amino-acids or polypeptides which are the building blocks of proteins. Thus, this process can be considered as converting a coding sequence into actual protein domain. Various coding sequences amount to different proteins and we can alter those sequences and hence control what proteins are synthesized. Regulatory Elements Regulatory elements are agents that control the expression of genes at different levels like transcriptional and translational. Transcriptional regulation can be accomplished using Repressors and Attenuation. Repressors are specific proteins that bind the DNA and prevent the RNA polymerase activity, which implies that no mRNA is made. Examples of repressors can be, LacI, C1, TetR, etc. Translational regulation involves controlling the translation of the mRNA into proteins; this is usually done by mechanisms such as: RNAi and Ribozymes. Transformation Transformation is the process of uptake of exogenous genetic material by cells from their surroundings through the cell membrane. Transformation occurs in nature but may also be affected by artificial means. Bacteria those are capable of being transformed, whether naturally or artificially are competent. Artificial Transformation is usually carried out using Plasmids, which are DNA molecules which are separate from and can replicate independently of the chromosomal DNA. Synthetic Biology Synthetic biology refers to the field of system design using molecular biology tools. Since all living cells comprise double helix structures called DNA which in turn are composed of genes such that each gene encodes a specific protein. Thus, we can manipulate at some extent expression of the genes as desired through regulatory elements. Synthetic Biology is a relatively new field and so far it has led to the synthesis of various chemical compounds, enzymes that are naturally not produced by this particular cells type or species. P a g e | 10 Chapter 3: Objectives The project’s objectives can be outlined as follows: 1. To understand the concepts behind implementing physical and mathematical systems based on biological designs. 2. To illustrate the applicability of biological systems for real-life scenarios. 3. To design a novel biological system to generate true random numbers based on noise in gene regulatory networks. 4. To illustrate the boundaries between predictable and unpredictable aspects of biological systems. P a g e | 11 Chapter 4: Proposal In this work, we propose the construction of a genetic random number generator module by using techniques of genetic engineering. This can be accomplished by the introduction of an artificial network based on naturally existing λ-bacteriophage toggle-switch[11] in Escherichia coli cells. PRM CIII PIac RBS N loxP PL EGFP CI loxP PRM RBS PRE PR PR Cre Cro Cr o P * iac CII PRE Antisense gfp RNA Tetr Piac# Antisense Cre RNA Fig 2: State I. Cro repressor activated and hence GFP is expressed. Plac* represents lac-operon with OR-3 operator sequence. Plac# represents lac-operon with OR-1 operator sequence C1 PIac RBS loxP ERFP lox P RBS Cre Piac* Antisense gfp RNA Piac# Antisense Cre RNA Fig 3: State II. C1 repressor activated and hence RFP is expressed. Construction To design our system; we use a fragment of DNA derived from the lambda phage genome, it contains genes encoding the C1 repressor, CII and CIII assistant proteins, N antiterminator protein and a mutated version of the Cro repressor. It is flanked by specific sequences that allow P a g e | 12 the fragment to be integrated into bacterial chromosome via in vivo recombination. Additionally, it has a tetracycline resistance selection marker. To introduce inducibility into the system the LacI binding site is to be inserted right downstream of -10 box of Pr promoter. Therefore, no transcription of Cro will take place until Isopropyl β-D-1-thiogalactopyronoside (IPTG) is added to the medium. The other network components include Enhanced Green Fluorescent Protein (EGFP) encoding gene followed by recombinase Cre gene fused to an ssrA tag that causes rapid gene degradation under control of the Plac promoter. The transcriptional terminator is situated after the Cre gene. Their corresponding antisense[12] RNAs are controlled by hybrid promoters. The promoter for recombinase antisense RNA has OR1 operator site so expression can be turned off upon C1 binding. Meanwhile, the promoter upstream of EGFP antisense RNA contains OR3 operator site thus transcription can be prevented upon Cro binding. The last circuit element is a Enhanced Red Fluorescent Protein (ERFP) encoding gene that is situated between recombinase recognition sites. It lacks a promoter, but has a transcriptional terminator. However, it can be expressed out of the promoter downstream of EGFP if being flipped by means of recombinase. The loxP recombinase Cre recognition sequences are placed in 5’Un-Translated Region (UTR) of EGFP encoding gene and in between EGFP stop codon and Cre RBS. Functioning To generate true random numbers we first have to adjust the expression of both lambda repressors to the system requirements. In nature, they are not expressed simultaneously. There is a significant delay in C1 synthesis since its establishment depends on the CII and CIII proteins presence, while Cro is produced as soon as phage DNA enters host cell. To approximate chances between Cro and C1, Cro polypeptide bearing mutation in either DNA binding domain or dimerization domain or both has to be created first. This mutation would weaken the binding affinity of Cro to the operator sequence at low concentrations. The mutant showing the result closest to the desired is to be implemented in the circuit. To diminish the effect of leakiness of the LacI promoter on the expression of recombinase Cre and avoid spontaneous inversion to occur the ribosome binding site upstream of Cre coding region should be weak enough so that antisense RNA binding to the Cre mRNA is first order event rather than initiation of translation by ribosome assembly. The first step of the model building is introduction of the race unit of the circuit Lambda phage DNA fragment into bacterial chromosome. The necessity of its insertion is to ensure that only one copy of this piece of DNA is present in the cell. Otherwise, the outcome of the logic would be ambiguous, in the presence of multiple copies each of them would have its own end result and since all the proteins that are involved in network work in trans; it is likely that DNA fragment responsible for the outcome visualization would be constantly flipped and both proteins would be synthesized at the same time. P a g e | 13 Unlike race element of the circuit, the several copies of the other network component containing fluorescent protein encoding genes as well as both antisense RNA and recombinase sequences allow fast accumulation of the outcome reporter protein. So, the next step is cell transformation with low copy plasmids bearing the processing part of the circuit. Upon induction by IPTG, Cro transcription and translation takes place. Whether one or another repressor is established cannot be predicted but can be visualized post-factum. The Cro establishment will block synthesis of EGFP antisense RNA, and thus engender translation of the messenger RNA which results in accumulation of the protein. At the same time, C1 establishment will incite a cascade of events. At the beginning it blocks recombinase Cre antisense RNA production, so eventually protein synthesis occurs. As soon as active recombinase is present it will mediate inversion of the DNA segment that is enclosed between loxP recognition sites. Even though inversion do not affect Cre gene directly, it actually terminates gene transcription. The reason is that in contrast to EGFP, ERFP encoding gene has a transcriptional terminator at the end of the sequence so transcription of Cre recombinase from Plac promoter ceases. In absence of transcripts protein cannot be synthesized to act again. Thus the orientation of inverted DNA fragment is fixed. Implementation It should be noticed that the choice of bacterial train is limited by strains that do not possess at tetracycline resistant gene in their genome since we use it has a selection marker. The single cell plating technique is used to grow transformed E.coli cells. The cells are grown at optimal condition in minimal medial supplemented with appropriate antibiotics (To select transformed cells) and IPTG. Upon colonies formation (cells composing colony are clones of a single parent cell and therefore the same phenotype) the plates are visualized by means of fluorescent microscopy. Each plate represents one bit of the random number that is to be generated and is assumed to be one if red colonies overcome green ones and 0 if the majority of colonies are green. Risks First of all, even with mutated version of Cro the percentage of C1 establishment may not reach wanted (desired) 50% and thus might substantially lower, for e.g. 35% of C1 and 65% of Cro. The other problem that can have an impact on the circuit outcome arises from the fact that recombinase as regulatory proteins acts in trans on specific sequences whenever it meet it and does not distinguished between the defragment that already underwent inversion and the ones P a g e | 14 that did not. Therefore, some EGFP/.ERFP encoding sequences might be flipped several times, while the others (on the other plasmids) might not be affected at all. Because of that, each cell will produce both fluorescent proteins simultaneously. P a g e | 15 Conclusion This work presented a novel technique for true random number generation: using biological systems and the existence of a race condition between two repressors in a gene regulatory network. We argue that this method generates numbers with quality as high as any other true random number generator, each outcome is equally likely and independent of the other outcome. A brief implementation procedure is also presented and some possible risks and hurdles that might be faced in the process are highlighted, possible ways to overcome the risks are presented. Although biological systems are relatively undeveloped, there’s hope for significant improvement in terms of applicability, speed and reliability and it can be asserted that these systems will prove much more versatile compared to prevailant systems since they’re based on components occurring in nature. P a g e | 16 References [1].Michael Luby, Pseudorandomness and Cryptographic Applications, Princeton Univ Press, 1996. A definitive source of techniques for provably random sequences. [2].Donald Knuth. The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89684-2. Chapter 3, pp. 1–193. Extensive coverage of statistical tests for nonrandomness. [3].R. Matthews Maximally Periodic Reciprocals Bulletin of the Institute of Mathematics and its Applications 28 147-148 1992 [4].An optical ultrafast random bit generator Kanter, Ido; Aviad, Yaara; Reidler, Igor; Cohen, Elad; Rosenbluh, Michael Nature Photonics, Volume 4, Issue 1, pp. 58-61 (2010). [5].Halprin, Ran; Naor, Moni (PDF). Games for Extracting Randomness. Department of Computer Science and Applied Mathematics, Weizmann Institute of Science. Retrieved 2009-06-27. Main site : http://math166pc.weizmann.ac.il/ [6].TrueCrypt Foundation. "TrueCrypt Beginner's Tutorial, Part 3". Retrieved 2009-06-27. [7].vo 52,number 1, unified inversion technique for fermion and boson integral equations, qian xie and nan-xian chen, july 1995, [8].Donald Knuth (1997). "Chapter 3 - Random Numbers". The Art of Computer Programming. Vol. 2: Seminumerical algorithms (3 ed.). [9] [10] “DNA Structure and function”. Richard R. Sinden(1994). Academic Press Inc. ISBN: 0-12-645750-6. [11] Sabine Brantl and E. Gerhart H. Wagner. An Antisense RNA-Mediated Transcriptional Attenuation Mechanism Functions in Escherichia coli. Journal of Bacteriology, May 2002, p. 2740-2747, vol. 184, No. 10. [12] Timothy S. Gardner, Charles R. Cantor and James J. Collins. Construction of a genetic toggle switch in Escherichia coli. Nature 403, 339-342(20 January 2000)