Homework Assignment for Phys 498 Biological Information and Complexity - Submitted by Anoush Aghajani-Talesh Computing with DNA In the following I will discuss the paper „Breaking DES Using a Molecular Computer“ by Dan Boneh, Christopher Dunworth and Richard Lipton [2], which describes an algorithm using DNA and standardized biotechnological procedures for breaking the Data Encryption Standard (DES). This paper is rather a description of a thought experiment than the design of a DNA-computer. To my knowledge no successful experiments based on the described DNA-computer was carried out so far. The DES-Algorithm encrypts a 64-bit block of data (plain text) into a 64-bit block (encoded message) using a key consisting of 56-bit. To understand the ideas presented in the paper of Boneh et al. a detailed description of DES is not required. All we have to know is that DES repeatedly uses rather simple binary operations to mix the bits of the plain text just like cards in a game. Given the key for an encrypted text it is possible to reverse the encryption and to obtain the plain text from the encrypted message. Therefore the key must be kept secret and it is the objective of a code-breaker to find the key. Given pair of a 64-bit plain text block and its encryption it is possible to find the key by a brute force method. One can try to calculate for any possible key (there are 256) the encrypted message out of the given plain text in order to compare it with the known encrypted message. By the time the discussed paper was published a brute force attack on DES required very expensive computers and was presumably only carried out by secret services. Notice that the 56-Bit DES was invented in the 1970s and is no longer state of the art. DES was replaced by the 112-bit 3DES, which is going to be followed by the Advanced Encryption Standard (AES). The molecular computer described by Boneh et al. breaks DES by brute force, but with the amazing difference that it computes and compares all possible keys and encryptions parallel at the same time. The attack on DES can be summarized as followed: Step 1: Let the keys be represented in a suitable way by DNA strands. Generate randomly all possible keys. The authors estimated that all necessary 256 strands would fit into a 1-liter test tube. Step 2: By applying biomolecular procedures run the DES Encryption-Algorithm on the DNA-strands in such way, that for each key the corresponding encrypted text is appended to the DNA-strand. Step 3: The test tube contains all pairs of keys and encrypted messages. Extract the DNA strands that contain the sequence belonging to the known encrypted message. The desired key can be obtained by sequencing this strand. In order to run this algorithm it is necessary to find a representation of binary numbers as DNA strands which allows performing biomolecular operations on them. The idea is to let the bit #i of a n-bit binary number be represented by two unique and DNA-strands Bi(0) and Bi(1), one for each possible value. That means that the representation of any n-bit number requires an alphabet consisting of 2n (one for each value 1 or 0) unique strands. In addition for technical reasons it is necessary that each of the sequences is separated by a specific separator sequence Si. Bone et al proposed to use strands with a length of 30 base pairs. The length of a strand should be as short as possible, to avoid breaking to make the biological operations faster and less expensive. On the other hand uniqueness of the oligomers is desired. They should be different in not too short subsequences. 1 It remains to clarify how binary operations can be executed on DNA-Strands. Boneh et al. proposed to use following methods: Extraction: With a method called streptavidin bead separation technique it is possible to extract from the test tube all DNA strands, that contains a short specific nucleotide sequence. This is done by using many copies of the complementary sequence, which are bound to the surface of tiny magnetic beads. In the test tube DNA containing this specific sequence will be annealed to its complementary sequence. It then can be extracted with a magnetic field. Amplification via PCR: The polymer chain reaction allows making duplicates of DNA in the test tube. It requires a beginning and a subsequence (usually 20 bp long), which are called primers to identify the sequence to be replicated. Copies of the primers anneal to the DNA strand. The enzyme polymerase then rebuilds the complementary part of the sequence between the two primers. In a process called melting the original DNA and the complementary copy of the sequence between the primers can be separated. Both strands can again be copied and used for further replication. Tagging: This will append a new short sequence to the end of every DNA strand in a tube. It is done by annealing a short strand to a longer strand so that the short strand is extending off the end of the longer strand. To the extending part of the string the complement can be annealed using polymerase. By melting the short string and the now extended string can be separated. Given these operations one can describe breaking DES in terms of manipulations on DNA: Step 1: (Generating keys) Using PCR it is possible to obtain numerous copies of the oligomers Bn(i) and Sn and its complementary sequences. Starting with S0 and appending randomly B1(0) or B1(1) followed by S1 etc. one can successively generate all possible keys, provided that one a uses sufficient number of oligomers. Notice that achieving all the oligomers to be appended in the right order requires several substeps that I left out for brevity. Step 2: (Encryption of the plain text) For the execution of the DES-Algorithm it is necessary to be able to perform an exclusive-OR operation (XOR) and a table lookup where a 6-bit value is assigned to a 4-bit value. The authors claimed that it is possible to run DES encryption with only these two operations. DES consists of 16 levels or rounds. Each round maps a 64-bit value to 64 bit-value by evaluating 48 XOR, 8 table lookups an another 32 XOR. Evaluating the XOR of bit #i and #j means to append the value ij on each strand. At first one extract all strand which contain the sequences Bi(1). Out of these one extracts those strands with Bj(0). Analogous one extracts all strands with Bi(0) and Bi(1). Then one appends one these strands by using the tagging technique a Sx and Bx(1) oligomer. On all other strands one appends Sx and Bx(0). Here x denotes the position of the appended bit. The 48 bits appended are then grouped into 6 bits each and used for 8 table lookups. A table lookup is a function that maps a 6-bit value on a 4-bit value, which means that it appends for each 6-bit sequence a corresponding 4-bit sequence. For simplification of the of the table lookup, the authors claimed that it is possible to append the bits in such a way, that a for the table lookup only 6 consecutive bits have to be evaluated. Thus by using 64 different magnetic beads it is possible to separate the strand by sequences of 6 consecutive bits. The corresponding 4-bit value is then appended by using the tagging technique. This step has to be repeated 8 times. After the table lookup another 32 XOR operations are executed. At the end of each round the strands contains redundant bits that have to be removed using restriction enzymes. Having completed all 16 DES rounds the DNA strands in the test tube hopefully consist of the 56-bit key and the encrypted message. 2 Step 3: (Isolation of the key) By Extraction the DNA-strand carrying the known encryption and its corresponding key can be isolated from the test tube. By amplifying and sequencing the key can be read. Boneh et al. roughly estimated that four month of lab time to are required to break DES by this method. However the hereby created solution which matches the keys to the encrypted texts can be reused to find other keys in estimated one day, which is a remarkable much short time. Another amazing fact is that apparently DNA allows it to store the complete table consisting of 1017 entries ( 106 terabyte of data) in a volume of 1 Liter. In addition all steps of the algorithm are performed massively parallel (256 operations at the same time), which might in certain applications overweigh the compared to conventional computers long time (several hours) needed to perform a single step. The discussed paper was supposed to demonstrate the possibilities of DNA computation, but it also demonstrated its limitations. The admittedly astonishing construction of a code-breaking algorithm was in many ways only possible due to some special properties of DES. Firstly DES can be attacked by a brute force method; other encryption algorithm cannot be broken in such a simple way. Secondly DES has a comparatively short key length of 56-Bit. 256 DNA strands in a test tube appear to be an upper limit (breaking 112-bit 3DES would require a 1017 liter test tube). Finally it was of great importance that DES uses few and rather simple operation. Even a slightly more demanding algorithm might cause a far more complex implementation. One has to notice that everything above was said under the assumption that no errors occur during the whole procedure. In fact the technical difficulties are immense and far away from being controlled. A detailed discussion about the technical problems and about the control of errors in molecular computing and can be found in [3] and [1]. References: [1] Adleman L.M., Rothemund, P.W.K., Roweis S., Winfree E. On Applying Molecular Methods to the Data Encryption Standard, Proceedings of the Second Annual Meeting on DNA Based Computers, held at Princeton University, June 10-12, 1996. [2] Boneh, D., Dunworth, C., and Lipton, R. J. (1995a). Breaking DES using a molecular computer. Technical Report CS-TR-489-95, Princeton University, Princeton, N.J. [3] Maley, C.C. DNA-Computation: Theory, Practice and Prospects, Evolutionary Computation 6(3) 201-229 (1998) 3