1 Molecular Computations Using Self-Assembled DNA Nanostructures and Autonomous Motors John H. Reif Department of Computer Science, Duke University DNA Nanostructures: DNA tiles: composed of a few strands of DNA that self-assemble (via DNA annealing) into a roughly rectangular shape. TX tiles: 3 double stranded DNA with Holiday junctions Approx 20 Angstroms wide DNA TX tile Nanostructures: • 3 double stranded DNA with Holiday junctions • Approx 20 Angstroms wide 2 4 GTTCAGCCTTAGT CCACAGTCACGGATGG ACTCGATAGCCAA CAAGTCGGAATCA GGTGTCAGTGCCTACC TGAGCTATCGGTT ACTCC TGGCATCTCATTCGCA GGACA TGAGG ACCGTAGAGTAAGCGT CCTGT T TCTGG T T AGACC T 1 4 CATCTCGT CCTTGCGTTTCGCCAATCCAGAAGCC GTAGAGCA GGAACGCAAAGCGGTTAGGTCTTCGG TAO35 3 1 T T T CCATC T GGTAG 3 TGCGAGCA ACGCTCGT 2 TAE 44 Strand topology traces of TX tiles. •‘T’ denotes three DNA helices •‘A’ denotes anti-parallel crossovers (strand changes direction of propagation at crossover points) •‘O’ and ‘E’ are odd and even, respectively: refer to the number of helical half-turns between adjacent crossover points. DNA Hybridization and Ligation Operations. Hybridization of sticky single-strand DNA segments. Ligation: If the sticky single-strand segments that anneal abut doubly stranded segments of DNA, you can use an enzymic reaction known as ligation to concatenate these segments. Unique Sticky Ends on DNA tiles. Input layers can be assembled via unique sticky-ends at each tile joint thereby requiring one tile type for each position in the input layer. Tiling self-assembly: proceeds by the selective hybridization of the pads of distinct tiles, which allows tiles to compose together to form a controlled tiling lattice (these pads determine the form of the tiling that self-assembles). Technical Challenge: Optimal Design of strands composing tiles. 2D DNA Self-Assembled Tilings: Rendering Simple Banded Images B* Tiles with Loops Atomic Force Microscope Image Bands Generated by B* Tiles with Attached Beads Major Goals of Duke DNA Nanotechnology (Reif’s Group) 1) Patterned DNA Arrays. • Set of specific tiles which form patterns. • Assembly around scaffold strands. • Molecular fabric. 2) Computation via DNA Self-Assembly • Reporter strand output (requires ligation). • Microscopic readout (via AFM, TEM, SEM, etc.). 3) Applications of DNA-Based Assemblies. • Molecular and nano-scale electronics. • Molecular motors and actuators. 4) Software for Design and Simulation of DNA Assemblies. Background Literature on DNA Self-Assembled Tiling Lattices. First Paper: [Winfree and Seeman,98] The first experimental demonstration of self-assembly of DNA to construct 2D lattices consisting of up to a hundred thousand DNA tiles. Recent Review papers: • [Reif, LaBean, Seeman, 2000] Challenges and Applications for Self-Assembled DNA Nanostructures. • [J. H. Reif, IEEE Computer and Scientific Engineering Magazine, 2002] DNA Lattices: A Method for Molecular Scale Patterning and Computation. • [LaBean, Yan, Park, Feng, Yin, Li, Ahn, Liu, Guan, and Reif, Information Sciences, 2004] Overview of New Structures for DNA-Based Nanofabrication and Computation. Reif Group’s Papers on DNA Self-Assembled Tiling Lattices. • [LaBean, Winfree, Reif, & Seeman, J. Am. Chem. Soc. 2000] Improved DNA nanostructures: TX tiles that have multiple DNA strands running them. • [Mao, LaBean, Reif, Seeman, Nature 2000] Experimentally demonstrated for the first time a computation using self-assembled DNA lattices of TX tiles that self-assembled around input strands running through the tiles. • [Yan, Feng, LaBean, and Reif, JACS, 2003] DNA Nanotubes, Parallel Molecular Computation of Pair-Wise XOR Using DNA String Tile. • [Yan, LaBean, Feng, and Reif, PNAS, 2003] Directed Nucleation Assembly of Barcode Patterned DNA Lattices. • [Yan, Park, Finkelstein, Reif, & LaBean, Science, 2003] DNA-Templated Self-Assembly of Protein Arrays and Highly Conductive Nanowires. • [Feng, Park, Reif, and Yan, Angewandte Chemie, 2003] A Two State DNA Lattice Actuated by DNA Motors. • [Li, Park, Reif, LaBean, Yan, JACS 2004] DNA Templated Self-Assembly of Protein and Nanoparticle Linear Arrays. • [Liu, Reif, LaBean, PNAS 2004] Improved DNA nanostructures: DNA nanotubes selfassembled from triple-crossover tiles as templates for conductive nanowires. Molecular Pattern Formation using Scaffold Strands for Directed Nucleation: • Multiple tiles of an input layer can be assembled around a single, long DNA strand we refer to as a scaffold strand (shown as black lines in the figures). A B C o Examples of Arrangements of Scaffold Strands : – (A) Diagonal TAO layer which partially defines binding slots for tiles of the next successive layer. – (B) Horizonal layer of alternating TAE and DAE tiles. – (C) crenellated horizontal layer which could be comprised of TAE or DAE tiles. Structures in B and C completely define binding slot for tiles on next layers. Directed Nucleation Assembly of Barcode Patterned DNA Lattices (PNAS 2003) Hao Yan, Thomas H. LaBean, Liping Feng, John H. Reif Aperiodic patterned DNA lattice (Barcode Lattice) by directed nucleation self-assembly of DNA tiles around a scaffold DNA strand. A first step toward implementation of a visual readout system capable of converting information encoded on a 1-D DNA strand into a 2-D form readable by advanced microscopic techniques. 12 TileSoft: Sequence Optimization Software for Designing DNA Secondary Structures P. Yin*, B. Guo*, C. Belmore*, W. Palmeri*, E. Winfree†, T. H. LaBean* and J. H. Reif* * Department of Computer Science, Duke University † Department of Computer Science, Caltech 13 Abstract DNA is a crucial construction material for molecular scale objects with nano-scale features. Diverse synthetic DNA objects hold great potential for applications such as nano-fabrication, nano-robotics, nano-computing, and nano-electronics. The construction of DNA objects is generally carried out via self-assembly. During self-assembly, DNA strands are guided by their sequence information into secondary structures to maximize Watson-Crick pairing of their bases and thus minimize the free energy of the resultant structures. A crucial computational problem in constructing DNA objects is the design of DNA sequences that can correctly assemble into desired DNA secondary structures. However, existing software packages only provide unintuitive text-line interfaces and generally require the user to step through the entire sequence selection process, which could be time-consuming and tedious. In contrast, TileSoft described here deliver the following features: • Its graphical user interface renders the molecular architect the ability to define DNA secondary structure and accompanying designing constraints directly on the interface as well as the ability to view the optimized sequence information pictorially. • Its fully automatic optimization module relieves the user of the drudgery of manually dictating the sequence selection process, and its evolutionary algorithm produces satisfactory results efficiently. • Its graphical user interface and its optimization module are smoothly integrated from user's perspective, while they are at the same time well separated in terms of software architecture, making each amenable to future improvements without negatively affecting the other. 14 Motivation: Designing DNA tiles DX lattice • • • • Rhombus lattice TX lattice 4x 4 lattice Barcode lattice DNA as nano-construction material Self-assembly as bottom-up nano-construction method DNA lattices made of DNA tiles, i.e. smaller DNA secondary nanostructure units Design process: 1. 2. Template design Sequence selection (optimization) Tile template to be optimized 15 Prior work v.s. TileSoft • DNA word design software: • • Sequin: • • • • Produce a pool of DNA sequences such that each sequence is of maximal difference from others Generate DNA sequences that uniquely assemble into desired secondary structures Text line interface for inputting template and displaying optimized sequences Sequence optimization process only semi-automated TileSoft: Available at http://www.cs.duke.edu/~py/dnaTileSoft/. • • • • Generate DNA sequences that uniquely assemble into desired secondary structures Graphical interface for inputting template and displaying optimized structure (GUI Module) Sequence optimization process fully automated (Optimization Module) GUI Module and Optimization Module smoothly integrated for end users, yet well separated in software architecture 16 GUI: Default window 17 GUI: Define Crossover • The user can define crossovers between helices, by clicking sequentially the two bases to be connected in 5' to 3' order. 18 GUI: Set 5’ end; set 3’ end Set 5’ end • Set 3’ end By setting the 5' end and 3' end of a DNA strand, the user specifies the length of the strand, and the unused segment of the strand is deleted automatically (showed in color gray). 19 GUI: Edit base • The user can directly input the base values for a strand by typing; Typing more than one character edits consecutive bases in the 5' to 3' direction along the strand 20 GUI: Set non-Waston-Crick base pairing • The user can define the subsequences that are not required to be Watson-Crick base paired by clicking on the starting and ending bases of the subsequences. 21 GUI: Set and show EQ constraint Set EQ constraint Show EQ constraint • Set EQ constraint: Clicking on two bases in one strand defines the starting and ending points of the first sub-sequence, and a click on another base delineates the second sub-sequence with the same length and direction as the first one. • Show EQ Constraint: A small window is brought up that contains multiple buttons, with each representing a set of equal sub-sequences. When one of these buttons is clicked, the corresponding sub-sequences will be highlighted in purple. 22 Optimization module • The optimization module employs an evolutionary algorithm to find the best solution to the optimization objective function. • Evolutionary algorithm: • An evolutionary algorithm maintains a population of DNA sequences, which are generated randomly during initialization. During selection, the fittest DNA sequences are chosen for reproduction, based on their score according to the objective function. These individuals are used to generate new individuals via mutations and crossovers, and the newly produced individuals are reinserted into the population. The process is repeated until meeting some termination condition. • Objective function: • The objective function consists of two weighted factors, the count of unwanted complementary sequences spurious matches (as the sequence-symmetry minimization algorithm used in Sequin) and the count of to-be-avoided subsequences, e.g. long AT runs. 23 Future work • GUI: • Geometrically more flexible structures • Make the number of helices and number of bases per helix user specifiable • Optimization: • Multiple parallel traces of optimization process with different starting points • A pre-optimized library • Incorporate parameters such as hybridization temperature and software modules such as BIND • A new heuristic that performs optimization based on existing pre-optimized duplex libraries • Other: • Curvature analyzer (S. H. Park, Duke Physics ) • Make the software more robust 4 x 4 Tile A Lattice of 4 x 4 Tiles QuickTime™ and a TIFF (U ncompressed) decompressor are needed to see this picture. Uncorrugated 4 x 4 tile QuickTi me™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 27 Software for Designing and Simulating DNA Nanorobotical Devices John Reif, Sudheer Sahu, Peng Yin Department of Computer Science, Duke University 28 DNA-Based Nano-Engineering: DNA and its Enzymes as the Engines of Creation at the Molecular Scale John H. Reif Department of Computer Science, Duke University 29 Motivation-Device I-Device II-Device III-Conclusion Motivation DNA based nanorobotics devices Rotation Open/close Open/close Open/close (Mao et al 99) (Yurke et al 00) (Simmel et al 01) (Simmel et al 02) Rotation Extension/contraction Extension/contraction Extension/contraction (Yan et al 02) (Li et al 02) (Alberti et al 03) (Feng et al 03) Motivation-Device I-Device II-Device III-Conclusion Motivation DNA nanorobotics Rotation, open/close extension/contraction mediated by environmental changes Autonomous, unidirectional motion along an extended linear track Kinesin (R. Cross Lab) Synthetic unidirectional DNA walker that moves autonomously along a linear route over a macroscopic structure ? (Recent work: non-autonomous DNA walking device by Seeman’s group, autonomous DNA tweezer by Mao’s group) 30 31 Structural Components • Design Part • Sequence Optimization Part • Simulation Part 32 I/O Specification for Design • Input: – Symbolic relation between various parts of DNA sequences • Output: – Suitable restriction enzymes, sequences and optimal values of experimental conditions Design Sequence Optimization Simulation 33 Design Part • 3-parameter model for restriction enzymes – r,d,l Design Sequence Optimization Simulation 34 Design Part (continued….) • All restriction enzymes can be expressed via these parameters. • The cleaving actions on the DNA strands in the device are translated into these parameters, and as output we get a list of all restriction enzymes that can be used for the purpose. Design Sequence Optimization Simulation 35 Example • r=6, d=16, l=14 • Restriction enzymes that can be used – – – – Acu I Bpm I BpuE I Bsg I Design Sequence Optimization Simulation 36 I/O for Sequence Optimization • Input – Topologies of system in various conformations – Template sequences and constraints on them • Output – Exact optimized sequences for these template sequences Design Sequence Optimization Simulation 37 Procedure • Find out the constraints for WC matches, for the given conformation. • Assign bases randomly to degenerate bases. • Calculate SCORE on the basis of: – Count of bad oligos like GGG, GGGG, TTTT, AAAA – Counts of complementary match regions (with and without one mismatch) at intra-molecular, inter-molecular intra-complex and inter-complex regions. • The goal is to find an optimal structure that minimizes the score. Design Sequence Optimization Simulation 38 Issues • We can mutate one base at a time and evaluate the new structure. – Local minima. • An evolutionary algorithm can be used for that with a fitness function similar to the score function defined earlier. Design Sequence Optimization Simulation 39 Dynamic case • • Number of conformations small – Easy to extend this method (just add few more constraints) Number of conformations large – Difficult problem – Consider one (or more) basic conformation(s) and the local changes in it (them) and apply the method on this group as a whole • C0 is initial conformation • Ci is local change in conformation in going from Ci-1 to Ci • Consider the group C0,C1,C2,…,Cn for the optimization as a whole. Design Sequence Optimization Simulation 40 I/O for Simulation • Input – Initial conformation of the system – Various external factors like • temperature • Na+, Mg++ concentrations • Output – Graphical display of the simulation Design Sequence Optimization Simulation 41 Simulation Software • DNA molecules represented as – A string representing the sequence of 1st strand from 5’ to 3’ end. – Another string representing the sequence of 2nd strand from 3’ to 5’ end. – Offset between the two sequences. • With each DNA molecule we associate the numerical value of its concentration. Design Sequence Optimization Simulation 42 Assumptions • Discrete time events. • Restriction requires an exact match of the recognition sites. • For simplification, we assume that ligation requires an exact compliment of the sticky end. Design Sequence Optimization Simulation 43 Ligation Events • For every DNA molecule, calculate the probability of it ligating with any other molecule, based upon the concentration of the two molecules (only if they have complementary sticky ends). • Allow that ligation to occur with that probability. Design Sequence Optimization Simulation 44 Restriction Events • In every molecule, find out if there is any restriction site, then depending upon the activity of the restriction enzyme, find the probability of that restriction. • Allow that restriction to occur with that probability. Design Sequence Optimization Simulation 45 Simulation Steps 1. 2. 3. 4. 5. Start with the initial configuration of the system. Calculate the probability that next event will be a ligation or a restriction. Calculate the probabilities of various ligations and restrictions. Perform those events with the calculated probabilities. Update the configuration of the system and the concentrations of the molecules and goto step 2. Design Sequence Optimization Simulation 46 Motivation-Device I-Device II-Device III-Conclusion Example Design of DNA Walker Restriction enzymes PflM I Walker Anchorage A* B Track C BstAP I D A 47 DNA Walker: Operation • Valid hybridization: A* + B = A + B* => A*B C* + D = C + D* => C*D • Valid cut: A*B => A + B* C*D => C + D* Walker B*C => B + C* D*A => D + A* Anchorage A* B Track B* + C = B + C* => B*C D* + A = D + A* => D*A C D A 48 DNA Walker: Operation • Valid hybridization: A* + B = A + B* => A*B C* + D = C + D* => C*D • Valid cut: A*B => A + B* C*D => C + D* A*B C B* + C = B + C* => B*C D* + A = D + A* => D*A B*C => B + C* D*A => D + A* D A 49 DNA Walker: Operation • Valid hybridization: A* + B = A + B* => A*B C* + D = C + D* => C*D • Valid cut: A*B => A + B* C*D => C + D* C A*B B* + C = B + C* => B*C D* + A = D + A* => D*A B*C => B + C* D*A => D + A* D A 50 DNA Walker: Operation • Valid hybridization: A* + B = A + B* => A*B C* + D = C + D* => C*D • Valid cut: A*B => A + B* C*D => C + D* PflM I A*B C B* + C = B + C* => B*C D* + A = D + A* => D*A B*C => B + C* D*A => D + A* D A 51 DNA Walker: Operation • Valid hybridization: A* + B = A + B* => A*B C* + D = C + D* => C*D • Valid cut: A*B => A + B* C*D => C + D* B* + C = B + C* => B*C D* + A = D + A* => D*A B*C => B + C* D*A => D + A* B* A C D A 52 DNA Walker: Operation • Valid hybridization: A* + B = A + B* => A*B C* + D = C + D* => C*D • Valid cut: A*B => A + B* C*D => C + D* A B* + C = B + C* => B*C D* + A = D + A* => D*A B*C => B + C* D*A => D + A* D B*C A 53 DNA Walker: Operation • Valid hybridization: A* + B = A + B* => A*B C* + D = C + D* => C*D • Valid cut: A*B => A + B* C*D => C + D* A B* + C = B + C* => B*C D* + A = D + A* => D*A B*C => B + C* D*A => D + A* D B*C A 54 DNA Walker: Operation • Valid hybridization: A* + B = A + B* => A*B C* + D = C + D* => C*D • Valid cut: A*B => A + B* C*D => C + D* B* + C = B + C* => B*C D* + A = D + A* => D*A B*C => B + C* D*A => D + A* BstAP I A D B*C A 55 DNA Walker: Operation • Valid hybridization: A* + B = A + B* => A*B C* + D = C + D* => C*D • Valid cut: A*B => A + B* C*D => C + D* B* + C = B + C* => B*C D* + A = D + A* => D*A B*C => B + C* D*A => D + A* C* A B D A 56 DNA Walker: Operation • Valid hybridization: A* + B = A + B* => A*B C* + D = C + D* => C*D • Valid cut: A*B => A + B* C*D => C + D* A B C B* + C = B + C* => B*C D* + A = D + A* => D*A B*C => B + C* D*A => D + A* D*A 57 DNA Walker: Operation • Valid hybridization: A* + B = A + B* => A*B C* + D = C + D* => C*D • Valid cut: A*B => A + B* C*D => C + D* A B B* + C = B + C* => B*C D* + A = D + A* => D*A B*C => B + C* D*A => D + A* C*D A 58 DNA Walker: Operation • Valid hybridization: A* + B = A + B* => A*B C* + D = C + D* => C*D • Valid cut: A*B => A + B* C*D => C + D* B* + C = B + C* => B*C D* + A = D + A* => D*A B*C => B + C* D*A => D + A* D* A B C A 59 DNA Walker: Operation • Valid hybridization: A* + B = A + B* => A*B C* + D = C + D* => C*D • Valid cut: A*B => A + B* C*D => C + D* B* + C = B + C* => B*C D* + A = D + A* => D*A B*C => B + C* D*A => D + A* A* A B C D 60 DNA Walker: Experimental Design 61 Autonomous Motion of the Walker 62 DNA Turing Machine: Structure Turing machine Transitional rules: Rule molecules Turing head: Head molecules Data tape: Symbol molecules Autonomous universal DNA Turing machine: 2 states, 5 colors