A Computational Model for the Isothermal Assembly of Tiled DNA Nanostructures by Benjamin Steele B.S., Biology California Institute of Technology, 2010 Submitted to the Department of Biology in Partial Fulfillment of the Requirements for the Degree of ,NASSCHSETS uI4STrJE CFTECHNOLOGY Master of Science in Biology at the IBRARIES MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2014 0 2014 Massachusetts Institute of Technology. All rights reserved. Signature of Author ...................................................................... . ............................................ Department of Biology January 31, 2014 Certified by .................................................................................. Mark Bathe Assistant Professor of Biological Engineering Thesis Supervisor Accepted by...................................................................... / Amy KeatinV Associate Professor of Biology Chair, Department of Biology Graduate Committee A Computational Model for the Isothermal Assembly of Tiled DNA Nanostructures by Benjamin Steele B.S., Biology California Institute of Technology, 2010 Submitted to the Department of Biology on January 31, 2014 in Partial Fulfillment of the Requirements for the Degree of Master of Science in Biology ABSTRACT Complex DNA nanostructures have proven difficult to assemble from starting materials. Inefficient nanostructure assembly constitutes a barrier to the widespread use of DNA nanotechnology and is difficult to investigate experimentally due to the complicated nature of the assembly. This work introduces a type of tile assembly model, the isothermal tile assembly model (iTAM). The iTAM seeks to capture the behavior of assembling DNA tile nanostructures to identify design factors and reaction conditions which improve assembly yields. Simulations using the iTAM model explain the experimental observation that only a narrow range of temperatures permit optimal isothermal assembly of tile-based DNA nanostructures. This narrow temperature range reflects a balance between the stabilization of non-designed interactions at low temperatures and the destabilization of the overall designed structure at high temperature. Simulations based on the iTAM are effective at estimating the temperature of optimal assembly unique to 25 two-dimensional tile designs, with an mean error of estimation of 4.6 degrees C. Results from the iTAM indicate that optimal assembly temperatures are determined largely by the strength of tile-tile domain interactions. For a given tile design, tile concentration and the length of time represent convenient axes of control over tile assembly. Kinetic trapping that blocks complete assembly of a tile design is likely to be overcome by increasing the both temperature and tile concentration in the assembly reaction. Such a change also substantially decreases the computationally predicted time required for complete assembly. Thesis Supervisor: Mark Bathe Title: Assistant Professor of Biological Engineering 2 Acknowledgements Thanks are due to Mark Bathe and Amy Keating for their guidance during my studies; to Cameron Myhrvold and Peng Yin, whose enthusiasm for DNA engineering was infectious; and to all the members of the Laboratory for Computational Biology and Biophysics (especially Matthew Adendorff, Aprotim Mazumder, and Keyao Pan). To my parents and family, and to Arshed - this is for you. 3 Contents Abstract 2 Acknowledgements 3 1. 5 2. 3. Introduction to DNA Nanotechnology and Self-Assembly 1.1 Basics of DNA Nanostructures............................................................................... 1.2 DNA Nanostructures: Applications......................................................................... 1.3 Assembly of DNA Nanostructures........................................................................ 5 8 11 Models for DNA Tiling Self-Assembly 2.1 The Abstract Tile Assembly Model............................................................. 2.2 The Kinetic Tile Assembly Model............................................................. 14 14 17 2.3 2.4 20 21 The Isothermal Tile Assembly Model........................................................ The Thermodynamics of Self-Assembly.......................................................... The iTAM and the SST System 23 3.1 Description of the SST System ........................................................................ 23 3.2 Results from the SST System .......................................................................... 27 Im plem entation of the iTAM ............................................................................. 3.3 30 3.4 Simulation Results: Optimal Assembly Temperatures......................................34 3.5 3.6 3.7 Nucleation........................................................................................................ Tim e and Concentration..................................................................................... Conclusions...................................................................................................... 40 43 46 References 47 Supplemental 50 4 Chapter 1. Introduction to DNA Nanotechnology and Self-Assembly 1.1 Basics of DNA Nanostructures DNA nanotechnology uses DNA as a building material to create structures and active devices with nanoscale sizes. Though simple in concept, DNA nanotechnology has attracted interest from such disparate fields as biochemistry, computer science, and mechanical engineering, with a corresponding range of suggested applications. DNA: informationstore and building material DNA is a basic building block of life on Earth. Nature uses DNA to store and copy information that encodes the basic macromolecules of the cell: DNA, RNA, and proteins. Though genome sizes and organization differ between organisms, the physical structure and basic patterns of DNA layout are shared between all known forms of life. DNA nanotechnology uses the same materials toward a different purpose. In DNA nanotechnology, the specific pairings and stable structure of DNA are used not for information storage but rather to create architectural patterns on the nanoscale. DNA consists of a long strand of monomer units, or bases, that are joined together by covalent bonds. Four different types of monomers are found in nature, each of which can form stable hydrogen bonds with only one of the other monomer species (adenine with thymine; cytosine with guanine). Such base pair interactions underpin the form in which DNA is most commonly found: the familiar double helix, where two strands of DNA run parallel to each other in opposite orientations and every base is hydrogen-bonded to its appropriate complement on the other strand. This form is further stabilized by the exclusion of water molecules from the core of the helix and especially by stacking effects between adjacent bases on the same strand. One specific form of such interaction, called B-DNA, describes the substantial majority of all DNA found in natural and designed DNA applications. B-DNA is characterized by a rise per base pair (bp) along the helix of 0.332 nm, with 10.5 bp constituting a full turn of the helix. This double-helical structure is stable against thermal dissociation if the strands are at least 15 bp 5 long [1]. Two helical grooves run along the exterior 2.0 nm of the double helix. The larger of these grooves, the G C A major groove, is frequently used as a binding site for protein complexes that copy DNA or regulate T gene activity. 10.5 bp (3.5 nm) DNA nanotechnology is feasible because the interaction of two strands of DNA is in general both readily predictable and highly programmable. Under appropriate assembly conditions, two (0.33 nm) strands will specifically and stably bind only if they share a long stretch of complementary Figure 1: B-DNA. The view is perpendicular to the helical axis, and the first three base pairs are labeled. nucleotides. The resulting strand-strand binding region is typically a simple double helix that can be modeled as a stiff rod over lengths of around 10 nm [1]. Complex structures can be built by introducing breaks between double-stranded rods, either by placing a nick in which the backbone of one strand is discontinuous between successive base pairs or by placing a region of single-stranded DNA between double-stranded segments. Such breaks introduce flexibility into the structure and permit the design of bends [2]. The simple design properties of DNA nanostructures contrast with those for designed protein-protein or protein-ligand interactions, where tertiary structure and interactions are more variable. Compared to inorganic nanostructures produced through methods like photolithography, DNA allows the formation of structures at smaller scales, can be made to undergo conformational changes, and exhibits optimal function only in a biological environment. This last is a significant hinderance to DNA nanotechnology's widespread utilization: DNA nanostructures require an aqueous environment and are frequently intolerant of significant changes in environmental temperature or salinity. 6 DNA nanotechnology The first suggestion that DNA could be used as a building material for nanotechnology dates to Nadrian Seeman in 1982 [3,4]. Seeman suggested that a three-dimensional cubic DNA lattice could be used to fix the position of covalently-bound particles difficult to crystallize, conferring a ordered pattern that could be used for X-ray structure determination of the bound particles. In practice it is difficult to build a structure from DNA with sufficient size and rigidity for this purpose, though recently a DNA-based liquid crystal has been successfully used to orient membrane proteins for NMR structure determination [5,6]. More fruitful was the basic concept that synthetic DNA strands with designed regions of base pairing can self-assemble into nanostructures. In 1991 Seeman's lab succeeded in producing a cube, sparking significant interest in DNA nanotechnology. Work along similar lines has produced other polyhedra, such as tetrahedra and octahedra [7,8]. The desire to produce design patterns with broader application led Seeman and others to utilize a tiling pattern to generate larger structures from simpler components, a strategy described in greater detail below. The earliest designs used DNA tiles made rigid through the inclusion of multiple crossovers to generate two-dimensional planar structures, which were later shown to be generalizable into hollow tubular forms [9,10]. Though successful, the tendencies toward largescale disorder, the relative unsuitability for three-dimensional applications, and the low yields of complex structures have constrained the use of tile-based designs in favor of DNA origami, discussed below, in many settings. Recent work addresses some of these drawbacks, using single-stranded tiles (with fewer synthesized bases) to generate arbitrary 3D shapes on a grid pattern [11,12]. The introduction of DNA origami by Paul Rothemund in 2006 greatly enhanced the design, assembly, and yields of DNA nanostructures. In DNA origami designs, a single-stranded DNA virus genome called the scaffold strand is folded up using small synthetic staple strands that bind to multiple regions of the scaffold [13]. While the necessary inclusion of the large scaffold strand places constraints on the size, sequence, and layout of the DNA nanostructure design, DNA origami designs have been used to great success in a wide range of applications. 7 1.2 DNA Nanostructures: Applications Engineeringapplications Proposed engineering applications for DNA nanostructures have focused on the precise positioning and controllable nature of designed base pairing patterns. In one approach, DNA nanostructures are used as a foundation to orient linked proteins and functional groups. Such demonstrated designs include "printing" an array of DNA-linked gold nanoparticles onto a surface and constructing an efficient light-harvesting antenna by positioning dyes onto a DNA cylinder [14,15]. One recent design of particular interest created an artificial membrane channel permeable to ions and single-stranded DNA by positioning covalently linked cholesterol molecules at the sides of a multihelix bundle made with DNA origami [16]. a. _ b. flllfff C. Eu..... Figure 2: Designed DNA nanostructures. (a) Planar DNA origami designs (left column) and corresponding AFM images from [13]. Structures are 100 nm wide. (b) A variety of 2D shapes assembled from single-stranded DNA tiles [12]. Each AFM image shown is 150 nm x 150 nm. (c) A DNA nanorobot that shifts from a closed barrel conformation (top left) to a open conformation (right), releasing a protein payload, when unlocked by an antigen "key" [17]. Alternatively, DNA can be used as an active, mobile element. Changes in DNA base pairing have been used to drive conformational changes in a nanostructure, provide motive power to a DNA "walker," and sense alterations in the local chemical environment [17,18,19]. 8 Such conformational changes have been harnessed to produce "smart" drug delivery vehicles that exhibit enhanced drug delivery and improved cytotoxicity [20]. Computationalapplications Computation through nanostructure assembly was first demonstrated by Len Adelman in 1994 and represents a distinct area of interest in tiled DNA structures [21]. Any desired structure or, equivalently, computation result, can be generated using an appropriately chosen set of tiles and temperatures, given certain assumptions [22]. A tiling DNA computer implemented in this fashion could be enormously parallel, utilizing -1015 logic gates (while as of 2013 microprocessors feature >1010 transistors)'. No calculation approaching this size has been attempted, though computation through tiling has been demonstrated for proof-of-principle problems such as assembling an XOR logic gate from tiles, performing binary operations, and constructing a Sierpenski triangle [23-26]. DNA tiling computation is constrained by the large size of the tile set needed to carry out nontrivial problems, by the time and temperature dependence required for a calculation, and by the significant error rate of tile addition, all of which have been the subject of theoretical and experimental investigation. Since the number of strands needed to encode a calculation grows exponentially with the size of the problem, scaling up DNA computing to address interesting problems demands prohibitive numbers of DNA species in practice [27]. Error rates further limit calculation size: an error rate of 0.1%, comparable to the best achieved, requires that structures not exceed 50 tiles in size to ensure 95% confidence that no error has occurred [25]. Finally, tile computations are inefficient: only the growing edge of a tiled structure actively performs calculations, and when growth is complete it is difficult to read out the computation result or reuse tiles for a different problem. These considerations mean that DNA tile computation cannot practically solve problems intractable on existing computers. IA 1 mL volume with total DNA tile concentration of 10 pM corresponds to 1015 molecules. Faster operation is not implied: DNA binding is slower than transistor switching, and many molecules may perform the same calculation. 9 Figure 3: A Sierpinski triangle grown from self-assembled planar DAO-E tiles, from [26]. The scale bar is 100 nm. (a) An tile placement error rate of 5% produces disorder visible at larger scales. (b) Detail from (a), with misplaced tiles shown with red crosses. (c) A structure with growth initiated by an error and terminated by 3 errors (crosses). More significant for future applications are comparatively trivial calculations for which DNA as a material is uniquely suited. Relevant uses could include assembling a patterned layout as a template for a nanodevice or performing a computation within a biological environment. In a proof of principle, nanodevices composed of DNA alone have been built which can perform signal processing based on multiple simultaneous environmental inputs [28, 29]. Similar networks could be used in vivo to determine the state of a cell or biological process and used to trigger a response - for instance, a caged toxin could be released into a cell if a DNA-based logic network determined that the cell exhibited high levels of cancer metabolites. 10 1.3 Assembly of DNA nanostructures Nanostructuredesign and assembly Though structures as small as Seeman's cube are simple to design by hand, complex structures do not possess a single obvious correct design. DNA designs are difficult to scale up in practice: more or longer strands are required, which increases the cost of reactants. Further, complex designs are frequently more sensitive to assembly conditions and exhibit poor yields of correctly formed structures. To simplify nanostructure layout and assembly, large DNA structures are today typically designed using either a tiling pattern or a DNA origami design. In a DNA tiling design, a simple DNA architectural element is repeated throughout a design to form a larger tiled structure. A tile can be any shape that fills space - triangles, hexagons, aperiodic tilings, and patterns incorporating multiple shapes are all possible. Most designs to date have used a rectangular "pixel" or cubical "voxel" unit that assembles to form a grid pattern. Unlike DNA origami structures, tiled structures are not constrained by the size of a scaffold strand: structures can be as small as a single tile and in theory have no maximum size. Tiled structures generally exhibit lower yields than comparably sized DNA origami structures: more strands must interact to form tiled structures, and errors in tiled structures can propagate throughout the structure because information on tile placement is derived only from local interactions. With a DNA origami design, a careful choice of folds produces any structure obtainable by bending a wire or loop: barrels and planar folded patterns resembling an antiparallel B-pleated sheet are common design motifs. In practice, DNA origami is a robust assembly method that produces large structures with good yield and requires synthesized DNA only for the short staple strands. A key concept for overall origami assembly, which contrasts with that for DNA tiles, is that the presence of the scaffold strand imposes a top-down organization that reduces large-scale disorder. The assembly of a DNA nanostructure typically proceeds by placing all strands required into solution in a single reaction vessel, heating all strands to break any existing base pairs, and then gradually cooling the reaction in a slow thermal annealing ramp to room temperature. 11 Assembly requires the presence of salts, as DNA's negative electrostatic charge must be neutralized by cations for folding to occur. Small divalent cations (Mg 2 +) are especially effective at stabilizing DNA nanostructures, since such cations can enter tightly packed structures to form coordination complexes with strands' negatively charged phosphate backbones. Kinetic barriersaffect DNA nanostructureassembly Kinetic effects govern the assembly process of DNA nanostructures, as demonstrated by slow rates of folding and the observed hysteresis between folding and unfolding. The fully folded state is hugely favored at equilibrium (each 1 kcal/mol difference between states corresponds to a 5-fold factor at equilibrium), such that folded or nearly-folded states alone should be populated at equilibrium. This is not observed for large nanostructures, which instead exhibit low yields (10% yields are common) and a substantial degree of incomplete assembly [30]. These effects can only be explained by the existence of a kinetic barrier to assembly that prevents equilibrium occupation of the folded state. DNA nanostructures exhibit substantial hysteresis between folding and unfolding, a kinetic effect. Measured nanostructure folding temperatures are around 10 0 C lower than unfolding temperatures [31, 32]. processes to be out of equilibrium. This requires one or both of the folding and unfolding Evidence points to the folding temperature being time- dependent, indicating folding is subject to kinetic control [31]. Separate work has also demonstrated that the folding rate of DNA origami is greatly enhanced at low temperatures in the presence of denaturant, as is expected if kinetic trapping governs folding rates [33]. Isothermal assembly Recent work indicates DNA nanostructures' folding is subject to another kinetic effect: folding rates have substantial temperature dependence and are maximal below the structures' melting temperatures [31]. This phenomenon is well-known in polymer crystallization, and is explained by lower- and upper-temperature bounds to assembly (Tgiass and Tmelt, respectively) [34]. Tgiass is a kinetic barrier below which incorrect interactions that impede crystallization are stable over the time length studied. Tmeit is an equilibrium barrier, above which the ordered state 12 is unstable. Assembly rates are non-negligible only between Tgiass and Tmeit, and peak at an intermediate temperature Toptimal. Several DNA nanostructure designs have been demonstrated to exhibit this peaked folding rate at a specific temperature, including both DNA origami and tiled DNA structures [31,35]. When structure assembly is carried out isothermally at Toptimai, yields are greatly enhanced relative to older protocols requiring a gradual temperature ramp and incubation times can be shortened from weeks to hours. Understanding the nature of the kinetic barriers that block DNA nanostructure assembly and being able to set the temperature of optimum folding has the potential to greatly increase the yield, speed of folding, and range of environmental conditions under which DNA nanostructures are employed. This work seeks to do so by applying a model to analyze the assembly of a set of tiled DNA nanostructures that have been experimentally characterized. 13 Chapter 2. Models for DNA Tiling Self-Assembly DNA nanostructure assembly occurs through a complex series of coupled reactions between intermediates. Though nanostructure assembly is not understood in great detail, work over the course of two decades has produced models through which aspects of the assembly process can be analyzed. Such models are reduced models that rely on a number of simplifying assumptions about the nature of DNA assembly. Sections 2.1 and 2.2 review two such models developed by Erik Winfree, the aTAM and kTAM, and describe prior research into these models' properties. Section 2.3 provides motivation for and outlines the iTAM, a variant of the kTAM developed for this work. The goal of the iTAM is to provide a model that can computationally recapitulate the isothermal assembly behavior experimentally observed in DNA tiles [35]. 2.1 The Abstract Tile Assembly Model The abstract tile assembly model (aTAM), introduced by Erik Winfree in 1998, extends the mathematical notion of Wang tiling to the problem of two-dimensional DNA nanostructure assembly [22]. The aTAM takes as input the definition of a tile set, an initial "seed" structure, and a "temperature" T, together termed the tile assembly system. Using these starting parameters, the aTAM then executes a Monte Carlo algorithm that simulates the irreversible growth of a tiled structure. The following review of the aTAM is from Winfree, and is important to reproduce here because the model's features and results underlie the kTAM and iTAM models discussed in sections 2.2 and 2.3 upon which this work is directly based [22]. Description The tile set comprises n square tiles, or monomers, {ai, a2, ..., an}. Each tile in the tile set is defined by eight parameters: four edge labels {JON, aIE, ui,s, aiw} and four associated integer edge strengths {gijo, g i_, gi_2, g_3}. Two adjacent tile edges interact with strength gidir if the 14 I 1 U a. if I 2 2 oH I 2 S1 2 I 1i ifliI OkJ2 2 0 2 b. T =2 Figur 4: An example tileset for the abstract tile assembly model. (a) An example tile set is shown. Different colors represent the four edge labels, and numbers represent the edge strengths. (b) Assembly of this tile system is shown at T = 2. Under this condition assembly is deterministic and produces a variant of the Sierpinski triangle (a portion shown at right). For T = 0 or T = 1 incorrect additions can occur and assembly does not yield a set pattern. Figure 5: An example step in the kTAM. A three-tile structure will lose a tile at rate koff and gain a tile at rate p(off) = kof f kon+koff p(on) = 7 on kon+kof] k0 n. At left, possible structures after tile dissociation. At right, the 7 possible sites for tile addition are shown with dashed outlines. Tiles and binding strengths (units of kBT) are as in Figure 4(a). The corner tile is a fixed seed tile and cannot koff = kf[e-'I+e -2] kon =7kf[mono.] dissociate. 15 edges have identical labels and interact with strength 0 otherwise. Structure assembly begins with a grid that is initially occupied by the input "seed" structure. In each step of the simulation, as illustrated in Figure 4, a tile a is chosen at random from the tile set and placed in a grid square that borders the existing structure. The new tile's edge interaction strengths with the existing structure are scored and summed as Gnew. If Gnew ! T, the new tile is retained and the structure grows in size by 1 tile. If Gnew < 'r, the new tile is removed and no growth occurs. Per Winfree, any desired structure or computation result can be generated with an appropriately chosen tile assembly system [22]. A tile assembly system may or may not be deterministic, meaning it leads to a unique terminal assembly, and may or may not produce a structure of finite size. Though unattractive for constructing physical nanostructures, nondeterministic systems have been formally shown to possess greater computational power than deterministic systems [36]. As described by Winfree, a typical tileset in the aTAM has tiles possessing edge interaction strengths gi = 1. This results in programmed structure growth that occurs at T= 2, where a tile adds only if it attaches to correct partner tiles along two edges of the structure. (At T= 1 a single correct edge match will yield tile addition.) If all edge labels are used only twice in the final structure, a single correct edge match describes a unique position for the adding tile and programmed growth will occur at T = 1. If edge labels are reused throughout the tileset, a single edge match will not uniquely describe a position. Growth will then occur, but some errors are stable and will be incorporated into the structure. At T= 0, all tile additions are stable and the structure will grow indefinitely and without order; at Tr> 2, no additions are stable and growth cannot occur. These bounds have been shown to be true only for rectangular tiles in two dimensions: extension of the aTAM to three dimensions permits algorithmic tile assembly at T= 1 [37]. aTAMextensions Numerous groups have sought to extend the aTAM by relaxing specific model assumptions [38,39,40,4 1]. Of greatest relevance for the isothermal tile assembly model, outlined 16 in section 2.3, are extensions of the aTAM in which properties of the tiles are changed. Relaxing the constraint that tiles must be confined to interact in two dimensions leads to a model for flexible tiles which can efficiently encode certain graph-theoretic problems [40]. Models can be generated in which edges which do not share the same edge label nonetheless interact with nonzero strength [39]. The assumption that only monomer tiles are allowed to interact can be reduced through the staged assembly model, in which selected strands are assembled in separate reactions into subcomponents that are then mixed for final assembly [41], or the 2HAM model, where two structures assemble separately but at each step have a chance of merging [39]. Though of these only the edge-label constraint is relaxed in the isothermal tile assembly model, these extensions are significant in that they represent routes through which the assembly models discussed in Chapter 2 can be generalized to more complex tile systems and more realistic assembly conditions. 2.2 The Kinetic Tile Assembly Model The kinetic tile assembly model (kTAM), introduced by Winfree concurrently with the aTAM, provides a more physically realistic description of DNA tile structure assembly than the aTAM [22]. In the kTAM, tile addition is considered to be a reversible chemical reaction. The properties and treatment of tile assembly of the kTAM have been previously described, but are key to this work and so are reviewed in the remainder of this section. The kTAM takes as input a tile set, a seed structure, a physical temperature T, a tile concentration [monomer] = e-Gmc, and a kinetic rate constant kf. The edge strengths {gi o, ... , gi3} are now given in energy units (e.g. kcal/mol) and T is in Kelvin. Here, the ratio Gmc/gi is approximately equivalent to r as defined for the aTAM. An assembly simulation with kTAM begins with the input "seed" structure. In each step, the m unfilled grid squares bordering the current structure are determined and placed in the set U = {uJ, ... , um}. The h filled grid squares in the structure (the seed tile is excepted) have their interaction energies with other tile squares counted and summed, creating a set of h tile binding energies B = {b 1 , ... , bh}. 17 The tile monomer on-rate is assumed to be constant for each possible binding site. The net on-rate is then given by: kon = mkijmonomer] The net off-rate is given by: koff = kfh e-bi The overall rate of events is ktot = k0 n + koff. Using this, the time interval At until the next event is determined by sampling from an exponential distribution with probability density function f(At) = ktote-ktotAt. The next event is chosen at random to be an on-event with p(on-event) = ko 0 /kto and to be an off-event with p(off-event) = 1 - p(on-event) = kof/ktot. If an on-event is chosen, a tile is sampled randomly from the tile set and added to an unfilled grid square sampled randomly from U. If an off-event is chosen, a tile is sampled randomly from the h filled tiles with a probability based on its sum binding strength: the probability that the ith tile is chosen is given by e-b/(h e-b). The sampled tile is then removed from the structure. Unlike the aTAM, the kTAM can model structure shrinkage under conditions of high temperature or low tile concentration and can permit the incorporation of errors into a growing structure. Errors in the kTAM are introduced when an inappropriate tile adds to an edge but is stabilized by the addition of tiles complementary to it before it can dissociate. Errors are significant as a factor limiting the assembly of both real and simulated systems, and numerous strategies to minimize their impact (including appropriate choice of assembly temperature and tile set design) have been explored [42,43,44]. Applying the kTAM to describe an experimental system requires that a number of model assumptions hold. Though these assumptions are not true in general for physical systems, the degree to which they hold must be a consideration when kTAM-type models are used to study a tile system. A listing of some assumptions implicit in these models is given below. 18 i. The tiles interact solely in a two-dimensional plane. Physical tiles successfully used to construct two-dimensional nanostructures readily generate tubes, indicating that out-of-plane tiled structure curvature is quite possible under physical conditions [9,12]. ii. added. Monomer concentration is constant and equal to the concentration of initial monomer This is not expected to be the case over the course of an assembly run, since free monomer concentration should decrease as tiles are incorporated into growing structures. Winfree's xgrow program seeks to account for this effect by reducing the concentrations of tile species incorporated into the structure [45]. A further consideration is that any kinetic trapping not captured by the model will reduce free monomer concentrations. iii. Temperature is constant during assembly. This constraint is fairly straightforward to relax during simulation. In particular, it is not a concern for simulating experimental systems that assemble at a single isothermal temperature. iv. Only monomers can add to a growing structure. Certain aTAM models like the 2HAM and staged assembly model simulate the merger of tile complexes [39,41], but there is no efficient method that can simulate the combinatorial variety of possible intermediate structures, assess their concentrations, and then produce realistic merger events at realistic rates. It is unclear how significant this effect may be for assembling tile structures. Electron micrographs of experimentally generated tile-based structures sometimes depict objects that appear to arise from two conjoined subassemblies, though similar structures can be generated in simulations purely through errors incorporated during monomer addition. v. The rate constant for tile addition and dissociation is constant and identical between all tiles. This term should vary with temperature and with the hydrodynamic radius of the tile. Larger tiles or groups of tiles diffuse more slowly, with decreased rate constants. vi. Tile addition rates are proportional to the number of unoccupied sites bordering the 19 structure. This fails to take into account steric hindrances to binding or any dependence on the number of edges at a site available for binding, which could affect on-rates. vii. Tile edges that match interact with an assigned energy g and tile edges that do not match interact with energy 0. Relaxing this assumption has been considered in an aTAM-based model [39] and is discussed for the iTAM in section 2.3. 2.3 The Isothermal Tile Assembly Model The isothermal tile assembly model (iTAM) is a variant of the kTAM intended to model thermodynamic features of DNA tile assembly, and was developed specifically for this work. Two changes distinguish the iTAM from previously used kTAM models and are intended to improve the quantitative prediction of DNA tiling assembly properties [22, 43]. In the first change, independent dissociation events at a single tile are treated independently. This slows growth by increasing off-rates and reduces the incorporation of incorrect tiles through growth errors. In the second change, the interaction strengths between all tileset edges, rather than just edges intended to pair, are calculated using the nearest-neighbor model for DNA thermodynamics. This change is motivated by the consideration that two non-partner tile edges may be partially complementary and interact, contributing additional error during assembly. The iTAM can account for multiple independent dissociation events at a single tile, while the kTAM requires that a single tile have only a single dissociation event. This was motivated by considering the aphysical nature of some errors produced by the kTAM: a tile can be added to the structure with an (incorrect) bond of strength 0 and then stabilized by the addition of an adjacent tile. This internally stable tile dimer is linked to the remainder of the structure by a strength-0 bond only; in a physical system, the dimer would freely dissociate. Considering both possible independent modes of dissociation yields a more realistic off-rate for the dimer. The number of independent dissociations possible at a tile site is the number of tile blocks produced when that tile is removed (a block is a group of 1 or more tiles that has no bonds 20 to any other block). The rate of each independent dissociation is kor_din = kfe-blocki, with blocki the binding energy between that block and the tile. Analogous to the kTAM case, the net off-rate for all tiles in the structure is given by: koff = kfh e-blocki where h is taken over all independent dissociation events possible in the structure. The iTAM also calculates the interaction energies between all edges in the tileset, while the aTAM and kTAM assume that interactions between non-partnered edges have energy 0. This is needed to permit errors and kinetic trapping in tilesets where each edge pair is used only once in the intended design. The kTAM predicts that such tilesets should assemble without error even at low temperatures: correctly placed tiles will not dissociate under these conditions, while incorrectly placed tiles must have 0 binding energy and will rapidly dissociate. This is in contrast to expectation and experimental results, which favor the idea that non-designed interactions between tiles should be stable at low temperatures and should contribute toward kinetic trapping. This is also important to take into account orthogonal partially complementary interactions within the tileset that may contribute trapping near assembly temperatures. 2.4 The Thermodynamics of Self-Assembly The self-assembly of DNA nanostructures is a thermodynamically favorable reaction that is slowed by kinetic trapping. While the origin and effects of this kinetic trapping have been previously discussed in section 1.3, a tile assembly model like those discussed in sections 2.1 through 2.3 must also consider the thermodynamics of DNA hybridization driving tile structure growth. Accordingly, the following section briefly reviews the nearest-neighbor method, which is the standard literature technique used to estimate the free energy of hybridization between two strands of DNA. As is later described in section 3.3, this method is used to calculate tile-tile binding energies in this work's implemented version of the iTAM. 21 The nearest-neighbormodelfor DNA hybridization thermodynamics The thermodynamic driving force for DNA nanostructures' assembly is the free energy of hybridization of complementary stretches of DNA, which can be estimated with the nearestneighbor parameters. This estimation has an average error of ±1.6*C for an isolated DNA duplex, and works by decomposing the hybridized region into neighboring pairs of base pairs [47]. Each such nearest-neighbor pair is assigned a AHNm and ASNm value, which are added to additional AH and AS contributions due to hybridization and secondary structural features to determine an overall AHhyb and AShyb. The free energy of hybridization for that region is then given by: AGhyb = AHhyb - TAShyb These values are given for DNA in a 1 M NaCl solution. Using an equation empirically derived by Owczarzy et al, these energy parameters can be adjusted to account for the high Mg 2+ conditions typically used for nanostructure assembly [48]. DNA nanostructures are also subject to additional effects not captured in this treatment, like mechanical strain energies and electrostatic interactions between adjacent double helices. 22 Chapter 3. The iTAM and the SST System DNA nanostructures are limited in their applications by the constrained range of environmental conditions under which they can assemble and persist. Recent work by Sobczak et al. [31], which demonstrated isothermal assembly for DNA origami structures, shows that an improved understanding of the assembly process is likely to lead to a significant increase in yields and concomitant broadening of the range of conditions tolerated during assembly. A series of experiments by Myhrvold et al. [35] extend the concept of isothermal assembly described by Sobczak et al. to tile-based systems, with the explicit aim of increasing such designs' yield under biological conditions. The tile designs used, which will be referred to as the SST system, systematically explore the influence of various features of tile design on temperatures of maximal isothermal assembly. Below the SST system developed by Myrvold et. al is reviewed in section 3.1 and 3.2. The proposed explanation for observed trends in the system at the end of 3.2, as well as the iTAM implementation and results described in sections 3.3 through 3.7, constitute new work. The iTAM model was employed to describe the assembly behavior observed for the designs of the SST system. The goals of doing so were to determine the degree to which a kinetic assembly model can accurately describe the observed features of tile structure growth and to provide insight into the mechanistic details of the assembly process. 3.1 Description of the SST System The tile sets The SST system studied by Myhrvold et al. consists of 27 different tile sets that utilize design rules previously shown to produce arbitrary tubes and two-dimensional structures on a finite grid [12]. Schematically the tile layout in the assembled structure resembles that used for the tile assembly model: tiles have four edges, and for 25 of 27 tile sets each tile edge binds a different partner. 23 While Yin's previous work with the tile design used the same tiles to construct different shapes, the SST system constructs shapes with similar or identical patterns using tiles with different sequences and structural features in order to relate difference in design choices to different observed assembly behaviors. The 25 designs with rectangular layout are separated along three principle axes: i. The connectivity pattern of the tiles in the fully assembled structure. Three different connectivity patterns were used across the 25 designs: ml/m14, m4, and m3. (See Figure 6.) The m10 designs, not considered, constitute the remaining 2 of 27 designs and represent a fourth connectivity pattern. ii. The presence or absence of a linker region between the binding domains. About half of all designs incorporated a flexible linker consisting of repeated thymine nucleotides (typically 10 thymines, or lOT) placed between tile binding domains. Such linkers remained single- stranded in the fully assembled structure. iii. The nucleotide sequences of the tile edges. Within the ml class of designs, domain lengths were varied from 8 nucleotides/edge to 21 nucleotides/edge. Outside of this class, sequences differed between the ml and m14 designs, between designs with different GC content, and between a design that incorporated a single mismatch base and those that did not. 24 6 12 18 17 24 30 3 1016 23 29 36 421 a. b. 12 9 15122128135141 48 54: - -= - 14121127134140147153 601661 r- - 1 1 17 11 - C. 2 7 13 11 18 4 4391 11 5 20 2631 37 101161211271321381431491 15 =. - I- I- U- I- -- . S -9-I 16 111 17122128133139144150155161 1 11218 I40O455156621 14I30I35I4146Is215713J. 119251321381451511581 1137 44 50 57 63 47 53 48 54 59 6 5 0 0 60 66 :9J65 43 49 56 62 55 61 a. C. 4111181 2 I 8 7 14 13 20 3 10 916 15 22 21 28 27134 17 24 23130 29 36 35 42 41 48 19 26 33 40 47 54 25 32 39 46 53 60 31 38 45 52 59 66 Figure 6: The intended designs for the SST structures tested. Designs are composed 66 separate tiles each used once in the final structure. (a) The design used for the ml and m14 tile sets. (b) The design used for the m4 tile sets. (c) The design used for the m3 tile sets. b. 12i o ** o d. 37 44 51 58 65 43150157 64 49 56 63 5562 61 _____ I . ~# ### I 4F7j(7t I ~' I I Figure 7: The linkage patterns of single-stranded tiles in the designed structures. Domains are shown as solid lines. Dashed lines indicate linkers for designs where linkers are present. (a) Four tiles in the ml or m14 structures. The junction is equivalent to a multibranched loop. (b) Four tiles in the m4 design; (c), four tiles in the m3 design. The four-tile junction in both designs is distinct from that in the ml design. (d) Eight tiles in the ml0 design. The connectivity of the ml0 design differs substantially from the other designs, incorporating a crossover and using two strands in the basic unit tile. 25 varied domain length ml_9mer m1 ml_13mer ml-6mer ml_19mer ml_21mer ml_8mer_10T ml_9mer_10T m1_10mer_10T ml10T ml_13mer_10T ml_1T ml_2T ml_4T ml_7T mllOT ml_13T varied linker length varied connectivity ml/m14 m3 m4 varied sequences m1 m14 ml1OT/m14_1OT m3_10T m4_10T m10 mlOhighGC m14_10T m14_1OT_lowGC m4_10T m4_10T_split Table 1: Design variables in the SST system and resulting tile sets tested. Tile sets listed within the same cell can be contrasted for comparisons across that variable. Some tile sets are listed more than once. Table 2 contains additional information on the tested designs. Tile assembly experiments Myhrvold et al. measured the assembly rate of various tile sets over a range of temperatures [35]. In these experiments, all strands of a single tile set were added to a single reaction mixture (200 nM of each tile type to 0.5x TE buffer supplemented with 10 mM Mg 2+). The mixture was then held constant at a set isothermal temperature for 1 hour 2 , the assembly period, and was then placed at 4*C for the remaining steps. The reaction mixture was loaded into an agarose gel and underwent gel electrophoresis, which separates the structures present by their size (directly related to charge) and their degree of folding (fully folded structures are compact compared to aggregated or partially formed structures, and so should exhibit high gel mobility). The assembly yield was calculated using band densitometry: a high-mobility band presumably indicating the folded structure was located on the gel. The amount of DNA in this band, from band density, was divided by the total amount 2 Except for tile sets with domain lengths > 16 nt, where assembly times were 12 hours each. 26 of DNA in the gel lane, from total lane density, to determine the calculated yield. Note that the yield as defined by this method is higher than the yield as defined by the percentage of starting DNA incorporated into correctly assembled structures: structures with high mobility are likely to be substantially "correct" but need not be perfectly assembled, and monomeric strands remaining after assembly may have been run off the end of the gel and as such are not accounted for. To verify that high-mobility bands on the agarose gel did in fact correspond to fully assembled structures, reaction mixtures were adsorbed onto a surface for atomic force microscopy. Though limited detail into the tile makeup of these structures can be gleaned from such methods, but the structures formed appear to generally follow the design pattern, albeit with a substantial degree of shape heterogeneity. Each tile set underwent assembly reactions in parallel at different temperatures. These temperatures typically spanned about a 20'C range, with data points separated by around 3"C. A Gaussian curve was fit to the data, which was displayed as assembly yield versus isothermal assembly temperature. The peak and full width at half maximum of this Gaussian fit were calculated and given as the mean and spread of the isothermal assembly temperatures. 3.2 Results from the SST System The isothermal temperatures at which peak assembly was observed to occur were directly related to predicted tile domain-domain binding strength and inversely related to the presence of a single-stranded linker region between tile binding domains. Most aspects of the analysis below are unique to this work, as domain-domain binding strengths were not previously calculated. Domain-domain binding strengths for a design are closely and directly correlated with its optimal assembly temperature, as would be expected. The relationship is strongest when considering the melting temperatures of the domains as predicted from the nearest-neighbor model for DNA duplex melting temperatures [47], and factors that influence this calculated duplex melting temperature are predictably related to the observed temperatures for SST tile design optimal assembly. Increasing domain length is positively correlated with optimal assembly temperature, expected since longer domains have higher melting temperatures, directly 27 correlated with GC content in binding domains, and negatively correlated with a domain incorporating a single mismatch, expected since this destabilizes domain duplex pairing. The link between linker status and optimal assembly temperature is strong: the presence of a linker shifts optimal assembly temperatures downward for a design by about 8"C relative to a linkerless design. Curiously, there is a minimal observed relationship between the length of the linker region and designs' optimal assembly temperatures. The trends between linker presence and optimal assembly temperature cannot be explained by differences in stacking effects, since the coaxial stacking lost through inclusion of a linker is largely compensated for by dangling-end stacking of the linker region against the two adjacent domains. A more plausible explanation is that the observed relationship is due to an entropic effect. Linkerless designs are not forced to constrain the ends of a relatively flexible region, which would be expected to contribute a destabilizing entropic penalty in a folded structure. 28 m1_9mer_0T M1 9 0 35.90 ±6.42 mlOT m1 10.5 0 47.38 ± 6.09 ml_l3mer_OT m1 13 0 54.77 ± 3.69 ml_16mer_0T m1 16 0 60.07 ± 4.60 ml_19mer_0T m1 19 0 63.80 ± 2.70 ml_21mer_0T m1 21 0 64.80 ± 6.09 ml_8mer_10T m1 8 10 18.42 ± 6.49 ml9mer_10T m1 9 10 26.81 ± 12.71 mll10merl10T m1 10 10 34.98 ± 4.47 ml13mer_10T m1 13 10 48.66 ±5.51 mlT m1 10.5 1 40.66 ± 5.56 m1_2T m1 10.5 2 39.37 ± 4.79 ml_4T m1 10.5 4 39.14 ± 4.81 ml_7T m1 10.5 7 39.73 ± 4.73 ml_10T m1 10.5 10 38.51± 4.87 ml_13T m1 10.5 13 40.31± 5.41 ml_16T m1 10.5 16 37.01 ± 7.46 m14_OT m14 10.5 0 46.83 ± 5.95 m14_10T m14 10.5 10 37.92 ± 5.37 ml4jowGCUOT m14 10.5 10 26.66 ± 7.06 m4_OT m4 10.5 0 44.39 ± 7.90 m4_10T m4 10.5 10 37.84 ± 4.57 m4_split_10T m4 10.5 10 24.44 ± 7.04 m3_OT m3 10.5 0 45.63 ± 5.63 m3_10T m3 10.5 10 39.34 ± 4.36 Table 2: Isothermal assembly temperatures and design features for the SST system. Data shown are given in Supp. Info. 2 of [35]. 29 3.3 Implementation of the iTAM The iTAM model was implemented for this work in Python as a Monte Carlo algorithm and run using representations of the 25 tile sets of the SST system as input. The assembly conditions used were those present in the physical system: 10 mM Mg2+, 200 nM each tile, and the temperature experimentally used. Assembly times were 1 hour for most tile sets and 12 hours for tile sets with domains > 16 nucleotides in length, as was used for the experimental system. A fuller description of the program is given below, and a link to the program files is provided in the supplemental material. Tile-tile standardfree energies andMg2+ correction In a step preparatory to running the simulation, the free energy of interaction between each pair of domains in the tile system was calculated. This was done using a program ("feedermg.py," see supplemental) that took as input text files containing the tile set's sequences as given in Suppl. 2 [35]. The spacing of binding domains along these designs' sequences was hard-coded into the program. When the sequences were read from text files, the program extracted the binding domains' sequences and wrote them to a separate list 220 entries in length, for the 220 domains present per tile set. These 220 sequences were read into the UNAFold program in a pairwise fashion [49]. Using UNAFold's "melt.pl" script, the standard enthalpy and entropy of binding for the minimum free energy structure in which the two strands are bonded is computed. This calculated entropy value, set by default to be that for DNA in a 1 M NaCl solution, is then corrected by the "feedermg.py" program using equation 22 in Owczarzy et al. to account for the 0 M NaCl, 10 mM Mg2+ experimental conditions [48]. This correction is most significant when considering domains with domains 13 nucleotides or greater in length, and strengthens binding. The output is a 220 x 220 matrix in which each entry is a pair (AHij, ASij) giving the enthalpy and entropy of interaction between domain i and domainj under experimental conditions. 30 Secondary structure correctionof tile-tilefree energies The domain-domain energies calculated by UNAFold and subsequently adjusted do not account for secondary structure features that impact the overall structure's energetics. This is not a minor effect: the enthalpies and entropies calculated by UNAFold do not distinguish between a structure with 10-nucleotide linkers or without such linkers. Such a change is experimentally observed to shift the assembly temperature of a design by about 6-9*C - quite significant when peak assembly for these designs occurs only over a 5"C range. To account for such effects, the iTAM model performs energy corrections based on the tile structure design. Ideally, energy corrections would be performed dynamically during the simulation based on the secondary structures present at each step in the assembly. The iTAM implementation of these corrections uses a simpler scheme: domain-domain energies are assigned an energy penalty before the beginning of the simulation that is based on the secondary structural elements created by the formation of such bonds within the context of an otherwise fully-assembled structure. The iTAM accounts for the secondary structure in designs with no linkers or with linkers 52 nucleotides in length by considering that when these designs assemble, they form structures patterned with four-arm junctions, also known as multibranch loops: see Figure 7. (Technically, this is only true for the ml and m14 designs: as shown in Figure 7, the m4 and m3 junctions possess distinct designs, but are treated identically for simplicity.) Each internal tile contributes corners toward 4 separate junctions, and so removing it from a fully assembled structure eliminates four junctions. Because tile-tile energies in this model are assigned at the domain level, this four-junction penalty is distributed evenly between the four domains of the tile: one four-arm junction penalty (+1.6 kcal/mol for a 0-nt linker) is assigned per domain [47]. This rule is modified for edge tiles that contain 1, 2, or 3 domains. Tiles with 1 domain will not form additional loops on binding and do not contribute additional secondary structural features. Consequently, interactions involving a domain of a 1-domain tile are assessed no secondary structure penalty. Tiles containing two domains contribute toward the formation of only a single junction in the final assembled structure. This single junction-formation penalty is divided between 2 domains, for 0.5 junction penalties per domain, for all interactions involving 31 2-domain tiles. 3-domain tiles appear only in the m3 designs, and are assigned 1 junction penalty for interactions with 4-domain tiles (for consistency) and 0.5 junction penalties for interactions with 3-domain tiles such that 3-domain tiles correctly contribute toward 2 junctions in the fully assembled structure. Designs with linkers >2 nucleotides in length form junctions that are similar in connectivity, but experimental evidence for assigning energies to such expanded junctions is lacking. Instead, the iTAM model assigns penalties to these structures by considering that forming a domain-domain bond within a fully-assembled structure produces two single-stranded regions that are bound at both ends by double-stranded regions, which is also the case in an internal loop. Accordingly, domain-domain pairings are penalized as though they created internal loops. For an internal tile, the penalty is 1 internal loop per domain-domain pairing (+4.9 kcal/mol for a 10-nt linker) [47]. Edge tiles create fewer internal loops, as was previously the case with junctions, and fractional internal loop penalties are assigned for domain-domain pairings involving such tiles as in the junction case. This overall treatment of tile-tile binding energies requires the following assumptions: i. The minimum free-energy structure for domain-domain pairing predicted by UNAFold makes sense within the context of the overall tile structure. This will be the case for correct domain-domain pairs, which are most important for simulating tile assembly. ii. The energy of the minimum free-energy structure is accurate. Melting temperatures as predicted by the nearest-neighbor model are typically around ±2*C of experimentally measurements, placing a bound on the accuracy of assembly temperature predictions [47]. iii. Each domain-domain interaction can be treated independently of its environment; overall energies are simply the sum of domain-domain energies, with the domains as defined. The formation of DNA secondary structures violates this assumption, and (as seen through the inclusion of linkers) has a considerable effect on binding energies. 32 The secondary structure adjustment to domain-domain energies attempts to correct for this violation to allow tiled structures' accurate simulation by a kinetic tile assembly model. Simulation The main program, "itam.py," took as input a tile set name, set of temperatures for assembly, entropy/enthalpy values, and simulation-specific parameters such as the number of tile assembly runs to perform at a specific temperature (nruns) and the simulated tile cutoff value for halting simulations. Experimental values for the temperature, tile concentration (200 nM for all simulations), and time given to run the assembly reaction (1 hour for most tile sets, 12 hours for tile sets with domains > 16 nt) were used unless otherwise specified. When run, itam.py began by importing the relevant list of tileset enthalpy/entropy values and calculating the 220 x 220 matrix of tile-tile interaction AG values, as described above. The simulation was initialized by selecting a tile at random from the tileset, assigning it a random rotation, and placing it onto a grid. The simulation was then run as described in sections 2.2 and 2.3. No tile was designated fixed, though the structure was not allowed to shrink below size 1. If a dissociation event broke the structure into two discontiguous blocks of tiles, the smaller of those blocks was deleted. A simulation was run until any one of three stopping conditions was met: i. The structure was of size 66 tiles and had no free edges for further tile addition. If this stopping condition was met, the simulation was ended and a structure was considered to have successfully assembled. This condition does not require that the structure be error-free, but it does effectively require that the structure consist of a single "domain" of tiles aligned per the design layout. Simulating such structures for the full time period studied experimentally substantially increased computation times and did not obviously alter simulation results. 33 ii. The structure reached size 80. Any structure that reached this size was considered an aggregate likely to grow in an unbounded fashion beyond the capability for simulation. iii. 1 hour elapsed in simulation time (or 12 hours for tile sets with domains > 16 nt). Upon stopping, the state of the simulation was recorded, including the final structure, number of addition/dissociation steps performed, and simulation time elapsed. The simulation was then rerun at the same temperature (starting anew with 0 elapsed simulation time and with a single seed tile) repeatedly in this fashion until the number of runs equalled nns. When that occurred, the tileset was rerun at the next temperature to be tested with AG values recalculated for the new temperature. Simulation yields at a specific temperature were calculated as yield nsuccess = nsuccess/nruns, where is the number of runs terminated by stopping condition (i). A completed simulation in which a tile design was tested at m temperatures generated m+l data files. The first of these data files was titled "simdata design-namedate.txt," and recorded the variables used for the simulation (temperatures, design, assumptions). For each run in the simulation, the outcome of a run (success or failure), the simulation time before the end of the run, the fraction of tiles in the mode tile orientation, and the size of the final structure were recorded. The remaining m data files, titled "extradatadesign-name temperature.txt," record the size and pattern of the assembling structure through snapshots taken at set points during the simulation. 3.4 Simulation Results: Optimal Assembly Temperatures The iTAM reproduces the main features of SST system assembly experimentally observed: yields from structure assembly are only significant for a relative narrow temperature range typically spanning ~10*C. Simulation trends in this temperature range generally matched those found in experiment, with an average error in predicting the experimentally observed 34 optimal assembly temperature of 4.60 C. The temperature of maximal yield for a design (Toptimai), as expected, bears a direct relationship to both the time and the concentration of tile monomers provided for the assembly reaction. Isothermalassembly temperatures Isothermal assembly behavior as predicted by iTAM was in general a good match for that observed through experiment. Tiled designs tested in the iTAM exhibit a yield with respect to temperature that approximately follows a Gaussian distribution centered around Toptimai, the temperature at which peak assembly is observed, and with a standard deviation of around 3-5*C. As expected from experiment and polymer crystallization theory, yields are negligible at temperatures sufficiently far from simulation Tassembly < Toptimai Toptimal. This is due to kinetic trapping for and to destabilization of the assembled state for Tassembly > Toptima. Low temperatures For Tassembly < Toptimai, low yields are a reflection of the high likelihood that an error stably incorporates into the structure. Such an error is likely to act as a "seed" for subsequent incorrect tile additions, resulting in a structure with multiple different patches of aligned tiles. Further addition events to such structure will not yield a structure that follows the intended design. At sufficiently low temperatures, many promiscuous non-designed interactions are stable and a single assembly run, when halted at the aggregation stopping condition of size 80, contains many patches of tiles of different orientations. As temperature approaches Toptimal, these patches are reduced in number and grow in size, as fewer erroneous tile additions occur during assembly. (See Figure 9a.) Promiscuous interactions are most stable near Toptimai for designs with shorter domain lengths because close matches to non-designed partners are more likely when domains are short. For instance, a 220-domain design with 8-nt domains should contain multiple 5-nt promiscuous interactions by chance alone, stable near Toptimai, while a 16-nt domain would be unlikely to harbor a non-designed match of even 9 nt. 35 Around Toptimal Assembly near Toptimal occurred efficiently for most designs, with a probability quite close to 1 that the designed structure would be formed during the course of the simulation. Optimal or near-optimal assembly was typically observed over a range of temperatures, with an average simulated FWHM (the range of temperatures over which assembly was at least half-maximal) of 7. 1*C. (The corresponding average experimental FWHM was 11.5*C.) Skew was evident on the curves of simulated yield vs. temperature, with the longer "tail" of successful assembly apparent for temperatures Tassembly < Toptimai. This can be attributed to successful structure formation being energetically favorable, just unlikely, at low Tassembly, while at high Tassembly all structure growth is unfavorable and a structure will effectively never form. 70 -- experiment 60 -csimulation 50 40 - ml._nmerOT =cc 30 mlnT 20 mlnmer_1OT 1- I 0 I I. c ., I W- =4 -I oW, 04 I o- I o I c' I '-4 r- I i I I -W m14 r-4 I I - r- r- r- I M r- ~ deigs Teprauesar aas~ I . -0 I -I- I- C I m4 I l m3 I V4 I I 0 - D C Y I. r- E alult nhe menoE ausa i totedta.o2aiy h Fig= .. 8: Optimal assembly temperatures from iTAM and from experiment for 25 SST designs. Temperatures are calculated as the mean of a Gaussian fit to the data. For clarity, the 25 designs are placed into 5 groupings of similar designs. General trends in experimental Toptimal values are reflected well in simulated Toptimai values - both increase with increasing domain length and decrease with the presence of linker regions. 1) 36 The iTAM simulations consistently underestimated the experimentally observed T0 ptiwa values. The Toptimal was underestimated compared to experiment for 22/25 designs, with the mean Toptimai +4.21*C higher in experiment. The average Jerror of prediction was 4.60*C. Trends in Toptimai were accurately captured by the iTAM model: R2 = 0.928, so over 90% of variation in experimental Toptimai is explained by variation in simulated Toptim. One possible interpretation of the general underestimate of Toptimi is that physical tile-tile interactions are stronger than modeled; e.g., secondary structural penalties assigned by the model are too high. Another interpretation that may be significant for some designs (particularly the designs grouped under mlnmerOT in Figure 8) is that the structures measured for the experimental definition of yield may differ from those used in the experimental definition of success. In particular, these designs invariably form large structures (-60 tiles) during assembly runs at temperatures where complete assembly is rare or absent. Their corresponding physical structures would not be reliably distinguished from a size-66 structure when run on an agarose gel, so at least some experimentally calculated yields may count near-complete structures as 0i ~0 0 00000. .0 . - . . . . . gur9: iTAM assembly of ml_8mer_lOT structures. - -- -. . . ... -..m .: .- .:(a) Colors indicate patches of tiles that form designed bonds. Assembly at 1 0*C < Toptimai. Numerous errors stably incorporate into the final structure. Stopped at size 80. (b) Assembly at 22*C = Toptimat. Stopped as a success. (c) Assembly at 28*C > Toptimai. Multiple tile layers are missing at top and left. Stopped after 1 hour. 37 temperature (5C) ml designs ml_9merOT mlOT ml_13mer_OT ml16merOT ml_19merOT ml_21merOT ml_8mer_10T ml_9mer_10T ml_10mer_10T ml13mer_10T 10 20 40 30 50 60 70 NOW"ma M U - - varied linker mliT mI_2T ml_4T m1_7T 6 IN ml_10T ml13T ml16T - I I - A simulation M experiment other designs m14_0T m14_10T m14_lowGC_10T m4_OT m4_10T m4-split_10T m3_OT m3_10T Figu 10: Optimal assembly temperatures from iTAM and from experiment for 25 SST designs. Bars are centered at the mean of a Gaussian fit to the data, as in Figure 8, and are 1 FWHM (1.7a) in width. Each bar spans the range where yields are at least half maximal, assuming yields are normally distributed with respect to temperature. successfully assembled. This would be most influential at high temperatures, and like the observed trend would push experimental up compared to simulated Toptimat. High temperatures For Tassembly > Toptimal, low yields are a reflection that high temperatures weaken the tile- tile interactions sufficiently that fully assembled structures have a low equilibrium probability. 38 Assemblies of reasonable size are likely to be present under these conditions, but the likelihood that they sample one of the fully assembled 66-tile states during the course of a finite simulation becomes negligible with rising temperature. For Tassembly greater than, but close to, Toptimai, an assembly reaction for a number of designs results in structures of substantial size (-60 tiles) that fail to reach a fully assembled state over the course of the simulation. These structures otherwise obey the planned design, with all tiles oriented and bonded correctly. The missing tiles in these structures are typically those at the corners of the intended design, which become increasingly squared-off at higher temperatures. This can be attributed in large part to the diamond and parallelogram shapes of the tested designs (see Figure 6); incomplete assembly would be reduced with a rectangular design. Corners in diagonal designs fray because corner layers contain the fewest tiles, and layer removal is fastest (and layer establishment is slowest) for the narrowest layers. Consider the case of a 2-dimensional horizontal layer of n tiles at Tassembly > Toptimai, where tiles bound at 3 edges do not dissociate, where tiles bound at 2 edges frequently exchange with monomer and have probability p of being bound, and where tiles bound at 1 edge rapidly dissociate. This layer grows and shrinks horizontally, but can disappear entirely only by shrinking down to two tiles in width, past which point a single dissociation event will induce the rapid dissociation of the last remaining tile. The rate of layer disappearance is then directly proportional to the occupancy of the width-2 state. Considering tile layer lengths from 2 to n to be in equilibrium, the relative probability of a width-2 layer is given by the partition function:, n- I Prelative(width-2) = n (n - k)pk k= I where n is the design width, in tiles, of the layer. Prelative decreases with increasing n, especially for large p. The rate of (rare) layer addition is proportional to the number of sites at which a new width-2 layer can be added, which is just n-1. Since layer production is fastest with large n and layer removal is slowest with large n, a layer is most likely to be present, if not necessarily intact, when designed to include many tiles. No design tested features a square of tiles larger than 6 x 6, so this can be regarded as the 39 maximum layer width present in all designs. The temperature at which a layer of width 6 becomes disfavored at equilibrium should coincide with an abrupt loss of all structure. In fact, the ml_19merOT and ml_21merOT structures appear to follow this pattern: stable structures of size -60 form for temperatures Tassembly > Toptimai and initially shrink only slightly with temperature. Increasing temperature by a further 2*C leads to a complete melting of structures, indicating that even a maximally "squared-off' structure is no longer stable. 3.5 Nucleation If polymer crystallization is a useful model to describe the isothermal assembly of DNA tiles, might the assembly of tiled DNA structures also be governed by a rate-determining nucleation step? Nucleation acts as a rate-determining step when the initial association of several subunits to form a crystal seed for further growth is slow and energetically disfavored compared to subsequent growth at the crystal edges. Simulation results from the iTAM do not indicate that nucleation of a small initial complex is generally a rate-limiting step in tiled structure growth. This was determined by monitoring the size of structures as they grew during the course of an assembly simulation. Whenever a structure grew larger than it had previously been during the simulation, its size and the elapsed simulation time were recorded. Graphing all structures' maximum sizes versus simulation time illustrates the extent to which a critical seed limits structure growth. If there is a critical nucleus of size n that forms slowly and governs assembly rate, structure growth should be observed to pause below size n for long periods of assembly time. Once a structure reaches size n, it should then rapidly proceed to complete assembly. This was not generally observed for structures undergoing assembly: structures generally aggregated, assembled quickly, or failed to undergo significant assembly over the simulation time course. Structures were not paused below a certain cutoff size under aggregation or optimal assembly conditions. (See Figure 11(a) and Figure 11(b) for typical assembly reactions illustrating this point.) 40 Conditions where nucleation was observed impact yields were observed only in the rare cases for assembly reactions Tassembly > Toptimal where significant structure was formed during the reaction but few structures successfully assembled. One straightforward instance of nucleation was apparent in the ml _19merOT structure at 67.3"C, where a nucleus of size 20 is only slowly assembled (there is only a 50% chance that this nucleus will be assembled in a 3-hour period), but the remaining growth from size 20 to an equilibrium structure of size 60 can occur in 30 minutes or less. The mlOT structure, which assembled poorly even at its Toptimal, demonstrates an even more complicated behavior: nucleus sizes of 10, 20, and 50, are all supported by assembly data. (The equilibrium states are 50 and 60 tiles.) The addition of preformed nuclei would not be expected to effectively seed assembly of most designs under most conditions. Kinetic trapping through destabilization of a crystal seed is predicted by the iTAM model as limited to a narrow range of temperatures above Toptimal for particular designs. In the limited number of cases where seeding could dramatically speed the approach toward equilibrium, the nuclei required are a substantial size of the full design, at 20 tiles in size. 41 aggregation a. b. 80 70 assembly 8Cr1 1 _r_ 30 40 70 F 50 50 40 40 30 "' 30 2 20 20 10 10 4U- U 0 0.5 1.0 1.5 2.0 2.5 3.0 time (min) single nucleus C. 0 3.5 rn. 10 20 50 time (min) d. 7 01 60 60- ~50 50 N040 40- 30 3 0- 20 2 2 0 -- 10 1 0'0 100 200 300 400 500 600 700 800 time (min) 60 multiple nuclei 1 0- 0 10 20 30 40 time (min) 50 50 Figure HI: Maximum structure size attained versus assembly time for several designs and conditions. Pauses in growth at intermediate structures may indicate rate-determining nuclei. Each line depicts an independent assembly reaction performed at a set temperature and tile set. At least 20 reactions are shown in each graph. (a) In aggregation conditions, growth to size 80 is rapid and pausing is not observed. Assembly is of ml_13merOT at 40.1*C. (b) At Toptimai, assembly is rapid and accurate. Nucleation does not impact assembly rates. Assembly is of ml_8merIOT at 22*C. (c) Assembly of ml_I9merOT at 67.3*C is rate-limited by a slow-forming 20-tile nucleus. (d) The assembly of mlOT at 47.8*C is influenced by nuclei of sizes 10 and 20 tiles. Apparent pausing at size 50 may represent a third nucleus or may reflect an equilibrium. 42 3.6 Time and Concentration Time Because tile assembly is a kinetic process, the time for which the assembly reaction is run must influence the results of assembly. Because the lower bound for successful assembly is not significantly affected by time (it is mainly set by the high likelihood of aggregation at low temperatures), altering assembly times affects only the placement of upper bound of the Toptimai window. Longer assembly times are expected to raise yields, especially around this upper bound, and shift the upper assembly bound higher in temperature. This effect is observed in the iTAM, where simulated assembly outcomes are moderately For the ml_8merlOT and ml _13merOT designs, the sensitive to the time provided. temperature of peak assembly is shifted downward by 3*C when assembly time is reduced from 1 hour to 10 minutes. Assembly times significantly longer than one hour (i.e. 12 hours) would be expected to raise the calculated temperature of peak assembly success for most designs by a similar amount. a. 1i., - .-. ,--r---- 0.9 0.8t ---- - 0.7 10.6 06 - - - . Is, 0.4 0.3 0.2 0.1 0 0* 0.6 0.6 -- - 0. 0.4 0.3 0.2 - - -- 0.8 -0.7--- -- 0.9 - -- -- 5 10 15 20 25 30 temperature (2C) time mn 0min =1110 min - - -- -- 30 min - 0.1 0--0 40 assembly ---- -- -6O 45 50 55 min 60 temperature (2C) Figur 12: Yields had simulations been halted at times prior to 60 minutes (>70 runs/point). (a) Yields for the ml_8merlOT design. (b) Yields for the ml 13merOT design. Shorter assembly times reduce yields and lower the temperature at which maximal assembly is achieved during simulation. Caution should be exerted in interpreting physical significance to the precise position of the upper bound. Since a success is defined as the structure entering an assembled state at any point during the simulation and not by the equilibrium probability that a structure is assembled, a simulation of infinite length will assign a yield of 1 to an arbitrarily high temperature. Most 43 designs have reached an equilibrium after 1 hour at or above Toptimai, and increasing simulation yields with increased simulation time demonstrate only that a structure of significant size is present in this equilibrium. Concentration Optimal assembly temperatures are expected to be positively correlated with tile monomer concentration: higher tile concentrations increase all on-rates, while off-rates are unchanged. This makes the formation of kinetically trapped and fully assembled structures more likely at higher temperatures, shifting higher the temperature range in which near-optimal isothermal assembly occurs. Optimal assembly curves simulated using the iTAM are shifted with respect to temperature as expected. Based on the data shown in Figure 13, a 5-fold decrease or increase in tile monomer concentration for such designs can be expected to shift Toptimai downward or upward, respectively, by 5-6*C. a. b 0.8 0.7 0.9- -0.8 0 0.6 o. - 0.4 0.3 0.7 0.6 - - + 0.5 - --------- - - - 0.4 - 0.2 0.3 0.2 -- 5 10 15 20 25 30 35 00-40 nM -. - - 200 nM - 1000nM 0.1 0.1 - - - 35 40 45 50 55 60 temperature ('C) temperature (2C) Figure 13: Simulated assembly yields as a function of tile monomer concentration. (a) Yields for the ml_8mer_10T design. (b) Yields for the ml_13merOT design. [Monomer] is directly associated with Toptimai. Experimental [monomer] was 200 nM. Simulation time was 1 hour/assembly reaction, with >50 runs/point. Increasing or decreasing tile monomer concentration shifts the assembly equilibrium, and so (unlike changing assembly times) can lead to high yields at temperatures far from the Tptimai calculated for the 200 nM case. The kinetics of assembly also differ for assembly reactions at different tile monomer concentrations performed at their respective Toptimai values: high 44 temperature, high tile monomer conditions promote rapid exchange of tiles. Optimal assembly at high temperatures is thus closer to equilibrium (Table 3). 40 169C 0.72 69.9 ± 12.2 40.3 ± 8.1 200 229C 0.98 107.1 ± 37.8 15.4 ± 7.8 1000 289C 0.98 201.9 ± 82.1 6.0 ± 4.3 Table 3: Optimal assembly for the ml_8mer_1OT design approaches equilibrium most rapidly under high-temperature, high-monomer conditions. Each 5 x [monomer], 6*C rise in assembly increases the number of association or dissociation steps before successful assembly by 50% and reduces the average time to assembly by 60%. The iTAM results indicate that modification the tile concentration of an isothermal assembly reaction may be an effective at setting a desired temperature for designs' optimal assembly. When this temperature and tile monomer concentrations are high, the model indicates that kinetic barriers to structure assembly are reduced and assembly reactions are substantially reduced in time. One important caveat to these conclusions is that the iTAM model assumes that tiles add only as monomers. Since high tile concentrations favor the formation of oligomers, if oligomeric additions influence assembly their impact is likely to be greater (and the model less accurate at describing the physical system) under these conditions. 45 3.7 Conclusions Simulations using the iTAM model explain the experimental observation that only a narrow range of temperatures nanostructures. permit optimal isothermal assembly of tile-based DNA This narrow temperature range reflects a balance between the stabilization of non-designed interactions that occurs at low temperatures and the destabilization of the overall designed structure that occurs at high temperature. Simulations based on the iTAM are effective at estimating the temperature of optimal assembly (Toptimi) unique to 25 two-dimensional tile design, with an average error of estimation of 4.6*C. The iTAM can be used to improve the design and assembly properties of tiled DNA nanostructures by tailoring a design's temperature of optimal assembly to the environment of its use. It provides a concrete estimate of the extent to which non-designed internal interactions will impede assembly. Control over optimal assembly temperatures is most directly set by the strength of tile-tile domain interactions. Results from the iTAM suggest that large, blunt designs with edge layers that incorporate many tiles will be most stable at high temperatures. Edge and corner tiles in a finite design make few contacts and so are less stable than interior tiles. The bonds of edge tiles can be strengthened by increasing GC content or domain length to ensure that edges are completely formed under assembly conditions. Nucleation is not in general a rate-limiting step for assembly of the tiled DNA structures studied through simulation. Adding preformed seed nuclei to reaction mixtures or designing structural cores with greater stability is not expected to significantly alter assembly kinetics for most tile designs. Tile concentration and the length of time may be used as convenient axes of control over tile assembly. Kinetic trapping that blocks complete assembly of a tile design is likely to be overcome by increasing the both temperature and tile concentration in the assembly reaction. Such a change also substantially decreases the time required for complete assembly. 46 References [1] Lu, Y, B. Weers, N. Stellwagen. "DNA Persistence Length Revisited," Biopolymers [2] Tinland, B., A. Pluen, J. Sturm, G. Weill. "Persistence Length of Single-Stranded DNA," [3] [4] Macromolecules (1997) 30: 5763-5765. Seeman, N. "Nucleic acid junctions and lattices," J. Theor Biol. (1982) 99: 237-247. Seeman, N., N. Kallenbach. "Design of Immobile Nucleic Acid Junctions," Biophysical Journal(1983) 44: 201-209. [5] Douglas, S., J. Chou, W. Shih. "DNA-nanotube-induced alignment of membrane proteins (2002) 61: 261-275. for NMR structure determination," PNAS (2007) 104: 6644-6648. [6] [7] Berardi, M., W. Shih, S. Harrison, J. Chou. "Mitochondrial uncoupling protein 2 structure determined by NMR molecular fragment searching," Nature (2011) 476: 109-113. Chen, J., N.C. Seeman. "Synthesis from DNA of a molecule with the connectivity of a [8] cube," Nature (1991) 350: 631-633. Shih, W., Quispe, J., Joyce, G. "A 1.7-kilobase single-stranded DNA that folds into a nanoscale octahedron," Nature (2004) 427: 618-62 1. [9] [10] Winfree, E., F. Liu, L. Wenzler, N.C. Seeman. "Design and self-assembly of twodimensional DNA crystals," Nature (1998) 394: 539-544. Rothemund, P., A. Ekani-Nkodo, N. Papadakis, A. Kumar, D. Fygenson, E. Winfree. "Design and Characterization of Programmable DNA Nanotubes," JACS (2004) 126: 16344-16352. [11] [12] [13] [14] [15] [16] Ke, Y, L. Ong, W. Shih, P. Yin. "Three-Dimensional Structures Self-Assembled from DNA Bricks," Science (2012) 338: 1177-1183. Wei, B., M. Dai, P. Yin, "Complex Shapes Self-Assembled from Single-Stranded DNA Tiles," Nature (2012) 485: 623-626. Rothemund, P. "Folding DNA to Create Nanoscale Shapes and Patterns," Nature (2006) 440: 297-302. Yan, H., S. Park, G. Finkelstein, J. Reif, T. LaBean. "DNA-Templated Self-Assembly of Protein Arrays and Highly Conductive Nanowires," Science (2003) 301: 1882-1884. Dutta, P., R. Varghese, J. Nangreave, S. Lin, H. Yan, Y Liu. "DNA-Directed Artificial Light-Harvesting Antenna," JACS(2011) 133: 11985-11993. Langecker, M., V. Arnaut, T. Martin, J. List, S. Renner, M. Mayer, H. Dietz, F. Simmel. "Synthetic Lipid Membrane Channels Formed by Designed DNA Nanostructures," Science (2012) 338: 932-936. [17] Douglas, S., I. Bachelet, G. Church. "A logic-gated nanorobot for targeted transport of [18] molecular payloads," Science (2012) 335: 831-834. Lund, K., A. Manzo, N. Dabby, N. Michelotti, A. Johnson-Buck, J. Nangreave, S. Taylor, [19] R. Pei, M. Stojanovic, N. Walter, E. Winfree, H. Yan. "Molecular Robots Guided by Prescriptive Landscapes," Nature (2010) 465: 206-2 10. Liu, D., S. Balasubramanian. "A Proton-Fueled DNA Nanomachine," Angew. Chem., Int. Ed. (2003) 42: 5734-5736. 47 [20] Zhao, Y., A. Shaw, X. Zeng, E. Benson, A. Nystrnm, B. H6gberg. "DNA Origami Delivery System for Cancer Therapy with Tunable Release Properties," A CS Nano (2012) 6: 8684-8691. [21] Adleman, L. "Molecular Computation of Solutions to Combinatorial Problems," Science [22] (1994) 266: 1021-1024. Winfree, E. "Algorithmic Self-Assembly of DNA," California Institute of Technology, thesis (1998). [23] Mao, C., T. LaBean, J. Reif, N. Seeman. "Logical computation using algorithmic self assembly of DNA triple-crossover molecules," Nature (2000) 407: 493-496. [24] Yan, H., L. Feng, T. LaBean, J. Reif. "Parallel Molecular Computations of Pairwise Exclusive-Or (XOR) Using DNA "String Tile" Self-Assembly," JACS (2003) 125 14246-14247. [25] Barish, R., R. Schulman, P. Rothemund, E. Winfree. "An Information-Bearing Seed for [26] Nucleating Algorithmic Self-Assembly," PNAS (2009) 106: 6054-6059. Rothemund, P., N. Papadakis, E. Winfree. "Algorithmic Self-Assembly of DNA Sierpinski Triangles," PLoS Biology (2004) 2: e424.2 [27] Reif, J. "DNA Computation - Perspectives: Successes and Challenges," Science (2002) 296: 478-479. [28] Qian, L., E. Winfree, J. Bruck. "Neural Network Computation with DNA Strand Displacement Cascades," Nature (2011) 475: 368-372. [29] Chen, Y., N. Dalchau, N. Srinivas, A. Phillips, L. Cardelli, D. Soloveichik, G. Seelig. "Programmable Chemical Controllers Made from DNA," Nature Nanotechnology (2013) [30] 8: 755-762. Pinheiro, A., D. Han, W. Shih, H. Yan. "Challenges and opportunities for structural DNA nanotechnology," Nature Nanotechnology (2011) 6: 763-772. [31] Sobczak, J., T. Martin, T. Gerling, H. Dietz. "Rapid folding of DNA into nanoscale shapes at constant temperature," Science (2012) 338: 1458-1461. [32] Myhrvold C., personal communication. [33] Jungmann, R., T. Liedl, T. Sobey, W. Shih, F. Simmel. "Isothermal Assembly of DNA Origami Structures Using Denaturing Agents," JACS (2008) 130: 10062-10063. [34] Mandelkern, L., Crystallization of Polymers, Volume 2: Kinetics and Mechanisms [35] Cambridge University Press (2004). Myhrvold, C., M. Dai, P. Silver, P. Yin. "Isothermal Self-Assembly of Complex DNA Structures under Diverse and Biocompatible Conditions," Nano Letters (2013) 9: [36] [37] 4242-4248. Bryans, N., E. Chiniforooshan, D. Doty, L. Kari, S. Seki. "The power of nondeterminism in self-assembly," SODA 2011: Proceedings of the 22nd Annual ACM-SIAM Symposium on DiscreteAlgorithms (2011) 590-602. Cook, M., Y. Fu, R. Schweller. "Temperature 1 self-assembly: Deterministic assembly in 3D and probabilistic assembly in 2D," SODA 2011: Proceedingsof the 22nd Annual ACM-SIAM Symposium on Discrete Algorithms (2011) 570-589. 48 [38] Becker, I. Rapaport, E. Rdmila. "Self-assembling classes of shapes with a minimum number of tiles, and in optimal time," FoundationsofSoftware Technology and TheoreticalComputer Science (2006) 45-56. [39] [40] [41] [42] Cheng, Q., G. Aggarwal, M. Goldwasser, M. Kao, R. Schweller, P. de Espands. "Complexities for generalized models of self-assembly," SIAM Journalon Computing (2005) 34: 1493-1515. Jonoska, N., G McColm. "Complexity classes for self-assembling flexible tiles," Theoretical Computer Science (2009) 410: 332-346. Demaine, E., M. Demaine, S. Fekete, M. Ishaque, E. Rafalin, R. Schweller, D. Souvaine. "Staged self-assembly: nanomanufacture of arbitrary shapes with 0(1) glues," Natural Computing (2008) 7: 347-370. Winfree, E., R. Bekbolatov. "Proofreading Tile Sets: Error Correction for Algorithmic Self-Assembly," Lecture Notes in ComputerScience, (2003) 2943: 126-144. [43] [44] [45] [46] Fujibayashi, K., D. Zhang, E. Winfree, S. Murata. "Error suppression mechanisms for DNA tile self-assembly and their simulation," NaturalComputing (2009) 8: 589-612. Schulman, R., E. Winfree. "Programmable control of nucleation for algorithmic selfassembly," SIAM J.Comput. (2009), 39: 1581-1616. Winfree, E., R. Schulman, C. Evans. The xgrow simulator. (2003) http://www.dna.caltech.edu/Xgrow/ Fujibayashi, K., S. Murata. "Precise Simulation Model for DNA Tile Assembly." IEEE Transactionson Nanotechnology (2009) 8: 361-368. [47] SantaLucia J., D. Hicks. "The Thermodynamics of DNA Structural Motifs," Ann Rev Biophys Biomol Str. (2004) 33: 415-440. [48] [49] Owczarzy, R., B. Moreira, Y. You, M. Behlke, J. Walder. "Predicting Stability of DNA Duplexes in Solutions Containing Magnesium and Monovalent Cations," Biochemistry (2008) 47: 5336-5353. Markham, N., M. Zuker. "UNAFold: Software for Nucleic Acid Folding and Hybridization," Methods in Molecular Biology 453: Bioinformatics, Volume II. Structure, Function andApplications (2008) 3-31. 49 Supplemental Program files are available at: http://sourceforge.net/projects/itam-tile-simulator/ Python 2.7, NumPy, SciPy, and UNAFold (http://mfold.ma.albany.edu/) are required. Assembly curves as a function of temperature for all 25 designs tested are reproduced below. Blue lines are iTAM simulation (error bars indicate 95% binomial confidence interval), while green lines are experimental observations. mi-9merOT mlOT 1.0 i a. 0.8- 0.8 o 0.6- 0.6 .1 0.4- 0.4 0.2 0.2 [ 30 35 temp. *C 40 45 5C 40 vS7 ml-13merOT 45 temp *C 1 .1I 50 -Y. 5 ml_16merOT 1.0. 1n 0.8 0.8 0.6 0.6 0.4 0.4 0.2- 0.2 T temp, *C 55 60 40 45 50 55 temp. IC 50 60 65 ml 21mer ml19merOT 1.0 . . , . 0.8- 0.8 0.6 0.6 1- .4 0.4 0.4 0.2- 0.2 45 40 65 60 55 temp. IC 50 7 70 45 40 0.8- 0.8 0.6 0.6 0.4 0.4 0.2 0.2 10 15 20 temp. C 25 15 0.0 3! 30 20 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 I . I 20 25 . m C 75 25 temp, IC 30 35 4( ml-13merlOT ml-10mer lOT 15 70 ml_9merlOT l.c L 65 60 55 temp. *C 50 ml Smer_10T 1.0 0.0 5 OT 1.c 35 40 I n 45 i 20 51 2 25 30 35temp 45 50 55 ml-2T 1.0 1.0 0.8- 0.8 0.6- 0.6 0.4- 0.4 0.2 0.2 I 5 30 35 40 temp. C 45 50 u. 55 25 30 30 35 ml-4T T0 4015 40 temp, *C 15 45 5 50 5 ml7T 1Cr 0.8 0.8 0.6 0.6 0.4- 0.4 0.2 0.2 . 0.T S 30 35 40 temp. IC 45 50 A 55 . 30 35 m110T 40 temp, *C 5 45 50 5 45 50 55 ml_13T 1.0 0.8- 0.8 0.6 0.6 0.4- 0.4 0.2 -1 0.2 0 2 25 30 35 temp, TC 40 45 ~* 2 50 52 30 35 40 temp, *C ml16T m14_OT .1 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 20 30 25 35 40 temp, IC 50 45 55 61 35 'A3 40 m14_lOT 1.0 0.8- 0.8 0.6- 0.6- 0.4- 0.4 0.2 0.2 30 25 35 55 60 6! m14_lowGClOT 1.0 '93 45 50 temp, IC 45 40 temp, *C 50 5 3 10 15 05 1 55 5 20 20 m4.1OT 25 25 temp, IC 30 30 T0 3 35' 4 4 4. m4.1_10T 1'T - 0.8 0.8 0.6 0.6 32 0.4 0.4 0.2 0.2 I I .'0 25 30 35 40 45 temp, IC 50 55 60 65 -m 30 30 40 35 35 40 temp, *C 53 1 .1 45 50 m3.1_OT m4.1_split_10T 1.0 1.. 0.8 0.8- 0.6 0.6- 0.4 0.4- 0.2 0.2 I 10 15 20 25 temp, IC 30 35 40 0.T 4! m3.1_10T Thy 0.8 0.6 0.4 0.2 I 25 30 35 e20 40 temp, TC T .I 45 50 55 54 35 40 45 temp, *C 50 55 ) 6