Avogadro Scale Engineering ‘COMPLEXITY’ Physics of Information Technology MIT – Spring 2006 PART II jacobson@media.mit.edu Homework • I] Nanotech Design: Find an error function for which it is optimal to divide a logic area A into more than one redundant sub-Areas. • II] Design Life: (a) Design a biological system which self replicates with error correction (either genome copy redundancy with majority voting or error correcting coding). Assume the copying of each nucleotide is consumptive of one unit of energy. Show the tradeoff between energy consumption and copy fidelity. (b) Comment on the choice biology has taken (64 -3 nucleotide) codons coding for 20 amino acids. Why has biology chosen this encoding? What metric does it optimize? Could one build a biological system with 256 – 4 bit codons? Questions: jacobson@media.mit.edu Scaling Properties of Redundant Logic (to first order) P A Probability of correct functionality = p[A] ~ e A (small A) Area = A P1 = p[A] = e A P2 = 2p[A/2](1-p[A/2])+p[A/2]2 Area = 2*A/2 = eA –(eA)2/4 Conclusion: P1 > P2 Designing Life Fault Tolerant Redundancy Error Correcting Fault Tolerant Other Coding (e.g. Parity) Error Correcting Designing Life I] Fault Tolerant Redundancy Gene1 Gene2 Gene3 Gene1 Gene2 Gene3 1. Replicate Linearly with Proofreading and Error Correction Fold to 3D Functionality Error Rate: 1: 106 100 Steps per second template dependant 5'-3' primer extension 3'-5' proofreading exonuclease Beese et al. (1993), Science, 260, 352-355. http://www.biochem.ucl.ac.uk/bsm/xtal/teach/repl/klenow.html 5'-3' error-correcting exonuclease MutS Repair System Approach 1b] Redundant Genomes Deinococcus radiodurans (3.2 Mb, 4-10 Copies of Genome ) [Nature Biotechnology 18, 85-90 (January 2000)] D. radiodurans: E. coli: Uniformed Services University of the Health 1.7 Million Rads (17kGy) – 200 DS breaks 25 Thousand Rads – 2 or 3 DS breaks http://www.ornl.gov/hgmis/publicat/microbial/image3.html Combining Error Correcting Polymerase and Error Correcting Codes One Can Replicate a Genome of Arbitrary Complexity N Basic Idea: M M strands of N Bases Result: By carrying out a consensus vote one requires only M ln N To replicate with error below some epsilon such that the global replication error is: PE M PE 30 25 20 15 10 100 200 300 400 N (Genome Length) 500 II] Coding mRNA Ribosome Amino Acid 4 Base Parity Genetic Code Let A=0, U,T=1, G=2, C=3 Use 3+1 base code XYZ Sum(X+Y+Z, mod 4) Leu: UUA -> UUAG http://schultz.scripps.edu/Research/UnnaturalAAIncorporation/research.html Error Correction in Biological Systems Fault Tolerant Translation Codes (Hecht): NTN encodes 5 different nonpolar residues (Met, Leu, Ile, Val and Phe) NAN encodes 6 different polar residues (Lys, His, Glu, Gln, Asp and Asn) Local Error Correction: Ribozyme: 1:103 Error Correcting Polymerase: 1:108 fidelity DNA Repair Systems: MutS System Recombination - retrieval - post replication repair Thymine Dimer bypass. Many others… E. Coli Retrieval system - Lewin Biology Employs Error Correcting Fabrication + Error Correcting Codes Physics of Information Technology MIT – Spring 2006 4/10 1] Von Neumann / McCullough/Winograd/Cowan Threshold Theorem and Fault Tolerant Chips 2] Simple Proofs in CMOS Scaling and Fault Tolerance 3] Fault Tolerant Self Replicating Systems 4] Fault Tolerant Codes in Biology 4/24 1]Introduction of the concept of Fabricational Complexity 2]Examples, numbers and mechanisms from native biology: error correcting polymerase and comparison to best current chemical synthesis using protection group (~feedforward) chemistry. 3]Examples from our error correcting de novo DNA synthesis (with hopefully a demo from our DNA synth simulator) 4]Error correcting chip synthesis 5]Saul's self replicating system with and without error correction Fabricational Complexity •Total Complexity •Complexity Per Unit Volume •Complexity Per Unit Time*Energy •Complexity Per unit Cost Ffab = ln (W) / [ a3 tfab Efab ] Ffab = ln (M)-1 / [ a3 tfab Efab ] Fabricational Complexity Total Complexity Accessible to a Fabrication Process with Error p per step and m types of parts: 70 FFAB p ln m n 60 n 50 40 30 n 1 20 10 p A 2 p 3 p G A G T A T A G T 200 C A C G T … A G C … Complexity Per Unit Cost: For given complexity n*: f FAB p ln m / C n* Where C is cost per step 400 600 800 1000 Fabricational Complexity Non Error Correcting: f FAB p ln m / C n* A G T C Triply Error Correcting: A G T C f FAB3 3 p (1 p) p 2 * 3 n 3 ln m / 3C A G T C A G T C 140 P = 0.9 120 f FAB3 f FAB 0.25 100 80 0.2 60 0.15 40 0.1 20 0.05 50 100 150 n 200 250 300 n = 300 P = 0.85 0.3 f FAB3 f FAB 3000 2500 2000 1500 1000 500 50 100 n 150 200 0.86 0.88 0.92 0.94 p 0.96 0.98 Resources for Exponential Scaling Resources which increase the complexity of a system exponentially with a linear addition of resources 1] Quantum Phase Space 2] Error Correcting Fabrication 3] Fault Tolerant Hardware Architectures 4] Fault Tolerant Software or Codes Fabricational Complexity Genome (Natural) Design Rule Smallest Dimension (microns) 0.0003 Number of Types of Elements 4 Area of SOA Artifact (Sq. Microns) NA Volume of SOA Artifact (Cubic Microns) 6.E+01 Number of Elements in SOA Artifact 3.E+09 Volume Per Element(Cubic Microns) 2.E-08 Fabrication Time(seconds) 4.E+03 Time Per Element (Seconds) 1.E-06 Fabrication Cost for SOA Artifact($) 1.E-07 Cost Per Element 3.E-17 Complexity 4.E+09 Complexity Per Unit Volume of SOA(um^3) 7.E+07 Complexity Per Unit Time 1.E+06 Complexity Per Unit Cost 4.E+16 Cost Per Area NA Gene Chip (Chemical SemiParallel conductor Synthesis) Chip 0.0003 4 7.E+08 5.E+06 7.E+04 8.E+01 2.E+04 3.E+02 1.E+02 2.E-03 9.E+04 2.E-02 6.E+00 9.E+02 2.E-07 0.1 8 7.E+10 7.E+09 7.E+12 1.E-03 9.E+04 1.E-08 1.E+02 2.E-11 2.E+13 2.E+03 2.E+08 1.E+11 2.E-09 High Speed Offset Web 10 6 2.E+12 2.E+12 2.E+10 1.E+02 1.E-01 7.E-12 1.E-01 6.E-12 4.E+10 2.E-02 3.E+11 3.E+11 6.E-14 TFT 2 8 1.E+12 1.E+11 3.E+11 4.E-01 7.E+02 2.E-09 2.E+03 6.E-09 6.E+11 5.E+00 9.E+08 3.E+08 2.E-09 Liquid DVD-6 Embossing 0.25 2 1.E+10 7.E+12 2.E+11 4.E+01 3 2.E-11 3.E-02 2.E-13 1.E+11 2.E-02 4.E+10 4.E+12 3.E-12 0.2 4 8.E+09 8.E+08 2.E+11 4.E-03 6.E+01 3.E-10 2.E-01 1.E-12 3.E+11 3.E+02 5.E+09 1.E+12 3.E-11 …Can we use this map as a guide towards future directions in fabrication? 1. Replicate Linearly with Proofreading and Error Correction Fold to 3D Functionality Error Rate: 1: 106 100 Steps per second template dependant 5'-3' primer extension 3'-5' proofreading exonuclease Beese et al. (1993), Science, 260, 352-355. http://www.biochem.ucl.ac.uk/bsm/xtal/teach/repl/klenow.html 5'-3' error-correcting exonuclease DNA Synthesis Caruthers Synthesis Error Rate: 1: 102 300 Seconds Per step http://www.med.upenn.edu/naf /services/catalog99.pdf Avogadro Scale Engineering Molecular Machine (Jacobson) Group – MIT - May, 2005 Gene Level Error Removal Nucleic Acids Research 2004 32(20):e162 Error Rate 1:104 In Vitro Error Correction Yields >10x Reduction in Errors Nucleic Acids Research 2004 32(20):e162 Error Reduction: GFP Gene synthesis Nucleic Acids Research 2004 32(20):e162 Autonomous self replicating machines from random building blocks HOMEWORK – DUE 5/1/06 1] Consider biological cells which are able to copy their genome using appropriate pieces of molecular machinery (e.g. polymerase). Assume that the total probability of correctly copying each nucleotide is p=.999 per nucleotide. Calculate the Total Fabrication Complexity accessible to this system assuming that there are 4 types of nucleotides (i.e. A,G,C,T). Now assume that we have created a new type of cell which has a genome possessing six different types of nucleotides (i.e. A,G,C,T,X,Y). If we assume that we wish to keep the total Fabricational Complexity the same what must the probability per nucleotide addition, p, now be? 2] Consider now the fabricational complexity per unit cost f. Calculate the threshold probability p for which it is advantageous to use a redundant error correction scheme (such as trible redundancy) and majority voting than no error correction. Into which regime does biology fall?