Appendix: Formula for the expected amino acids composition p = (pAla, pArg, ….., pterm) considering the single nucleotide deletion. We consider a synthesized DNA, which has 5’-constant region of n base long, a random sequence region of m base long and a 3’-constant region. We assume that the probability of single base deletion at any site i is a (independent of i). In practice, a is much less than unity, we neglect multiple deletions in the same strand. The amino acid composition vectors cf, d1, d2, d3, e1, e2, and e3 defined below can be calculated from the mixing ratio x =(x1, x2, x3,….,x12L) defined in Materials & Methods. cf : expected amino acid composition when no deletion occurs. An example of an element is given by Eq.(1) in the text. d1 : expected amino acid composition in the frame-shifted region. d2 : expected amino acid composition for codons with a deletion at the 2nd letter. d3 : same as above except at the 3rd letter e1 : expected amino acid composition for the last codon in the random region with a deletion at the 1st letter. e2 : same as above except at the 2nd letter e3 : same as above except at the 3rd letter We introduce following parameters: i Z 3 i n 1 Y 3 m R 3 where [s] denotes Gauss’s symbol, that is, the maximum integer less than s. Finally, we get the formula for the expected amino acid composition p as a function x, a, n, and m: Q(x, a, n, m) n m 3 i n 1 a(1 a) i 1 1 1 Yc f R Y 2d1 i 3Z 2i 3Z d1 i 3Z 1i 3Z d 2 i 3Z 1i 3Z 2d 3 R 1 2 2 p(x, a, n, m) 1 a n m 1 c f 1 1 a n 1 d 1 Q(x, a, n, m) a1 a n m4 e1 a1 a n m 3 e 2 a1 a (A1) As expected from the definition, when a is around 0.005 and m is around 50, we can neglect terms concerning d2, d3, e1, e2, and e3. We incorporated the effect of the single nucleotide deletion into the GA calculation by using Eq.(A1) instead of Eq.(1) in the text. n m2 e3