Supplementary Materials GENESUS: A two-step sequence design program for DNA nanostructure self-assembly Takanobu Tsutsumi1, Takeshi Asakawa1, Akemi Kanegami2, Takao Okada2, Tomoko Tahira1, and Kenshi Hayashi1 1 Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University. 2Research Institute of Biomolecule Metrology Co., Ltd. CONTENTS GUSP algorithm Seed pair tables Maximum number of USPs obtainable by GUSP Longest unique sequence pairs obtainable by modified GUSP CESS algorithm Design file DPAL_AB Output files Designing a regular octahedron Hinge length Assembly of T5-hinged octahedron Efficiency of assembly Design and assembly of octahedron multimers 1 Strand sequences References 2 GUSP algorithm In GUSP, specified number of unique segment pairs (USPs) having specified lengths are generated by extending non-redundant (r + 1)-mer seed pair tiles with r-mer overlaps, and evaluated by melting temperature as shown in Figure S1. Figure S1. Flowchart of GUSP “r-mer overlap” means the last r-mer of lastly picked seed pair matches the first r-mer of the next seed pair. Lastly picked seed pair is identified by referring to Picked Seed Pair List (PSPL). The overlap is any r-mer, when PSPL is empty. SPT: Seed pair table. Seed pair tables This file is generated using a program, Seed Maker that firstly makes a list of integers 0 to (4r+1 – 1) in quaternary numbers described in (r + 1) digits. Then the numbers are converted into the nucleotide sequences, that is, 0, 1, 2, and 3 are converted into A, C, G, and T, respectively. This makes an exhaustive list of non-redundant (r + 1) mers called seeds in alphabetical order. After removal of palindromic seeds (for odd-numbered r’s) or seeds containing simple repeats (homo- or co-polymeric tetranucleotides), the table is arranged so that each of the seeds and their complements are placed at the same relative positions in the top and bottom half of the table. This makes a seed pair table. Each seed has usage status values 0 for unused and 1 for used. Seed pair tables of r = 4 to 7, and the source code for the program Seed Maker are available at GENESUS site (http://crane.gen.kyushu-u.ac.jp/genesus/). Maximum number of USPs obtainable by GUSP 3 The sizes of USP sets were determined by 1,000 GUSP trials in which the program was run without Tm range restriction until no more USPs were collectable, for l from 16 to 25, and r from 4 to 7. The means of generated USPs reached to 60 - 70 %, of the upper limit with narrow distribution (Table S1). Without retry step (see Figure S1) at the extension (i.e., sudden death method), obtainable USPs are reduced to less than half (data not shown). 4 Table S1. Size of USP sets obtainable by GUSP l 16 17 18 19 20 21 22 23 24 25 r=4 Limit 42 39 36 34 32 30 28 26 25 24 Mean 29.6 27.3 25.3 23.6 22.1 20.8 19.6 18.5 17.6 16.7 cv 0.028 0.028 0.029 0.029 0.030 0.029 0.031 0.032 0.032 0.033 Max 32 30 28 25 24 23 21 20 19 18 Rate 0.70 0.70 0.70 0.69 0.69 0.69 0.70 0.71 0.70 0.70 r=5 Limit 186 170 157 146 136 128 120 113 107 102 Mean 121.8 111.7 103.1 95.7 89.5 83.9 78.9 74.6 70.8 67.3 cv 0.014 0.015 0.015 0.016 0.015 0.015 0.016 0.016 0.016 0.016 Max 127 116 107 100 93 87 83 78 74 70 Rate 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 Limit 819 744 682 630 585 546 512 481 455 431 Mean 512.4 465.4 426.1 393.1 364.9 340.6 319.1 300.3 283.4 268.3 cv 0.009 0.009 0.010 0.011 0.011 0.011 0.012 0.012 0.013 0.014 Max 526 479 437 403 375 351 328 310 292 276 Rate 0.63 0.63 0.63 0.62 0.62 0.62 0.62 0.62 0.62 0.62 Limit 3640 3276 2978 2730 2520 2340 2184 2048 1927 1820 Mean 2138.4 1920.0 1742.5 1595.0 1470.1 1363.7 1270.9 1189.0 1117.9 1054.6 cv 0.006 0.007 0.007 0.008 0.008 0.010 0.010 0.011 0.012 0.012 Max 2171 1957 1778 1627 1500 1393 1301 1218 1149 1082 Rate 0.59 0.59 0.59 0.58 0.58 0.58 0.58 0.58 0.58 0.58 r=6 r=7 1,000 USP sets were generated by GUSP (l from 16 to 25, r from 4 to 7) without Tm range restriction. Limit is USP counts if all seed pairs are used, calculated as 4(r + 1)/2(l - r). Max means the largest number of obtained USPs. Rate is Mean divided by Limit. Longest unique sequence pairs obtainable by modified GUSP In the scaffold/staple design of DNA nanostructures without insertion of special sequences such as hinges at the junctions, simply a long stretch of unique sequence without sequence redundancy/symmetry is needed. We modified GUSP so that when extension is stopped because of seed pair exhaustion, the extension is retried once by reversing the extension and resuming the extension using the alternative choice. After a thousand trials for each of 5, 6 and 7 for r’s, unique sequence pairs without redundancy nor symmetry of average lengths 1,310 bp, 4,333 bp and 14,700 bp with the restriction of GC content of 0 to 100 % at sliding window of 10 base, and 1,027 bp, 3,464 bp and 11,060 bp with the restriction of GC content of 40 to 60% at sliding window of 10 base, respectively, are obtained, again with narrow size distributions (Figure S2, Table S2). The longest unique sequence at r = 7 was 17,999 bp (“Strand sequence”), whose most stable local base pairing is -13.1 kcal/mol, as estimated using Mfold (1). The longest redundancy is 7 bases, which is much shorter than that of M13mp18, 42 bases. 5 Figure S2. Length distribution of single USP The longest USPs in sudden death mode (SD mode) with the restriction of GC content of 0 to 100 % (green) and 40 to 60 % (red) at sliding windows of 10 base and alternative choice mode (AC mode) with the restriction of GC content of 0 to 100 % (yellow) and 40 to 60 % (blue) at sliding windows of 10 base are shown for r = 5 to 7 (a to c, respectively). Summary of these trials is presented in Table S2. 6 Table S2. Size of unique sequence pairs obtainable by GUSP r GC content (%) 5 6 7 0-100 40-60 0-100 40-60 0-100 40-60 Limit 2053 2053 8198 8198 32775 32775 Max length (bp) 1510 1206 5177 4112 17999 1310.2 1026.6 4333.2 3463.7 14698.9 11059.8 0.064 0.078 0.092 0.090 0.094 0.106 63.8 50.0 52.9 42.3 44.8 33.7 Mean (bp) cv Rate (%) 13402 Limit is the USP length if all seed pairs are used. Maximum length of USP obtainable by GUSP, after 1,000 trials of segment pair extension without limitations by l. r means maximum length allowed for redundancy. Segment pair finding is retried once, when the extension is stopped by segment pair exhaustion. Rate is Mean divided by Limit. CESS algorithm In CESS, USPs are allocated to helices, and a set of candidate strand sequences is produced by linking USPs (generated at GUSP) according to strand definition (see “Design file”, Figure S4). The strand set is evaluated by its worst aberrant pairings using a dynamic programming algorithm, “DPAL_AB”, that calculates lowest free energy of aberrant pairing (Gab,min) in all combinations of strands within the set (Figure S3). Then, CESS picks the strand set for which Gab,min is the highest among examined strand sets. 7 Figure S3. Flow chart of CESS Gab,min is the free energy of most stable aberrant pairing. DPAL_AB is a dynamic programming algorithm modified to find Gab,min (see “DPAL_AB”). Gab,min,best is Gab,min in the best strand set. Design file Helices are numbered, and strand segments that constitute helices are named by integers or primed integers that indicate helix numbers they belong. The design file consists of two parts, helix definition and strand definition. Helix definition defines properties of each helix group (HG) that are numbered. The properties of the group are its helix length (HL) in integer, lowest and highest Tm (TL and TH) in rational numbers and group member (GM) denoted by helix numbers in integer. An example of making design file starting from a 3D sketch is shown in Figure S4. Helix definition of the structure shown in this figure is, HG1=HL21/TL65/TH72/GM1-12 This helix definition means “helix group 1 has helix length 21 bp, lower limit of Tm is 65 °C, upper limit of Tm is 72 °C, and the member of this group is helices 1 to 12.” Note that Tm values are calculated by the nearest neighbor method under arbitrary conditions (i.e., 50 mM Na+ and 50 nM DNA) (2), just as indicators of relative stabilities between chosen sequences, and likely to be different from the values when actual assembly is carried out. Strand definition describes strand sequence information by segment names and junction sequences that connect segments. If a strand segment is split into two strands, the 5’-half is denoted by the segment name followed by prefix “b” and the length in bases, while the 3’-half is segment name followed by prefix “f” and the length in bases. Junctions are indicated by their actual sequences, where homopolymer stretch is denoted by A, C, G or T, followed by the length of the stretch. Segments and junctions are delimited by colons. The definition of one of the staple strand named S1 drawn in green in Figure S4 is; S1=12'f11:T5:2':T3:4':T5:6'b10 This strand definition means “Strand S1 consists of 3’-terminal 11 bases of complementary strand of helix 12, connected to T5, connected to complementary strand of helix 2, connected to T5, connected to complementary strand of helix 4, connected to T5, connected to 5’-terminal 10 bases of complementary strand of helix 6.” 8 Figure S4. Scaffold/staple design of an octahedron and definition of strands Scaffold strand (red) and staple strands (black and green) that form a regular octahedron with the helices numbered are shown in 3D sketch (a) and in unfolded diagram (b). In this example, the chirality of octahedron in a is “convex” folding of b. DPAL_AB Coding of DPAL_AB was done by referring to Primer3 (2). The free energy of the most stable aberrant pairing (Gab) between two strands (strand 1 and strand 2) is calculated as follows 1) Gab (i,j) is the free energy of the secondary structure formed by aberrant binding of subsequences of strand 1 (0 ≤ i < N1) and strand 2 (0 ≤ j < N2), considering up to one base gap and mismatch, where N1 and N2 are length of strands 1 and 2, respectively. 2) Gab (k,l)(m,n) is the free energy of binding of the k-th and m-th bases of the strand 1 with the l-th and nth bases of the strand 2, respectively. Gab (k,l)(m,n) is obtained from nearest neighbor table of SantaLucia et al (3). 3) If the pairing of i-th and j-th bases is in the intended pairings, the calculation of Gab (i, j) is skipped. 4) If the (i-1)-th base of strand 1 matches with the (j-1)-th base of strand 2, then, ΔGab (i-1, j-1) + ΔGab (i-1, j-1)(i, j) Gab (i, j) = minΔGab (i-1, j-2) + ΔGab (i-1, j-2)(i, j) ΔGab (i-2, j-1) + ΔGab (i-2, j-1)(i, j) [1] If the (i-1)-th base of strand 1 mismatches with the (j-1)-th base of strand 2, then, ΔGab (i-2, j-2) + ΔGab (i-2, j-2)(i, j) Gab (i, j) = minΔGab (i-1, j-2) + ΔGab (i-1, j-2)(i, j) ΔGab (i-2, j-1) + ΔGab (i-2, j-1)(i, j) 5) [2] Gab = min(Gab (i, j)) Output files The output of GENESUS consists of four files. Those are; strand sequence file, usage status information file, USP set file and history file for the chosen strand set. In strand sequence file, helix definitions followed by strand name and sequences are described. History file contains project name, the name of previous project if any, the name of adopted seed pair list, name of design file, free energy of the worst aberrant pairing of chosen strand set (Gab,min,best) and the time stamp at the completion of the project. Designing a regular octahedron The regular octahedron designed here was made of a long scaffold strand that passes through all edges of the structure (Eulerian circuit) and four staple strands that hold the structure (Figure S4). We chose a regular octahedron, because it is hardly collapsed since its faces are regular triangles and all branch-angles are fixed. Octahedron can be drawn in many other paths of Eulerian circuit. In this study we chose the path that is 9 simple to draw on paper. The length of all edges (helices) was 21 bps, and 5 nucleotides hinges were inserted at all junctions for bridging the longest positions in two contacting helices (Figure S4 and S5, detailed below in “Hinge length”). T-stretches were used as hinges, because T is the smallest among the four bases, and less likely than other bases to interfere assembly process. Also, AT pair is less stable than GC pair, and is less likely to be involved in fortuitous pairings. Homopolymeric stretches can act as sequence punctuations since they are excluded from seed pair tables that are used in GUSP. Each staple is designed to have two full segments and two half segments at both ends, connected by hinges. Because staples have only two full-length segments, topological problem at assembly is avoided. Hinge length Helix junction can be seen as two cylinders contacted at point C (Figure S5). Hinge length is approximated from the distance (d) between two nucleotides that are located at exit and entrance points of the two helices (arrow in the figure). The length in nm of hinge was estimated by the following equation. d = r{[(1 - cos)sin]2 + [cos + (1 - cos)cos - 1]2 + (sin – sin)2}1/2 [3] Here, r is the radius of helix in nanometer (= 1 for DNA double helix), and are rotational angles of exit and entrance points of helices measured from contact point C, and is the angle between the helices (Figure S5). The nucleotide number required for hinge is approximated by the round up of d/0.7, assuming = 2/3 for octahedron. A hinge of five nucleotides is long enough to bridge the two contacting helices in the octahedron at any rotational angles. Figure S5. Helix rotation and hinge length The distance (d) between two nucleotides at exit and entrance points of the two helices (arrow in the figure) is determined by the rotational angles ( and ) of the two helices and the angle () between terminal surfaces of the two helices. Assembly of T5-hinged octahedron A total of 619 base oligonucleotides were designed using GENESUS (sequences in “Strand sequences”). Strands were prepared essentially as described in “Strand preparations”. Assembly conditions were established by annealing at various temperatures (Figure S6a). Optimal assembly was observed at approximately 60 °C, where a single product band was seen, while the bands of dissociated scaffold or staples were absent. The product of assembly where some staple strands were omitted showed bands of 10 mobility distinct from that of the complete assembly (Figure S6b), confirming that the product of complete strand mixture was composed of all strands. Figure S6. Optimization and confirmation of T5-octahedron (T5-OCT) assembly In a, Strand mixtures were annealed at various temperature. Lane 1: mixture of staple strands only; lane 2, scaffold strand only. Lanes 3 to 7: Strands mixtures were annealed for 1 h at 30 °C, 43 °C, 52.5 °C, 62 °C and 70 °C, respectively. Lane M: 100 bp ladder marker. In b, lane 8 and lane 9 are annealed products of four staples only, and scaffold only. Scaffold strand was annealed at 62 °C for 1 h with various one staple, two staples, three staples, and four staples for lanes 10 to 13, respectively. Lane M: 100 bp ladder marker. Efficiency of assembly The efficiency of assembly of an octahedron with T5-hinges was estimated by densitometric analysis of electrophoretic bands stained by SYBR Gold. Fluorescence intensity of bands can vary depending on various experimental factors such as staining conditions. So, quantitative interpretation of bands requires calibrations using internal references. Fluorescence intensity coefficient of a structure (Fstr) is defined as follows. Fstr = Astr / Mstr [4] Here, Mstr and Astr are mass in ng and peak area in the scan in arbitrary unit, respectively, for a structure. The structure consists of double-stranded helices and single-stranded junctions. Then, Fstr = Fds × fds + Fss × fss [5] Here, Fds and Fss are fluorescence coefficient of double-strand and single-strand, respectively, and fds and fss are fractions of double-strand and single-strand in the structure, respectively. This estimation assumes that both Fds and Fss are sequence-independent, and have the same values regardless of whether they are in free solutions or integrated into particular structures. Such assumptions are likely to be untrue, but unlikely to be widely deviated (4). For the octahedron studied here, fds is 0.8 and fss is 0.2. By determining Fds and Fss, the mass of octahedron in the gel can be estimated from peak area of octahedron in the scan. 11 Figure S7 shows an example of gel electrophoresis used for such quantifications. The assembly was carried out at various molar ratios of scaffold to staples, and products were electrophoresed together with internal standards of single-strand (mixtures of staples at various amount) and double-strand (bands of molecular mass markers at various amounts). Fss was determined from scans of lanes 1 to 4 (mixture of staples), and Fds was determined from lanes M1 to M4 (ds-DNA ladder marker) (example in Figure S8). By five independent trials of assembly, electrophoresis and scan, the ratio Fds / Fss of 1.85 ± 0.32 was obtained, in agreement with reported value (4). The three bands in lanes 10 to 13 were assumed to be the bands of unbound staples, completed octahedron and faint band of defective octahedron that decreased in the presence of excess staples (from bottom to top). Figure S7. Quantitative analysis of assembly Lanes 1 to 4: Mixture of staples at 4 ng, 8 ng, 16 ng and 32 ng, respectively, lanes 5: Scaffold only, lane 6: Scaffold and one staple, lane 7: Scaffold and two staples, lane 8: Scaffold and three staples, lane 9: Scaffold and different three staples, lane 10: Scaffold with all four staples. Lanes 5 to 10 are mixtures of strands at equimolar ratio. Lanes 11 to 13: Scaffold with all four staples but with 2-fold excess, 4-fold excess 8-fold excess of staples over scaffold, respectively. Lanes M1 to M4: ds-DNA ladder marker at 5 ng, 12.5 ng, 20 ng, and 40 ng, respectively. 12 Figure S8. Estimation of fluorescence coefficients Relationship between fluorescence intensities of bands (ordinate) and their masses (abscissa) for ss-DNA (a) and 300 bp ds-DNA (b) are shown. Using values of Fds and Fss of each trial, the yields of assembly products were estimated (Table S3). The Table shows that 90% to 95% of input strands were accounted for. Table S3. Efficiency of octahedron assembly Scaffold : Staple Octahedron (ng) 1:1 1:2 1:4 1:8 6.13±1.21 7.39±0.19 7.76±0.77 7.58±1.39 - 2.67±0.68 10.34±0.98 24.29±2.78 1.42±0.19 1.34±0.17 1.05±0.14 0.86±0.10 Total observed (ng) 7.55 11.4 19.15 32.73 Input (ng) 8.20 12.30 20.50 36.90 Explained (%) 92.07 92.68 93.41 88.70 Efficiency (%) 74.8 90.2 94.6 92.5 Staples remained (ng) Defective dimer (ng) Efficiency of octahedron assembly was estimated by densitometric analysis of scanned images of gels as exemplified in Figures S7 and S8. Amounts of octahedron, staples remained and defective dimer were average of five independent determinations by densitometry. Total observed means total mass actually observed in each lane, that is, summation of octahedron, staples remained and defective dimer. Input is expected mass loaded to each lane. Explained is the percentage of total observed to input. Efficiency is the percentage of octahedron actually made to input. Design and assembly of octahedron multimers We designed three additional T5-hinged octahedrons and four sets of connector strands (Figure S9). See text for the design of connector strand sets. Strands of combined length 2,481 bases were designed using GENESUS employing seed pair table for r = 6 (sequences in “Strand sequences”), and prepared as described in “Strand preparation”. Assemblies and verifications of these structures were carried out as described in the legend to Figure 3. 13 Figure S9. Design of octahedron multimers Six structures were designed and named as shown. Scaffold strands are in red. Staple strands are in black. In multimers, connector strand sets are in various other colors. All segment-junctions carried T5 as hinges (not shown in this figure). 14 Strand sequences [Sequences used in PCR primers are underlined. Segments that are exchanged are in bold.] <T5-hinged octahedrons> Strand set of octahedron #1 Scaffold strand L-1: ATACATCCCTTTATGCTCTTGTTTTTCAATTACATGGGAGTGAGATGTTTTTCGTTTCTGGGCGGCATACCTCTTTTTACGGGACAACATAAATATCGATTTTTATCCGCG TAATGTCGTTTGTCTTTTTGGCTCACCTTATACGGAGTCCTTTTTTCATCCTAATTTCCAGCTCAATTTTTGGATAGCACGGAATCTCGTGGTTTTTCTGCACAGACTCCT TTGCCCGTTTTTCGGACATCTACTGATTTGATGTTTTTCATGTTCGGTTGATGCTAGTTTTTTTGCAGCACGTTTAGCTCTAAGC Staple strands S-1-1: AAACGTGCTGCTTTTTCATCTCACTCCCATGTAATTGTTTTTTCGATATTTATGTTGTCCCGTTTTTTGGACTCCGTA S-1-2: TAAGGTGAGCCTTTTTCCACGAGATTCCGTGCTATCCTTTTTCATCAAATCAGTAGATGTCCGTTTTTGCTTAGAGCT S-1-3: AAGGGATGTATTTTTTCGGGCAAAGGAGTCTGTGCAGTTTTTGACAAACGACATTACGCGGATTTTTTCAAGAGCATA S-1-4: AATTAGGATGATTTTTGAGGTATGCCGCCCAGAAACGTTTTTAACTAGCATCAACCGAACATGTTTTTTTGAGCTGGA Strand set of octahedron #2 Scaffold strand L-2: TTTACTGGCGATCGAAGTGTCTTTTTTCTTCTCAGTGGCAGTAAGAGTTTTTCTCCGAAGTAACGGGAGCATCTTTTTAGAACACCAAAGTTTCATGCTTTTTTTTCAGGT CTTTGTATCCCGTGTTTTTGCCTATATCGTTAGCGGGTATTTTTTGGGAAGACCGACAAGGGACCGTTTTTGATCTGTTCATGTCCTTTCTCTTTTTAGTATGTGCATCTT TCCCGCGTTTTTGCCGGGAGATGATCTTACCTCTTTTTATCCAGATCTTGTTAGTGTTGTTTTTCCTGTGATGTCGATTTAACGC Staple strands S-2-1: GACATCACAGGTTTTTCTCTTACTGCCACTGAGAAGATTTTTAGCATGAAACTTTGGTGTTCTTTTTTATACCCGCTA S-2-2: ACGATATAGGCTTTTTGAGAAAGGACATGAACAGATCTTTTTGAGGTAAGATCATCTCCCGGCTTTTTGCGTTAAATC S-2-3: TCGCCAGTAAATTTTTCGCGGGAAAGATGCACATACTTTTTTCACGGGATACAAAGACCTGAATTTTTGACACTTCGA S-2-4: TCGGTCTTCCCTTTTTGATGCTCCCGTTACTTCGGAGTTTTTCAACACTAACAAGATCTGGATTTTTTCGGTCCCTTG Strand set of octahedron #3 Scaffold strand L-3: TCTAGCGCTCTAGATATGGGCTTTTTTTCTGCTTATGGGTATATGTTTTTTTGCCCATCGGGATGCCTTCAGGTTTTTCCACCAAAGCACACTGACCGGTTTTTGCACCGC TTAGTTCTCCCGAGTTTTTGAAGTGCAGCGACTCTTGCCCTTTTTCTATATTCATCGCCTGCACAGTTTTTCCCAGGCGGCTTATAGGCATGTTTTTCCAATGCAAAGTCT TCTCGTTTTTTTGAACCAAGAGATGGGACTTATTTTTTGAGAAAGACTAGTAATTGCCCTTTTTCCTGACTCGCAGTTAGAGTCT Staple strands S-3-1: TGCGAGTCAGGTTTTTAACATATACCCATAAGCAGAATTTTTCCGGTCAGTGTGCTTTGGTGGTTTTTGGGCAAGAGT S-3-2: CGCTGCACTTCTTTTTCATGCCTATAAGCCGCCTGGGTTTTTATAAGTCCCATCTCTTGGTTCTTTTTAGACTCTAAC S-3-3: AGAGCGCTAGATTTTTAACGAGAAGACTTTGCATTGGTTTTTCTCGGGAGAACTAAGCGGTGCTTTTTGCCCATATCT S-3-4: GATGAATATAGTTTTTCCTGAAGGCATCCCGATGGGCTTTTTGGGCAATTACTAGTCTTTCTCTTTTTCTGTGCAGGC Strand set of octahedron #4 Scaffold strand 15 L-4: AGGCCAATTAGCTCCTGTCACTTTTTGCGGCGACATAGAACGAAGTATTTTTACCAAGCTGATGATTAGTAGCTTTTTAATGATACTTATTAGCGCTTTTTTTTCGACCTA CTTTCTAGCCCGAGTTTTTCGCTTCTCCGTAGAGTTTGAGTTTTTGTTGAGGCTTGCATGCTAGTATTTTTGAAGATCTGTTTGGCTGCCTGTTTTTACCCGATAGGTTGT TTCGCTCTTTTTAGCGTTTGATGTTAATGCTCGTTTTTAACTCCACGCGAAAGGGATAGTTTTTGAGACTCCTCATACGTCCCTG Staple strands S-4-1: TGAGGAGTCTCTTTTTTACTTCGTTCTATGTCGCCGCTTTTTAAAGCGCTAATAAGTATCATTTTTTTCTCAAACTCT S-4-2: ACGGAGAAGCGTTTTTCAGGCAGCCAAACAGATCTTCTTTTTCGAGCATTAACATCAAACGCTTTTTTCAGGGACGTA S-4-3: CTAATTGGCCTTTTTTGAGCGAAACAACCTATCGGGTTTTTTCTCGGGCTAGAAAGTAGGTCGTTTTTGTGACAGGAG S-4-4: CAAGCCTCAACTTTTTGCTACTAATCATCAGCTTGGTTTTTTCTATCCCTTTCGCGTGGAGTTTTTTTTACTAGCATG Connector strands Connector strand set A S-1-4-a: AATTAGGATGATTTTTGAGGTATGCCGCCCAGAAACGTTTTTCACGGGATACAAAGACCTGAATTTTTTTGAGCTGGA S-2-3-a: TCGCCAGTAAATTTTTCGCGGGAAAGATGCACATACTTTTTTAACTAGCATCAACCGAACATGTTTTTGACACTTCGA Connector strand set B S-2-4-b: TCGGTCTTCCCTTTTTGATGCTCCCGTTACTTCGGAGTTTTTCTCGGGAGAACTAAGCGGTGCTTTTTCGGTCCCTTG S-3-3-b: AGAGCGCTAGATTTTTAACGAGAAGACTTTGCATTGGTTTTTCAACACTAACAAGATCTGGATTTTTTGCCCATATCT Connector strand set C S-3-4-c: GATGAATATAGTTTTTCCTGAAGGCATCCCGATGGGCTTTTTCTCGGGCTAGAAAGTAGGTCGTTTTTCTGTGCAGGC S-4-3-c: CTAATTGGCCTTTTTTGAGCGAAACAACCTATCGGGTTTTTTGGGCAATTACTAGTCTTTCTCTTTTTGTGACAGGAG Connector strand set D S-3-2-d: CGCTGCACTTCTTTTTCTCGGGCTAGAAAGTAGGTCGTTTTTATAAGTCCCATCTCTTGGTTCTTTTTAGACTCTAAC S-4-3-d: CTAATTGGCCTTTTTTGAGCGAAACAACCTATCGGGTTTTTTCATGCCTATAAGCCGCCTGGGTTTTTGTGACAGGAG 16 Strands required for multimer constructions (T5-OCT)2 : L-1, L-2, S-1-1, S-1-2, S-1-3, S-1-4-a, S-2-1, S-2-2, S-2-3-a and S-2-4. (T5-OCT)3-I : L-2, L-3, L-4, S-2-1, S-2-2, S-2-3, S-2-4-b, S-3-1, S-3-2, S-3-3-b, S-3-4-c, S-4-1, S-4-2, S-43-c and S-4-4. (T5-OCT)3-L : L-2, L-3, L-4, S-2-1, S-2-2, S-2-3, S-2-4-b, S-3-1, S-3-2-d, S-3-3-b, S-3-4, S-4-1, S-4-2, S-43-d and S-4-4. (T5-OCT)4-I : L-1, L-2, L-3, L-4, S-1-1, S-1-2, S-1-3, S-1-4-a, S-2-1, S-2-2, S-2-3-a, S-2-4-b, S-3-1, S-3-2, S-3-3-b, S-3-4-c, S-4-1, S-4-2, S-4-3-c and S-4-4. (T5-OCT)4-L : L-1, L-2, L-3, L-4, S-1-1, S-1-2, S-1-3, S-1-4-a, S-2-1, S-2-2, S-2-3-a, S-2-4-b, S-3-1, S-3-2d, S-3-3-b, S-3-4, S-4-1, S-4-2, S-4-3-d and S-4-4. <Longest USPs> r = 5 (with the restriction of GC content of 0 to 100% at sliding windows of 10 bases) 1,510 bases CTAAACAACTGGACTCCGGCCCGGAATAGCTCGTGGGTGCCAGGCGATCACGCAAGACCAGCGTTTCTTCAACATTGGGTAAATAGGCAAATCCTACCTTCGAGCAAACATGAT TTCTACCCGAAATGCGAACCCAGGTGGCTTTGGGAGTACCTATCGTCTGTCTTCCCGGTGAATTGCTTAGCGGGCTGAAGGCCATCTGTTGTATGCTCATAAGTTTCCTCAACC TCGCTTCTAAGAATAAGGTCTATTATGTTCATGCCGGGTTTCGTCGATGTAGCAAGTCGCCGGTTAAAGGAGCTTACGATTTATCAGTCGGGCAGAAAGAAGATTCCTGTAAAC CTTGAATCGAACAGTCTTTGACTAACCTGACGCTAGGGATACAGCCTGTCCGATTATCGCAAAGTTAGACGCACTTTACGTCTTAACGACCTGTTTCAATGCAAGCATGGGATC GAGGCGTGGCAGTAAGGGCCAATACCGGATTCTATGCACTGGCTGGCGAAAGTAATCAATTTAAGTGAAATACATTAGGTCATACGACTAGATCGTTGCGGTCCCAGAACGCAG GCCGTACATGCGGCCAGACTCGTTTGTTATGGGTCACTGAGGGCGTCCGTTGGAGTGCTATCCTCCTCGGATCAGAAGGGAATTTGGTCGGTCTGACCACTGTCATTGCCCTGC ACCGCCGTTTAATTGTAATGGGCGGATAACGTGACATTTAGGGCTCCAGATTAGTAGCGATGGTGAGGCCCAAGCCGTCGCTGATGATGCGTATTACCGCTGCAACAATGGTAG TGAGTCAGCAGCATCGGGAGGGACGGTCAATCCCGCCAAGGGTACGAACTATTGAGTAACGGTGGTTTGAGCCGCTCCCTGGTAACCATAGTGCCGAGTTCTAGCCTCCGTAGG TTCTTGTTCCCACTCGACCCTCGTAGTCCGCTTTCAGGCATCCGTCAACGCCACTAGCAGGTATGGAAAGGTGCGACAAATAACCCTGAGTTTGGAATGAATGGATGACTTAGG CGGCAACTAAGGACCCGCACGCTTGTCCATGAGTGATACCAACCAGTCACCCTACAAAGGCTTATCTTGGATTAACATCTTTAGCCGATACTGGGCTACGCCGAAGCTAACAGC AACCGAGCGTAACTGAATACTACTGTACCGAACGTATCTGCATCTACTCGCAGTCCTATGTAAGTACGTGGTGGGCATTAAGGCACGTCGTCCTGGATCTTACCTCATCGTGCT TCCTAGACTATCAACTTCCAGCTCACGACAGCTACCGTGCAATAATGATCTGAACTTGGTTCAAATGGCATAAAGCTGAGCGAAGTCCCGTGAACGGGCGAGCCTACTTCATAG CCATGTCTACAGAATTACGGGTGGACCTCCATCCTGCCGTGGAACTGCTCCGCAGACAACGTCCAATCAGGAGTTGGCAAGGAACGAATGCTAATAAACTAGGTGATTCGACAT ACTTGTACTGACAATCTATCTAATTTCC r = 5 (with the restriction of GC content of 40 to 60% at sliding windows of 10 bases) 1,206 bases GGTAGCAATCTGCCTACATGGAAGCAACCTGGTCGAATCCTATGGGTGACTATCACCGGAACTTGTACGCTTCAAAGACGCCATTTGTAGCGGGTTAGAAGTGCCGTACTGATC CTCCAACGACCTCCTAACTCGTCAGATCATCGGTAAGTCGAGCAAGATGGCTGAGTAAAGCGAGGTTTCTAGGACCCTCAAACCAGTGGCTTAGGCTGGACTAGCATGGCCTTA GCTGCTGATAAGTGGAGTATTGGACCGAACAAGGGCCTATTGCCCGGATAACATGCGAACTATGAGCAGAATTGACCTGAATAGTGGTGGCAAGTGAGTTCTGAGGACTGGAAT GGACGGATGTCGATTGGTGAACGCAAGGCTTGACTCGCCTAGTTAAGCGGAAGGAAACGTGCGATTAGGGATGGTATCGTTACCCGTCTAAGATCCACGTAACCAAGCTCGTTT AAGGAGCGGTATTTCCGTGACGTTGAACATCGCTCAATAAGGGTGGGTTTAGTAGGGCGATCAACTCCCACTCATTACGGTTCTATGCACTCCTGTAGAATCGGAATCACGACG TAGCTTTCGCCACGATTTAGGTGCTCCATACCTCATCTATCGGGACATTAACCGGGTCTTACTTCGGACGAAACTACTCGGTGCAACATTGTCCACCGTTGTCAGCGTCATAAT GCGTGGTAATTCCCTGGGAAAGATACGGGAGCTATGTACCGTGCTGTTACAACCCTATCCAAACGCTAACAAAGGGACTCAAGGTCCCAATTAGTGCGTCCTTTCACTGCGACA GATGCTGCCAGAAGCTGTCCTGATGACCATTGGCTCCGTTAGTTGCGGATCGAAATGCAGTCGCTAGATCGGCTTCGTAAGCATTGCAGGGTACTAGACAGGCATAAGAAGGCG TGAATGCCCTCGATAATCCGACAAGCACTGAGCTGAACTGAAGTCCGTAGGAAGACTGACCGTCAAATCCAGACCTACCAGCAAATAGCCCTGTTGACATGACAGTAGACTACC TGCTTTGCCTCGTAGATAGCAGTTGGCACGAATTTGGGTCACTACGCAGGATACAGTCAGGCTAGGTACGTTATGGTGCCCATTAGCCAGCGAATAACTGGGCGTATGATGGGA TTGATGCCGCATACTGGCCGAATGACTTGGAACCACCTTACAGACGACTAAGTAGCCGTTCTTGGG r = 6 (with the restriction of GC content of 0 to 100% at sliding windows of 10 bases) 5,177 bases CACGTGCGTTTATTCGTGCCTTTGAAACCGTGCTCAGGCACTGCATTTGAGTCCTCCGTTTAGTCGGACCTTCAAGCTGGAGCCGTTCCCAGGGACCCAGGCGGCGATCATTCA CTTTAGTTTCGCCGTTTGTCTACCTCGTCTTCCACGAAACCTGCACCTTGGAAACCCATAACTTTCTTACTAACGAACTTTGGCTCCTGTAGAAGTTGGAGTGGCAGATTTAGT ACCCGCCCATGGTAGTCCCTAAGTGCGGAAGATTAATACTGAACGTAGGGTAACGCAGATGGTACTTGGACTCGAACAGGGTCCAGATTCATGCAGGCGTTCAACAGGCTCAAT CAGCTCGAGTTGGTAGCAGAAGCAAATAAATCCTTCTAAACTTTATTAAGATGGCATCAACATGCTAGCGACATGACCTAACCTTGTTCCACCCGATAGTTGCCTTGTCTGACA AATGACCGGGAGCTCGTACGTGAGGCCTTCGCAACAGAAGTAAACGACGTGGTGACGGTATTCTGAACATAGACAGTATGCCTTCCGGCACGCATTTCCTCCCACCACCCTGAG GGAGTGCTACTTCACGCCGAGTAACCACCAGCGATGAAACTCCCATCAGGAGTGATACCGCACCGAATCACTTCAGTCCGAACGACCTCACCAGGCAGTGACCTTGCTAGTTGG GTGGCATAAGCTCACTATGATCTGTCGCATTCCCGGGTTGGGACCAAATACCCTCACTCGACTGGATCTGGGCCAGGTCCACCTAGTTTAATGGCGGGAAGGACCCGCTATTTG ATGGAGGTATGCTCACGGTGCCTCCCTGATCAAACAGGTACCCTGGCAGGATCCGTATTGAGTAGTATCTTGCAATGTTGCTACGAATGTAGGTTATCCTTGGGTAGCTCAAAC CACTTTGTCGCCTCCGGACCGGACATGGTCCGTAGACGTACCTCAAGCGGGATGGGTACAATTAGTGCACCCAGACTCGCCTGATAGACGCCTACGCCCACTGCTGCCAATAGC CATACGAGTCGTAGGTGAGCCGGCAATGCGGCCTTATCCAACCGCATAAAGACGTGCTTTGTTGGGCCTCCTTATTCAGTTCATCCAGACCAGACAGGTCAGAAGACAACCCAC GAGGCCGGTAGAATCCCTGCCCTATCTAGTAACTGGGCGATTCGATTCCGTACTGTTTAAAGATTGGTCTTCGGTTTATCGCAGCAAACCTATTCAATTTGCGTATCGTCCCAG CTGTTTCCTTAAACGTCTGGCTTGTTATGCTGTTCATGGACGCAACCCTGCGTCAAGCCTGCTTCCTGAAATCTTGGTGATAACAGGACTTTAAGCAGCGTCACGTTTCAGATT 17 AGTCATCTTGATACTCATTAGGCGGTCACGCTCGGACGTCGCCGGAGTTCAGGTTGTACAGGCCAGTCCCATGCTTTAGATCCGGGCTAAGCCACGTTCCGTCAATTCTTGAAC AACATACGGCCGAACATGTCTTTCGCTCATGCTCGCCCTAATCCTCGATGGTGAAATGAGCTGTCTTACATCCCATACTCCTACGGCTGGTTTAACAGCGTAAAGTACGAACGG TAACCTCCCGTTATGATTCCCATTAGATTGACAGTTTCCCTTACTTAATCATTAATTCCCTCGTAGATAAATAACGCTACTGCCCATTTCGCACCCGCATCACGCAGCCTCCAA TGGTTCGTGAGTGGGATTGAACTTCGTAAACCTTACAACGGTTCTATGAAATAATCTATTTACTGCGACCAAGTGCATGCCAAACTGGCTCGTTTCTACGTGCAATAAGTATCC CTACTCGTCCACTGGCAATTTCGGAATTGCTTATCAGCGTTGCGGTGATCGGTCCCGAGGTGGCCCTACAAACCCGACCATTTGCACTGGGAGCGACGCTAATGTTCTACTGAT GACCATCCTTTACTCCCGACGCACTCAGCTTACGTTACAAATTTAGACGAAATCCACCAATAACATCGGGCGAACAATTCGAACGTCGGTTAGATACCTATGACTCCTCGCCGA AGCTATTAGGACGTGAACTACGCTCAAGTCAGATACATCGAAGCATCAGAACCAGGTTCAGCTGCAGACCTCGACCAGCAAGCACTTCTTCCCTAGGATGTTCCTACTAGGGCT ACTATCTGTTAGCCGTGAAAGCCCACGTGATCTTGTCAACCGTCCTCAGATGCTGCAATTACGCCTCGAAATACTAGCGTGGGAAATTCGTCGGAACTAACTCGGAAGCCCTGG TCAAAGCCGCACGGCACCCTCGGCTGATGTCACTAGAATAGCACCTAATTCATACGCTATCCCAAGTCCAGTACTTCCCAATCCAATTGTCCCGTCGCACTGAAAGGTTGATTA TGGTCTACAACAAGTCTAGGGTCTAGCCGCTTAGAACGGGCCCATCTACTAAGGTATTGTTGAGGCGGACAGGATGCGTTGAGTGAAGCGGTGGGCACTCGGTGAACGGACGGG TTCCGCCGCAAAGTATGTCCCTCCTCATTCCAGCAGCCGCCCTCCAGGTGGGTGACTCGGGCAAGGAGCACTGTTCTGCGAGCCCGTCTGATTACATAGGCCTGCCGTAAATCA GGTGCGTGGATAGGCAAACTACCTTCGGAGCAGTACCTAGATTTGCCACGCCACGATTCAAGGACAACAGCCTAACTGCACGGGAACCAATGACAACTCCATTGTAGCCAGGCC GAGCCTCGCTCGACGCCCGGCATTGGGTCTTTGGAGCAAAGCTGTACCGACTTATCTGGCGGTGCTGTAAGTGGATCACTAACATTTGGAAGTTTGACATTGAGGATCATGGTG GCGGCACTTTCAAGTGAGTCGATGCGAATCCAGCGTGCTGCTCATAATAGACCTGACCCGAGCAATCCTGGCGTACCACCGACATCTAAGTAGTCAACGTCACTTATGACAGGC AAGACTGAGTAATGGTGCCAGCGGGCCAAATTATGTACCTTTATCCCGTAATCGACAGCGGTAGGATCTTTCCTAACGCCGCTCAGCAATACAGCGAATAGAAATGTACTCACC CGTTTCCGCTGCGGTCGGTCGTACAAGACCTATCGTGATTTGTTTGCAGGAACGCTTGCAGTATCGGACAAGCTTTCGTTTGCGATTTGGGCTGACCGTTAATGCCCTCGCACG ATCTACCAGCTAGTGGAGTCTAATCGGGATTTGACTGACCACGGGCGGTACCAAGGGTTTAGCGAACTGTTACTGAGCCCATAGATCAGATCTAGACATCGCCACTTGTCGTGA CTTTCCCGAAGTCTGTATGGTAATGAATACTTGCCATAGTTCTAGGAGCGGATAATTCGGCAAAGGTCGACGGTCCTGCGGGACTTACCCGAATACAACTACAAGCATTGATTC ACCTGCCTACAGTGAAACAATCCGGTTATTCCTTGTATGTAGATTCGTTCATAAATGATAGCTGACTGCGGCATCCGGAACAGACCCGTATCCTGACTCAGGACAGCCCGACAG TCTACTCCGTGGCCAAGAAGATAACTCCTTTGGTTAGCACGAGCAGATCGGAAATAGGACTAGCCACTGTCCAGCCCTCATCACTGAGGCTTCGTTGATCGACGAGCGGTTACA GATGTTGGAACTTACGCATACATGCCTGCGAACGCAAGAACTTGGCGGAGTCCGGCCCGAACCCAAAGCGAAGTGGGTATCCATAAACATAAGGTGGTCCTAGCAGTCCACGTA TCATGTTTCTTCGCCTTCATGTAACATGGGTCGGCTATCGACCGGCAGACGATGCTATCTTTGTACTTACTGTACGTTTGAAGTCCGTCTTATGGCGTCAGGGCGTTGTTTACC CAGTAATCCATCCATGTAGCGATTGATCCCGGACGCCGGTTTGGATCGTAATGTCTGCGGACGACCCTTTAGGTAACAACCAAGCGAGTGGAAATCAAGTAGGCCCTGTCGTCA ACTAATAAGCGACCGTACCGTAGTAATAACTATTACAATCGGATGTCCAATCGAGTATTGGAATCTTAGACCGTGGAGGAGCTATGGTTATGTCGAACCGGGCCTTTAATTGGT AAATTGGCCGGGACGCTCCATACCAGGAATCCTATCACGGAAGTCAATAATGCTAAGACTCATGAAAGAATGCCAGAAAGGACTGTTGGTTTCTGAGTTAACGACATTCCGGTG GAGCTGCCTCATAGCGAGCTTGCGAGTACGCAGTTGATAATGTAAATGTCCTCGGTCTTAACGTGCCAACCTAAGCTAGGTCGCCAGCCAGCACTACGATAAGAAGGCTACGGG CACGGATACAATAGGGAGCCTTTCCGGGTATGAGTTTACAGTCGGCAGCCACCGTCTAGTCAGTACAGAATCAGTTAATCCGTTACGACGGATGATGCAGAATTACCTGCTGTC ACCTCCGCTAGGCATTTAGCTACGTACTAGTGCCATGAGGTGCCCTTTCGGCTTGGATGCAACAACTTGACCTCCTATGCTTAGTGGTCGTGGTTCACTACCCTTCCTTCAGCA GACTGGTCCAACGTATGCATTACAGGTTAACAATGTATTTAGGATAGTCTTCATCATGCGTCTGTCAGCATTCGCCAATCACCTTCCAGGACGAAGATGCATGTCAGGCTAATT ACTTTCGATACAGTTGGCCTTGCATGGAATACGATGAGCCATTGCTAAAGAACAAACTCGTGAATTGGGCATAGCCTATTAAATTCAGCCATGCCCGCAAGGGCCGCCGGCCGC TACCGATAACGATCGCTTCTAGCTCGGTAAGATCGCATAGTATGACGATCAGTCGCAATCATGACTTAAATCGATGACATAGCAGCTTCCATCGGCTCGAAGGGATCCATTATG CAGCGACTTGCTCAACCATCACCCTTAGGCCACTACTGGTAAGGATCGGCGGTTCCTCGTTACTTGTAAAGCGGACTATTGCCCTGCAGTCGTTGGCTGTAGCTTGGCTTTAAC TACTACATCATTGTCACGGGTAGGTAGGCTTAACCAGTTGCACCGGGTCGAATGGACTAACCCTCCGATGCACGCCTTACGGAGCTTCTTTGATTGTCGACTAGGAATGGCAGT TCTGGTGCATTCAGACCGGTAATACGCCGTACGCTTTCCAGTTCCCT r = 6 (with the restriction of GC content of 40 to 60% at sliding windows of 10 bases) 4,112 bases CCGTTTCAATGGGTGGAGTTTCTTGGGCAAAGTGAACGTATCGCAAACGCTCGTCTACTACTCGTCGCTACATCCATGATCGTACATCTTCGGAACTCCTAAGCATTGGGAGTC CTACCCTACCTAAGGGTACGAATGTCGAACAGGACCCTTCTTTCGACCTTAGCGGTCTGATAAGCGTCTAGGTCAACAGTCATGCCCACTAAGTCAGGTCTGGAAATCTGGAGC AGACAGGAGCTTACAAGCCCAGTGAATTCCTGAGTACGAGCATTTGTCCGGCTTTGAGCTTGATTCTGGCGGTTACATAGGGAGGGTAAGAATCCGTATCAGCCTTATGGCCGC ATTTCGTCTGTACTGAGGGCTACAATCTGCGTCCTTCTAGATGGAGGTCTTCTACAGGTGCCAAACATCCTGTAAGGAAGCTATGCAAGCCGATGGTTCCCAGAACAGATGTCG GTCTAATTCCGTGGCATAATCGCAGCACTACAGAAGGTTCCAGGATTGTCATCGCCTGATTATCCGACTACCATACAGATCGCCAGTCAACTGCAAAGACCAGATCCTATCCTG CGTTCAGACAACGTTACTGGCTTCACGACCCAATCGACAACCCTCGAGTAAAGTCCGACAGAATGCTAGTGGATGCCAAGGTCTACCTTCGGTGAGTGAGCCACTGTTACCATG TCAGGGATCCAACGCCAAATGGATCGTGCAACCTCAATGTTGCAGCCAACATAAGCCTAGGGTTAGCGAACGAGTTTGTACGGGACTGAACAGCGACCAAGACAGCCTCAGAAC CTCGCAAGAAGTTCGATCACGTCTTGATCGGCGTTAATCGAGGCAATTCGCTTCTAAGACCCACCAACTGAAGCTCGATGCTACCAGCAGGTACAACCAAGGGACAACTTTCCA CTGGTTTACTGTCCCGAACAATTGCGACTGGATGGTCCTAGAAAGCCATACGGTATGGGTACCGTTAACTTGGCCAGTTCTTTGGTGCCGATACAGGGTGCAAACTTGCGGCTA CTAGACCATGGGAAGAATACGGCCAACTACCGGGATCTAACAGCCCATCAAATCCCGCATAGAAACCGTTCCTCAGCCATTTCTTCGCCCTTCATTACCCGGGAATAAAGCCCT TTAGCCTACAGCACCTCACTTACCTGGCCTAACGTCAACGTACCATCAGAAGTGCGAGCAATACAGTGCTTCTGGATCACCTGCCTATGCCTAAGTTGGGCCTTTACCAGTCTA TGGGCGATTTCGGGCGTTTAGATGCCCGTTTGATAGCGGCAATCAACCACGACGTTCTGTTTCCAAGCCAAAGTTAGGGATACGCTGGCAATGGTATCGTCTTCATAGGACGAG GATTCAGTAGGTCGCAGAATTGGCGAGTGAAGTTGAGTCCGGTTTAAAGCGGATAGACAAGGACCTGTTTACAGCCGCTTACTAGTGCCGTCAATTACCGGTCTTGTTTCGAGG AGGAATCGATGGCGAACCAATCCCTTTCCTAACCTGATGATGCCTGGTGATGGCACTTAGCCAAGCTGTCGTTGCTGGACGATTGTTGGGACGACTTGCTGTCAAGGGTGATTC GCCGTTAGTATCCGGTGGATTAAGCTGCTCATAAGTGGAGGAGTACATGTCCCATAAACCTCCGCAAATACCACGGGTTACCTCGTGGAATGTTCACCTATGGAGCTAAGAACG ACGACAATGGCGTGGATAAACGGCCGAATAAGTCCCTACGAGTGGCAACTCGAACGGATGTCAACCGTGCATCAATCACGATGTATGCCCTACTATGGTGGTCTGTTATCCTCC GTCATTGGTACTCGGACAGATTCACCCACTGAGTGCCTCAACGATGGGATCGATAGCATGTCGTATCTGGCAGTGAGTTCGCACGTAAGCCGTCTATTGTCCTCCCATTAGCTG CATCGTCAGTAAGTCGACCAGTTGCTCAAGCCTTGGAGTGGTCAGGATCAGATAGGTACGGAAAGCAACCGAGTGCTATTCACGCCGTAACTCGCTGAGTCTGCAGGTTCAGCA GATAACCCACGGTAAATCGGCCCATTTAGGGTCGGTAACTTACGGGTCAAATAGGGCCACTTTCTGGTTGGTATGAGTCAGTGCGTCTGAGTTAGACAGACTAGTTGCAAGGGC AAGCTTAGTCATCCCGTTGTAACTGGGTCAGAATAGGAGCGGAAATACGTGGTCCAACCTACAAGAACCCAACGGAACATACGAGGCTAGATACTCCCGGAAACTATCGGAGGG AATGGTTTGCCGTACGTAAACGTTCCATTACGACCATAATGCGTCGGAATCTATCGCTTGCTTAACCTTGCTAGGATACCCAAACCTGGTTACGTCTGCCTTTCGTTCCGGTAC ATACTGGACTAGCGTGATGCTTTGCTTGCGTCAAGCACCATCTTACTCCGAAACTGAGCTCGTGACTCCAGGTCACTTCCCGAGTTCTGCATTGCTGAGCCTTTGATCCGTGAT AGAACGCTTGGTAGATCGGGCAATAGGTTCGAAGACTACGCCTTAACAGGGACGTTTCGCCTACGTATGACAGGCTATGACTTGGTGAAAGATCCGGCAAGATACCGATCGCAT AAGACGGGTATTACGGACTGCTAACGCTGACAAGCTACAGTAGTCAAGTCCTCGCTAGACATACCTAGTGATCTGCACTTTAGGCTGCGTATGCTGGGATTACCTACGCAACTT GAGTGGACCTTCCTACTCCATTCGACGTGCTACGAAACCAAACGTAGCTAACTGTCACGCAATAACGGGTGACATTAGGTCCCAGTACTACCCGCTTGAACCGCTTCGATAACG TGCCTACTTCAACGCATGGTTGCGAAGCAATCCGGAGCATAAAGGAGGCCTATTATCGCCGAAGTTTAGCGTCACCATGACATAGCAGCAAATCAGCGGGAAAGACGCCTAGAT TACGCTCATCTGGGTGCTAATTGCAGGCAGTACGTCACTGTCGCATGAGGTACTAGCCAGGAAGATAGCCATGAATCGTGGCCTTACTTGTCGGCTAATACGCCCAAATAACCG GCGTAAAGACTGCGAGTCAATAGCTCGGAAGACATTGCATCCGATGTTACGCATACTACGACTAACTAGGAGGTGACTAGGGCTGACTATTGGCCTCATCACTCCTTCAAAGTC GCTCGAAGTCGGTGCATGTATCACCGTAAATGTCCGTGCTTTAGTCCCACTTGATACCTGTCCTTATCCCGACGAATTTGGACGGAAGTCCACGTTCGTATTGGGCTCAACATG GCGGAATAGTTCGGATTTCTACCCAGGACTATGAGGGAGTTGGAAGCGTACCGAATGCCTTCAGCTATCCAAAGCGAGCTGATCGCTGCAGTAGAACCAGCGAAATAGCCCGTG AAATTCCCACCTAACATGACCCTCAAACTAGCTCATTCCGGACCAATAAGCAGGGTATGCAGTGGTTCATGTAGGATGCGAATACTCACGGCATTGTTAGCCCACGAATCGGTT TGTTCCACCTGAATGGCATGTTCGTTGGCAAATGAACGGCAGTTAATGCCAGTGCATTAGTGCGGATTAGGGCGGTATTCATCCGGGTTGAACTACGGTTAGAATCGCTATGGC AAGGAATGCATGGACAAGATCTGAGCACGACTGTACAAGTCTGACGGTCATAACGAGCCTATCAGTTCGTGCGTTACAGTTCCTAGGTTTCCTTTGCGTTTCTGCTTGGGATAG TGCAAGTGCTCCCTTACGATGAGTAGCAACAGCATCTTTGTCGCTTAAACTGCGGTTGTTGCGGTCAATGAGGAGCCATCTGACTGATTGGCTCGACTAGATCAGGACGTCTAA GGCAACCCATCCTTCGCTCAATTTCCCTGGTACAGCTCACGAAGCTGACCTTTCAAGGAGCAAGACTCGATCGTTACGAAGTGGTAAGGTACCACCTTAATGGCCTGGATTGAA GCAGTTCAGGGTTCCTTCCCTAGCTTTCGGTTCGTCAAACCCGATAAGTAGCGGAGTCTACGATAGGGTGGTTATGTCCACCAGACATCAGCACGCTTTCAGGCAAGTACGGTC CATTGGAC 18 r = 7 (with the restriction of GC content of 0 to 100% at sliding windows of 10 bases) 17,999 bases ATCGGGACTTTCCCAATAGTCTAACCAATACCCTCCCACGGGAACGTGCTCGTGCTGTTTATTACATCCCGTTAAACAGACTGAACATGTAGATTAGCGGAGTGCACGTCGCTG ACCTAGCCAAAGATACTGCCTCCAACCCGGGCTGGGCTTACCACCTGCCTCACTAGACGAGCGAACTATCTGGGAGTAATGACTGTTGGAGTCATAAAGCATTGTTCTTTGCCC ATACTCCAGGAATTCTTCCTTCGCCACTAGTCGCTGTATTACGAAACGTTCATAGTGATTCACCATGACGATGGGCTCCTCGAGCAGTACCTCATTTGGTTTAAATCATCCGTT AACATTAGGTGACTAGTTGGAGCTACAGCGGCCGAAACCATAGGAGTTGGCAGGTTCGGCGTCACTGACCATCGCATCAATGAGTGACAGATAAACGATAACCATGTCGAGCGA TTTAGCACCCGTGCGGAAGGTGCAAACAATGAGGAAACTTTCTATGCCAGACATGCTATTAGCTGTCCCTCATTCAGCTGCCAGCGAAAGACCATCTTTGCTCAAGAATCAAGC GTGCGGGCTCCAGATCAATGGACTCGTCCGATGTTTGTTATCTGAACCTCACGTGCTGACCAGGTAACGACGTATTACAACTCAGCAGGGATCGATACGATTGTTGCGTATTAG CCGCTACTCATCTACAAACGTCTACATACAGCTTGAGGTAGCATAAGGCACCCGGAGTCTTTGCAAGGCCTCCGGATACTGGACTGTTTGGGTCTACGCCAGGCGTTCATCACC GACATGTATGACGTGCGTTGTTGATGCTTCCAGGGATGTCCATCAGAAGGCGACGGTGATTAATTACTCGGCTGGCCATGGGCATACCACCAGCCCAAAGTATTACCAACCTAA GCACTAATCGACATTTCTTTGGGAAGCCATGCGGCAGGAGCGAGGCCTATTCAATCTGCATTGTCGGCCAACCAACCGGATGTAGAACGAAGGATTGATTCGACCTTTAATTAT CTTCGTACAACTATTGAGTTCACTATTAGATAATGTAATTGATCCACTGGGATGGATCTTCAGGGCTGTTATTTGTAGCTATCTGCAGCGGGCCGTCCCTAGATGTTACCGTGA AAGAACTCGCTATGGACGAATTAGCGTCCTATCACTGAACCAACGCCTATTTACAATAGGACCCGAGCAAATACGACTGCCACTTTACTGACCTGGGCAACGGTTACAACCCAG ATCGTGAACCTGAAAGTAATTGGCATTTATCGGAACTGTTCACGTAGGACAACCCTACATCGTTTCCTGGCGACAGACGGTCGAGTGATTACTGCCGTGCATCTTACCCGCCTG TCGCATTATTGAGCTCGTGGTCTAAACTTTAGGCTATCAATCGCCCTAGTTTCATCGGTACCTCGTGCCAAAGTTGAATCAAATCTAACAATACCGGACAAACTCCTCGTGGGT CGCCTGAAATTTACTCGCCTGCCGTCGGATAATAAAGGCGGAACATCACGTCGTCCACCCAGTTTCCCTCACCATCACTTATGGCGAACATTTGGGTGGTCTGCTCAGAAATGG GAGTTTCCACCTGGTTGCCCTTACTACCTATTTCCTTTCGGGTTCTGATTATTAGCGAATAATGGGAACAGTGGTGACCGCTGCCCATTCTAGACGTGGATGGGCACTACAAAG AAGTGGCGTAATAGCCGGTGAACTATGATGGCTCGGTTTACTTTGCTGTTGTCGGAGCATTGATACCGCCCTGCCCACTTTGTCTTACAGCATTGGCGAATAGATTTACTGCTA GGAAGTCCAGGTTTAGTGCGGATCATTTACTAGATGCGGCGATAGTCCTTCGGGACATTCTGTCCCAGCACCGCCTTCCAAATTATTGGGCCAGAACCTCGGGTATCACTAGGG TACGTGAGTAACATGTCAGACGACTGAGTGGGCTTGTATTCACCTGACTTATGACCACTGGCAATACATGCACCCATGGTGCGTTAGGGTCGATGCAACTTAGGTCCCTTTACT CCGATGGCCTTATCCGCAACGATCATTGGACCCGGCTACCTTTGGTGGAGCTCACTGGTTGAGCAGCTCATTGGTTGCACTACGAACCACGATCCACGATGCGAAACGAAATCG TGGGCGATAATAGAAGTGATTGGGCTATGGTGCAGAAGGGCTAAACTGCAGAATAAGCATGGAAATCAACAGTAGGGACCGTGACAGCCTGAATGCGGAAATACGTGCATTAGG CCATGATAAGTTTCAGAAACCAACAGCGAATGGGATAGTGCCAATACTGACAGTTTCGGAATTCGGAAGGGTTAATAGCGTACAGCCGTGAATAATCCCATACCTCACTCGCAA AGTTATTAACGGCCTATCTGACGTGGTAGTGAAGACTTCATAGAACAGCAGTGGGAGTCTGCTGGGCGTGCGTCCCTGCGGGAACTCAGGCCCGGTATGATTCGTTCGGCATAG TGGTTGACATAACGGCTGAGTGCGATTGCTTCTGGCGTGGCCTTTACGTGCTTGCTGTACCATGGATCCTATCGGTTGGGACGGGCAATTCCCTTACCCTGTCGTGGAAAGGGA TAAGTACGCACCTCCTTCCCGAGTTAGAAGCACTCGATGATAATACCCATACAGGCTCGAGTGGACGCCTACTCAAGGTCCAGGCGATTATGTCGGGCCAATCAAGACGATCTT AAGGAGCTCGATCACTGCCAGATACGGGTTCGAAGTTTGATGAAATGCCCTACTATTAAGTTGATAATTGTCTTCGGCGGAGTCCAGAAGACGTCGTATGCACGACCCGTGGCA TTCCGGCCCTAAGTCAAGGGCTGGCGGGCGAGCGACGGAGTACAAGTGCGGCATTCTACTGATTCAGCACTATGTTCGGGCTCGTCCAACCACGCAGCAGTCTGGCGGTGATCT GCGGGCCCATGCTGCATCAAGGGTGCAGCGTACGTTGTAAACGAAGTACATGGTAATCGCAAGACCTAAACCCAAAGCCGGTCTGTTGAGTGCATTGGTCTGAAGACGAGTAAC CAGGCGGCTCCGTGGCTTCACCTCCCAAGAAGAAGGGTCGGGCTATTGACTTGGCCGGTATCGAGCGGTATTCCGGATGCCCAGCGGGTGACTGATCTGTAATTCTGCTGAAGC GGGCGTGGGTTACATCTACTTACGCCCACTGTAAGGGTATCTGCCAGTCTGCCATCCTAATAACGAACCTTCCTTTGCCTAATCAACCCTTTGCGTTAATGAAATTAAGGGCGG GCCAGGTCGGTTGTCGTCTTAAGCCAAGACAACAAGACTCGCCATCTGGTAGGATCACCAGCTCAAGCGGAGCAAACTATTCATCGAAACCCAGGCAGTGCTGCGGCGTAAAGC CCTTTATGGCTTGGTGGGTTTGACGTTTAAGTCGCCACGTAACTTCATTGCGGTAGATGGTTGCTGGTATGTCGTCGAGTTGATTGCCCAAGTTGACTTCGGAAATCCAGGTGA GTCAGCCATCCCTCAAGTCGTACAGTCTTCTTACCGGGTCAACGTAGCCTGGATAGTATTGCAGTTATCTTAGACTTATTACTTCAATGGCGACGCCCATTTAGTATTCTGTTC CACGCCGCCTCAGAAGATCGAAGATGTATCCACGCTTTGAATCCTGGCCGGCTAGTCGTTCTTAAAGCGGTAGTCCATCGAACCTCCGCTCAGTCCGAGTTGCTCATCGACGTA GGGATAGGAGGGCTAGACTCCATCGCCCGTAGTACATACTTCAGGATCCGCTTCTACGACCACGTAGTCCGCCAATAGGCATCGCACCCTAGACTGTCAAGGATAGCATCCTAT GCCCGGGATGCGGAGCGAAGAAGTAAGCATCCCAAATCAGCCGAAGGGAAATGCGATAAACTCAAACCGTGATCGGGCTTCAAGCAATACCAGGGTTTCTTCTGCTCCTCAATA CTTACTCGAGGGACCATTCGTCGCTTCTGAAGGAATAGGTTGAACTGTCCTGATAAATTGGCCTGAAGCTCAGTTAGGGCTATCCGCCGGTAGCGGTGGGTCTTGACTAGATCT TGGCCTAGCATCGTCTAATTGCATTACTGGAGTGAGTTCTACGCATTAACAGCATGAACCCACTCCTCATCTGTCTGGAACGCACTCCTGTTGCCGATTGCATCTACGAATAAC CGTGCTGCCATTCTTGGCACCATCTGACAGCGTGCCGTACAATCATCTGAGGGTCCACTTACGTGACGCTCAAGGCGGCACGTTGCCGGCTTGTCCTCGACCTACTGGCACCCT GCGAGGCTCGTGACTCCCGCTGAGGGCTTAACGGTACCATTTAATAGGCTCAGCGTGAGGAGGCCGCTATCACGGATTTCGGCGACTACTATGCGAGGGCTGCGAAAGCCCGCC GCCCTCCCGTATTCTTCATACCCACCCGATTCCTGAACTTCGGCAGTCAGTCCACTACAGCAGCAGGCTGGGTAAGCGGGTATGGCGGCTTGATACTCGTGATTGTCACCGATC CGACCATCAACCTGATAGATGAAGTACTAAGACGCCTTCATAATCACGCAGGGCCGCTCGTTCATGTATCGAAATTACAAACCTTAATGGAGTAAGATTGGGAGCCTTCAAATA ATTACGGTTGACCGCCTGACATCCAGCCATACGGCATTATGGATGATGTCTGTCTTTCATCTGGGCGAAAGGTAAGACTTGGAGTACTGCAACCGAGGCGTGCCTTGGCGTCAT TTGACCGGAATCGCATACGTGGACGGAAGATTGACTGCGTTACAAATCTGCCGAGGGCGTCCTGCTCACTTGCTGAGCATTTAGCCTAGGTACTAGCGTTGAGGTGACCTTTCC GTGGTGCGACTATGTATTTCAGTTTCACCTATGGCACCGCTGGTGCGGCTGGTTGGTCACGCAACGCAACCTCCAAGATTCGCTCAGACCTGTCGAACCAGAACAGTAACTCAT TCCATAAATCAATTTCGTGGGATGACGGAACTCGGCCCTTGTAAGTACTGACGCCCGGCATCACTAAATCACTGGAGGGCCACGATAGACGTCCGCTTAAGGCTAGCGTCAGCA TAGCAGGTGACGCAGTGAAACGTCCTCAAATCAAGGTAGATCGCCGACCCGACTATTTGTTCAACGGATTAGGTCGCTGCGTTGATTCCCTGACGTCTGCAGGAGGACTAATTA CCGTCTACCCAGGGTGAAGGCCCTTAGATGGCCGGGAGTAGACTCGATTCTAACGCTCGGCATGGAGTTGCGTCCGGTCGTGATCTTGCTACGCTTCGCTGCCACCGGACTAGG CCCATCTTGTCCCTATGAAAGTCGACGGGACAACATGACAAGCTGCGGGTCCGTAACTAGTGCACCACTAGGCGATGAACCTTGACCCTAGCCGCATTTGCTTCACTTGGGAAC TGGCTCGCATGGTATCTAAAGTCCGCACCTAGTATTAGTGGATAGGCAGCGTCAAGTATCGCTGGATACGTGATTCCATGAGCGTTCCGTTTAACCAACTAACCTATCTACGGG TCAGCCGCCGGGCTACAGGACAAATGTCCAAATGGAGCATGGTGGTTCTTATTAGACTCAAGCCTTCTAAGAAAGCCATCTTAGGTGCTCCGTCAAATAGAAATGCTTGCGTCT GAACGATTGGGTATCGTTCTGACCCAATTTCCACTTTCGTGCTTCCGGCTCCTGGTCCACCTTTGTTGCATTTGATACGAAAGAATAACATTTACCGTTCATTACCGACCGTAG TTACATAAAGAAAGGAGCACTAGGTTTACGAGGGAATGTCAGTCTTATCAAGGCCAACAGGTCCCGAGGGTGACGGACCTCCTGGCTGGCTTGAATTACTAAACTCCGCAGACA TAAACTAGTATGGCAGGGTCTGACGGATACGACAGTGGAACGGAAATGGAATTTACGGGCCCTCGTTGCCTAGATTTCGAATTACCCGGGACAGTATTCGTGGTGATACGGCCG AGGTCCATCCGATGATCGACTTGTCCGTTTGGGAGTGCTAGATTCCTTGGCTCAAACGCATCACGCCGGCCCAAGGTCGCAATAGACGCATTCCACCCTAATCGTGCCCAGTTG GTACCTGGGTGCTGCTCGTCAACCTCGAATCCAGATGTAGTCAAACGTGATACATACGTTGGTGCATTCTTTGACAAAGATCGGACCCATGTCATTGAGCCTGGGACTCAGACA AATACCTCGCTGGGAGGCGTCGAGCACTTATTCAGCGGTAACCACCCGTACCGGCACTGTTGCAGCTGTCAACTCACCCGGCGTATTGGATAGACAAACAGATGCTACGAGTTG TCCAACGCTCCGGCAGCTCGTCGGTCACGGGCGGCTATGTCCGACTACGGACAGCAATCTGATCGTGCGTGAACGGCGACCTGCTCGCCAAACATTGCCGTTCGCAGGCAACTG ATCATAGACATGAATGTTCATAAGACCCGCCCGAGGCCCAAATTTCCGGTAACTCCCTTTGTAAAGGAATTAACTGGGAGCTGCGATTTCTGGAAACGTGGTGGCGTCTAACGT CCCTTCTAGATAGAATAAATGGCGGAGGAGTACGTTCGGTGGTCCTGCAATTCTTTACGGTACAACATCGCAGTTACGATCCCGTCCATCTTCTAGTCCCGCCCAGTCATAGCA ACTTCACTCGGATGAGTCTGACCTTAGAAAGGCCCGAGTGCGGGTAGAAGACATACTCGCACTTTATCCGTTTCGGTAGTTAGGATCGTCGTGGCTATCGTATGAACTTAAATA GTAAGTGAGTGGTTAGTAGGCCTGTACTGGGCCTTGGTGAAGATTTCATGCGACCGTTAGGTTCAAACAGCTATTTCTGCCAATCGCTTCGTCGACATAGTTAGTTAAGCTGAC TTCTGTCAAATCGGCTACTGTCTGTACATTCCAACGTCAGCTGGGTGAAAGTGCATAATTAGGGAGTACCTTCGATGCCATAATCCTCCCATGAAACCTGCGTTCGGATACCTT TACCTACAAGTTGTTACTGGCTGCCCTCCGCAAAGCGGCACCCAACGACGGGCGTTTCGAAGGCCTGACGGTAGCACGGAGGAAGTTATGGGTTCCAGCGGACAGACCCAGTAT GTTATGAATGCCTGACCTCAGGCTCCGGACTTACAGACGCCGCTCAACGAGTAGTGAGCCATTTCGCACGGATAGTCGATAATTTCTGTTTCATACAATGCGAGTCAACATCTG TATGATAGCAAGGCTTCAGATCATTAATCCCGGAGCAGGCACGGACTGCTTCAACAGCCAGATTTATTGGCTGAGCTGATGGGATCATACTAAGCATTATCCCACTCGCCCAAA CTTGTCACTTTAAATGTCAAGTGGCTAGCTGAGGACTGCAAGTATGCGTTATGCCCTTTGGATGGCGTCCCGTAACGAAGACTCAGCGAGTAGCTAACCAGCACTTCCGTTGGT AGGGAGGGATACAAGGCTGATACTAACGGCAGAATGCGAACCGACTGAAACCGTCGTCGCCGCTGGCGATTCAAGACATTCGTAAAGTCATCCTCAGACGGGCCTCAATGCCGG CGGTAGCCTCATACGGACGATCACCTCGACTAATGTCGTACTTGCCGAAAGTATCATGCGGACAATGTAAAGACCCTCGTCGAATGTATGGTGGCACGCACTGGTCCTTGGAAG TAGTTCTGTAAATTAATGGTAAACCCGCTCGCTGTCTTAGATAGCGTCGCTAGTCAGACAGGTACGCCGTATTTAGTTGTTTAGGAAGCTGAACGCTGGCCTGGAGCAATCAGA CTGTTAGACATTAAACAAGGCCGTCGCATAAACAAAGACATCGTGACATCAAATACAGGATACCAGACTTTATTTACCCTTGCTGCTTAAGTACCCGAATTGAGGGTAGATTCT ACCACCGCCCAATTCTAGTTCAGTTCTAAGCCCGGATGAACTGCACCTACCCTGATACCTCCTCCCTTGGCAGCCTTACCTAAGGTTGGCGTACTCACCGGTACATCCAATTGC GGCACTAGAAATTCTTAGGGTTTGGCATGAGTAAGGATTTAACCTGCTTTAATAATAACAATTTGTCATCATCAGGCACCTGTTATGGACAGCGGTTTGTTGGCAAGTCGGCTC GTTGTTTCACGAGGCGGGAATAGACCGCATAAGCCGTCAATCAGGGCCAAATTCGCAGACTATGCTAGAAGCCAACTATGCATCTGCGACCCGGACAGGGCACCTATTGATGGA AAGACGTTAAGTTTATTCAAGGCATCAAAGTCACCGGGAGCGACCCTCCTTGATCCTTTAGCCGTCTATGGGTGGGCGGCAACCTGGTGAACAAGCATGTTCGTTCAAAGATTG TTAAATTCAACAAATGGTTCCTCAGGTCCTGTTAGGCCTTTGCTTAGATCACTACATTTAGAATGTTACGGCTCGGAACGACATCCGTAATCAAAGGATGTTATCCTCGATAGC AGTCACGCCAATTGTACTAATCTGCTGTCGGGTTTGCTGATAGTACAGAATCAGCATTCGGCCACTAACAAATCATGGACTTATCTACCAAGGTAAACTTCTTGCGTGCATGAA GTGACTCGGCGGGTTGTTAACCTTTAGTAGTAGGTGGAGGGTTTAAGACATGGCTGTAATGATTGGTTCTGCGTTTAAAGGGTCAGTCAACGCCCAAGCGGGATAAATCCAAAT AAATTTGTTTGATCGTAGTCATGGCGATACCTACGGTTTATCAATAACCAGTCACTCGTCGCAAATGAATACCCGGTTAGTGCAGCAATAAGCGGCTCGATGGCTAGTGCGACG CAATGGAAACAAGCTAACGGGTAAAGTATGAATCCGGCCGCAGGACTCCTTGGTCAATGTAGCTTTACTTGTATGTCCCTTATTCCCTACAATCCTGCGTAGGCCCTCATCCCT TAACATCAATCTATCGATTCCAGGAGTTCCTCCCGAACCGGAGCTTCGTTGGTCTTCGACATCTATGGAAGCCGGACCGTCCGTTAGATCTGAGCCGATCGTTTGTCAGGTATG 19 CGGATTGAAACATAAGTATCTGATGTAGCCTTGACGTGAAATTGCTGGGTTAACGAAAGGCGAAGCAACTATCCTAGAATCCGTAGGATGTCTACGTACAGATACAGCACTCAT CAGTCTACCGATAATGGCGTATGTTTAAGCGTGAAGTAGCGAACAGCCGGCGATGTCGGATTGCAAGTGCTACTACAACAGAACATACCGCAGGTCCGGGAGGTCACTTAGAAT TTAGCTTACTAATGACCCATTCATTCGGATCAGCAGCCTATCGCTCCAAGGCGTAACAAGTCCGTACATGCCTTCGGTCTACCTAAAGGCACTGGCTAGAAAGCGAATTCTGGG ATACGCAGCCGGGCCATTCGGTGCGTCATCAGCACCAGTGGGTCCATACGTACTGGTCTTGGTGCTTAACCACGGCAAAGTCTTGGGCTCGGCTTAGTTGCGAATGTTGAGGCA CGAAGTCTAGATCATGCTTCTTTCCTTGAAATCGAAGTAACAGAAGCCCAACTACAATACAAAGCTTGGCGGCAGTGGAGGATCCCTCGGTGGGATCTTTATTGCACCGCATTG AATGCTCGTTACTTACCTGCCCTTCAGTTAAACTGACGGGCTGATGCTAATTAACAAGCGATGGCACTCACCAAACGTATTCGAACGACGACCGATTTACCAAGTCGCAGGAGT AACGGTGGTTTGTCGATGTACAATGTCTGAATCCCGCCACGGCCTCGATCCATTTACATACCCTGCTTACATTCTTCTAATCGGGCCGCACGAGTGGTGGATCTGTCGGCTTTA ACATGAGCTACGGCTGGGATCGGGTTATGAGCTTGCGAACTTAGACAGGAGCTGGACGGCTCCCTCAGGGTTGCGGGCGGAGCTAAACCGGCGAGTTTACATGACCGTATGTCA CTATCTATGCGGGCATTAGACAACCACCTAGAACCATACTTAACCCGCATCATCTTGCACGAATCGTGAGCACGGCCGGCAGGTCATGGGCGGACCTAGATCGGCGAGCTACCG ACTCCTAAATGACTAGGAACAATTCCACGTTGTCCGATAGTGGAGCGTAGTGGCATGGTCACCGTTTGAATTGGCTCCACGGACCAACATACATAACAACAGGGCTCCCGAGCC TACTTAAAGGACAATCTAGTTGCCGTAATAATTCTATGTTTGCTTTCCTCCTTAGACGTAGCTTCAATCGGTGGCGAGGATGTATGCTCACCTTTATCATTATTATGCTTCGGT AAAGATGCGACTCCAGAATGGAGGACAGATTCAGTTGTATGCCAAGCGAGGGTAATCCTTTCAATTAACGTAGATCAGCTGTTGCGGTTCAACTACCCATGATCGCTACGCCCG AAATTCGTGATAAGCGTCCCAGTCCTATGTAGTTTCTGCGGCTCACGTTCTAACCTGTATCGCAGAATCTTTCTACGGCCCACTATTCTGAATAGAATGACCAAACTCGAACTC ATCGCTAACGTTGAGTACTATGGATAGCGACCAAGATGTTCGACTGCAGGCGGACATCACTCACGGTGAGGTGCATAGACCTATCCTCACCCAATGTATCTATTGAATTCGTTA GGCTTATCCTGCACGTGGAGTAGTCCCTCCAATTTGGTATGAGTCGTGCTCCCGCCGGACGACCTGACCACGCCTAAATCTTATGGAGGTGGATGAGCCACTGCATCGTTAGCC CAGTGCTAATCATGCCGCTAACTTGCTAACAGCTTCTGCCTTGTCCACTGACTTTCTTCGTGGCAAGGACATAAGGAACTGAGTCGGGCGACAAAGGAGGCGATCCGCCCTATC ATCATTCGATGTTCTAAACGTACATAGCATTCCTTAGGCGACTGGAAGGATACTATTGTCCCAATGACTTCAGTGGTTCCATGGGAAGTGGATGTTCACTGTCACCCACCTTCA GGTCTAGGCGTACGATACGTCAATGCATCCCTAGGCGGTTCGTCGGGAACAACCATTGGCCATCACCCTATGCGTCCTCGCAACATAATGGTGGGACGACAGCTCACCACCGGG TGGCGGCGAGGGAGTGATAATCGATCTAAATTGCCGAGTGGCCTATGGGAGCGGTTGTAGGAATACGCTACATGAGTTTGCCGGAACCGCTAGACGCTGGAGTTCGTTGATCAG ACGTGACCTGTACCGAGCTTTAGGGACTGATTAACTAGGCTGTAGAAGTCGTTGCTTTAGAAATACCATCCGGCATACGGGCACTCCAATCAGTTATTGCCCTATTGCTATTCA CTTTCTGTATTGGGTTAGTCTTTCTTGCCTATGTTAATACCGTGGGCATGTTGCTCCCACCTGTAGTATTTCGGTGAATGTAGGTTCTTCCATGTAAGGAAGCGTTTACGCTTA CAACAATTAGCTTCTATCAAAGACGAAACTCAGATTATGCCATGAACAATAAGTAGCCCACGTTCGACCAGACGAACAGACGAGGCTGGCACGGCTCATTCGTTTGATTGTTCG AGCGTGACGTAAATCAGATGCCTCCCGCACCCGAGTACATTTGCATCCGCCTCCGTCCCGGTCACTAACTCCTTATCTTTAGGATTAGTCTAGTTTGAAGTCGAAACAGGTATT GAAGGGATCTAATCCCTATCTTGTTAGGAATCATCACTGTTTAGCGACTCGAAACTATGGGCCATCCGCAGCCATTAAATACTCACTGAGCCCTATGTCATAATTGAGCATCTA TTAATGCTACAACGGACCGATACCAACAACTCCCGTCACTCCTAGACATAGACTGCCTAAGTGCGTCGGGTAACTATCGTCTGCCTAGCCCGTGGTAACCGGAAGGACCTTCGC ACCGGTTTGCGAAGCCTAGCGTAGATTGGATCACGTGGGACATCGAGGGCACGTCTTACGCTAACATAACCGAATTTATGCCTCAGGAGCAAGCGACAACTAGGAGCTTGGATA CTCAATTTACCTGTAAACCATGCAACCCAAGGATCGGAAGTATCCCGTGCAACATTTCGACGCTGAACTCCGTGACTACTTTGTACTTAGTAACCCGTGATTTGCGTAGCTGGG CCGCCTACCTGAATACTCGGTGAGCGGTCGAAGCTTATCGCTAGGTAGCTGTACAAATTCTGACTTGCTTGCATCATTAGCAGTTAAGGCAATTATTCCAGCCTTCCCAGACTA AGCTCAACTCCGAGTAGGTTTCTATCGCAATCATGACAGACTACTGGAACAAACTTACGGCCTGGTGCAACGGAGCTGTTCCTCGGCGTTTATGGTAGCAATCCCTGTTAACTC CACCGCAAGTAGAACAACATTGTTACCAGCGACATGATCCAGCATACTGCAGCTTATTAAGCGATCTGATAACAGTATGAGCAGTCGGAAGACGCTTGCGGCGGCATAAGTTAG GTCATCAACTGAGGTAACAGCGGATACAGACCAGGGACTTAGTTACTTTATGCTCCTACGAAATAGTGCATCACCAACCCTCGACAACGCTTAGACCTTGTTGAGCCGGTTCTA CCTGCTGGAAAGCTCACGGGAGCTTATGTACCACTGCGGTGACGAGGACTCATGGCTCCGATACTGTATCAGTACAACCTGCAGCAGCGATTAGCTAGACCATGTTCTTCGGGT GAACGCATGTTTAGTTCTTAGTTCATGACCAGTCCAAACAAGTTCACGGAATGATCACCCATTGCCTCCTAATCTTTAAGCCGCAACTGTAGCAAACGCTAAGCCACTCAATGT CCTCATAGATTAACGAGTTTCTAACATCGTAATCCATTGTCTACAACTTCGCCGAGCTAGTCTACAGTAATTTAAGCTCGTTTAGGCATTGCAGCCGACGTCCATGCATTTCGT CGTAGGCTGAGGCCACCGTGCAGCCCGCTAAAGCGAGCACCCTCATGGTCCATTCGAAGAATCTGAATTTGAGCGACTTTAATCTGGAATTAGATGACGAAGTGGTTTCCAAAC GCCTCAAAGGCTTAATCCTGATGGTACTATTCGTTGTATTTATCTATCCGGCGAAACTTGAGTGAGCGTGCAGACCTCGTCCTCCGAAGGTTGAGGGCGACTCACTAATTCTTG CTTTGCATGTTAGGGAACCCGGTCTAGTCGACCGGCCCGGAAGTCACGTTGGGCCCTGTACGGGATTTCTTGGACAGGACGTGGCACTTCGGTTGATCTATCTGTTAGTTCGAC GGATCCAACTGCCTTAGTGCCTGTAGGAGGTTTAATCATTTCCAAGCCTATGACGCTTTACCAGGTCCACGGTCAGCTAACAACGGCGGTCCGAGCGGAACGTAGTGACGACTA CAAGACAGGGAAGGGCCATAGCACTACCCTTTCATGATCAATACGGATCTATGTAAGATCAGGTGCACGGAACCAAGAACGGGTTGCCAGAATCGTCACGAAGGTGATGTAAGT GCAATTGACAGGATTTGTCGGTCTGGACGCAGGCTTTGAGCTATGCCTTTCAGCCCTAGGAAATCGGACTAAGTGGCCAAAGGACCCAGCGTAATTTGCGGTCGTTGACAACCT CAGCATGTAGTAACTTAAGAAGCGTCGGTATTTACGCAATTACGACAAGGAGCCAGCTGCTAGACAGAAACAAACGGGTGATTCGGAGTCAGGAGTCAAAGTACAGTTATGCAT GCGTCAACAGGCAATCTAAGTCTTTATCTTGAACTACGAGCTAACTAAAGAACCCAGTCTTGAGGAATGTTTATCCATGCGTGGACAGTCGCTTTGGAATTCACGCCTGTAACA AAGTCCAACTTGACCTATTATTGCGGGTGCAAGCCCTACGAGGATCATCCTGTTTACCGGCTGGACTAAAGTAGGGTCCTCGGACGGATGGGTTATTTCTTATCCAACAATGGT AGACAGTTGGCGGGTGGATAATCCGGGTTTCCCATAAGCAATGTCGCCGGGATAGCTCCTCCAAAGTCGGTGCTTTCACTAAGCGAAGGCTAACCGACAGTCCCGTTGCTCGCA GTACATCAACGTCCGGGCGGTCTTCAAAGGTTTCCTTAACGACCAGTTTGAGGTTTCGCTTGACCATTGACTCAGGGCGTTACCCTAAACTATCAAGTCACTAGCGGCGGTTGC CTTGCGTTTCTACCGTTTACCCAACCTGTTCGAAAGATGTCCTAATTCACTAGTAGGACTCGGTTACTGCACTGCGACATCAGTTCAGAACTTGGAACCGTATTATTTCGAGGC TTTCCGGGACTCCCTGAGCAATTCGTATTAATCGGAGGGAACTAGGTCGGCCCGTGCTTTACAGTAGATACTAGCTACGAAGGGTACCACGCTGGTCTATCATGTACCCTGGAT GACATCGGCCTTCTTAGCCGGATCGACCACCACCTCGCCTTGCCCTCACTTCGCAGTCGCAACCGTCTTCGCCCGGTCGCCAAGTAACCTACCACTCCACTTGGTTTGGTAAGG ACCGCTAAGTCGGGACGTCAACTAGCTTGCCAGGCCCACCGGTCTTTGGTACAAGCTTAATTCAGGAGGGAGGCCCTAGAAACATTAAGTGCCCTTGGACCTATGCAGAAAGAA ACAGCCTCCGACGATGAGGCGAATGACGTAGAATTCAGTCCTTATTGAAATTCATCTTAATACATTTCAAAGTGCCTAGTAGCTGCCCGCCTATGCTCGAAAGGGCGAGGCATG GCGTTGCTACTTGAGCCCAGGGAAAGATCAAGCTTCATCCCGATGTATTCTAGCGTGATCAGGGACCCAAGTCCACCGTACTGCCAAATCCGGTGATAACCCTCACGAAACCGC AGCGGCAATAGCAGCCGCTGATGTTGCCACCCACGTACAAGGTACGTCTTGCAACCTTTCTGACGACGCATCGCTGAAATCCTAGCGGAATCCACGGCTACGACTAACTTTAAG GCCGCCAGGTATCTTCAATTTGAATACGAGGCACTCGTATGGTCTGGGCCACCTCATCATGCATCGCCAACGTTCCATCCTGCCAGGGTCATGCCCTGACAGGCATGAATAACT CCGGTAGGGCCAGTATTAAACGACAATTTAATCGTTGTCAGTTACAATGGCTAACTGCCGGTGGATTAACCAGATTCGGTATGGTAAGTCGTCGGCCGGGCAAACAGGGTTACG GTGCTCAGTAGTACTGGCGGATTACAGTCAGCGTTATTCCGTTATTTAACTCAGTGGCGGAAGAAGATAGTCTTGCCGGGAAGACTACGCACGGCGATTTGGAGCAGCGGAAGC GGCCAAACGACGCCAGTTTACTGTCGACTGGCGAAATCACCTGTCATCGCAACGTTTGCCATTGGATTCTTAACCGGCATGCGATCAGTGCCAGGTTACCTAGTCTTAACTCGC AGCGTTGCCAAGTTTAATTGGACGGTGCCGTTGATAGGTAATGCTGGCCGTTTCCGGCGTTCGCCGTAGGCATGCTAGTCCTGGATCCCAATTGATGATCTTCGCATGTAACGG CGTTAGCACGATGAAGGGTGGGAAGATGACCGCACCGTTGTCTGTTAAACCGCCATCAGTGAAGGTAACTTGAAAGATAGCACCACCCTCCGTAGACTTTCGACCCAGACAATC GACTGTAGGTCCTAGGCCGCAAACTGCTTATCTGCTTGATGGGTACCGCTATTTGCCGACATTCAAGTCCTAAGGCGTTATCTAGGAATTGAATCTTAACGCCACGAGTCGAGT AGATCCTCATGTCCGTGAGGCTTACGAAGTTAAGTGGGTAGGACCATCCATCACGATACAAACTAATGCGGGACGATACTACGCCTGAGCTAGATCCAAACTACTCAGAATGAT AGTCAGTAGCAGCATCGGGTCTAACTGTTTCCGACCTCCGGGCAGTAGACGAATCTAAACCAGTTAGTACCCACGGATCAACATAGCGTTCGTCAATAAACGTTAAAGTACCAG AAGGACAGGCCGGTGCATCGGAAAGTAGCTTGGGACTAACATTGAACCCTATTCTTATGCAACGTAACCTTATGGTGACGTCGAGGCGAGCCGGGACCCGTTGACGGGTCTGCG TGCTATTTATGTTCCTTCCTCACTGTACCTTAGCCTGTTACAAGATCGCACGAACCCAACTGACCGATGCCGACCGAGGATTGGCTATGATAACGACAGAACCGTGGTCCGACG CCGGTCAGGGTGGCTGGTCAAAGCTAAGTAGTATGCAGTCATGTCAATGAATCGTTATGATCGGTCTTAATGACAACTGTTACGAGCCACGTTACTGAGTATGACTTAAGGGTC CGGAACGGGCAGGAACGCCAGCACGCCATAGAAGGTAGGTCTGGCCGCCGACGGGTAGGGTAGCCCTGCTAACTCAACCGGTCGGGAGTGGATTGGACTATTCCGAGGTGCCGA ATACCTATGAGTTGGGATTCCACTCATAAGGGTTCCGACAGGAAATTCCGGGTAGCGACAGCCAACCGCACTGCTACCCAAGATACGTAATCTGTTGGGCAGATGACTCATTTA TGAAATAAGTTCTAGCTTAGGGATCAAATTAGCCACCTATCGACGGTATCCATTACAAGCGTATGCTAAAGACAAGGGAACATTACGAGTACGAACTGGTGGCCGGACTGTATT CCAAGACGGGAAGCTATTATGGGCTGCTTTCGGATGTTGGACCACTTATCGTGCAGTGACTATTACAGGAACTACCATGATGCCCGATAACTCACGACTGGTACGACGGAATCT GCGTAACCAAGCGTTCTGCTAAACGGCTAGATACGCTGTTCAGCATCATACGTCCGTCGGTTTCCGCTTTCCCGCTACAAGGATTCCTCGTTAGTTTACGGAAGGCCAGGGAGG TTAATGTACGCAGTATTTGGTCGTCAAGAACAGGCCATTTGCCAGCTATCGGGAAACGCTGTACTCATTAAAGAAGGTCAATCGTCGATCTGTTCATTTCGGACGTGCTAGGGC AGTCCCATTGGGCACCGATGGATTTGAACTTTGAGTAATAGACTTGACTGGCTTTAGTGGTCTTACTGCGGCAAGCAGGATCAGGAAAGTTCGGAGCGGCTTCATTATGCGTGC GAGGTTTGATAACTTAGTGGGTGCGTGGCAGGCATCCAAGGTTGTTTGTACCTCCGTTCGTAACGTCGGGCACCAATCTGTAGGGCGTGAGTCCGACCCAACAAGTAGTGGGAA TTATGTAGGCACGTACTTCCTGCTACGGAACAAGTGGTAATTGTATCAACCAAAGCTGAGCGTAACTCGTCAGGAACCTGGGAACCGGTAATCTTGCGGACCGGTAGTACCGAA CCCTGCACCGAGCATCCGACTCGGAAGCTCGACTGACGTACGGTCTAATCAGCGTAGGTTGCTTACCGCACGTTTACTACGTATTGCGTTCTTCAGTCAGATTTCCCGGTTGTT GTTGCTTGGTCTACTTCTAGGCTCCTATTTGGCGGTCGCTTATGATTACCTTACTTAATTGCTAGTGGAAGCGAACGTTACAGTTCGGTCCCGCTCATAGTCCGTTCTATCCTG AGCCAGGGCAGAACTGCGGACGAAAGCTTACCTCCACGAACTAGCAGGCGTCAATTAGTAGCGTGGGCCAACTTCTGGGCAGTTCCAGGCAAGTTCCTACGCCGATTACAATTT CTTCCCTATTAGTTCCGCAGGCCTCATTAGATTATTTATTAATTCGGCTTCGAGTACCGCATGTCCACCATCGACTCCGTTACCGATTATTGTATTATGAGGTTACAGCTGGAT TGACAAGGTGGCATCCTCCGGTCCTAACGTGAAGAAGCCGTTGGGAATACTTCGCTATTGTTTCGTTCCCACGCAATCCTACATGGGATGTTTACTCAGGT 20 r = 7 (with the restriction of GC content of 40 to 60% at sliding windows of 10 bases) 13,402 bases TTCAGTGGAGCTCAATGGTACTCGCAAATCAAGGCCGTATTGCTGGGTGATTATCGGCTAGATAACCGAAGTAGGCCTGTCTACTCGGCACTTTCGTGCTCCTATCTGTCCCTC AGAACGTGATCGGTTGGGTTCGTGACGCATCATTACCGTCATCAAGACTGCGTGAGTGATCAACAGGTAACATGTCCCGTGAACATCTGGCAGCATCAAACGCTGTATGGCTGG GTATTCTGCGGACTAGATTGCACGCCTTTAATGGCGGTTCATTCCGCCACTTTGAACGTGCACCATCCAACGACGAGGATTGTACAGGAAAGGTCATGGGTCTGACTACCTCAT CGTCGTCGAAAGTTGCTCCCATGATCGGGCAATTCTACCGGACAGAAATGGCCAATCCTGATGATTCCAGGGCAAAGGACATCAGGCATACAATGGGACAAACTTCGTGCCATC TTGTAGGGCTATGAAGTCCAATCCGAGTGGTTCCTCGAAACTGCTCCGTTAAACTGCGGAAGATAGCTGAACCTACGGATTGAGGCGGATTTAACCGCCGATTAGCTCCCGAAT TTGGGTCTTACGTGAAGACTCGGCTTGAGTTGCTACGAGTTACGTACCGTTCGGTAGGCAAGGTGAGTTTAGGTGAGGACTCCACTATGGTACCCTGAGCAAAGATGCCATCCG TAAACATGGGCTAAGATAGGCTTACAACGCCGAACATCAGTGATCTTGAGGGCTGAAATTGGCACTGTACCGCTGTAGACAACGATTCCTGAAGGCTCGAATTTCGGCCAGATT TGCACCGTGATACCGTGCTTTCGGAGGCATTAAGCCCGCTTTATTCCGCAAGTGAATCGTAGTCTGAGGTTACCATTCCACGACGAACATAGACGGGTGAAGAAACGAGGGAGT ACAATGCAGTTCTGCGACCTTATTCCAGCTGCATTAACCGTCCATGTCCAGGAGTGACCTATTGCCGCTGTTTCTGCAGTGGAATAGGGTCATAGGTCTAGATCCCAGGACAAT CCGCAGGTAATACGACGGCATTTGGATGCGGATCAACGGGCATAATAGGGCGACATTCGAAGACTGTTCACCCTTCTATTGGCAGGTTGTCCCTGTCTGTTTCCGACCCATAGC ACTGAAGATGTCGTAGAAACCTCAACGGTCTAGTTCACGACAACATCGCCCAAGTTTGAGCTACCTGAGCCATGCACTTAGAACCCGCATGAAGTACGTGGGTCATCTAGTGGA TCGATTCGACATGGGAACAATGTCGATCTGAGGCTTTAAGTCCCTACAGATTCCCTTTGAGGGAGCATTCTGCTCAGTGACCCTGAATCTACCACGACATTTGCGTACGTATCC TCACTCCATGTTCGCATAGTGCTGAAGCAGTTACGGACAGTATCTGAGCTATTGAGTGCTCCAGAATCGTCGGGATAGGACCGTTTCGTAGCCGTAGAAGTCGCCAGATCTTCT GACCGTTCCTGATTGACCTCACCAGATCGTATGCTCAAGAACAGACGACCATGTCAATGCGACATACGACTAGGACGGTATTGTTGCGAGGTTCTACGTACTAGCAACAAGATC GATGATCTAGCACTACCTAACAGTACCACCGTTGTTATGCGTCCCTAATGAGCAGTGGTTAGCATACCCGAATGCTTCCAGGACGTAGCTCCTCCTATTACGAGGAAGCACCAA TAACCTCCAGCTACGACGACAGACAAACCAAGCTTAGACCTCGCTAGTTAGACCGGACTTCTGGGCTGATCTTTGGTGGACAGACCTCCTAGTGCAATGACCCGATGAGTGACA GATCCACCGAGTTCGAGGGAACAGTGAAGCTTGTCCATCTAAGTCCTAGCTGGTCGTTAGGATTGCCATAGTTACCGGATTTGCTACCATTACGAAGCAGCATTTACCTGGAAA CTTGCTCAATCACGCCAAGACTCCAATCTTGGACCTTACCATCCCTCACTTAACGGGCCAATTCGCTTCTGTCATGCCGATGGATCGCATTACCAAGCGATAGAAAGGCTTTGT TACGGGACCATTCGGACATCCTCGGTCTACCTCGGATGATTTCGCACCGTAAGGTAGGATTTGGGAGGTGAACCATCAAGCATGCGTGAACGAAACCTGGGCTTTGCTTTCCTA CTCAGACGCTGAAAGGATCTGGGTGCTATTCACTGCACTTTACCGCCGTTAATTCGCCGGATTAGTCACGACGTCAAATGAGCCGAACTAGCTTCAGGTTGCGTCTATCTAGGG TTCGCAAGTATGTCCAAGGGTCTGGTTCCCACTTACTACGACCCTAGTAACGTGCTGCTAATCTGTCGCCTTGACAGGTTGACTCACGTGGTATGATCTGCGATGTCCACTACC GTAAAGGTTGGGCTACCTTCCGGATCCATGGGTATCCATAGCCATGGACAGCGTAATACTGCCCTGATTTCTGGGACTTGCACGAGTAGGTTAGACGTCGGTTACGGTTGGCAA GCAATGCGGTGCTAACAAGCCTTGCTTAGCGTTACTCGCTTGAAATCGCACGGAATGTAACCGTATCCACGGACAAGTCATGTCCGTTACCTCCCGATACGAGCTTCATCTTGG GATGCTAAAGCGAACGACAATCTACGTGAGGGTTGCAATCGCATCTTAGTCCCTTAATCTGCGTCCTTGTTACTCCAGGAACTTCGCAACCGAAATGTAGCCGCATGTAACAGC AGGGTAATCTAGCGGGTATCATCGTGGTTAACGGTCGCATAAGCCGCCTTAATAGCCGTGCTAAGACAGCACGATTTAGCTGGCTGTAAACGAGTGCGTAAATCGCTGGCATTG CATCCATGACGCTCACTATTCGGTGCTGATTCAACCGAGTATGTACGCACCTGTTCGCTATTCGAACCCGATAGCTTTGGCTGCACTACTGTTGGGCAGACATAAGCTGCACGT ATCTACAGGCGTAAAGTTCGCCGATCAACCACGCAGAAGAAAGCCCTAGCTTGTAGCGGGAACTAGGTTGCATAAGGTTCCATACCATGACATGCACGTGGATTACGCAGTGCA ACCGTCTGTCAGAAACTCGGCCTAATTTCCGCCCTTATGGAAGGGCAACAAAGACGCCTCATGAAGACAGTACATGCAACAGTCGTAGGACTGCATCACGACTATCGGCACCTT TCACGAAGACATCACCGGTTTCTTGGAGCTATCAGATGGGACGAGTATTGAGGGTATCGACTTCGAACTGATGCGGTATCTAACCCGTGAGTAATGGACCCGTTACTGTAGGCG ACTATGAGGGAAGAAATGCGGAGTTTATCGCCGCAAATAACCGCATAAAGCTCCAGGTTAGCCGTCCAATTCCAGTGACTTTCCGTTAGCCAGCAGATTTCGTCACCTTATCAC TCGCTATCTGGTAGAACTCGACCTGTCTTTACGTGGTTTCGATCCGTACAGACATCCGCACTCATGGGCAACTGTTACGTCCGCAAAGACTCACTCAGCGTTGAACATGCCCAG ACTTGGCCTCACTAAAGGGAGTGCATTGAACCTCACTGCTAGCTCGTTAAAGCTGCCGAAACAGTCCGAACAGATGAGTCATGAACTGACTGTACGATAGGTAACCGCTAACAG CCTCCATGATGCACCTAGGATAGACCTGCTATGCAGGGCATACTTCGTCATCTGCGGTCAATTGCGGCACTAATGTCGGATCCTCAACCTATGGGACTCCTTCTTTCAGGCCCT AAGAAGCGTGCAGTTGCTGTTGGACGTATTTGGGCTGCTATCGTTTGCAGCTCAGTTATCCTGGCTTATCTTGCCGCATAGACCGCCAATTATCCCGCTCATAGATCGTGCCTG AATAGGCTCATGTATGGACCGTCAAGGAAGCCTAGATTTGGCCCTACTTTCCCATACCGTAGGCTTTCGTTGCTTCGCTACCGTTTAGATGCGGGATTTCTACCCAGGTCTTAG AAGGACAGGCATGTCGTCATTTCCAGCGAATGACATCGGCTTTAGAACGCTCCTAGACATGGCCAGTGATTGTCACGCTACTCCAACGGTTGTAATGGCTCCCAATAAGGGAGC CTAAACCACCGACTTACGATGGACGTGCAAACCCGTACTGATGGCTTCTTCGTCTTAGGTTGGTGCAGTACGTCCCAGTAGCATCATCACCTAGACGTGGCATTCGTGATCCAT TCCGGACATTTACGGCTTCGATGAAGAACGCCTGATAATGCGTGGGTATGGCACCTCATAACAGGGAGTCTGTCTGAGCGAGTTCCAACATCATCCTTGATGTACGGCCATCAT TTGGCAACATGCATCCCGTAGACATCTGAACCGAATCAGGTACCCATACGTGGCGAATACATGCCGGATGTAACTCCGAGCAACAGACCAGCACTGTTTACCACGCCATACGAG GTATTAGGGTGCGTTTAACAGCGTGGATCATTCCCTGGCTATCGCATACCGAGGCATAACGGGATCGTAAAGCCACGTTATGACGCCGTAAGATTGCCCTCATCTACGATCACT TGGTTGAGTCCCTGAAATGGTGCTAGGTCTTTGATCGTCACTAGTTCTGGCTGAGTCACGTTGAAAGCCAAGGTTTCAGGTCTGATACAGACGTGCTTGTTGGTCTGGCAATTT CGCTGTACATAGCCTTGGCAATACCCTGGATCTTACTGCATGAAAGCGGCTTGTATGGGCGAGTTATGGTGGTAACAACCTCCTGCTACTCGAGGCTTAGGAAGTCATCGTTCC GGTACCTTAAGCGAGGGTAAGCCACCTAGTATCCTGTAGGTCTGGGTTCCAAGCCGTTAGACTCCTGTAAGCATCACCCTCAGTACAGCAGCGTACTTTCGCTAAGTAGTCCTC GCATTTAGACGCCCAAAGAAAGGGTGGGTAACCCAGTTGGACCGAAGAACTCCGTACGATTTCCTTGGGACTACCATGGAACATCGTTGGGTAGTTCTAGGCCTCAAGTGCCTA GTGGTCTGCTAAGTGACCGATTGTTTGCCTGCATTCAGGAGTTGAGGCTATCCGAGCGTATGGATTCGGATGCGAATGCGAAAGCGAGCCAAACATTCGGCGACAAATACCCGG ACAAATTCCCAGGGTTTCCTCAGGTCACGATGGGATGTCGAGCGTTGTCCTCCTGTTTAGCACTCGCATCACTGTAGTCCATTGAAGTCGTTCATGGACCAGAAAGGTACAGGC AGGTACTGAATGGTTCGGAACCCTTTACTGTCGTGCTTATCAGTGGCAACCTCATGTCGCTACTTGCATCGTTTAGGGCAGCTATGCTTCTGGCACCATAGCGGTGCATTCGAT CACCTTTGCATGGGATCCTGTTCCTACAACATGGTTCCAGAAACCCGGTCTTTATGGGTGGGATTCATGCCAGTTTACCCGCCTATTGACTCCCATCCACTCGTCTAGCATTGA GGAGGCAATTACCTCGACAGTGACGGACTCAGAATCACTCCCGTTGTATCACCATGGCACTCAGTGGGTTCTTATGGCTCGATAGTGCCCATCTACAACGTTCCACCACGAACG ATCGCCAAACCTCCGAGTACATCGTGCATTTCGCCCTGTATTAGCGGCTGAATGAGGCAAAGCATCCTGCTCAACCGTGGAAGTTCACTGACGTTTCGACCTTGGATCTACATG CTACCCTAGCCAACGTAGGTAAGCTCCCAGTTCCATTGGCCTATTTCACGGCTTAGTGCGGTCTTGAGTCAGCAATGGGCTTATTCTGGGTCCAAAGGCGAACGTTATCCAGAC TGAGGTGAGCATGATACTCGAACGAGTCACTCGTGGAGTACGATCGACGAATCAAGTCGTGCACTAAACCTGCACGAACAAGTACCCGCAACTTAGTCGGCGTAAGAACGAGCA TGCATTTGCAGTGCTTAACCCATGTACAAGCGTGGTCATCATCGGACCTACTGTAACCTGTACTCATGACGTCTGATTAGCGTCTTCCTACGGCAAGCTACCAAAGGTATGGTT ACGTTGCGATCTTCCAGACCGAACAAAGTCGCTGCAAGATCTAGGTTCTGTATGCTTGTCAAGACAGATGTAGTGAGTAGTCATCCCTGTACGTGATAGACTCAGGCAGATTCT TGGGCAAATGACCTCGTGAAGTCTGCGTATCCCTGCAACTAGCATCGACAAAGAACGGATTCGCTAGCAATTCGTTGGCCTTGCATTGGGTGACATTACCCAGTGAAACCCATT CGACTTGTAGACGCTCCAATGTCAGCCATGAGGATTCCCGATGGTACATGATGGACCTCAATTCAGCGGAAGTAGACCTTCCCATTCTACTGGTCGCAACTCCATTGTCCACGA TTAGTGACCAGACGTAAACCAGCCTAGTACAAGGGACATTCCTCAAGGTAACTCACTGAATCCTGGTCTAATGACGGCTGTAGTAGCGGCATCTAGAACGTAGCCATAGATAGC CCGTTCTATTCCCTCCTTCAGTAACCGGCTACTTCACTCGAGCTTAATCCTCCCGTATGTTCTGGATTCTAGGGCGTACTATTGCGGTTGTCAATCGAACAGTTACCCGGTTTG ATCCTCCTCCATTTCCGGCTAATGTTCCGGGTTACAAAGGCCACGATACGCTTGCAAACTGTCGGCAAGTCACGCCTACTGATCGACCTCGATGGAGCTGGATTATGGTCCTGC TTACGTTACCGTTACAGGGTAGTCGTTGCAGTCGCAAAGCGACGCTTTGGAGTGGCTATTTCGGGTTTATGGCGGCTAAACTACGGACGTGATGGAGTCGCTTACATGGTGCGA TAACGCTGCATCTTGATCCCTCCACTTCGGACGTCTTTCTGCTGCGAAGACCTTTGATGAGGACCTGATCATCCATCCGGTTATGAACCCATCATCTTCGCTGCTAGTCCAGCA GTTCGGATTCCGTACTTGCTAGGGAATTCAGACCGTCTACTAAGCACCGGTATCCTACGCCTTGGAATACCTACCTGGTACGATGTCGGTTCGTCTACCGATGCGTACTCGTAC CGAGTGCAAAGTGCGTGATCTGGACCCTTAGTACGGTCGTTGATTCACGTCTTGGGTAAAGGACCTCCAAGTCCGACGTTCAGACTCAATGTCTGACCTACGCAACGCATACAT CAGCATACTCCGGATTGCTTCCCGAAGATCAAGGTGCGAGTTTGACGCAAACTATCGCCACCTTTATCGAGGCACTAGCTACCCATCGCTATGTTGAGCCTTAGTGGGACAGTT CGCAGACCTTGACTGTCTGCTTCTTTGCAGACAATTGGCTTGTCGAGTCAAATCGCCATGAATGCTGGCCAAAGTGACGCTTCCTAGGGTGAGTATTCGTCAGGCTTGCAGTAT GACAGTGGATTGTTGTCCAAACGAACTCCCAACGTTTGTCGGGAACATAACGCAGCGATACCACCCGTAATAAGGCGAGCAATAGACCCGGATAAACCTCGGGTTATCATCCCA CGAAATGGGTCAAAGGAACGCTAGCCATCTAGGATGGTCCACTCAAAGTCCAGACAGCTCGTCAGTAGATCAGTCTAGTCGATCCTGGATGCTCCATAATGCTCGGGTAAGTGC GAAGGTCAAACGGAACGCAACAGCTACCGACTCAGTTCCCAAACTTACGCAATCTGCAGGCTAGACAGCCCATAAGCAGCGGTTTCCATGAGCATTGGTTACCGCACGAATCTT CGAGTTGCCTTCACTAGACAAGTGCTTTGTAGCCTTTAGATCCGGGATTCTTTCGCCAAGCATCGCTGTCGATTTCGACAGAATTGGTGGCCTAAAGTGGTCGCTAGACCAATC TGGGCCATTGTTGACCACCCTACTCCTCGATTCCGCTCCTTAGCATGCCCTTTAACCTGCGTAGATAAGGCTAGGAGTTTCGCAATGATGAGCGATAAGCGTCCAAGTTAGTCG TCAGAATTCGGCCGTAATTTGGCGGAGTAAGGGCACGTTTGAGTTCCGTCGTCTATTCGCAGTGATAGGAAGGAGCTTCCTGCTGCAATTCACTCCGCTAGTGATACGCCGTGA TTCAGCTGCGATTCTAAGCGAAGTCTTGCATAGTCAAGCGTTAACTGACGGGATTGGTTTCCGGTGACTTCTTCCAAGTGGTACCTCCACCATCAGATTCGCCCACTTTCCAAG ACCCTCCAAACAACGAGCGAACTCACGAAACAACCAGGTCATAAGGAGGAAATACCTGGCACTACGAGCCAGATAACTCCCGGAATAGCCTAGGGATAAGACGCAGTAGGACCC TAACTCCTCCCTAAGGGATAGCGTGCGAATTCCAACCCTATCAAAGCCGGTCAAATAGGGACCCAATTACAGGCCCATAGAAACGCTCATTCTAGCGTAAGCGTACAAGACCTC AGCATCCGTGATCAGTTGTCTTGGTCACCGAACCTTTCGTATCCGCTTGCTTTGGTCCGCTTTGAATGCGTTTCCGTGGGTTAGTGCTTGAGCCAGGTTCTTCTTGGTGCCTGT TCATCGCCTTCTAGGACCTATCCTTCTGTAAGGCTTCGCATGGTAAAGCTTGCGGTACTACGGTCACGTATTCCGTCAACGAGTAAGCGGGACAATTACGCCATGCTAGGCCAT CCTTAACCTAGCTCATACGTTCCCGTACAAATGCGTCGAGTAATCTGGAGTGCTGTAACATCGGATACAGGCTAAGGTTTGCTTGACATCCATTGCGGAGGATAAAGCAGGTTC GTTATGGGAGGACTTGAAGCATGGTTATGCATCGCAATACTACGCACGTCATTGTAGGCCGTTTGTTGGAGGCAGTTTGAAGCCCTGAGTAAAGCACCCGTTTAATCCCGAACG TAATCGACGCCTTACTATGCCCAATACGCAAGTTCTGACTTCGGGTCTGTAATCCTGTCCGATAGGGATGTTAGGGAACCTGACTAAACGCCACGAATAACGGTGACAGTAGAC TGTAGAAGACGAGGTGATTGGTGAGTGGTGGTTACTCATCCCGAGTCATTGACGAGTTCAGGCGATCAGAAGTGATGACTCCAGTATGCATGACTAAGCTGGGTCTACGTCTTA 21 CTTCCCTATGGTTCTGCCTCGTACAACCCTGTCCAATAGTCCTGAGTTAGTGGAACAACGTCGTCAAAGTTCCCTTGGTCTTCCGTCTGACAAAGCTTCCCAGTCCATGCCTCA GTTGCGTGGACTTATCCGAAGGTTAGGGCTCGTTTGAACAGCGAGCTATGGCCACTAATACGGACTAAATGGCTGTCGTTAAGACTGGAGGATGAACCAGGAAGGTTGATGTCA CTACTACCCGTCATGTAGCAGCTGTTTGACTGCCGTTGAAGGGATTTGATGGCAAAGTAGTGGCATGCAAGCTCAAGTAAGGAGTACTGACCGACAATTCCTGCCGATTTCTTG CGGCTTCTACCTGTCGTATCTTGACCTGTTAGTTCGACGGTCATCCTACATCCTGGGTTGACGATAGTAGTCGGGTTGTCTGATCTGTACCCTCCTACTACTCGCCATCACTAC CAGCTCATGCCTACATAAGGCACTCCTTTCGAGTCGAACGCTTAACAGTGGCGTAGTTGACGCCTATCATAGCGTACGACAAGACGACTTTCATCCGCCATGTTATCCGCCCAA TTGAGCCGGTATTTACCGGTCCTTAATGGAGCGACTAATTCGGAGTCAGATTGACGTACGGTGAATAACCCTCCGATTTGGACTGTCCTCGACTATTCCTGTCGGTATCGTTGC GGATAGTCTGGTACCGAATTGATGGGCTCACTTTCAGTCGGCATCAGACTGTCAGCGTGAAGCACGCTCAAAGCAGTAGTTACGAGCAGTCACCAGGACTGTAATGCCAACGCT ATTGCACCCGATTAACCCTGCGATACAGCCCGATTGATAGCGGAACAGAAAGCGTGAGGTACATTGACCCGTATCTGCTTGAATCCGCTCGAAATGAGTGGCAGAAATTCCGGC GTTAATACGCCCGAATAATCGCCCGTATTCACGGTAAACCGTGCAACATACCGGGTGATAGCATGACCGAATGAGTTGGCGTTCACTTCCATTCACCTACCACCTCAAACCTAG GCGAATGTACGTCTACGCTAACTCGATCTAAGCAGAAGTTGGCCCATTATTGGCGTCAGATAGTCCCGGTTAGAATGCCGAAGTTAGGTGGGAGTAATTCCGACAGCAAAGTTG TCCGAGCTTTCCCGAGGAATCTTGTCCTTGAACAAGCTCGCACTACAAACGACTGAGTACCTACATGTTGGTGATTCCGGTTTAGCTACGGGTACATAACCCAAGGTACGAATC GTGAGGAACTGCCAGGAATGACCGGATCAGATCCCGATCTACGGTTCACGTTAGATCTGCCCTTGTTCTTCCGGCAAACTTTGGGATACCATCTGCTACGCTCAGGATCACCGA CGAAACTACCACTGACATGAACCTTCTTCTAGCCCGAACTTCATCCAAGTACAACGAACGGGTTGATAGTGGCTCCTAAATCCACCTATTAGCCACCGATTACAAGGCAATGAG TAGCCTATGCATTCCATAGGAGTCGGAAGCTACAACTTGGTCCCAATCCAGGCATTGGCAAACGTCGCCATTTGGTGCGGAAACTGAACGCACTAGAAGCAAGACATAGGACAG CCTTGAAAGACCATGCCAAACGTAGTGAATGCCCACTATTACCCGTGCATAGATGTCCCAGCTATTTGCCAGGCAATCGTTCTGGTTTCATCAGCGAACCAACTTGTCGGCTTC AGTCCGGATGACATACCTAGTCCTAAGTTCGTCCTCGTTAGCGAGCGATTGGGTTGGTTGCTGGAAGGTGCTGGATGTACCGATCGAACGTGGAAAGACATGCGTATGTAACGC ACGATAAGTGCCCTATCCATGCAGCACTTAAGACGGTAGGTTTCCCAACTTTAGCGGATGGTTAGGCCTTCAAAGGCTCCTGATCGGATCATAGGCACCTAAGTGGCCCAAACA GAACGGTGCTCATCGAGTCTGCACCTTACTGGATCCTTGTAGTTCCAGGCTGATTACTGGGCGTTTCAGAACCCTCGATCATCAACGCAGGAACAAACTCCAGCCTTCTTACCT GACCATTGGATAGCTAGCCGTTTCCAATGCGTAGCGATCTGTTACCGACGCAATAAGTCGCATGAGTTTGGAAGGACGAGCAAGAACGTCACCATCTATCGGTAAGTCTAGGCA GCAGTAACAGGCGGTATGAAGGACCCACTAGCACGTGATTAGGGCGGTTTATTGCCCACGTTTCCCTTACCTTCTGCAATGTAGACCACGTCAACATTGCCACCGTATGACGTA GTAGATGCTTCGGAAATCTTGCGAGCACTAGTCAACAGCCCTTTCCATCGCACTGTCTATCGATCGCTTATGAAGCCTCCTAAGATCGGTCCTGTTGCCTAGGACAAAGGAGGC CATAATACCGCTAGGATTAGGCTTGGGAAGTTGTTCGACCGACTAGCTGACGATTGGCTACAGACTGCCAACAACGCAAAGGGTATGTATCCCGGTGAAATGCCTAGCGATTCG CAAACATGCGATGATTACCGCTCAGTAGCCCTTCAGACGATAACCTTGCGTTTGCTGGCAGATGCTCAGAAAGTCCGCTCAACTGAAGTGCTCAGCTTGCTCCTGCATCGAGCA GCAATTTGGTCGGATTAACTCGGGCATTGATACGGCTGCATAAACGACCCAGTATCGAACCGCTTCGAAAGCTGAAGGTCGTAAGGCCATGATAGCCGACGTATGCCAATCATT GGCTCCAGTTTGTACCAGCAAGTCTATGGTCACGGAACCTAAAGGTCGGAGTAGTTCAGTCAGCGAGTCTAGAAGTGGCTTGGTGACAAGGCCTGACAATACGAGTCGTACTAC CAACGCCTTCATGGTGGCAGTAGCTATGAGTCCAAGCAGCTTCGTCCATAGTCCAACCATCGATAAGGACTCGGAAGGATGTTGAAGCTGACCAGTGAGCAGCTAGAACAACCC ACCATTAGGCAGGGAAATGTTGGCCGTTATCAGCAGTCTAAGGTGGGTGAAACTGGTCTTGCGTGCATCTGGGAAATCACGAGTCCATCCAGTGCATCCTCAGTGCACCTGACA GCTTTGCAACGCCATCGAAAGACTAGCCTCGTTCAGGTGGCATAAGTCAGGGTTCAGTAGTCTTCTGTTGAGTACGTTTCACGCCGTTTACTACCGCTTAAACCCAGCGTATTC TACGCCCATTTAAGGGCTTGCTATTGGTCAGATGCGTGCTACAGTTGACTGGACCACTGTAAAGCGTTGCCATGACTGACTAGGGAGTTGTCAGTCCTTGCGATTACCTAGGGC AAGAAGCTGGCAACTAACGCCTAGCTAATGGCAGGAGTATCCGTATGCAGACGTTACGGCATCCTATCGACCAGATTCATCGGTGAACGTACGAACTACGAACGCAGTTGTACC CACTTCACCTTCTACTAGCGTCCTCAAATCAGCTCGGTTTGGTCATGTTTCGCTTGTTTCCTGCAGCTACTAACTGGGCCTTAACTACCGGGAAATAACGCCCTTGAATACCGA AGCCAACCACTACAGGTCCCTTTCGGTCGGTTTCAACCTCGAAGGTATTGGTTCCGCATACTAGACGGAGGAACATGATCCTTCGTAGACTAAGGTCAGCCTTAAAGGAGCCCA ATGACAACGGATCTGCACTCACTTCAACTGGCTGCTTTCACCTGTATGTAGCGTCAGCATTGCCGAGCTAAAGACGGGACTTTAAGCCTGGTATTGACAGCCGAATACCACTCC GATCCAATCAGTCGCTAAATGAGGGTCAGTTTGGTACAGCCTGCTAACTGCGAGTCAGTAAGCCGGATAGAAGCTCAGCGATCCTACTTGAACGGGAAGTCTACCCGACATGTA TCGAGTACTTGACGCTGGATCAATGAGGTCGCTTCAATCGCCTCGAATGAAGGGTTGAACTCCTTGGTTAGTCTGCCGTATGGTAGGGCATCATACGCCTAATAAGCGGCCTTT ACATCGGTTAACTTGCCATTGGTCCAATGAACGGAAGGCTAAAGGATGCGTTAAGGTTGCTCACGGTTAGCTGCGTCATTCTGGTGCTGTTCGTTCTACTTGGTAGGCTGAAGT AACCCTTGCTAAGCGGAGGTATCCGGTGGAAACATCGACGGATAATGGCCTGCAATAACGTCCTAATGCTTGCACCTCCTTGCACTTGCCTTAAGATGGGTGCATACCAAGGCA TCAACTCAGATGATCCCACTCCTAACCCTAAGCTAGACTACCGAGCAGATCGGACTACGTAAGT 22 REFERENCES 1. Zuker, M. 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acid Res. 31:3406-3415. 2. Rosen, S., and H.J. Skaletsky. 2000. Primer3 on the WWW for general users and for biologist programmers. Bioinformatics Methods and Protocols Vol. 132, p. 365-386. Humana Press, New York, NY. 3. SantaLucia Jr, J., and D. Hicks. 2004. The thermodynamics of DNA structural motifs. Annu. Rev. Biophys. Biomol. Struct. 33:415-440. 4. Tuma, R.S., M.P. Beaudet, X. Jin, L.J. Jones, C.Y. Cheung, S. Yue, and V.L. Singer. 1999. Characterization of SYBR Gold Nucleic Acid Gel Stain: a dye optimized for use with 300-nm ultraviolet transilluminators. Anal. Biochem. 268:278-288. 23