Supplementary Material

advertisement
Supplementary Materials
GENESUS: A two-step sequence design program for DNA nanostructure self-assembly
Takanobu Tsutsumi1, Takeshi Asakawa1, Akemi Kanegami2, Takao Okada2, Tomoko Tahira1, and Kenshi
Hayashi1
1
Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation,
Kyushu University. 2Research Institute of Biomolecule Metrology Co., Ltd.
CONTENTS
GUSP algorithm
Seed pair tables
Maximum number of USPs obtainable by GUSP
Longest unique sequence pairs obtainable by modified GUSP
CESS algorithm
Design file
DPAL_AB
Output files
Designing a regular octahedron
Hinge length
Assembly of T5-hinged octahedron
Efficiency of assembly
Design and assembly of octahedron multimers
1
Strand sequences
References
2
GUSP algorithm
In GUSP, specified number of unique segment pairs (USPs) having specified lengths are generated by
extending non-redundant (r + 1)-mer seed pair tiles with r-mer overlaps, and evaluated by melting
temperature as shown in Figure S1.
Figure S1. Flowchart of GUSP
“r-mer overlap” means the last r-mer of lastly picked seed pair matches the first r-mer of the next seed pair. Lastly
picked seed pair is identified by referring to Picked Seed Pair List (PSPL). The overlap is any r-mer, when PSPL is
empty. SPT: Seed pair table.
Seed pair tables
This file is generated using a program, Seed Maker that firstly makes a list of integers 0 to (4r+1 – 1) in
quaternary numbers described in (r + 1) digits. Then the numbers are converted into the nucleotide
sequences, that is, 0, 1, 2, and 3 are converted into A, C, G, and T, respectively. This makes an exhaustive
list of non-redundant (r + 1) mers called seeds in alphabetical order. After removal of palindromic seeds (for
odd-numbered r’s) or seeds containing simple repeats (homo- or co-polymeric tetranucleotides), the table is
arranged so that each of the seeds and their complements are placed at the same relative positions in the top
and bottom half of the table. This makes a seed pair table. Each seed has usage status values 0 for unused
and 1 for used. Seed pair tables of r = 4 to 7, and the source code for the program Seed Maker are available
at GENESUS site (http://crane.gen.kyushu-u.ac.jp/genesus/).
Maximum number of USPs obtainable by GUSP
3
The sizes of USP sets were determined by 1,000 GUSP trials in which the program was run without Tm range
restriction until no more USPs were collectable, for l from 16 to 25, and r from 4 to 7. The means of
generated USPs reached to 60 - 70 %, of the upper limit with narrow distribution (Table S1). Without retry
step (see Figure S1) at the extension (i.e., sudden death method), obtainable USPs are reduced to less than
half (data not shown).
4
Table S1. Size of USP sets obtainable by GUSP
l
16
17
18
19
20
21
22
23
24
25
r=4
Limit
42
39
36
34
32
30
28
26
25
24
Mean
29.6
27.3
25.3
23.6
22.1
20.8
19.6
18.5
17.6
16.7
cv
0.028
0.028
0.029
0.029
0.030
0.029
0.031
0.032
0.032
0.033
Max
32
30
28
25
24
23
21
20
19
18
Rate
0.70
0.70
0.70
0.69
0.69
0.69
0.70
0.71
0.70
0.70
r=5
Limit
186
170
157
146
136
128
120
113
107
102
Mean
121.8
111.7
103.1
95.7
89.5
83.9
78.9
74.6
70.8
67.3
cv
0.014
0.015
0.015
0.016
0.015
0.015
0.016
0.016
0.016
0.016
Max
127
116
107
100
93
87
83
78
74
70
Rate
0.66
0.66
0.66
0.66
0.66
0.66
0.66
0.66
0.66
0.66
Limit
819
744
682
630
585
546
512
481
455
431
Mean
512.4
465.4
426.1
393.1
364.9
340.6
319.1
300.3
283.4
268.3
cv
0.009
0.009
0.010
0.011
0.011
0.011
0.012
0.012
0.013
0.014
Max
526
479
437
403
375
351
328
310
292
276
Rate
0.63
0.63
0.63
0.62
0.62
0.62
0.62
0.62
0.62
0.62
Limit
3640
3276
2978
2730
2520
2340
2184
2048
1927
1820
Mean
2138.4
1920.0
1742.5
1595.0
1470.1
1363.7
1270.9
1189.0
1117.9
1054.6
cv
0.006
0.007
0.007
0.008
0.008
0.010
0.010
0.011
0.012
0.012
Max
2171
1957
1778
1627
1500
1393
1301
1218
1149
1082
Rate
0.59
0.59
0.59
0.58
0.58
0.58
0.58
0.58
0.58
0.58
r=6
r=7
1,000 USP sets were generated by GUSP (l from 16 to 25, r from 4 to 7) without Tm range restriction. Limit is USP counts if all seed
pairs are used, calculated as 4(r + 1)/2(l - r). Max means the largest number of obtained USPs. Rate is Mean divided by Limit.
Longest unique sequence pairs obtainable by modified GUSP
In the scaffold/staple design of DNA nanostructures without insertion of special sequences such as hinges at
the junctions, simply a long stretch of unique sequence without sequence redundancy/symmetry is needed.
We modified GUSP so that when extension is stopped because of seed pair exhaustion, the extension is
retried once by reversing the extension and resuming the extension using the alternative choice. After a
thousand trials for each of 5, 6 and 7 for r’s, unique sequence pairs without redundancy nor symmetry of
average lengths 1,310 bp, 4,333 bp and 14,700 bp with the restriction of GC content of 0 to 100 % at sliding
window of 10 base, and 1,027 bp, 3,464 bp and 11,060 bp with the restriction of GC content of 40 to 60% at
sliding window of 10 base, respectively, are obtained, again with narrow size distributions (Figure S2, Table
S2). The longest unique sequence at r = 7 was 17,999 bp (“Strand sequence”), whose most stable local base
pairing is -13.1 kcal/mol, as estimated using Mfold (1). The longest redundancy is 7 bases, which is much
shorter than that of M13mp18, 42 bases.
5
Figure S2. Length distribution of single USP
The longest USPs in sudden death mode (SD mode) with the restriction of GC content of 0 to 100 % (green)
and 40 to 60 % (red) at sliding windows of 10 base and alternative choice mode (AC mode) with the
restriction of GC content of 0 to 100 % (yellow) and 40 to 60 % (blue) at sliding windows of 10 base are
shown for r = 5 to 7 (a to c, respectively). Summary of these trials is presented in Table S2.
6
Table S2. Size of unique sequence pairs obtainable by GUSP
r
GC content (%)
5
6
7
0-100
40-60
0-100
40-60
0-100
40-60
Limit
2053
2053
8198
8198
32775
32775
Max length (bp)
1510
1206
5177
4112
17999
1310.2
1026.6
4333.2
3463.7
14698.9
11059.8
0.064
0.078
0.092
0.090
0.094
0.106
63.8
50.0
52.9
42.3
44.8
33.7
Mean (bp)
cv
Rate (%)
13402
Limit is the USP length if all seed pairs are used. Maximum length of USP obtainable by GUSP, after 1,000 trials
of segment pair extension without limitations by l. r means maximum length allowed for redundancy. Segment pair
finding is retried once, when the extension is stopped by segment pair exhaustion. Rate is Mean divided by Limit.
CESS algorithm
In CESS, USPs are allocated to helices, and a set of candidate strand sequences is produced by linking USPs
(generated at GUSP) according to strand definition (see “Design file”, Figure S4). The strand set is evaluated
by its worst aberrant pairings using a dynamic programming algorithm, “DPAL_AB”, that calculates lowest
free energy of aberrant pairing (Gab,min) in all combinations of strands within the set (Figure S3). Then,
CESS picks the strand set for which Gab,min is the highest among examined strand sets.
7
Figure S3. Flow chart of CESS
Gab,min is the free energy of most stable aberrant pairing. DPAL_AB is a dynamic
programming algorithm modified to find Gab,min (see “DPAL_AB”). Gab,min,best is Gab,min
in the best strand set.
Design file
Helices are numbered, and strand segments that constitute helices are named by integers or primed integers
that indicate helix numbers they belong. The design file consists of two parts, helix definition and strand
definition.
Helix definition defines properties of each helix group (HG) that are numbered. The properties of the
group are its helix length (HL) in integer, lowest and highest Tm (TL and TH) in rational numbers and group
member (GM) denoted by helix numbers in integer.
An example of making design file starting from a 3D sketch is shown in Figure S4. Helix definition of
the structure shown in this figure is,
HG1=HL21/TL65/TH72/GM1-12
This helix definition means “helix group 1 has helix length 21 bp, lower limit of Tm is 65 °C, upper limit of
Tm is 72 °C, and the member of this group is helices 1 to 12.” Note that Tm values are calculated by
the nearest neighbor method under arbitrary conditions (i.e., 50 mM Na+ and 50 nM DNA) (2),
just as indicators of relative stabilities between chosen sequences, and likely to be different
from the values when actual assembly is carried out.
Strand definition describes strand sequence information by segment names and junction sequences that
connect segments. If a strand segment is split into two strands, the 5’-half is denoted by the segment name
followed by prefix “b” and the length in bases, while the 3’-half is segment name followed by prefix “f” and
the length in bases. Junctions are indicated by their actual sequences, where homopolymer stretch is denoted
by A, C, G or T, followed by the length of the stretch. Segments and junctions are delimited by colons.
The definition of one of the staple strand named S1 drawn in green in Figure S4 is;
S1=12'f11:T5:2':T3:4':T5:6'b10
This strand definition means “Strand S1 consists of 3’-terminal 11 bases of complementary strand of helix
12, connected to T5, connected to complementary strand of helix 2, connected to T5, connected to
complementary strand of helix 4, connected to T5, connected to 5’-terminal 10 bases of complementary
strand of helix 6.”
8
Figure S4. Scaffold/staple design of an octahedron and definition of strands
Scaffold strand (red) and staple strands (black and green) that form a regular octahedron with the helices
numbered are shown in 3D sketch (a) and in unfolded diagram (b). In this example, the chirality of octahedron
in a is “convex” folding of b.
DPAL_AB
Coding of DPAL_AB was done by referring to Primer3 (2). The free energy of the most stable aberrant
pairing (Gab) between two strands (strand 1 and strand 2) is calculated as follows
1)
Gab (i,j) is the free energy of the secondary structure formed by aberrant binding of subsequences of
strand 1 (0 ≤ i < N1) and strand 2 (0 ≤ j < N2), considering up to one base gap and mismatch, where N1
and N2 are length of strands 1 and 2, respectively.
2)
Gab (k,l)(m,n) is the free energy of binding of the k-th and m-th bases of the strand 1 with the l-th and nth bases of the strand 2, respectively. Gab (k,l)(m,n) is obtained from nearest neighbor table of
SantaLucia et al (3).
3)
If the pairing of i-th and j-th bases is in the intended pairings, the calculation of Gab (i, j) is skipped.
4)
If the (i-1)-th base of strand 1 matches with the (j-1)-th base of strand 2, then,
ΔGab (i-1, j-1) + ΔGab (i-1, j-1)(i, j)
Gab (i, j) = minΔGab (i-1, j-2) + ΔGab (i-1, j-2)(i, j)
ΔGab (i-2, j-1) + ΔGab (i-2, j-1)(i, j)
[1]
If the (i-1)-th base of strand 1 mismatches with the (j-1)-th base of strand 2, then,
ΔGab (i-2, j-2) + ΔGab (i-2, j-2)(i, j)
Gab (i, j) = minΔGab (i-1, j-2) + ΔGab (i-1, j-2)(i, j)
ΔGab (i-2, j-1) + ΔGab (i-2, j-1)(i, j)
5)
[2]
Gab = min(Gab (i, j))
Output files
The output of GENESUS consists of four files. Those are; strand sequence file, usage status information file,
USP set file and history file for the chosen strand set. In strand sequence file, helix definitions followed by
strand name and sequences are described. History file contains project name, the name of previous project if
any, the name of adopted seed pair list, name of design file, free energy of the worst aberrant pairing of
chosen strand set (Gab,min,best) and the time stamp at the completion of the project.
Designing a regular octahedron
The regular octahedron designed here was made of a long scaffold strand that passes through all edges of the
structure (Eulerian circuit) and four staple strands that hold the structure (Figure S4). We chose a regular
octahedron, because it is hardly collapsed since its faces are regular triangles and all branch-angles are fixed.
Octahedron can be drawn in many other paths of Eulerian circuit. In this study we chose the path that is
9
simple to draw on paper. The length of all edges (helices) was 21 bps, and 5 nucleotides hinges were inserted
at all junctions for bridging the longest positions in two contacting helices (Figure S4 and S5, detailed below
in “Hinge length”).
T-stretches were used as hinges, because T is the smallest among the four bases, and less likely than other
bases to interfere assembly process. Also, AT pair is less stable than GC pair, and is less likely to be
involved in fortuitous pairings. Homopolymeric stretches can act as sequence punctuations since they are
excluded from seed pair tables that are used in GUSP.
Each staple is designed to have two full segments and two half segments at both ends, connected by
hinges. Because staples have only two full-length segments, topological problem at assembly is avoided.
Hinge length
Helix junction can be seen as two cylinders contacted at point C (Figure S5). Hinge length is approximated
from the distance (d) between two nucleotides that are located at exit and entrance points of the two helices
(arrow in the figure). The length in nm of hinge was estimated by the following equation.
d = r{[(1 - cos)sin]2 + [cos + (1 - cos)cos - 1]2 + (sin – sin)2}1/2
[3]
Here, r is the radius of helix in nanometer (= 1 for DNA double helix),  and  are rotational angles of
exit and entrance points of helices measured from contact point C, and is the angle between the helices
(Figure S5). The nucleotide number required for hinge is approximated by the round up of d/0.7, assuming 
= 2/3 for octahedron. A hinge of five nucleotides is long enough to bridge the two contacting helices in the
octahedron at any rotational angles.
Figure S5. Helix rotation and hinge length
The distance (d) between two nucleotides at exit and entrance points of the two helices (arrow in
the figure) is determined by the rotational angles ( and ) of the two helices and the angle ()
between terminal surfaces of the two helices.
Assembly of T5-hinged octahedron
A total of 619 base oligonucleotides were designed using GENESUS (sequences in “Strand sequences”).
Strands were prepared essentially as described in “Strand preparations”. Assembly conditions were
established by annealing at various temperatures (Figure S6a). Optimal assembly was observed at
approximately 60 °C, where a single product band was seen, while the bands of dissociated scaffold or
staples were absent. The product of assembly where some staple strands were omitted showed bands of
10
mobility distinct from that of the complete assembly (Figure S6b), confirming that the product of complete
strand mixture was composed of all strands.
Figure S6. Optimization and confirmation of T5-octahedron (T5-OCT) assembly
In a, Strand mixtures were annealed at various temperature. Lane 1: mixture of staple strands only; lane 2,
scaffold strand only. Lanes 3 to 7: Strands mixtures were annealed for 1 h at 30 °C, 43 °C, 52.5 °C, 62 °C
and 70 °C, respectively. Lane M: 100 bp ladder marker. In b, lane 8 and lane 9 are annealed products of four
staples only, and scaffold only. Scaffold strand was annealed at 62 °C for 1 h with various one staple, two
staples, three staples, and four staples for lanes 10 to 13, respectively. Lane M: 100 bp ladder marker.
Efficiency of assembly
The efficiency of assembly of an octahedron with T5-hinges was estimated by densitometric analysis of
electrophoretic bands stained by SYBR Gold. Fluorescence intensity of bands can vary depending on various
experimental factors such as staining conditions. So, quantitative interpretation of bands requires calibrations
using internal references.
Fluorescence intensity coefficient of a structure (Fstr) is defined as follows.
Fstr = Astr / Mstr
[4]
Here, Mstr and Astr are mass in ng and peak area in the scan in arbitrary unit, respectively, for a structure.
The structure consists of double-stranded helices and single-stranded junctions. Then,
Fstr = Fds × fds + Fss × fss
[5]
Here, Fds and Fss are fluorescence coefficient of double-strand and single-strand, respectively, and fds and fss
are fractions of double-strand and single-strand in the structure, respectively. This estimation assumes that
both Fds and Fss are sequence-independent, and have the same values regardless of whether they are in free
solutions or integrated into particular structures. Such assumptions are likely to be untrue, but unlikely to be
widely deviated (4).
For the octahedron studied here, fds is 0.8 and fss is 0.2. By determining Fds and Fss, the mass of
octahedron in the gel can be estimated from peak area of octahedron in the scan.
11
Figure S7 shows an example of gel electrophoresis used for such quantifications. The assembly was
carried out at various molar ratios of scaffold to staples, and products were electrophoresed together with
internal standards of single-strand (mixtures of staples at various amount) and double-strand (bands of
molecular mass markers at various amounts).
Fss was determined from scans of lanes 1 to 4 (mixture of staples), and Fds was determined from lanes M1
to M4 (ds-DNA ladder marker) (example in Figure S8). By five independent trials of assembly,
electrophoresis and scan, the ratio Fds / Fss of 1.85 ± 0.32 was obtained, in agreement with reported value (4).
The three bands in lanes 10 to 13 were assumed to be the bands of unbound staples, completed
octahedron and faint band of defective octahedron that decreased in the presence of excess staples (from
bottom to top).
Figure S7. Quantitative analysis of assembly
Lanes 1 to 4: Mixture of staples at 4 ng, 8 ng, 16 ng and 32 ng, respectively, lanes 5: Scaffold only, lane 6: Scaffold
and one staple, lane 7: Scaffold and two staples, lane 8: Scaffold and three staples, lane 9: Scaffold and different
three staples, lane 10: Scaffold with all four staples. Lanes 5 to 10 are mixtures of strands at equimolar ratio. Lanes
11 to 13: Scaffold with all four staples but with 2-fold excess, 4-fold excess 8-fold excess of staples over scaffold,
respectively. Lanes M1 to M4: ds-DNA ladder marker at 5 ng, 12.5 ng, 20 ng, and 40 ng, respectively.
12
Figure S8. Estimation of fluorescence coefficients
Relationship between fluorescence intensities of bands (ordinate) and their masses (abscissa) for ss-DNA (a) and
300 bp ds-DNA (b) are shown.
Using values of Fds and Fss of each trial, the yields of assembly products were estimated (Table S3). The
Table shows that 90% to 95% of input strands were accounted for.
Table S3. Efficiency of octahedron assembly
Scaffold : Staple
Octahedron (ng)
1:1
1:2
1:4
1:8
6.13±1.21
7.39±0.19
7.76±0.77
7.58±1.39
-
2.67±0.68
10.34±0.98
24.29±2.78
1.42±0.19
1.34±0.17
1.05±0.14
0.86±0.10
Total observed (ng)
7.55
11.4
19.15
32.73
Input (ng)
8.20
12.30
20.50
36.90
Explained (%)
92.07
92.68
93.41
88.70
Efficiency (%)
74.8
90.2
94.6
92.5
Staples remained (ng)
Defective dimer (ng)
Efficiency of octahedron assembly was estimated by densitometric analysis of scanned images of gels as
exemplified in Figures S7 and S8. Amounts of octahedron, staples remained and defective dimer were
average of five independent determinations by densitometry. Total observed means total mass actually
observed in each lane, that is, summation of octahedron, staples remained and defective dimer. Input is
expected mass loaded to each lane. Explained is the percentage of total observed to input. Efficiency is the
percentage of octahedron actually made to input.
Design and assembly of octahedron multimers
We designed three additional T5-hinged octahedrons and four sets of connector strands (Figure S9). See text
for the design of connector strand sets. Strands of combined length 2,481 bases were designed using
GENESUS employing seed pair table for r = 6 (sequences in “Strand sequences”), and prepared as described
in “Strand preparation”. Assemblies and verifications of these structures were carried out as described in the
legend to Figure 3.
13
Figure S9. Design of octahedron multimers
Six structures were designed and named as shown. Scaffold strands are in red. Staple strands are in
black. In multimers, connector strand sets are in various other colors. All segment-junctions carried T5 as
hinges (not shown in this figure).
14
Strand sequences
[Sequences used in PCR primers are underlined. Segments that are exchanged are in bold.]
<T5-hinged octahedrons>
Strand set of octahedron #1
Scaffold strand
L-1: ATACATCCCTTTATGCTCTTGTTTTTCAATTACATGGGAGTGAGATGTTTTTCGTTTCTGGGCGGCATACCTCTTTTTACGGGACAACATAAATATCGATTTTTATCCGCG
TAATGTCGTTTGTCTTTTTGGCTCACCTTATACGGAGTCCTTTTTTCATCCTAATTTCCAGCTCAATTTTTGGATAGCACGGAATCTCGTGGTTTTTCTGCACAGACTCCT
TTGCCCGTTTTTCGGACATCTACTGATTTGATGTTTTTCATGTTCGGTTGATGCTAGTTTTTTTGCAGCACGTTTAGCTCTAAGC
Staple strands
S-1-1: AAACGTGCTGCTTTTTCATCTCACTCCCATGTAATTGTTTTTTCGATATTTATGTTGTCCCGTTTTTTGGACTCCGTA
S-1-2: TAAGGTGAGCCTTTTTCCACGAGATTCCGTGCTATCCTTTTTCATCAAATCAGTAGATGTCCGTTTTTGCTTAGAGCT
S-1-3: AAGGGATGTATTTTTTCGGGCAAAGGAGTCTGTGCAGTTTTTGACAAACGACATTACGCGGATTTTTTCAAGAGCATA
S-1-4: AATTAGGATGATTTTTGAGGTATGCCGCCCAGAAACGTTTTTAACTAGCATCAACCGAACATGTTTTTTTGAGCTGGA
Strand set of octahedron #2
Scaffold strand
L-2: TTTACTGGCGATCGAAGTGTCTTTTTTCTTCTCAGTGGCAGTAAGAGTTTTTCTCCGAAGTAACGGGAGCATCTTTTTAGAACACCAAAGTTTCATGCTTTTTTTTCAGGT
CTTTGTATCCCGTGTTTTTGCCTATATCGTTAGCGGGTATTTTTTGGGAAGACCGACAAGGGACCGTTTTTGATCTGTTCATGTCCTTTCTCTTTTTAGTATGTGCATCTT
TCCCGCGTTTTTGCCGGGAGATGATCTTACCTCTTTTTATCCAGATCTTGTTAGTGTTGTTTTTCCTGTGATGTCGATTTAACGC
Staple strands
S-2-1: GACATCACAGGTTTTTCTCTTACTGCCACTGAGAAGATTTTTAGCATGAAACTTTGGTGTTCTTTTTTATACCCGCTA
S-2-2: ACGATATAGGCTTTTTGAGAAAGGACATGAACAGATCTTTTTGAGGTAAGATCATCTCCCGGCTTTTTGCGTTAAATC
S-2-3: TCGCCAGTAAATTTTTCGCGGGAAAGATGCACATACTTTTTTCACGGGATACAAAGACCTGAATTTTTGACACTTCGA
S-2-4: TCGGTCTTCCCTTTTTGATGCTCCCGTTACTTCGGAGTTTTTCAACACTAACAAGATCTGGATTTTTTCGGTCCCTTG
Strand set of octahedron #3
Scaffold strand
L-3: TCTAGCGCTCTAGATATGGGCTTTTTTTCTGCTTATGGGTATATGTTTTTTTGCCCATCGGGATGCCTTCAGGTTTTTCCACCAAAGCACACTGACCGGTTTTTGCACCGC
TTAGTTCTCCCGAGTTTTTGAAGTGCAGCGACTCTTGCCCTTTTTCTATATTCATCGCCTGCACAGTTTTTCCCAGGCGGCTTATAGGCATGTTTTTCCAATGCAAAGTCT
TCTCGTTTTTTTGAACCAAGAGATGGGACTTATTTTTTGAGAAAGACTAGTAATTGCCCTTTTTCCTGACTCGCAGTTAGAGTCT
Staple strands
S-3-1: TGCGAGTCAGGTTTTTAACATATACCCATAAGCAGAATTTTTCCGGTCAGTGTGCTTTGGTGGTTTTTGGGCAAGAGT
S-3-2: CGCTGCACTTCTTTTTCATGCCTATAAGCCGCCTGGGTTTTTATAAGTCCCATCTCTTGGTTCTTTTTAGACTCTAAC
S-3-3: AGAGCGCTAGATTTTTAACGAGAAGACTTTGCATTGGTTTTTCTCGGGAGAACTAAGCGGTGCTTTTTGCCCATATCT
S-3-4: GATGAATATAGTTTTTCCTGAAGGCATCCCGATGGGCTTTTTGGGCAATTACTAGTCTTTCTCTTTTTCTGTGCAGGC
Strand set of octahedron #4
Scaffold strand
15
L-4: AGGCCAATTAGCTCCTGTCACTTTTTGCGGCGACATAGAACGAAGTATTTTTACCAAGCTGATGATTAGTAGCTTTTTAATGATACTTATTAGCGCTTTTTTTTCGACCTA
CTTTCTAGCCCGAGTTTTTCGCTTCTCCGTAGAGTTTGAGTTTTTGTTGAGGCTTGCATGCTAGTATTTTTGAAGATCTGTTTGGCTGCCTGTTTTTACCCGATAGGTTGT
TTCGCTCTTTTTAGCGTTTGATGTTAATGCTCGTTTTTAACTCCACGCGAAAGGGATAGTTTTTGAGACTCCTCATACGTCCCTG
Staple strands
S-4-1: TGAGGAGTCTCTTTTTTACTTCGTTCTATGTCGCCGCTTTTTAAAGCGCTAATAAGTATCATTTTTTTCTCAAACTCT
S-4-2: ACGGAGAAGCGTTTTTCAGGCAGCCAAACAGATCTTCTTTTTCGAGCATTAACATCAAACGCTTTTTTCAGGGACGTA
S-4-3: CTAATTGGCCTTTTTTGAGCGAAACAACCTATCGGGTTTTTTCTCGGGCTAGAAAGTAGGTCGTTTTTGTGACAGGAG
S-4-4: CAAGCCTCAACTTTTTGCTACTAATCATCAGCTTGGTTTTTTCTATCCCTTTCGCGTGGAGTTTTTTTTACTAGCATG
Connector strands
Connector strand set A
S-1-4-a: AATTAGGATGATTTTTGAGGTATGCCGCCCAGAAACGTTTTTCACGGGATACAAAGACCTGAATTTTTTTGAGCTGGA
S-2-3-a: TCGCCAGTAAATTTTTCGCGGGAAAGATGCACATACTTTTTTAACTAGCATCAACCGAACATGTTTTTGACACTTCGA
Connector strand set B
S-2-4-b: TCGGTCTTCCCTTTTTGATGCTCCCGTTACTTCGGAGTTTTTCTCGGGAGAACTAAGCGGTGCTTTTTCGGTCCCTTG
S-3-3-b: AGAGCGCTAGATTTTTAACGAGAAGACTTTGCATTGGTTTTTCAACACTAACAAGATCTGGATTTTTTGCCCATATCT
Connector strand set C
S-3-4-c: GATGAATATAGTTTTTCCTGAAGGCATCCCGATGGGCTTTTTCTCGGGCTAGAAAGTAGGTCGTTTTTCTGTGCAGGC
S-4-3-c: CTAATTGGCCTTTTTTGAGCGAAACAACCTATCGGGTTTTTTGGGCAATTACTAGTCTTTCTCTTTTTGTGACAGGAG
Connector strand set D
S-3-2-d: CGCTGCACTTCTTTTTCTCGGGCTAGAAAGTAGGTCGTTTTTATAAGTCCCATCTCTTGGTTCTTTTTAGACTCTAAC
S-4-3-d: CTAATTGGCCTTTTTTGAGCGAAACAACCTATCGGGTTTTTTCATGCCTATAAGCCGCCTGGGTTTTTGTGACAGGAG
16
Strands required for multimer constructions
(T5-OCT)2 : L-1, L-2, S-1-1, S-1-2, S-1-3, S-1-4-a, S-2-1, S-2-2, S-2-3-a and S-2-4.
(T5-OCT)3-I : L-2, L-3, L-4, S-2-1, S-2-2, S-2-3, S-2-4-b, S-3-1, S-3-2, S-3-3-b, S-3-4-c, S-4-1, S-4-2, S-43-c and S-4-4.
(T5-OCT)3-L : L-2, L-3, L-4, S-2-1, S-2-2, S-2-3, S-2-4-b, S-3-1, S-3-2-d, S-3-3-b, S-3-4, S-4-1, S-4-2, S-43-d and S-4-4.
(T5-OCT)4-I : L-1, L-2, L-3, L-4, S-1-1, S-1-2, S-1-3, S-1-4-a, S-2-1, S-2-2, S-2-3-a, S-2-4-b, S-3-1, S-3-2,
S-3-3-b, S-3-4-c, S-4-1, S-4-2, S-4-3-c and S-4-4.
(T5-OCT)4-L : L-1, L-2, L-3, L-4, S-1-1, S-1-2, S-1-3, S-1-4-a, S-2-1, S-2-2, S-2-3-a, S-2-4-b, S-3-1, S-3-2d, S-3-3-b, S-3-4, S-4-1, S-4-2, S-4-3-d and S-4-4.
<Longest USPs>
r = 5 (with the restriction of GC content of 0 to 100% at sliding windows of 10 bases)
1,510 bases
CTAAACAACTGGACTCCGGCCCGGAATAGCTCGTGGGTGCCAGGCGATCACGCAAGACCAGCGTTTCTTCAACATTGGGTAAATAGGCAAATCCTACCTTCGAGCAAACATGAT
TTCTACCCGAAATGCGAACCCAGGTGGCTTTGGGAGTACCTATCGTCTGTCTTCCCGGTGAATTGCTTAGCGGGCTGAAGGCCATCTGTTGTATGCTCATAAGTTTCCTCAACC
TCGCTTCTAAGAATAAGGTCTATTATGTTCATGCCGGGTTTCGTCGATGTAGCAAGTCGCCGGTTAAAGGAGCTTACGATTTATCAGTCGGGCAGAAAGAAGATTCCTGTAAAC
CTTGAATCGAACAGTCTTTGACTAACCTGACGCTAGGGATACAGCCTGTCCGATTATCGCAAAGTTAGACGCACTTTACGTCTTAACGACCTGTTTCAATGCAAGCATGGGATC
GAGGCGTGGCAGTAAGGGCCAATACCGGATTCTATGCACTGGCTGGCGAAAGTAATCAATTTAAGTGAAATACATTAGGTCATACGACTAGATCGTTGCGGTCCCAGAACGCAG
GCCGTACATGCGGCCAGACTCGTTTGTTATGGGTCACTGAGGGCGTCCGTTGGAGTGCTATCCTCCTCGGATCAGAAGGGAATTTGGTCGGTCTGACCACTGTCATTGCCCTGC
ACCGCCGTTTAATTGTAATGGGCGGATAACGTGACATTTAGGGCTCCAGATTAGTAGCGATGGTGAGGCCCAAGCCGTCGCTGATGATGCGTATTACCGCTGCAACAATGGTAG
TGAGTCAGCAGCATCGGGAGGGACGGTCAATCCCGCCAAGGGTACGAACTATTGAGTAACGGTGGTTTGAGCCGCTCCCTGGTAACCATAGTGCCGAGTTCTAGCCTCCGTAGG
TTCTTGTTCCCACTCGACCCTCGTAGTCCGCTTTCAGGCATCCGTCAACGCCACTAGCAGGTATGGAAAGGTGCGACAAATAACCCTGAGTTTGGAATGAATGGATGACTTAGG
CGGCAACTAAGGACCCGCACGCTTGTCCATGAGTGATACCAACCAGTCACCCTACAAAGGCTTATCTTGGATTAACATCTTTAGCCGATACTGGGCTACGCCGAAGCTAACAGC
AACCGAGCGTAACTGAATACTACTGTACCGAACGTATCTGCATCTACTCGCAGTCCTATGTAAGTACGTGGTGGGCATTAAGGCACGTCGTCCTGGATCTTACCTCATCGTGCT
TCCTAGACTATCAACTTCCAGCTCACGACAGCTACCGTGCAATAATGATCTGAACTTGGTTCAAATGGCATAAAGCTGAGCGAAGTCCCGTGAACGGGCGAGCCTACTTCATAG
CCATGTCTACAGAATTACGGGTGGACCTCCATCCTGCCGTGGAACTGCTCCGCAGACAACGTCCAATCAGGAGTTGGCAAGGAACGAATGCTAATAAACTAGGTGATTCGACAT
ACTTGTACTGACAATCTATCTAATTTCC
r = 5 (with the restriction of GC content of 40 to 60% at sliding windows of 10 bases)
1,206 bases
GGTAGCAATCTGCCTACATGGAAGCAACCTGGTCGAATCCTATGGGTGACTATCACCGGAACTTGTACGCTTCAAAGACGCCATTTGTAGCGGGTTAGAAGTGCCGTACTGATC
CTCCAACGACCTCCTAACTCGTCAGATCATCGGTAAGTCGAGCAAGATGGCTGAGTAAAGCGAGGTTTCTAGGACCCTCAAACCAGTGGCTTAGGCTGGACTAGCATGGCCTTA
GCTGCTGATAAGTGGAGTATTGGACCGAACAAGGGCCTATTGCCCGGATAACATGCGAACTATGAGCAGAATTGACCTGAATAGTGGTGGCAAGTGAGTTCTGAGGACTGGAAT
GGACGGATGTCGATTGGTGAACGCAAGGCTTGACTCGCCTAGTTAAGCGGAAGGAAACGTGCGATTAGGGATGGTATCGTTACCCGTCTAAGATCCACGTAACCAAGCTCGTTT
AAGGAGCGGTATTTCCGTGACGTTGAACATCGCTCAATAAGGGTGGGTTTAGTAGGGCGATCAACTCCCACTCATTACGGTTCTATGCACTCCTGTAGAATCGGAATCACGACG
TAGCTTTCGCCACGATTTAGGTGCTCCATACCTCATCTATCGGGACATTAACCGGGTCTTACTTCGGACGAAACTACTCGGTGCAACATTGTCCACCGTTGTCAGCGTCATAAT
GCGTGGTAATTCCCTGGGAAAGATACGGGAGCTATGTACCGTGCTGTTACAACCCTATCCAAACGCTAACAAAGGGACTCAAGGTCCCAATTAGTGCGTCCTTTCACTGCGACA
GATGCTGCCAGAAGCTGTCCTGATGACCATTGGCTCCGTTAGTTGCGGATCGAAATGCAGTCGCTAGATCGGCTTCGTAAGCATTGCAGGGTACTAGACAGGCATAAGAAGGCG
TGAATGCCCTCGATAATCCGACAAGCACTGAGCTGAACTGAAGTCCGTAGGAAGACTGACCGTCAAATCCAGACCTACCAGCAAATAGCCCTGTTGACATGACAGTAGACTACC
TGCTTTGCCTCGTAGATAGCAGTTGGCACGAATTTGGGTCACTACGCAGGATACAGTCAGGCTAGGTACGTTATGGTGCCCATTAGCCAGCGAATAACTGGGCGTATGATGGGA
TTGATGCCGCATACTGGCCGAATGACTTGGAACCACCTTACAGACGACTAAGTAGCCGTTCTTGGG
r = 6 (with the restriction of GC content of 0 to 100% at sliding windows of 10 bases)
5,177 bases
CACGTGCGTTTATTCGTGCCTTTGAAACCGTGCTCAGGCACTGCATTTGAGTCCTCCGTTTAGTCGGACCTTCAAGCTGGAGCCGTTCCCAGGGACCCAGGCGGCGATCATTCA
CTTTAGTTTCGCCGTTTGTCTACCTCGTCTTCCACGAAACCTGCACCTTGGAAACCCATAACTTTCTTACTAACGAACTTTGGCTCCTGTAGAAGTTGGAGTGGCAGATTTAGT
ACCCGCCCATGGTAGTCCCTAAGTGCGGAAGATTAATACTGAACGTAGGGTAACGCAGATGGTACTTGGACTCGAACAGGGTCCAGATTCATGCAGGCGTTCAACAGGCTCAAT
CAGCTCGAGTTGGTAGCAGAAGCAAATAAATCCTTCTAAACTTTATTAAGATGGCATCAACATGCTAGCGACATGACCTAACCTTGTTCCACCCGATAGTTGCCTTGTCTGACA
AATGACCGGGAGCTCGTACGTGAGGCCTTCGCAACAGAAGTAAACGACGTGGTGACGGTATTCTGAACATAGACAGTATGCCTTCCGGCACGCATTTCCTCCCACCACCCTGAG
GGAGTGCTACTTCACGCCGAGTAACCACCAGCGATGAAACTCCCATCAGGAGTGATACCGCACCGAATCACTTCAGTCCGAACGACCTCACCAGGCAGTGACCTTGCTAGTTGG
GTGGCATAAGCTCACTATGATCTGTCGCATTCCCGGGTTGGGACCAAATACCCTCACTCGACTGGATCTGGGCCAGGTCCACCTAGTTTAATGGCGGGAAGGACCCGCTATTTG
ATGGAGGTATGCTCACGGTGCCTCCCTGATCAAACAGGTACCCTGGCAGGATCCGTATTGAGTAGTATCTTGCAATGTTGCTACGAATGTAGGTTATCCTTGGGTAGCTCAAAC
CACTTTGTCGCCTCCGGACCGGACATGGTCCGTAGACGTACCTCAAGCGGGATGGGTACAATTAGTGCACCCAGACTCGCCTGATAGACGCCTACGCCCACTGCTGCCAATAGC
CATACGAGTCGTAGGTGAGCCGGCAATGCGGCCTTATCCAACCGCATAAAGACGTGCTTTGTTGGGCCTCCTTATTCAGTTCATCCAGACCAGACAGGTCAGAAGACAACCCAC
GAGGCCGGTAGAATCCCTGCCCTATCTAGTAACTGGGCGATTCGATTCCGTACTGTTTAAAGATTGGTCTTCGGTTTATCGCAGCAAACCTATTCAATTTGCGTATCGTCCCAG
CTGTTTCCTTAAACGTCTGGCTTGTTATGCTGTTCATGGACGCAACCCTGCGTCAAGCCTGCTTCCTGAAATCTTGGTGATAACAGGACTTTAAGCAGCGTCACGTTTCAGATT
17
AGTCATCTTGATACTCATTAGGCGGTCACGCTCGGACGTCGCCGGAGTTCAGGTTGTACAGGCCAGTCCCATGCTTTAGATCCGGGCTAAGCCACGTTCCGTCAATTCTTGAAC
AACATACGGCCGAACATGTCTTTCGCTCATGCTCGCCCTAATCCTCGATGGTGAAATGAGCTGTCTTACATCCCATACTCCTACGGCTGGTTTAACAGCGTAAAGTACGAACGG
TAACCTCCCGTTATGATTCCCATTAGATTGACAGTTTCCCTTACTTAATCATTAATTCCCTCGTAGATAAATAACGCTACTGCCCATTTCGCACCCGCATCACGCAGCCTCCAA
TGGTTCGTGAGTGGGATTGAACTTCGTAAACCTTACAACGGTTCTATGAAATAATCTATTTACTGCGACCAAGTGCATGCCAAACTGGCTCGTTTCTACGTGCAATAAGTATCC
CTACTCGTCCACTGGCAATTTCGGAATTGCTTATCAGCGTTGCGGTGATCGGTCCCGAGGTGGCCCTACAAACCCGACCATTTGCACTGGGAGCGACGCTAATGTTCTACTGAT
GACCATCCTTTACTCCCGACGCACTCAGCTTACGTTACAAATTTAGACGAAATCCACCAATAACATCGGGCGAACAATTCGAACGTCGGTTAGATACCTATGACTCCTCGCCGA
AGCTATTAGGACGTGAACTACGCTCAAGTCAGATACATCGAAGCATCAGAACCAGGTTCAGCTGCAGACCTCGACCAGCAAGCACTTCTTCCCTAGGATGTTCCTACTAGGGCT
ACTATCTGTTAGCCGTGAAAGCCCACGTGATCTTGTCAACCGTCCTCAGATGCTGCAATTACGCCTCGAAATACTAGCGTGGGAAATTCGTCGGAACTAACTCGGAAGCCCTGG
TCAAAGCCGCACGGCACCCTCGGCTGATGTCACTAGAATAGCACCTAATTCATACGCTATCCCAAGTCCAGTACTTCCCAATCCAATTGTCCCGTCGCACTGAAAGGTTGATTA
TGGTCTACAACAAGTCTAGGGTCTAGCCGCTTAGAACGGGCCCATCTACTAAGGTATTGTTGAGGCGGACAGGATGCGTTGAGTGAAGCGGTGGGCACTCGGTGAACGGACGGG
TTCCGCCGCAAAGTATGTCCCTCCTCATTCCAGCAGCCGCCCTCCAGGTGGGTGACTCGGGCAAGGAGCACTGTTCTGCGAGCCCGTCTGATTACATAGGCCTGCCGTAAATCA
GGTGCGTGGATAGGCAAACTACCTTCGGAGCAGTACCTAGATTTGCCACGCCACGATTCAAGGACAACAGCCTAACTGCACGGGAACCAATGACAACTCCATTGTAGCCAGGCC
GAGCCTCGCTCGACGCCCGGCATTGGGTCTTTGGAGCAAAGCTGTACCGACTTATCTGGCGGTGCTGTAAGTGGATCACTAACATTTGGAAGTTTGACATTGAGGATCATGGTG
GCGGCACTTTCAAGTGAGTCGATGCGAATCCAGCGTGCTGCTCATAATAGACCTGACCCGAGCAATCCTGGCGTACCACCGACATCTAAGTAGTCAACGTCACTTATGACAGGC
AAGACTGAGTAATGGTGCCAGCGGGCCAAATTATGTACCTTTATCCCGTAATCGACAGCGGTAGGATCTTTCCTAACGCCGCTCAGCAATACAGCGAATAGAAATGTACTCACC
CGTTTCCGCTGCGGTCGGTCGTACAAGACCTATCGTGATTTGTTTGCAGGAACGCTTGCAGTATCGGACAAGCTTTCGTTTGCGATTTGGGCTGACCGTTAATGCCCTCGCACG
ATCTACCAGCTAGTGGAGTCTAATCGGGATTTGACTGACCACGGGCGGTACCAAGGGTTTAGCGAACTGTTACTGAGCCCATAGATCAGATCTAGACATCGCCACTTGTCGTGA
CTTTCCCGAAGTCTGTATGGTAATGAATACTTGCCATAGTTCTAGGAGCGGATAATTCGGCAAAGGTCGACGGTCCTGCGGGACTTACCCGAATACAACTACAAGCATTGATTC
ACCTGCCTACAGTGAAACAATCCGGTTATTCCTTGTATGTAGATTCGTTCATAAATGATAGCTGACTGCGGCATCCGGAACAGACCCGTATCCTGACTCAGGACAGCCCGACAG
TCTACTCCGTGGCCAAGAAGATAACTCCTTTGGTTAGCACGAGCAGATCGGAAATAGGACTAGCCACTGTCCAGCCCTCATCACTGAGGCTTCGTTGATCGACGAGCGGTTACA
GATGTTGGAACTTACGCATACATGCCTGCGAACGCAAGAACTTGGCGGAGTCCGGCCCGAACCCAAAGCGAAGTGGGTATCCATAAACATAAGGTGGTCCTAGCAGTCCACGTA
TCATGTTTCTTCGCCTTCATGTAACATGGGTCGGCTATCGACCGGCAGACGATGCTATCTTTGTACTTACTGTACGTTTGAAGTCCGTCTTATGGCGTCAGGGCGTTGTTTACC
CAGTAATCCATCCATGTAGCGATTGATCCCGGACGCCGGTTTGGATCGTAATGTCTGCGGACGACCCTTTAGGTAACAACCAAGCGAGTGGAAATCAAGTAGGCCCTGTCGTCA
ACTAATAAGCGACCGTACCGTAGTAATAACTATTACAATCGGATGTCCAATCGAGTATTGGAATCTTAGACCGTGGAGGAGCTATGGTTATGTCGAACCGGGCCTTTAATTGGT
AAATTGGCCGGGACGCTCCATACCAGGAATCCTATCACGGAAGTCAATAATGCTAAGACTCATGAAAGAATGCCAGAAAGGACTGTTGGTTTCTGAGTTAACGACATTCCGGTG
GAGCTGCCTCATAGCGAGCTTGCGAGTACGCAGTTGATAATGTAAATGTCCTCGGTCTTAACGTGCCAACCTAAGCTAGGTCGCCAGCCAGCACTACGATAAGAAGGCTACGGG
CACGGATACAATAGGGAGCCTTTCCGGGTATGAGTTTACAGTCGGCAGCCACCGTCTAGTCAGTACAGAATCAGTTAATCCGTTACGACGGATGATGCAGAATTACCTGCTGTC
ACCTCCGCTAGGCATTTAGCTACGTACTAGTGCCATGAGGTGCCCTTTCGGCTTGGATGCAACAACTTGACCTCCTATGCTTAGTGGTCGTGGTTCACTACCCTTCCTTCAGCA
GACTGGTCCAACGTATGCATTACAGGTTAACAATGTATTTAGGATAGTCTTCATCATGCGTCTGTCAGCATTCGCCAATCACCTTCCAGGACGAAGATGCATGTCAGGCTAATT
ACTTTCGATACAGTTGGCCTTGCATGGAATACGATGAGCCATTGCTAAAGAACAAACTCGTGAATTGGGCATAGCCTATTAAATTCAGCCATGCCCGCAAGGGCCGCCGGCCGC
TACCGATAACGATCGCTTCTAGCTCGGTAAGATCGCATAGTATGACGATCAGTCGCAATCATGACTTAAATCGATGACATAGCAGCTTCCATCGGCTCGAAGGGATCCATTATG
CAGCGACTTGCTCAACCATCACCCTTAGGCCACTACTGGTAAGGATCGGCGGTTCCTCGTTACTTGTAAAGCGGACTATTGCCCTGCAGTCGTTGGCTGTAGCTTGGCTTTAAC
TACTACATCATTGTCACGGGTAGGTAGGCTTAACCAGTTGCACCGGGTCGAATGGACTAACCCTCCGATGCACGCCTTACGGAGCTTCTTTGATTGTCGACTAGGAATGGCAGT
TCTGGTGCATTCAGACCGGTAATACGCCGTACGCTTTCCAGTTCCCT
r = 6 (with the restriction of GC content of 40 to 60% at sliding windows of 10 bases)
4,112 bases
CCGTTTCAATGGGTGGAGTTTCTTGGGCAAAGTGAACGTATCGCAAACGCTCGTCTACTACTCGTCGCTACATCCATGATCGTACATCTTCGGAACTCCTAAGCATTGGGAGTC
CTACCCTACCTAAGGGTACGAATGTCGAACAGGACCCTTCTTTCGACCTTAGCGGTCTGATAAGCGTCTAGGTCAACAGTCATGCCCACTAAGTCAGGTCTGGAAATCTGGAGC
AGACAGGAGCTTACAAGCCCAGTGAATTCCTGAGTACGAGCATTTGTCCGGCTTTGAGCTTGATTCTGGCGGTTACATAGGGAGGGTAAGAATCCGTATCAGCCTTATGGCCGC
ATTTCGTCTGTACTGAGGGCTACAATCTGCGTCCTTCTAGATGGAGGTCTTCTACAGGTGCCAAACATCCTGTAAGGAAGCTATGCAAGCCGATGGTTCCCAGAACAGATGTCG
GTCTAATTCCGTGGCATAATCGCAGCACTACAGAAGGTTCCAGGATTGTCATCGCCTGATTATCCGACTACCATACAGATCGCCAGTCAACTGCAAAGACCAGATCCTATCCTG
CGTTCAGACAACGTTACTGGCTTCACGACCCAATCGACAACCCTCGAGTAAAGTCCGACAGAATGCTAGTGGATGCCAAGGTCTACCTTCGGTGAGTGAGCCACTGTTACCATG
TCAGGGATCCAACGCCAAATGGATCGTGCAACCTCAATGTTGCAGCCAACATAAGCCTAGGGTTAGCGAACGAGTTTGTACGGGACTGAACAGCGACCAAGACAGCCTCAGAAC
CTCGCAAGAAGTTCGATCACGTCTTGATCGGCGTTAATCGAGGCAATTCGCTTCTAAGACCCACCAACTGAAGCTCGATGCTACCAGCAGGTACAACCAAGGGACAACTTTCCA
CTGGTTTACTGTCCCGAACAATTGCGACTGGATGGTCCTAGAAAGCCATACGGTATGGGTACCGTTAACTTGGCCAGTTCTTTGGTGCCGATACAGGGTGCAAACTTGCGGCTA
CTAGACCATGGGAAGAATACGGCCAACTACCGGGATCTAACAGCCCATCAAATCCCGCATAGAAACCGTTCCTCAGCCATTTCTTCGCCCTTCATTACCCGGGAATAAAGCCCT
TTAGCCTACAGCACCTCACTTACCTGGCCTAACGTCAACGTACCATCAGAAGTGCGAGCAATACAGTGCTTCTGGATCACCTGCCTATGCCTAAGTTGGGCCTTTACCAGTCTA
TGGGCGATTTCGGGCGTTTAGATGCCCGTTTGATAGCGGCAATCAACCACGACGTTCTGTTTCCAAGCCAAAGTTAGGGATACGCTGGCAATGGTATCGTCTTCATAGGACGAG
GATTCAGTAGGTCGCAGAATTGGCGAGTGAAGTTGAGTCCGGTTTAAAGCGGATAGACAAGGACCTGTTTACAGCCGCTTACTAGTGCCGTCAATTACCGGTCTTGTTTCGAGG
AGGAATCGATGGCGAACCAATCCCTTTCCTAACCTGATGATGCCTGGTGATGGCACTTAGCCAAGCTGTCGTTGCTGGACGATTGTTGGGACGACTTGCTGTCAAGGGTGATTC
GCCGTTAGTATCCGGTGGATTAAGCTGCTCATAAGTGGAGGAGTACATGTCCCATAAACCTCCGCAAATACCACGGGTTACCTCGTGGAATGTTCACCTATGGAGCTAAGAACG
ACGACAATGGCGTGGATAAACGGCCGAATAAGTCCCTACGAGTGGCAACTCGAACGGATGTCAACCGTGCATCAATCACGATGTATGCCCTACTATGGTGGTCTGTTATCCTCC
GTCATTGGTACTCGGACAGATTCACCCACTGAGTGCCTCAACGATGGGATCGATAGCATGTCGTATCTGGCAGTGAGTTCGCACGTAAGCCGTCTATTGTCCTCCCATTAGCTG
CATCGTCAGTAAGTCGACCAGTTGCTCAAGCCTTGGAGTGGTCAGGATCAGATAGGTACGGAAAGCAACCGAGTGCTATTCACGCCGTAACTCGCTGAGTCTGCAGGTTCAGCA
GATAACCCACGGTAAATCGGCCCATTTAGGGTCGGTAACTTACGGGTCAAATAGGGCCACTTTCTGGTTGGTATGAGTCAGTGCGTCTGAGTTAGACAGACTAGTTGCAAGGGC
AAGCTTAGTCATCCCGTTGTAACTGGGTCAGAATAGGAGCGGAAATACGTGGTCCAACCTACAAGAACCCAACGGAACATACGAGGCTAGATACTCCCGGAAACTATCGGAGGG
AATGGTTTGCCGTACGTAAACGTTCCATTACGACCATAATGCGTCGGAATCTATCGCTTGCTTAACCTTGCTAGGATACCCAAACCTGGTTACGTCTGCCTTTCGTTCCGGTAC
ATACTGGACTAGCGTGATGCTTTGCTTGCGTCAAGCACCATCTTACTCCGAAACTGAGCTCGTGACTCCAGGTCACTTCCCGAGTTCTGCATTGCTGAGCCTTTGATCCGTGAT
AGAACGCTTGGTAGATCGGGCAATAGGTTCGAAGACTACGCCTTAACAGGGACGTTTCGCCTACGTATGACAGGCTATGACTTGGTGAAAGATCCGGCAAGATACCGATCGCAT
AAGACGGGTATTACGGACTGCTAACGCTGACAAGCTACAGTAGTCAAGTCCTCGCTAGACATACCTAGTGATCTGCACTTTAGGCTGCGTATGCTGGGATTACCTACGCAACTT
GAGTGGACCTTCCTACTCCATTCGACGTGCTACGAAACCAAACGTAGCTAACTGTCACGCAATAACGGGTGACATTAGGTCCCAGTACTACCCGCTTGAACCGCTTCGATAACG
TGCCTACTTCAACGCATGGTTGCGAAGCAATCCGGAGCATAAAGGAGGCCTATTATCGCCGAAGTTTAGCGTCACCATGACATAGCAGCAAATCAGCGGGAAAGACGCCTAGAT
TACGCTCATCTGGGTGCTAATTGCAGGCAGTACGTCACTGTCGCATGAGGTACTAGCCAGGAAGATAGCCATGAATCGTGGCCTTACTTGTCGGCTAATACGCCCAAATAACCG
GCGTAAAGACTGCGAGTCAATAGCTCGGAAGACATTGCATCCGATGTTACGCATACTACGACTAACTAGGAGGTGACTAGGGCTGACTATTGGCCTCATCACTCCTTCAAAGTC
GCTCGAAGTCGGTGCATGTATCACCGTAAATGTCCGTGCTTTAGTCCCACTTGATACCTGTCCTTATCCCGACGAATTTGGACGGAAGTCCACGTTCGTATTGGGCTCAACATG
GCGGAATAGTTCGGATTTCTACCCAGGACTATGAGGGAGTTGGAAGCGTACCGAATGCCTTCAGCTATCCAAAGCGAGCTGATCGCTGCAGTAGAACCAGCGAAATAGCCCGTG
AAATTCCCACCTAACATGACCCTCAAACTAGCTCATTCCGGACCAATAAGCAGGGTATGCAGTGGTTCATGTAGGATGCGAATACTCACGGCATTGTTAGCCCACGAATCGGTT
TGTTCCACCTGAATGGCATGTTCGTTGGCAAATGAACGGCAGTTAATGCCAGTGCATTAGTGCGGATTAGGGCGGTATTCATCCGGGTTGAACTACGGTTAGAATCGCTATGGC
AAGGAATGCATGGACAAGATCTGAGCACGACTGTACAAGTCTGACGGTCATAACGAGCCTATCAGTTCGTGCGTTACAGTTCCTAGGTTTCCTTTGCGTTTCTGCTTGGGATAG
TGCAAGTGCTCCCTTACGATGAGTAGCAACAGCATCTTTGTCGCTTAAACTGCGGTTGTTGCGGTCAATGAGGAGCCATCTGACTGATTGGCTCGACTAGATCAGGACGTCTAA
GGCAACCCATCCTTCGCTCAATTTCCCTGGTACAGCTCACGAAGCTGACCTTTCAAGGAGCAAGACTCGATCGTTACGAAGTGGTAAGGTACCACCTTAATGGCCTGGATTGAA
GCAGTTCAGGGTTCCTTCCCTAGCTTTCGGTTCGTCAAACCCGATAAGTAGCGGAGTCTACGATAGGGTGGTTATGTCCACCAGACATCAGCACGCTTTCAGGCAAGTACGGTC
CATTGGAC
18
r = 7 (with the restriction of GC content of 0 to 100% at sliding windows of 10 bases)
17,999 bases
ATCGGGACTTTCCCAATAGTCTAACCAATACCCTCCCACGGGAACGTGCTCGTGCTGTTTATTACATCCCGTTAAACAGACTGAACATGTAGATTAGCGGAGTGCACGTCGCTG
ACCTAGCCAAAGATACTGCCTCCAACCCGGGCTGGGCTTACCACCTGCCTCACTAGACGAGCGAACTATCTGGGAGTAATGACTGTTGGAGTCATAAAGCATTGTTCTTTGCCC
ATACTCCAGGAATTCTTCCTTCGCCACTAGTCGCTGTATTACGAAACGTTCATAGTGATTCACCATGACGATGGGCTCCTCGAGCAGTACCTCATTTGGTTTAAATCATCCGTT
AACATTAGGTGACTAGTTGGAGCTACAGCGGCCGAAACCATAGGAGTTGGCAGGTTCGGCGTCACTGACCATCGCATCAATGAGTGACAGATAAACGATAACCATGTCGAGCGA
TTTAGCACCCGTGCGGAAGGTGCAAACAATGAGGAAACTTTCTATGCCAGACATGCTATTAGCTGTCCCTCATTCAGCTGCCAGCGAAAGACCATCTTTGCTCAAGAATCAAGC
GTGCGGGCTCCAGATCAATGGACTCGTCCGATGTTTGTTATCTGAACCTCACGTGCTGACCAGGTAACGACGTATTACAACTCAGCAGGGATCGATACGATTGTTGCGTATTAG
CCGCTACTCATCTACAAACGTCTACATACAGCTTGAGGTAGCATAAGGCACCCGGAGTCTTTGCAAGGCCTCCGGATACTGGACTGTTTGGGTCTACGCCAGGCGTTCATCACC
GACATGTATGACGTGCGTTGTTGATGCTTCCAGGGATGTCCATCAGAAGGCGACGGTGATTAATTACTCGGCTGGCCATGGGCATACCACCAGCCCAAAGTATTACCAACCTAA
GCACTAATCGACATTTCTTTGGGAAGCCATGCGGCAGGAGCGAGGCCTATTCAATCTGCATTGTCGGCCAACCAACCGGATGTAGAACGAAGGATTGATTCGACCTTTAATTAT
CTTCGTACAACTATTGAGTTCACTATTAGATAATGTAATTGATCCACTGGGATGGATCTTCAGGGCTGTTATTTGTAGCTATCTGCAGCGGGCCGTCCCTAGATGTTACCGTGA
AAGAACTCGCTATGGACGAATTAGCGTCCTATCACTGAACCAACGCCTATTTACAATAGGACCCGAGCAAATACGACTGCCACTTTACTGACCTGGGCAACGGTTACAACCCAG
ATCGTGAACCTGAAAGTAATTGGCATTTATCGGAACTGTTCACGTAGGACAACCCTACATCGTTTCCTGGCGACAGACGGTCGAGTGATTACTGCCGTGCATCTTACCCGCCTG
TCGCATTATTGAGCTCGTGGTCTAAACTTTAGGCTATCAATCGCCCTAGTTTCATCGGTACCTCGTGCCAAAGTTGAATCAAATCTAACAATACCGGACAAACTCCTCGTGGGT
CGCCTGAAATTTACTCGCCTGCCGTCGGATAATAAAGGCGGAACATCACGTCGTCCACCCAGTTTCCCTCACCATCACTTATGGCGAACATTTGGGTGGTCTGCTCAGAAATGG
GAGTTTCCACCTGGTTGCCCTTACTACCTATTTCCTTTCGGGTTCTGATTATTAGCGAATAATGGGAACAGTGGTGACCGCTGCCCATTCTAGACGTGGATGGGCACTACAAAG
AAGTGGCGTAATAGCCGGTGAACTATGATGGCTCGGTTTACTTTGCTGTTGTCGGAGCATTGATACCGCCCTGCCCACTTTGTCTTACAGCATTGGCGAATAGATTTACTGCTA
GGAAGTCCAGGTTTAGTGCGGATCATTTACTAGATGCGGCGATAGTCCTTCGGGACATTCTGTCCCAGCACCGCCTTCCAAATTATTGGGCCAGAACCTCGGGTATCACTAGGG
TACGTGAGTAACATGTCAGACGACTGAGTGGGCTTGTATTCACCTGACTTATGACCACTGGCAATACATGCACCCATGGTGCGTTAGGGTCGATGCAACTTAGGTCCCTTTACT
CCGATGGCCTTATCCGCAACGATCATTGGACCCGGCTACCTTTGGTGGAGCTCACTGGTTGAGCAGCTCATTGGTTGCACTACGAACCACGATCCACGATGCGAAACGAAATCG
TGGGCGATAATAGAAGTGATTGGGCTATGGTGCAGAAGGGCTAAACTGCAGAATAAGCATGGAAATCAACAGTAGGGACCGTGACAGCCTGAATGCGGAAATACGTGCATTAGG
CCATGATAAGTTTCAGAAACCAACAGCGAATGGGATAGTGCCAATACTGACAGTTTCGGAATTCGGAAGGGTTAATAGCGTACAGCCGTGAATAATCCCATACCTCACTCGCAA
AGTTATTAACGGCCTATCTGACGTGGTAGTGAAGACTTCATAGAACAGCAGTGGGAGTCTGCTGGGCGTGCGTCCCTGCGGGAACTCAGGCCCGGTATGATTCGTTCGGCATAG
TGGTTGACATAACGGCTGAGTGCGATTGCTTCTGGCGTGGCCTTTACGTGCTTGCTGTACCATGGATCCTATCGGTTGGGACGGGCAATTCCCTTACCCTGTCGTGGAAAGGGA
TAAGTACGCACCTCCTTCCCGAGTTAGAAGCACTCGATGATAATACCCATACAGGCTCGAGTGGACGCCTACTCAAGGTCCAGGCGATTATGTCGGGCCAATCAAGACGATCTT
AAGGAGCTCGATCACTGCCAGATACGGGTTCGAAGTTTGATGAAATGCCCTACTATTAAGTTGATAATTGTCTTCGGCGGAGTCCAGAAGACGTCGTATGCACGACCCGTGGCA
TTCCGGCCCTAAGTCAAGGGCTGGCGGGCGAGCGACGGAGTACAAGTGCGGCATTCTACTGATTCAGCACTATGTTCGGGCTCGTCCAACCACGCAGCAGTCTGGCGGTGATCT
GCGGGCCCATGCTGCATCAAGGGTGCAGCGTACGTTGTAAACGAAGTACATGGTAATCGCAAGACCTAAACCCAAAGCCGGTCTGTTGAGTGCATTGGTCTGAAGACGAGTAAC
CAGGCGGCTCCGTGGCTTCACCTCCCAAGAAGAAGGGTCGGGCTATTGACTTGGCCGGTATCGAGCGGTATTCCGGATGCCCAGCGGGTGACTGATCTGTAATTCTGCTGAAGC
GGGCGTGGGTTACATCTACTTACGCCCACTGTAAGGGTATCTGCCAGTCTGCCATCCTAATAACGAACCTTCCTTTGCCTAATCAACCCTTTGCGTTAATGAAATTAAGGGCGG
GCCAGGTCGGTTGTCGTCTTAAGCCAAGACAACAAGACTCGCCATCTGGTAGGATCACCAGCTCAAGCGGAGCAAACTATTCATCGAAACCCAGGCAGTGCTGCGGCGTAAAGC
CCTTTATGGCTTGGTGGGTTTGACGTTTAAGTCGCCACGTAACTTCATTGCGGTAGATGGTTGCTGGTATGTCGTCGAGTTGATTGCCCAAGTTGACTTCGGAAATCCAGGTGA
GTCAGCCATCCCTCAAGTCGTACAGTCTTCTTACCGGGTCAACGTAGCCTGGATAGTATTGCAGTTATCTTAGACTTATTACTTCAATGGCGACGCCCATTTAGTATTCTGTTC
CACGCCGCCTCAGAAGATCGAAGATGTATCCACGCTTTGAATCCTGGCCGGCTAGTCGTTCTTAAAGCGGTAGTCCATCGAACCTCCGCTCAGTCCGAGTTGCTCATCGACGTA
GGGATAGGAGGGCTAGACTCCATCGCCCGTAGTACATACTTCAGGATCCGCTTCTACGACCACGTAGTCCGCCAATAGGCATCGCACCCTAGACTGTCAAGGATAGCATCCTAT
GCCCGGGATGCGGAGCGAAGAAGTAAGCATCCCAAATCAGCCGAAGGGAAATGCGATAAACTCAAACCGTGATCGGGCTTCAAGCAATACCAGGGTTTCTTCTGCTCCTCAATA
CTTACTCGAGGGACCATTCGTCGCTTCTGAAGGAATAGGTTGAACTGTCCTGATAAATTGGCCTGAAGCTCAGTTAGGGCTATCCGCCGGTAGCGGTGGGTCTTGACTAGATCT
TGGCCTAGCATCGTCTAATTGCATTACTGGAGTGAGTTCTACGCATTAACAGCATGAACCCACTCCTCATCTGTCTGGAACGCACTCCTGTTGCCGATTGCATCTACGAATAAC
CGTGCTGCCATTCTTGGCACCATCTGACAGCGTGCCGTACAATCATCTGAGGGTCCACTTACGTGACGCTCAAGGCGGCACGTTGCCGGCTTGTCCTCGACCTACTGGCACCCT
GCGAGGCTCGTGACTCCCGCTGAGGGCTTAACGGTACCATTTAATAGGCTCAGCGTGAGGAGGCCGCTATCACGGATTTCGGCGACTACTATGCGAGGGCTGCGAAAGCCCGCC
GCCCTCCCGTATTCTTCATACCCACCCGATTCCTGAACTTCGGCAGTCAGTCCACTACAGCAGCAGGCTGGGTAAGCGGGTATGGCGGCTTGATACTCGTGATTGTCACCGATC
CGACCATCAACCTGATAGATGAAGTACTAAGACGCCTTCATAATCACGCAGGGCCGCTCGTTCATGTATCGAAATTACAAACCTTAATGGAGTAAGATTGGGAGCCTTCAAATA
ATTACGGTTGACCGCCTGACATCCAGCCATACGGCATTATGGATGATGTCTGTCTTTCATCTGGGCGAAAGGTAAGACTTGGAGTACTGCAACCGAGGCGTGCCTTGGCGTCAT
TTGACCGGAATCGCATACGTGGACGGAAGATTGACTGCGTTACAAATCTGCCGAGGGCGTCCTGCTCACTTGCTGAGCATTTAGCCTAGGTACTAGCGTTGAGGTGACCTTTCC
GTGGTGCGACTATGTATTTCAGTTTCACCTATGGCACCGCTGGTGCGGCTGGTTGGTCACGCAACGCAACCTCCAAGATTCGCTCAGACCTGTCGAACCAGAACAGTAACTCAT
TCCATAAATCAATTTCGTGGGATGACGGAACTCGGCCCTTGTAAGTACTGACGCCCGGCATCACTAAATCACTGGAGGGCCACGATAGACGTCCGCTTAAGGCTAGCGTCAGCA
TAGCAGGTGACGCAGTGAAACGTCCTCAAATCAAGGTAGATCGCCGACCCGACTATTTGTTCAACGGATTAGGTCGCTGCGTTGATTCCCTGACGTCTGCAGGAGGACTAATTA
CCGTCTACCCAGGGTGAAGGCCCTTAGATGGCCGGGAGTAGACTCGATTCTAACGCTCGGCATGGAGTTGCGTCCGGTCGTGATCTTGCTACGCTTCGCTGCCACCGGACTAGG
CCCATCTTGTCCCTATGAAAGTCGACGGGACAACATGACAAGCTGCGGGTCCGTAACTAGTGCACCACTAGGCGATGAACCTTGACCCTAGCCGCATTTGCTTCACTTGGGAAC
TGGCTCGCATGGTATCTAAAGTCCGCACCTAGTATTAGTGGATAGGCAGCGTCAAGTATCGCTGGATACGTGATTCCATGAGCGTTCCGTTTAACCAACTAACCTATCTACGGG
TCAGCCGCCGGGCTACAGGACAAATGTCCAAATGGAGCATGGTGGTTCTTATTAGACTCAAGCCTTCTAAGAAAGCCATCTTAGGTGCTCCGTCAAATAGAAATGCTTGCGTCT
GAACGATTGGGTATCGTTCTGACCCAATTTCCACTTTCGTGCTTCCGGCTCCTGGTCCACCTTTGTTGCATTTGATACGAAAGAATAACATTTACCGTTCATTACCGACCGTAG
TTACATAAAGAAAGGAGCACTAGGTTTACGAGGGAATGTCAGTCTTATCAAGGCCAACAGGTCCCGAGGGTGACGGACCTCCTGGCTGGCTTGAATTACTAAACTCCGCAGACA
TAAACTAGTATGGCAGGGTCTGACGGATACGACAGTGGAACGGAAATGGAATTTACGGGCCCTCGTTGCCTAGATTTCGAATTACCCGGGACAGTATTCGTGGTGATACGGCCG
AGGTCCATCCGATGATCGACTTGTCCGTTTGGGAGTGCTAGATTCCTTGGCTCAAACGCATCACGCCGGCCCAAGGTCGCAATAGACGCATTCCACCCTAATCGTGCCCAGTTG
GTACCTGGGTGCTGCTCGTCAACCTCGAATCCAGATGTAGTCAAACGTGATACATACGTTGGTGCATTCTTTGACAAAGATCGGACCCATGTCATTGAGCCTGGGACTCAGACA
AATACCTCGCTGGGAGGCGTCGAGCACTTATTCAGCGGTAACCACCCGTACCGGCACTGTTGCAGCTGTCAACTCACCCGGCGTATTGGATAGACAAACAGATGCTACGAGTTG
TCCAACGCTCCGGCAGCTCGTCGGTCACGGGCGGCTATGTCCGACTACGGACAGCAATCTGATCGTGCGTGAACGGCGACCTGCTCGCCAAACATTGCCGTTCGCAGGCAACTG
ATCATAGACATGAATGTTCATAAGACCCGCCCGAGGCCCAAATTTCCGGTAACTCCCTTTGTAAAGGAATTAACTGGGAGCTGCGATTTCTGGAAACGTGGTGGCGTCTAACGT
CCCTTCTAGATAGAATAAATGGCGGAGGAGTACGTTCGGTGGTCCTGCAATTCTTTACGGTACAACATCGCAGTTACGATCCCGTCCATCTTCTAGTCCCGCCCAGTCATAGCA
ACTTCACTCGGATGAGTCTGACCTTAGAAAGGCCCGAGTGCGGGTAGAAGACATACTCGCACTTTATCCGTTTCGGTAGTTAGGATCGTCGTGGCTATCGTATGAACTTAAATA
GTAAGTGAGTGGTTAGTAGGCCTGTACTGGGCCTTGGTGAAGATTTCATGCGACCGTTAGGTTCAAACAGCTATTTCTGCCAATCGCTTCGTCGACATAGTTAGTTAAGCTGAC
TTCTGTCAAATCGGCTACTGTCTGTACATTCCAACGTCAGCTGGGTGAAAGTGCATAATTAGGGAGTACCTTCGATGCCATAATCCTCCCATGAAACCTGCGTTCGGATACCTT
TACCTACAAGTTGTTACTGGCTGCCCTCCGCAAAGCGGCACCCAACGACGGGCGTTTCGAAGGCCTGACGGTAGCACGGAGGAAGTTATGGGTTCCAGCGGACAGACCCAGTAT
GTTATGAATGCCTGACCTCAGGCTCCGGACTTACAGACGCCGCTCAACGAGTAGTGAGCCATTTCGCACGGATAGTCGATAATTTCTGTTTCATACAATGCGAGTCAACATCTG
TATGATAGCAAGGCTTCAGATCATTAATCCCGGAGCAGGCACGGACTGCTTCAACAGCCAGATTTATTGGCTGAGCTGATGGGATCATACTAAGCATTATCCCACTCGCCCAAA
CTTGTCACTTTAAATGTCAAGTGGCTAGCTGAGGACTGCAAGTATGCGTTATGCCCTTTGGATGGCGTCCCGTAACGAAGACTCAGCGAGTAGCTAACCAGCACTTCCGTTGGT
AGGGAGGGATACAAGGCTGATACTAACGGCAGAATGCGAACCGACTGAAACCGTCGTCGCCGCTGGCGATTCAAGACATTCGTAAAGTCATCCTCAGACGGGCCTCAATGCCGG
CGGTAGCCTCATACGGACGATCACCTCGACTAATGTCGTACTTGCCGAAAGTATCATGCGGACAATGTAAAGACCCTCGTCGAATGTATGGTGGCACGCACTGGTCCTTGGAAG
TAGTTCTGTAAATTAATGGTAAACCCGCTCGCTGTCTTAGATAGCGTCGCTAGTCAGACAGGTACGCCGTATTTAGTTGTTTAGGAAGCTGAACGCTGGCCTGGAGCAATCAGA
CTGTTAGACATTAAACAAGGCCGTCGCATAAACAAAGACATCGTGACATCAAATACAGGATACCAGACTTTATTTACCCTTGCTGCTTAAGTACCCGAATTGAGGGTAGATTCT
ACCACCGCCCAATTCTAGTTCAGTTCTAAGCCCGGATGAACTGCACCTACCCTGATACCTCCTCCCTTGGCAGCCTTACCTAAGGTTGGCGTACTCACCGGTACATCCAATTGC
GGCACTAGAAATTCTTAGGGTTTGGCATGAGTAAGGATTTAACCTGCTTTAATAATAACAATTTGTCATCATCAGGCACCTGTTATGGACAGCGGTTTGTTGGCAAGTCGGCTC
GTTGTTTCACGAGGCGGGAATAGACCGCATAAGCCGTCAATCAGGGCCAAATTCGCAGACTATGCTAGAAGCCAACTATGCATCTGCGACCCGGACAGGGCACCTATTGATGGA
AAGACGTTAAGTTTATTCAAGGCATCAAAGTCACCGGGAGCGACCCTCCTTGATCCTTTAGCCGTCTATGGGTGGGCGGCAACCTGGTGAACAAGCATGTTCGTTCAAAGATTG
TTAAATTCAACAAATGGTTCCTCAGGTCCTGTTAGGCCTTTGCTTAGATCACTACATTTAGAATGTTACGGCTCGGAACGACATCCGTAATCAAAGGATGTTATCCTCGATAGC
AGTCACGCCAATTGTACTAATCTGCTGTCGGGTTTGCTGATAGTACAGAATCAGCATTCGGCCACTAACAAATCATGGACTTATCTACCAAGGTAAACTTCTTGCGTGCATGAA
GTGACTCGGCGGGTTGTTAACCTTTAGTAGTAGGTGGAGGGTTTAAGACATGGCTGTAATGATTGGTTCTGCGTTTAAAGGGTCAGTCAACGCCCAAGCGGGATAAATCCAAAT
AAATTTGTTTGATCGTAGTCATGGCGATACCTACGGTTTATCAATAACCAGTCACTCGTCGCAAATGAATACCCGGTTAGTGCAGCAATAAGCGGCTCGATGGCTAGTGCGACG
CAATGGAAACAAGCTAACGGGTAAAGTATGAATCCGGCCGCAGGACTCCTTGGTCAATGTAGCTTTACTTGTATGTCCCTTATTCCCTACAATCCTGCGTAGGCCCTCATCCCT
TAACATCAATCTATCGATTCCAGGAGTTCCTCCCGAACCGGAGCTTCGTTGGTCTTCGACATCTATGGAAGCCGGACCGTCCGTTAGATCTGAGCCGATCGTTTGTCAGGTATG
19
CGGATTGAAACATAAGTATCTGATGTAGCCTTGACGTGAAATTGCTGGGTTAACGAAAGGCGAAGCAACTATCCTAGAATCCGTAGGATGTCTACGTACAGATACAGCACTCAT
CAGTCTACCGATAATGGCGTATGTTTAAGCGTGAAGTAGCGAACAGCCGGCGATGTCGGATTGCAAGTGCTACTACAACAGAACATACCGCAGGTCCGGGAGGTCACTTAGAAT
TTAGCTTACTAATGACCCATTCATTCGGATCAGCAGCCTATCGCTCCAAGGCGTAACAAGTCCGTACATGCCTTCGGTCTACCTAAAGGCACTGGCTAGAAAGCGAATTCTGGG
ATACGCAGCCGGGCCATTCGGTGCGTCATCAGCACCAGTGGGTCCATACGTACTGGTCTTGGTGCTTAACCACGGCAAAGTCTTGGGCTCGGCTTAGTTGCGAATGTTGAGGCA
CGAAGTCTAGATCATGCTTCTTTCCTTGAAATCGAAGTAACAGAAGCCCAACTACAATACAAAGCTTGGCGGCAGTGGAGGATCCCTCGGTGGGATCTTTATTGCACCGCATTG
AATGCTCGTTACTTACCTGCCCTTCAGTTAAACTGACGGGCTGATGCTAATTAACAAGCGATGGCACTCACCAAACGTATTCGAACGACGACCGATTTACCAAGTCGCAGGAGT
AACGGTGGTTTGTCGATGTACAATGTCTGAATCCCGCCACGGCCTCGATCCATTTACATACCCTGCTTACATTCTTCTAATCGGGCCGCACGAGTGGTGGATCTGTCGGCTTTA
ACATGAGCTACGGCTGGGATCGGGTTATGAGCTTGCGAACTTAGACAGGAGCTGGACGGCTCCCTCAGGGTTGCGGGCGGAGCTAAACCGGCGAGTTTACATGACCGTATGTCA
CTATCTATGCGGGCATTAGACAACCACCTAGAACCATACTTAACCCGCATCATCTTGCACGAATCGTGAGCACGGCCGGCAGGTCATGGGCGGACCTAGATCGGCGAGCTACCG
ACTCCTAAATGACTAGGAACAATTCCACGTTGTCCGATAGTGGAGCGTAGTGGCATGGTCACCGTTTGAATTGGCTCCACGGACCAACATACATAACAACAGGGCTCCCGAGCC
TACTTAAAGGACAATCTAGTTGCCGTAATAATTCTATGTTTGCTTTCCTCCTTAGACGTAGCTTCAATCGGTGGCGAGGATGTATGCTCACCTTTATCATTATTATGCTTCGGT
AAAGATGCGACTCCAGAATGGAGGACAGATTCAGTTGTATGCCAAGCGAGGGTAATCCTTTCAATTAACGTAGATCAGCTGTTGCGGTTCAACTACCCATGATCGCTACGCCCG
AAATTCGTGATAAGCGTCCCAGTCCTATGTAGTTTCTGCGGCTCACGTTCTAACCTGTATCGCAGAATCTTTCTACGGCCCACTATTCTGAATAGAATGACCAAACTCGAACTC
ATCGCTAACGTTGAGTACTATGGATAGCGACCAAGATGTTCGACTGCAGGCGGACATCACTCACGGTGAGGTGCATAGACCTATCCTCACCCAATGTATCTATTGAATTCGTTA
GGCTTATCCTGCACGTGGAGTAGTCCCTCCAATTTGGTATGAGTCGTGCTCCCGCCGGACGACCTGACCACGCCTAAATCTTATGGAGGTGGATGAGCCACTGCATCGTTAGCC
CAGTGCTAATCATGCCGCTAACTTGCTAACAGCTTCTGCCTTGTCCACTGACTTTCTTCGTGGCAAGGACATAAGGAACTGAGTCGGGCGACAAAGGAGGCGATCCGCCCTATC
ATCATTCGATGTTCTAAACGTACATAGCATTCCTTAGGCGACTGGAAGGATACTATTGTCCCAATGACTTCAGTGGTTCCATGGGAAGTGGATGTTCACTGTCACCCACCTTCA
GGTCTAGGCGTACGATACGTCAATGCATCCCTAGGCGGTTCGTCGGGAACAACCATTGGCCATCACCCTATGCGTCCTCGCAACATAATGGTGGGACGACAGCTCACCACCGGG
TGGCGGCGAGGGAGTGATAATCGATCTAAATTGCCGAGTGGCCTATGGGAGCGGTTGTAGGAATACGCTACATGAGTTTGCCGGAACCGCTAGACGCTGGAGTTCGTTGATCAG
ACGTGACCTGTACCGAGCTTTAGGGACTGATTAACTAGGCTGTAGAAGTCGTTGCTTTAGAAATACCATCCGGCATACGGGCACTCCAATCAGTTATTGCCCTATTGCTATTCA
CTTTCTGTATTGGGTTAGTCTTTCTTGCCTATGTTAATACCGTGGGCATGTTGCTCCCACCTGTAGTATTTCGGTGAATGTAGGTTCTTCCATGTAAGGAAGCGTTTACGCTTA
CAACAATTAGCTTCTATCAAAGACGAAACTCAGATTATGCCATGAACAATAAGTAGCCCACGTTCGACCAGACGAACAGACGAGGCTGGCACGGCTCATTCGTTTGATTGTTCG
AGCGTGACGTAAATCAGATGCCTCCCGCACCCGAGTACATTTGCATCCGCCTCCGTCCCGGTCACTAACTCCTTATCTTTAGGATTAGTCTAGTTTGAAGTCGAAACAGGTATT
GAAGGGATCTAATCCCTATCTTGTTAGGAATCATCACTGTTTAGCGACTCGAAACTATGGGCCATCCGCAGCCATTAAATACTCACTGAGCCCTATGTCATAATTGAGCATCTA
TTAATGCTACAACGGACCGATACCAACAACTCCCGTCACTCCTAGACATAGACTGCCTAAGTGCGTCGGGTAACTATCGTCTGCCTAGCCCGTGGTAACCGGAAGGACCTTCGC
ACCGGTTTGCGAAGCCTAGCGTAGATTGGATCACGTGGGACATCGAGGGCACGTCTTACGCTAACATAACCGAATTTATGCCTCAGGAGCAAGCGACAACTAGGAGCTTGGATA
CTCAATTTACCTGTAAACCATGCAACCCAAGGATCGGAAGTATCCCGTGCAACATTTCGACGCTGAACTCCGTGACTACTTTGTACTTAGTAACCCGTGATTTGCGTAGCTGGG
CCGCCTACCTGAATACTCGGTGAGCGGTCGAAGCTTATCGCTAGGTAGCTGTACAAATTCTGACTTGCTTGCATCATTAGCAGTTAAGGCAATTATTCCAGCCTTCCCAGACTA
AGCTCAACTCCGAGTAGGTTTCTATCGCAATCATGACAGACTACTGGAACAAACTTACGGCCTGGTGCAACGGAGCTGTTCCTCGGCGTTTATGGTAGCAATCCCTGTTAACTC
CACCGCAAGTAGAACAACATTGTTACCAGCGACATGATCCAGCATACTGCAGCTTATTAAGCGATCTGATAACAGTATGAGCAGTCGGAAGACGCTTGCGGCGGCATAAGTTAG
GTCATCAACTGAGGTAACAGCGGATACAGACCAGGGACTTAGTTACTTTATGCTCCTACGAAATAGTGCATCACCAACCCTCGACAACGCTTAGACCTTGTTGAGCCGGTTCTA
CCTGCTGGAAAGCTCACGGGAGCTTATGTACCACTGCGGTGACGAGGACTCATGGCTCCGATACTGTATCAGTACAACCTGCAGCAGCGATTAGCTAGACCATGTTCTTCGGGT
GAACGCATGTTTAGTTCTTAGTTCATGACCAGTCCAAACAAGTTCACGGAATGATCACCCATTGCCTCCTAATCTTTAAGCCGCAACTGTAGCAAACGCTAAGCCACTCAATGT
CCTCATAGATTAACGAGTTTCTAACATCGTAATCCATTGTCTACAACTTCGCCGAGCTAGTCTACAGTAATTTAAGCTCGTTTAGGCATTGCAGCCGACGTCCATGCATTTCGT
CGTAGGCTGAGGCCACCGTGCAGCCCGCTAAAGCGAGCACCCTCATGGTCCATTCGAAGAATCTGAATTTGAGCGACTTTAATCTGGAATTAGATGACGAAGTGGTTTCCAAAC
GCCTCAAAGGCTTAATCCTGATGGTACTATTCGTTGTATTTATCTATCCGGCGAAACTTGAGTGAGCGTGCAGACCTCGTCCTCCGAAGGTTGAGGGCGACTCACTAATTCTTG
CTTTGCATGTTAGGGAACCCGGTCTAGTCGACCGGCCCGGAAGTCACGTTGGGCCCTGTACGGGATTTCTTGGACAGGACGTGGCACTTCGGTTGATCTATCTGTTAGTTCGAC
GGATCCAACTGCCTTAGTGCCTGTAGGAGGTTTAATCATTTCCAAGCCTATGACGCTTTACCAGGTCCACGGTCAGCTAACAACGGCGGTCCGAGCGGAACGTAGTGACGACTA
CAAGACAGGGAAGGGCCATAGCACTACCCTTTCATGATCAATACGGATCTATGTAAGATCAGGTGCACGGAACCAAGAACGGGTTGCCAGAATCGTCACGAAGGTGATGTAAGT
GCAATTGACAGGATTTGTCGGTCTGGACGCAGGCTTTGAGCTATGCCTTTCAGCCCTAGGAAATCGGACTAAGTGGCCAAAGGACCCAGCGTAATTTGCGGTCGTTGACAACCT
CAGCATGTAGTAACTTAAGAAGCGTCGGTATTTACGCAATTACGACAAGGAGCCAGCTGCTAGACAGAAACAAACGGGTGATTCGGAGTCAGGAGTCAAAGTACAGTTATGCAT
GCGTCAACAGGCAATCTAAGTCTTTATCTTGAACTACGAGCTAACTAAAGAACCCAGTCTTGAGGAATGTTTATCCATGCGTGGACAGTCGCTTTGGAATTCACGCCTGTAACA
AAGTCCAACTTGACCTATTATTGCGGGTGCAAGCCCTACGAGGATCATCCTGTTTACCGGCTGGACTAAAGTAGGGTCCTCGGACGGATGGGTTATTTCTTATCCAACAATGGT
AGACAGTTGGCGGGTGGATAATCCGGGTTTCCCATAAGCAATGTCGCCGGGATAGCTCCTCCAAAGTCGGTGCTTTCACTAAGCGAAGGCTAACCGACAGTCCCGTTGCTCGCA
GTACATCAACGTCCGGGCGGTCTTCAAAGGTTTCCTTAACGACCAGTTTGAGGTTTCGCTTGACCATTGACTCAGGGCGTTACCCTAAACTATCAAGTCACTAGCGGCGGTTGC
CTTGCGTTTCTACCGTTTACCCAACCTGTTCGAAAGATGTCCTAATTCACTAGTAGGACTCGGTTACTGCACTGCGACATCAGTTCAGAACTTGGAACCGTATTATTTCGAGGC
TTTCCGGGACTCCCTGAGCAATTCGTATTAATCGGAGGGAACTAGGTCGGCCCGTGCTTTACAGTAGATACTAGCTACGAAGGGTACCACGCTGGTCTATCATGTACCCTGGAT
GACATCGGCCTTCTTAGCCGGATCGACCACCACCTCGCCTTGCCCTCACTTCGCAGTCGCAACCGTCTTCGCCCGGTCGCCAAGTAACCTACCACTCCACTTGGTTTGGTAAGG
ACCGCTAAGTCGGGACGTCAACTAGCTTGCCAGGCCCACCGGTCTTTGGTACAAGCTTAATTCAGGAGGGAGGCCCTAGAAACATTAAGTGCCCTTGGACCTATGCAGAAAGAA
ACAGCCTCCGACGATGAGGCGAATGACGTAGAATTCAGTCCTTATTGAAATTCATCTTAATACATTTCAAAGTGCCTAGTAGCTGCCCGCCTATGCTCGAAAGGGCGAGGCATG
GCGTTGCTACTTGAGCCCAGGGAAAGATCAAGCTTCATCCCGATGTATTCTAGCGTGATCAGGGACCCAAGTCCACCGTACTGCCAAATCCGGTGATAACCCTCACGAAACCGC
AGCGGCAATAGCAGCCGCTGATGTTGCCACCCACGTACAAGGTACGTCTTGCAACCTTTCTGACGACGCATCGCTGAAATCCTAGCGGAATCCACGGCTACGACTAACTTTAAG
GCCGCCAGGTATCTTCAATTTGAATACGAGGCACTCGTATGGTCTGGGCCACCTCATCATGCATCGCCAACGTTCCATCCTGCCAGGGTCATGCCCTGACAGGCATGAATAACT
CCGGTAGGGCCAGTATTAAACGACAATTTAATCGTTGTCAGTTACAATGGCTAACTGCCGGTGGATTAACCAGATTCGGTATGGTAAGTCGTCGGCCGGGCAAACAGGGTTACG
GTGCTCAGTAGTACTGGCGGATTACAGTCAGCGTTATTCCGTTATTTAACTCAGTGGCGGAAGAAGATAGTCTTGCCGGGAAGACTACGCACGGCGATTTGGAGCAGCGGAAGC
GGCCAAACGACGCCAGTTTACTGTCGACTGGCGAAATCACCTGTCATCGCAACGTTTGCCATTGGATTCTTAACCGGCATGCGATCAGTGCCAGGTTACCTAGTCTTAACTCGC
AGCGTTGCCAAGTTTAATTGGACGGTGCCGTTGATAGGTAATGCTGGCCGTTTCCGGCGTTCGCCGTAGGCATGCTAGTCCTGGATCCCAATTGATGATCTTCGCATGTAACGG
CGTTAGCACGATGAAGGGTGGGAAGATGACCGCACCGTTGTCTGTTAAACCGCCATCAGTGAAGGTAACTTGAAAGATAGCACCACCCTCCGTAGACTTTCGACCCAGACAATC
GACTGTAGGTCCTAGGCCGCAAACTGCTTATCTGCTTGATGGGTACCGCTATTTGCCGACATTCAAGTCCTAAGGCGTTATCTAGGAATTGAATCTTAACGCCACGAGTCGAGT
AGATCCTCATGTCCGTGAGGCTTACGAAGTTAAGTGGGTAGGACCATCCATCACGATACAAACTAATGCGGGACGATACTACGCCTGAGCTAGATCCAAACTACTCAGAATGAT
AGTCAGTAGCAGCATCGGGTCTAACTGTTTCCGACCTCCGGGCAGTAGACGAATCTAAACCAGTTAGTACCCACGGATCAACATAGCGTTCGTCAATAAACGTTAAAGTACCAG
AAGGACAGGCCGGTGCATCGGAAAGTAGCTTGGGACTAACATTGAACCCTATTCTTATGCAACGTAACCTTATGGTGACGTCGAGGCGAGCCGGGACCCGTTGACGGGTCTGCG
TGCTATTTATGTTCCTTCCTCACTGTACCTTAGCCTGTTACAAGATCGCACGAACCCAACTGACCGATGCCGACCGAGGATTGGCTATGATAACGACAGAACCGTGGTCCGACG
CCGGTCAGGGTGGCTGGTCAAAGCTAAGTAGTATGCAGTCATGTCAATGAATCGTTATGATCGGTCTTAATGACAACTGTTACGAGCCACGTTACTGAGTATGACTTAAGGGTC
CGGAACGGGCAGGAACGCCAGCACGCCATAGAAGGTAGGTCTGGCCGCCGACGGGTAGGGTAGCCCTGCTAACTCAACCGGTCGGGAGTGGATTGGACTATTCCGAGGTGCCGA
ATACCTATGAGTTGGGATTCCACTCATAAGGGTTCCGACAGGAAATTCCGGGTAGCGACAGCCAACCGCACTGCTACCCAAGATACGTAATCTGTTGGGCAGATGACTCATTTA
TGAAATAAGTTCTAGCTTAGGGATCAAATTAGCCACCTATCGACGGTATCCATTACAAGCGTATGCTAAAGACAAGGGAACATTACGAGTACGAACTGGTGGCCGGACTGTATT
CCAAGACGGGAAGCTATTATGGGCTGCTTTCGGATGTTGGACCACTTATCGTGCAGTGACTATTACAGGAACTACCATGATGCCCGATAACTCACGACTGGTACGACGGAATCT
GCGTAACCAAGCGTTCTGCTAAACGGCTAGATACGCTGTTCAGCATCATACGTCCGTCGGTTTCCGCTTTCCCGCTACAAGGATTCCTCGTTAGTTTACGGAAGGCCAGGGAGG
TTAATGTACGCAGTATTTGGTCGTCAAGAACAGGCCATTTGCCAGCTATCGGGAAACGCTGTACTCATTAAAGAAGGTCAATCGTCGATCTGTTCATTTCGGACGTGCTAGGGC
AGTCCCATTGGGCACCGATGGATTTGAACTTTGAGTAATAGACTTGACTGGCTTTAGTGGTCTTACTGCGGCAAGCAGGATCAGGAAAGTTCGGAGCGGCTTCATTATGCGTGC
GAGGTTTGATAACTTAGTGGGTGCGTGGCAGGCATCCAAGGTTGTTTGTACCTCCGTTCGTAACGTCGGGCACCAATCTGTAGGGCGTGAGTCCGACCCAACAAGTAGTGGGAA
TTATGTAGGCACGTACTTCCTGCTACGGAACAAGTGGTAATTGTATCAACCAAAGCTGAGCGTAACTCGTCAGGAACCTGGGAACCGGTAATCTTGCGGACCGGTAGTACCGAA
CCCTGCACCGAGCATCCGACTCGGAAGCTCGACTGACGTACGGTCTAATCAGCGTAGGTTGCTTACCGCACGTTTACTACGTATTGCGTTCTTCAGTCAGATTTCCCGGTTGTT
GTTGCTTGGTCTACTTCTAGGCTCCTATTTGGCGGTCGCTTATGATTACCTTACTTAATTGCTAGTGGAAGCGAACGTTACAGTTCGGTCCCGCTCATAGTCCGTTCTATCCTG
AGCCAGGGCAGAACTGCGGACGAAAGCTTACCTCCACGAACTAGCAGGCGTCAATTAGTAGCGTGGGCCAACTTCTGGGCAGTTCCAGGCAAGTTCCTACGCCGATTACAATTT
CTTCCCTATTAGTTCCGCAGGCCTCATTAGATTATTTATTAATTCGGCTTCGAGTACCGCATGTCCACCATCGACTCCGTTACCGATTATTGTATTATGAGGTTACAGCTGGAT
TGACAAGGTGGCATCCTCCGGTCCTAACGTGAAGAAGCCGTTGGGAATACTTCGCTATTGTTTCGTTCCCACGCAATCCTACATGGGATGTTTACTCAGGT
20
r = 7 (with the restriction of GC content of 40 to 60% at sliding windows of 10 bases)
13,402 bases
TTCAGTGGAGCTCAATGGTACTCGCAAATCAAGGCCGTATTGCTGGGTGATTATCGGCTAGATAACCGAAGTAGGCCTGTCTACTCGGCACTTTCGTGCTCCTATCTGTCCCTC
AGAACGTGATCGGTTGGGTTCGTGACGCATCATTACCGTCATCAAGACTGCGTGAGTGATCAACAGGTAACATGTCCCGTGAACATCTGGCAGCATCAAACGCTGTATGGCTGG
GTATTCTGCGGACTAGATTGCACGCCTTTAATGGCGGTTCATTCCGCCACTTTGAACGTGCACCATCCAACGACGAGGATTGTACAGGAAAGGTCATGGGTCTGACTACCTCAT
CGTCGTCGAAAGTTGCTCCCATGATCGGGCAATTCTACCGGACAGAAATGGCCAATCCTGATGATTCCAGGGCAAAGGACATCAGGCATACAATGGGACAAACTTCGTGCCATC
TTGTAGGGCTATGAAGTCCAATCCGAGTGGTTCCTCGAAACTGCTCCGTTAAACTGCGGAAGATAGCTGAACCTACGGATTGAGGCGGATTTAACCGCCGATTAGCTCCCGAAT
TTGGGTCTTACGTGAAGACTCGGCTTGAGTTGCTACGAGTTACGTACCGTTCGGTAGGCAAGGTGAGTTTAGGTGAGGACTCCACTATGGTACCCTGAGCAAAGATGCCATCCG
TAAACATGGGCTAAGATAGGCTTACAACGCCGAACATCAGTGATCTTGAGGGCTGAAATTGGCACTGTACCGCTGTAGACAACGATTCCTGAAGGCTCGAATTTCGGCCAGATT
TGCACCGTGATACCGTGCTTTCGGAGGCATTAAGCCCGCTTTATTCCGCAAGTGAATCGTAGTCTGAGGTTACCATTCCACGACGAACATAGACGGGTGAAGAAACGAGGGAGT
ACAATGCAGTTCTGCGACCTTATTCCAGCTGCATTAACCGTCCATGTCCAGGAGTGACCTATTGCCGCTGTTTCTGCAGTGGAATAGGGTCATAGGTCTAGATCCCAGGACAAT
CCGCAGGTAATACGACGGCATTTGGATGCGGATCAACGGGCATAATAGGGCGACATTCGAAGACTGTTCACCCTTCTATTGGCAGGTTGTCCCTGTCTGTTTCCGACCCATAGC
ACTGAAGATGTCGTAGAAACCTCAACGGTCTAGTTCACGACAACATCGCCCAAGTTTGAGCTACCTGAGCCATGCACTTAGAACCCGCATGAAGTACGTGGGTCATCTAGTGGA
TCGATTCGACATGGGAACAATGTCGATCTGAGGCTTTAAGTCCCTACAGATTCCCTTTGAGGGAGCATTCTGCTCAGTGACCCTGAATCTACCACGACATTTGCGTACGTATCC
TCACTCCATGTTCGCATAGTGCTGAAGCAGTTACGGACAGTATCTGAGCTATTGAGTGCTCCAGAATCGTCGGGATAGGACCGTTTCGTAGCCGTAGAAGTCGCCAGATCTTCT
GACCGTTCCTGATTGACCTCACCAGATCGTATGCTCAAGAACAGACGACCATGTCAATGCGACATACGACTAGGACGGTATTGTTGCGAGGTTCTACGTACTAGCAACAAGATC
GATGATCTAGCACTACCTAACAGTACCACCGTTGTTATGCGTCCCTAATGAGCAGTGGTTAGCATACCCGAATGCTTCCAGGACGTAGCTCCTCCTATTACGAGGAAGCACCAA
TAACCTCCAGCTACGACGACAGACAAACCAAGCTTAGACCTCGCTAGTTAGACCGGACTTCTGGGCTGATCTTTGGTGGACAGACCTCCTAGTGCAATGACCCGATGAGTGACA
GATCCACCGAGTTCGAGGGAACAGTGAAGCTTGTCCATCTAAGTCCTAGCTGGTCGTTAGGATTGCCATAGTTACCGGATTTGCTACCATTACGAAGCAGCATTTACCTGGAAA
CTTGCTCAATCACGCCAAGACTCCAATCTTGGACCTTACCATCCCTCACTTAACGGGCCAATTCGCTTCTGTCATGCCGATGGATCGCATTACCAAGCGATAGAAAGGCTTTGT
TACGGGACCATTCGGACATCCTCGGTCTACCTCGGATGATTTCGCACCGTAAGGTAGGATTTGGGAGGTGAACCATCAAGCATGCGTGAACGAAACCTGGGCTTTGCTTTCCTA
CTCAGACGCTGAAAGGATCTGGGTGCTATTCACTGCACTTTACCGCCGTTAATTCGCCGGATTAGTCACGACGTCAAATGAGCCGAACTAGCTTCAGGTTGCGTCTATCTAGGG
TTCGCAAGTATGTCCAAGGGTCTGGTTCCCACTTACTACGACCCTAGTAACGTGCTGCTAATCTGTCGCCTTGACAGGTTGACTCACGTGGTATGATCTGCGATGTCCACTACC
GTAAAGGTTGGGCTACCTTCCGGATCCATGGGTATCCATAGCCATGGACAGCGTAATACTGCCCTGATTTCTGGGACTTGCACGAGTAGGTTAGACGTCGGTTACGGTTGGCAA
GCAATGCGGTGCTAACAAGCCTTGCTTAGCGTTACTCGCTTGAAATCGCACGGAATGTAACCGTATCCACGGACAAGTCATGTCCGTTACCTCCCGATACGAGCTTCATCTTGG
GATGCTAAAGCGAACGACAATCTACGTGAGGGTTGCAATCGCATCTTAGTCCCTTAATCTGCGTCCTTGTTACTCCAGGAACTTCGCAACCGAAATGTAGCCGCATGTAACAGC
AGGGTAATCTAGCGGGTATCATCGTGGTTAACGGTCGCATAAGCCGCCTTAATAGCCGTGCTAAGACAGCACGATTTAGCTGGCTGTAAACGAGTGCGTAAATCGCTGGCATTG
CATCCATGACGCTCACTATTCGGTGCTGATTCAACCGAGTATGTACGCACCTGTTCGCTATTCGAACCCGATAGCTTTGGCTGCACTACTGTTGGGCAGACATAAGCTGCACGT
ATCTACAGGCGTAAAGTTCGCCGATCAACCACGCAGAAGAAAGCCCTAGCTTGTAGCGGGAACTAGGTTGCATAAGGTTCCATACCATGACATGCACGTGGATTACGCAGTGCA
ACCGTCTGTCAGAAACTCGGCCTAATTTCCGCCCTTATGGAAGGGCAACAAAGACGCCTCATGAAGACAGTACATGCAACAGTCGTAGGACTGCATCACGACTATCGGCACCTT
TCACGAAGACATCACCGGTTTCTTGGAGCTATCAGATGGGACGAGTATTGAGGGTATCGACTTCGAACTGATGCGGTATCTAACCCGTGAGTAATGGACCCGTTACTGTAGGCG
ACTATGAGGGAAGAAATGCGGAGTTTATCGCCGCAAATAACCGCATAAAGCTCCAGGTTAGCCGTCCAATTCCAGTGACTTTCCGTTAGCCAGCAGATTTCGTCACCTTATCAC
TCGCTATCTGGTAGAACTCGACCTGTCTTTACGTGGTTTCGATCCGTACAGACATCCGCACTCATGGGCAACTGTTACGTCCGCAAAGACTCACTCAGCGTTGAACATGCCCAG
ACTTGGCCTCACTAAAGGGAGTGCATTGAACCTCACTGCTAGCTCGTTAAAGCTGCCGAAACAGTCCGAACAGATGAGTCATGAACTGACTGTACGATAGGTAACCGCTAACAG
CCTCCATGATGCACCTAGGATAGACCTGCTATGCAGGGCATACTTCGTCATCTGCGGTCAATTGCGGCACTAATGTCGGATCCTCAACCTATGGGACTCCTTCTTTCAGGCCCT
AAGAAGCGTGCAGTTGCTGTTGGACGTATTTGGGCTGCTATCGTTTGCAGCTCAGTTATCCTGGCTTATCTTGCCGCATAGACCGCCAATTATCCCGCTCATAGATCGTGCCTG
AATAGGCTCATGTATGGACCGTCAAGGAAGCCTAGATTTGGCCCTACTTTCCCATACCGTAGGCTTTCGTTGCTTCGCTACCGTTTAGATGCGGGATTTCTACCCAGGTCTTAG
AAGGACAGGCATGTCGTCATTTCCAGCGAATGACATCGGCTTTAGAACGCTCCTAGACATGGCCAGTGATTGTCACGCTACTCCAACGGTTGTAATGGCTCCCAATAAGGGAGC
CTAAACCACCGACTTACGATGGACGTGCAAACCCGTACTGATGGCTTCTTCGTCTTAGGTTGGTGCAGTACGTCCCAGTAGCATCATCACCTAGACGTGGCATTCGTGATCCAT
TCCGGACATTTACGGCTTCGATGAAGAACGCCTGATAATGCGTGGGTATGGCACCTCATAACAGGGAGTCTGTCTGAGCGAGTTCCAACATCATCCTTGATGTACGGCCATCAT
TTGGCAACATGCATCCCGTAGACATCTGAACCGAATCAGGTACCCATACGTGGCGAATACATGCCGGATGTAACTCCGAGCAACAGACCAGCACTGTTTACCACGCCATACGAG
GTATTAGGGTGCGTTTAACAGCGTGGATCATTCCCTGGCTATCGCATACCGAGGCATAACGGGATCGTAAAGCCACGTTATGACGCCGTAAGATTGCCCTCATCTACGATCACT
TGGTTGAGTCCCTGAAATGGTGCTAGGTCTTTGATCGTCACTAGTTCTGGCTGAGTCACGTTGAAAGCCAAGGTTTCAGGTCTGATACAGACGTGCTTGTTGGTCTGGCAATTT
CGCTGTACATAGCCTTGGCAATACCCTGGATCTTACTGCATGAAAGCGGCTTGTATGGGCGAGTTATGGTGGTAACAACCTCCTGCTACTCGAGGCTTAGGAAGTCATCGTTCC
GGTACCTTAAGCGAGGGTAAGCCACCTAGTATCCTGTAGGTCTGGGTTCCAAGCCGTTAGACTCCTGTAAGCATCACCCTCAGTACAGCAGCGTACTTTCGCTAAGTAGTCCTC
GCATTTAGACGCCCAAAGAAAGGGTGGGTAACCCAGTTGGACCGAAGAACTCCGTACGATTTCCTTGGGACTACCATGGAACATCGTTGGGTAGTTCTAGGCCTCAAGTGCCTA
GTGGTCTGCTAAGTGACCGATTGTTTGCCTGCATTCAGGAGTTGAGGCTATCCGAGCGTATGGATTCGGATGCGAATGCGAAAGCGAGCCAAACATTCGGCGACAAATACCCGG
ACAAATTCCCAGGGTTTCCTCAGGTCACGATGGGATGTCGAGCGTTGTCCTCCTGTTTAGCACTCGCATCACTGTAGTCCATTGAAGTCGTTCATGGACCAGAAAGGTACAGGC
AGGTACTGAATGGTTCGGAACCCTTTACTGTCGTGCTTATCAGTGGCAACCTCATGTCGCTACTTGCATCGTTTAGGGCAGCTATGCTTCTGGCACCATAGCGGTGCATTCGAT
CACCTTTGCATGGGATCCTGTTCCTACAACATGGTTCCAGAAACCCGGTCTTTATGGGTGGGATTCATGCCAGTTTACCCGCCTATTGACTCCCATCCACTCGTCTAGCATTGA
GGAGGCAATTACCTCGACAGTGACGGACTCAGAATCACTCCCGTTGTATCACCATGGCACTCAGTGGGTTCTTATGGCTCGATAGTGCCCATCTACAACGTTCCACCACGAACG
ATCGCCAAACCTCCGAGTACATCGTGCATTTCGCCCTGTATTAGCGGCTGAATGAGGCAAAGCATCCTGCTCAACCGTGGAAGTTCACTGACGTTTCGACCTTGGATCTACATG
CTACCCTAGCCAACGTAGGTAAGCTCCCAGTTCCATTGGCCTATTTCACGGCTTAGTGCGGTCTTGAGTCAGCAATGGGCTTATTCTGGGTCCAAAGGCGAACGTTATCCAGAC
TGAGGTGAGCATGATACTCGAACGAGTCACTCGTGGAGTACGATCGACGAATCAAGTCGTGCACTAAACCTGCACGAACAAGTACCCGCAACTTAGTCGGCGTAAGAACGAGCA
TGCATTTGCAGTGCTTAACCCATGTACAAGCGTGGTCATCATCGGACCTACTGTAACCTGTACTCATGACGTCTGATTAGCGTCTTCCTACGGCAAGCTACCAAAGGTATGGTT
ACGTTGCGATCTTCCAGACCGAACAAAGTCGCTGCAAGATCTAGGTTCTGTATGCTTGTCAAGACAGATGTAGTGAGTAGTCATCCCTGTACGTGATAGACTCAGGCAGATTCT
TGGGCAAATGACCTCGTGAAGTCTGCGTATCCCTGCAACTAGCATCGACAAAGAACGGATTCGCTAGCAATTCGTTGGCCTTGCATTGGGTGACATTACCCAGTGAAACCCATT
CGACTTGTAGACGCTCCAATGTCAGCCATGAGGATTCCCGATGGTACATGATGGACCTCAATTCAGCGGAAGTAGACCTTCCCATTCTACTGGTCGCAACTCCATTGTCCACGA
TTAGTGACCAGACGTAAACCAGCCTAGTACAAGGGACATTCCTCAAGGTAACTCACTGAATCCTGGTCTAATGACGGCTGTAGTAGCGGCATCTAGAACGTAGCCATAGATAGC
CCGTTCTATTCCCTCCTTCAGTAACCGGCTACTTCACTCGAGCTTAATCCTCCCGTATGTTCTGGATTCTAGGGCGTACTATTGCGGTTGTCAATCGAACAGTTACCCGGTTTG
ATCCTCCTCCATTTCCGGCTAATGTTCCGGGTTACAAAGGCCACGATACGCTTGCAAACTGTCGGCAAGTCACGCCTACTGATCGACCTCGATGGAGCTGGATTATGGTCCTGC
TTACGTTACCGTTACAGGGTAGTCGTTGCAGTCGCAAAGCGACGCTTTGGAGTGGCTATTTCGGGTTTATGGCGGCTAAACTACGGACGTGATGGAGTCGCTTACATGGTGCGA
TAACGCTGCATCTTGATCCCTCCACTTCGGACGTCTTTCTGCTGCGAAGACCTTTGATGAGGACCTGATCATCCATCCGGTTATGAACCCATCATCTTCGCTGCTAGTCCAGCA
GTTCGGATTCCGTACTTGCTAGGGAATTCAGACCGTCTACTAAGCACCGGTATCCTACGCCTTGGAATACCTACCTGGTACGATGTCGGTTCGTCTACCGATGCGTACTCGTAC
CGAGTGCAAAGTGCGTGATCTGGACCCTTAGTACGGTCGTTGATTCACGTCTTGGGTAAAGGACCTCCAAGTCCGACGTTCAGACTCAATGTCTGACCTACGCAACGCATACAT
CAGCATACTCCGGATTGCTTCCCGAAGATCAAGGTGCGAGTTTGACGCAAACTATCGCCACCTTTATCGAGGCACTAGCTACCCATCGCTATGTTGAGCCTTAGTGGGACAGTT
CGCAGACCTTGACTGTCTGCTTCTTTGCAGACAATTGGCTTGTCGAGTCAAATCGCCATGAATGCTGGCCAAAGTGACGCTTCCTAGGGTGAGTATTCGTCAGGCTTGCAGTAT
GACAGTGGATTGTTGTCCAAACGAACTCCCAACGTTTGTCGGGAACATAACGCAGCGATACCACCCGTAATAAGGCGAGCAATAGACCCGGATAAACCTCGGGTTATCATCCCA
CGAAATGGGTCAAAGGAACGCTAGCCATCTAGGATGGTCCACTCAAAGTCCAGACAGCTCGTCAGTAGATCAGTCTAGTCGATCCTGGATGCTCCATAATGCTCGGGTAAGTGC
GAAGGTCAAACGGAACGCAACAGCTACCGACTCAGTTCCCAAACTTACGCAATCTGCAGGCTAGACAGCCCATAAGCAGCGGTTTCCATGAGCATTGGTTACCGCACGAATCTT
CGAGTTGCCTTCACTAGACAAGTGCTTTGTAGCCTTTAGATCCGGGATTCTTTCGCCAAGCATCGCTGTCGATTTCGACAGAATTGGTGGCCTAAAGTGGTCGCTAGACCAATC
TGGGCCATTGTTGACCACCCTACTCCTCGATTCCGCTCCTTAGCATGCCCTTTAACCTGCGTAGATAAGGCTAGGAGTTTCGCAATGATGAGCGATAAGCGTCCAAGTTAGTCG
TCAGAATTCGGCCGTAATTTGGCGGAGTAAGGGCACGTTTGAGTTCCGTCGTCTATTCGCAGTGATAGGAAGGAGCTTCCTGCTGCAATTCACTCCGCTAGTGATACGCCGTGA
TTCAGCTGCGATTCTAAGCGAAGTCTTGCATAGTCAAGCGTTAACTGACGGGATTGGTTTCCGGTGACTTCTTCCAAGTGGTACCTCCACCATCAGATTCGCCCACTTTCCAAG
ACCCTCCAAACAACGAGCGAACTCACGAAACAACCAGGTCATAAGGAGGAAATACCTGGCACTACGAGCCAGATAACTCCCGGAATAGCCTAGGGATAAGACGCAGTAGGACCC
TAACTCCTCCCTAAGGGATAGCGTGCGAATTCCAACCCTATCAAAGCCGGTCAAATAGGGACCCAATTACAGGCCCATAGAAACGCTCATTCTAGCGTAAGCGTACAAGACCTC
AGCATCCGTGATCAGTTGTCTTGGTCACCGAACCTTTCGTATCCGCTTGCTTTGGTCCGCTTTGAATGCGTTTCCGTGGGTTAGTGCTTGAGCCAGGTTCTTCTTGGTGCCTGT
TCATCGCCTTCTAGGACCTATCCTTCTGTAAGGCTTCGCATGGTAAAGCTTGCGGTACTACGGTCACGTATTCCGTCAACGAGTAAGCGGGACAATTACGCCATGCTAGGCCAT
CCTTAACCTAGCTCATACGTTCCCGTACAAATGCGTCGAGTAATCTGGAGTGCTGTAACATCGGATACAGGCTAAGGTTTGCTTGACATCCATTGCGGAGGATAAAGCAGGTTC
GTTATGGGAGGACTTGAAGCATGGTTATGCATCGCAATACTACGCACGTCATTGTAGGCCGTTTGTTGGAGGCAGTTTGAAGCCCTGAGTAAAGCACCCGTTTAATCCCGAACG
TAATCGACGCCTTACTATGCCCAATACGCAAGTTCTGACTTCGGGTCTGTAATCCTGTCCGATAGGGATGTTAGGGAACCTGACTAAACGCCACGAATAACGGTGACAGTAGAC
TGTAGAAGACGAGGTGATTGGTGAGTGGTGGTTACTCATCCCGAGTCATTGACGAGTTCAGGCGATCAGAAGTGATGACTCCAGTATGCATGACTAAGCTGGGTCTACGTCTTA
21
CTTCCCTATGGTTCTGCCTCGTACAACCCTGTCCAATAGTCCTGAGTTAGTGGAACAACGTCGTCAAAGTTCCCTTGGTCTTCCGTCTGACAAAGCTTCCCAGTCCATGCCTCA
GTTGCGTGGACTTATCCGAAGGTTAGGGCTCGTTTGAACAGCGAGCTATGGCCACTAATACGGACTAAATGGCTGTCGTTAAGACTGGAGGATGAACCAGGAAGGTTGATGTCA
CTACTACCCGTCATGTAGCAGCTGTTTGACTGCCGTTGAAGGGATTTGATGGCAAAGTAGTGGCATGCAAGCTCAAGTAAGGAGTACTGACCGACAATTCCTGCCGATTTCTTG
CGGCTTCTACCTGTCGTATCTTGACCTGTTAGTTCGACGGTCATCCTACATCCTGGGTTGACGATAGTAGTCGGGTTGTCTGATCTGTACCCTCCTACTACTCGCCATCACTAC
CAGCTCATGCCTACATAAGGCACTCCTTTCGAGTCGAACGCTTAACAGTGGCGTAGTTGACGCCTATCATAGCGTACGACAAGACGACTTTCATCCGCCATGTTATCCGCCCAA
TTGAGCCGGTATTTACCGGTCCTTAATGGAGCGACTAATTCGGAGTCAGATTGACGTACGGTGAATAACCCTCCGATTTGGACTGTCCTCGACTATTCCTGTCGGTATCGTTGC
GGATAGTCTGGTACCGAATTGATGGGCTCACTTTCAGTCGGCATCAGACTGTCAGCGTGAAGCACGCTCAAAGCAGTAGTTACGAGCAGTCACCAGGACTGTAATGCCAACGCT
ATTGCACCCGATTAACCCTGCGATACAGCCCGATTGATAGCGGAACAGAAAGCGTGAGGTACATTGACCCGTATCTGCTTGAATCCGCTCGAAATGAGTGGCAGAAATTCCGGC
GTTAATACGCCCGAATAATCGCCCGTATTCACGGTAAACCGTGCAACATACCGGGTGATAGCATGACCGAATGAGTTGGCGTTCACTTCCATTCACCTACCACCTCAAACCTAG
GCGAATGTACGTCTACGCTAACTCGATCTAAGCAGAAGTTGGCCCATTATTGGCGTCAGATAGTCCCGGTTAGAATGCCGAAGTTAGGTGGGAGTAATTCCGACAGCAAAGTTG
TCCGAGCTTTCCCGAGGAATCTTGTCCTTGAACAAGCTCGCACTACAAACGACTGAGTACCTACATGTTGGTGATTCCGGTTTAGCTACGGGTACATAACCCAAGGTACGAATC
GTGAGGAACTGCCAGGAATGACCGGATCAGATCCCGATCTACGGTTCACGTTAGATCTGCCCTTGTTCTTCCGGCAAACTTTGGGATACCATCTGCTACGCTCAGGATCACCGA
CGAAACTACCACTGACATGAACCTTCTTCTAGCCCGAACTTCATCCAAGTACAACGAACGGGTTGATAGTGGCTCCTAAATCCACCTATTAGCCACCGATTACAAGGCAATGAG
TAGCCTATGCATTCCATAGGAGTCGGAAGCTACAACTTGGTCCCAATCCAGGCATTGGCAAACGTCGCCATTTGGTGCGGAAACTGAACGCACTAGAAGCAAGACATAGGACAG
CCTTGAAAGACCATGCCAAACGTAGTGAATGCCCACTATTACCCGTGCATAGATGTCCCAGCTATTTGCCAGGCAATCGTTCTGGTTTCATCAGCGAACCAACTTGTCGGCTTC
AGTCCGGATGACATACCTAGTCCTAAGTTCGTCCTCGTTAGCGAGCGATTGGGTTGGTTGCTGGAAGGTGCTGGATGTACCGATCGAACGTGGAAAGACATGCGTATGTAACGC
ACGATAAGTGCCCTATCCATGCAGCACTTAAGACGGTAGGTTTCCCAACTTTAGCGGATGGTTAGGCCTTCAAAGGCTCCTGATCGGATCATAGGCACCTAAGTGGCCCAAACA
GAACGGTGCTCATCGAGTCTGCACCTTACTGGATCCTTGTAGTTCCAGGCTGATTACTGGGCGTTTCAGAACCCTCGATCATCAACGCAGGAACAAACTCCAGCCTTCTTACCT
GACCATTGGATAGCTAGCCGTTTCCAATGCGTAGCGATCTGTTACCGACGCAATAAGTCGCATGAGTTTGGAAGGACGAGCAAGAACGTCACCATCTATCGGTAAGTCTAGGCA
GCAGTAACAGGCGGTATGAAGGACCCACTAGCACGTGATTAGGGCGGTTTATTGCCCACGTTTCCCTTACCTTCTGCAATGTAGACCACGTCAACATTGCCACCGTATGACGTA
GTAGATGCTTCGGAAATCTTGCGAGCACTAGTCAACAGCCCTTTCCATCGCACTGTCTATCGATCGCTTATGAAGCCTCCTAAGATCGGTCCTGTTGCCTAGGACAAAGGAGGC
CATAATACCGCTAGGATTAGGCTTGGGAAGTTGTTCGACCGACTAGCTGACGATTGGCTACAGACTGCCAACAACGCAAAGGGTATGTATCCCGGTGAAATGCCTAGCGATTCG
CAAACATGCGATGATTACCGCTCAGTAGCCCTTCAGACGATAACCTTGCGTTTGCTGGCAGATGCTCAGAAAGTCCGCTCAACTGAAGTGCTCAGCTTGCTCCTGCATCGAGCA
GCAATTTGGTCGGATTAACTCGGGCATTGATACGGCTGCATAAACGACCCAGTATCGAACCGCTTCGAAAGCTGAAGGTCGTAAGGCCATGATAGCCGACGTATGCCAATCATT
GGCTCCAGTTTGTACCAGCAAGTCTATGGTCACGGAACCTAAAGGTCGGAGTAGTTCAGTCAGCGAGTCTAGAAGTGGCTTGGTGACAAGGCCTGACAATACGAGTCGTACTAC
CAACGCCTTCATGGTGGCAGTAGCTATGAGTCCAAGCAGCTTCGTCCATAGTCCAACCATCGATAAGGACTCGGAAGGATGTTGAAGCTGACCAGTGAGCAGCTAGAACAACCC
ACCATTAGGCAGGGAAATGTTGGCCGTTATCAGCAGTCTAAGGTGGGTGAAACTGGTCTTGCGTGCATCTGGGAAATCACGAGTCCATCCAGTGCATCCTCAGTGCACCTGACA
GCTTTGCAACGCCATCGAAAGACTAGCCTCGTTCAGGTGGCATAAGTCAGGGTTCAGTAGTCTTCTGTTGAGTACGTTTCACGCCGTTTACTACCGCTTAAACCCAGCGTATTC
TACGCCCATTTAAGGGCTTGCTATTGGTCAGATGCGTGCTACAGTTGACTGGACCACTGTAAAGCGTTGCCATGACTGACTAGGGAGTTGTCAGTCCTTGCGATTACCTAGGGC
AAGAAGCTGGCAACTAACGCCTAGCTAATGGCAGGAGTATCCGTATGCAGACGTTACGGCATCCTATCGACCAGATTCATCGGTGAACGTACGAACTACGAACGCAGTTGTACC
CACTTCACCTTCTACTAGCGTCCTCAAATCAGCTCGGTTTGGTCATGTTTCGCTTGTTTCCTGCAGCTACTAACTGGGCCTTAACTACCGGGAAATAACGCCCTTGAATACCGA
AGCCAACCACTACAGGTCCCTTTCGGTCGGTTTCAACCTCGAAGGTATTGGTTCCGCATACTAGACGGAGGAACATGATCCTTCGTAGACTAAGGTCAGCCTTAAAGGAGCCCA
ATGACAACGGATCTGCACTCACTTCAACTGGCTGCTTTCACCTGTATGTAGCGTCAGCATTGCCGAGCTAAAGACGGGACTTTAAGCCTGGTATTGACAGCCGAATACCACTCC
GATCCAATCAGTCGCTAAATGAGGGTCAGTTTGGTACAGCCTGCTAACTGCGAGTCAGTAAGCCGGATAGAAGCTCAGCGATCCTACTTGAACGGGAAGTCTACCCGACATGTA
TCGAGTACTTGACGCTGGATCAATGAGGTCGCTTCAATCGCCTCGAATGAAGGGTTGAACTCCTTGGTTAGTCTGCCGTATGGTAGGGCATCATACGCCTAATAAGCGGCCTTT
ACATCGGTTAACTTGCCATTGGTCCAATGAACGGAAGGCTAAAGGATGCGTTAAGGTTGCTCACGGTTAGCTGCGTCATTCTGGTGCTGTTCGTTCTACTTGGTAGGCTGAAGT
AACCCTTGCTAAGCGGAGGTATCCGGTGGAAACATCGACGGATAATGGCCTGCAATAACGTCCTAATGCTTGCACCTCCTTGCACTTGCCTTAAGATGGGTGCATACCAAGGCA
TCAACTCAGATGATCCCACTCCTAACCCTAAGCTAGACTACCGAGCAGATCGGACTACGTAAGT
22
REFERENCES
1. Zuker, M. 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acid
Res. 31:3406-3415.
2. Rosen, S., and H.J. Skaletsky. 2000. Primer3 on the WWW for general users and for biologist
programmers. Bioinformatics Methods and Protocols Vol. 132, p. 365-386. Humana Press, New York, NY.
3. SantaLucia Jr, J., and D. Hicks. 2004. The thermodynamics of DNA structural motifs. Annu. Rev.
Biophys. Biomol. Struct. 33:415-440.
4. Tuma, R.S., M.P. Beaudet, X. Jin, L.J. Jones, C.Y. Cheung, S. Yue, and V.L. Singer. 1999.
Characterization of SYBR Gold Nucleic Acid Gel Stain: a dye optimized for use with 300-nm ultraviolet
transilluminators. Anal. Biochem. 268:278-288.
23
Download