DNA Codes Design

advertisement
!
IMPORTANT:!
!
!
Applications!of!metaheuristics!to!the!Sequential!Ordering!Problem!will!be!
covered!first.!!
!
!
Applications!of!metaheuristics!to!DNA!codes!design!will!be!treated!only!if!
time!permits.!
!
!
!
Roberto!
!
DNA Codes Design
Roberto Montemanni
Dalle Molle Institute for Artificial Intelligence
University of Applied Science of Southern Switzerland
Email: roberto@idsia.ch
Tel: +41 58 666 666 7
1
Outline
• 
• 
• 
• 
• 
• 
Introduction
The DNA Codes Design problem
Approaches in the literature
Construction heuristics
Simple local searches
Metaheuristics
– 
– 
– 
– 
Intro to Stochastic Local Search
Applications to the DNA codes design problem
Intro to Variable Neighbourhood Search
Applications to the DNA codes design problem
•  Bibliography
2
Contributions to slides
•  Dan C. Tulpan
NRC Institute for Information Technology, Canada
(Introduction, Applications, Stochastic Local Searches)
•  Marco Chiarandini
University of Southern Denmark, Denmark
(Introduction to Stochastic Local Search)
•  Thomas Stuetzle
Darmstadt University of Technology, Germany
(Introduction to Variable Neighbourhood Search)
3
Outline
• 
• 
• 
• 
• 
• 
Introduction
The DNA Codes Design problem
Approaches in the literature
Construction heuristics
Simple local searches
Metaheuristics
– 
– 
– 
– 
Intro to Stochastic Local Search
Applications to the DNA codes design problem
Intro to Variable Neighbourhood Search
Applications to the DNA codes design problem
•  Bibliography
4
DNA – The Blueprint of Life
bacteria
human
DNA
chimp
worm
fish
cow
dinosaur
bird
Background: DNA
9 pictures taken from ClipArt
5
What is DNA?
•  All organisms on this planet are made of the same type of
genetic blueprint.
6
Real Applications
•  DNA computing => using DNA for massively
parallel computations.
•  DNA Chemical libraries => for the development
and test of new drugs
•  DNA Microarrays => for profiling genes and
tracing genes within long DNA strands
•  DNA Nanotechnologies => for the development
of new materials/devices
http://en.wikipedia.org/wiki/DNA_computing
7
Outline
• 
• 
• 
• 
• 
• 
Introduction
The DNA Codes Design problem
Approaches in the literature
Construction heuristics
Simple local searches
Metaheuristics
– 
– 
– 
– 
Intro to Stochastic Local Search
Applications to the DNA codes design problem
Intro to Variable Neighbourhood Search
Applications to the DNA codes design problem
•  Bibliography
8
What is DNA?
• 
• 
genetic material
four letter alphabet (nucleotides, bases):
– 
– 
– 
– 
• 
• 
A (adenine),
C (cytosine),
G (guanine),
T (thymine)
complementary base pairs CG, AT
hybridization via base pairing
DNA, Wikimedia Commons
5
3
5
3
A
T
A
T
A
T
T
T
C
G
G
G
G
C
G
C
T
A
T
A
3
5
3
5
Perfect hybridization
Imperfect hybridization
Background: DNA
9
Desired properties
•  Desired properties coming from real applications
•  Notice that properties are not the same for all applications
Modeling
Design Goals
Uniform Stability
Non-interaction
5
3
5
3
A
T
A
C
A
T
A
A
C
G
C
C
G
C
G
C
T
A
T
C
3
5
3
5
10
DNA Codes Design Problem description
Input data:
•  The alphabet {A, C, G, T}
•  A fixed length n for the codewords
•  A required distance d among codewords (used by constraints
in Z)
• A set Z of constraints (explained in the next slides)
Optimization objective:
•  Find the largest possible set of codewords (= code) of length
n on alphabet {A, C, G, T}, feasible with respect to constraints
Z (based on d)
Why to maximize the size of the code? To have
more flexibility in the applications seen before!
11
DNA Codes Design Problem description
Code (solution)
ACCTGATT
TCACCATG
ATTCCCAG
CTACTACG
ACCTTTTT
GGCTTTTA
TATATATA
TTGGCCAA
CATTCACC
CTATTCAC
GATTCAAT
GCGCGCGC
GCTTATTC
CCGTTACA
Example
Codeword
AATTCCGG
Word Length n = 8
The solution respects a
given a constraints set Z
(we do not know Z at
this stage!)
12
DNA Codes Design Problem description
Requirements of a DNA Code
•  Success in specific hybridization between a
DNA codeword and its complement.
•  No hybridization between DNA codewords from
the same DNA code or between a DNA
codeword and others complement.
How do these requirements translate into our
constraints set Z?
13
DNA Codes Design Problem description
Constraints considered (set Z):
•  Requirement: the distance between two codewords must be large (no
hybridization).
•  Answer: HD (Hamming Distance)
-  Given two codewords w1 and w2
-  H(w1, w2) = number of positions i in which the ith letter of w1
differs from the ith letter of w2
-  example: w1 = GCTA, w2 = ATTA, H(w1, w2) = 2
-  Constraint: H(w1, w2) ≥ d
14
DNA Codes Design Problem description
Constraints considered (set Z):
•  Requirement: the number of G or C of each codeword must be the
same (uniform stability) [=> self-hybridization is likely]
•  Answer: GC (GC-content constraint)
-  A fixed number of the letters of each word has to be
either G or C: floor(n/2) in our case
-  example: ATA is not feasible, AGA is feasible
15
DNA Codes Design Problem description
•  Requirement: the distance between a codeword and the complement of
another codeword must be large.
Watson-Crick complement of a DNA codeword
wcc(w) = Watson-Crick complement of a DNA codeword w,
obtained by reversing w and then by replacing each A in w
by T (and vice-versa) and each C in G (and vice-versa)
-  example: wcc(ATGC) = GCAT
16
DNA Codes Design Problem description
Constraints considered (set Z):
• Requirement: the distance between a codeword and the complement of
another codeword must be large.
•  Answer: RC (Reverse Complement Hamming distance)
-  Given two codewords w1 and w2
-  example: GCTA, ATGC
H(GCTA, wcc(ATGC)) = H(GCTA,GCAT) = 2
-  Constraint: H(w1, wcc(w2)) ≥ d
17
Example of a problem and its solution
•  Input data: n = 4, d = 3.
•  Constraints considered: HD, GC, RC
•  Solution:
the largest possible code with the characteristics above contains
6 codewords.
Optimal code with respect to the constraints considered (not
unique!):
CTTC
GGTT
GTCA
AGGA
ACTG
TTGG
18
Problem description
Important observation
•  Other kinds of constraints are possible.
•  They depend on the real-world application
considered
•  In this mini-course we limit ourselves to the
constraints on the previous slides
19
Outline
• 
• 
• 
• 
• 
• 
Introduction
The DNA Codes Design problem
Approaches in the literature
Construction heuristics
Simple local searches
Metaheuristics
– 
– 
– 
– 
Intro to Stochastic Local Search
Applications to the DNA codes design problem
Intro to Variable Neighbourhood Search
Applications to the DNA codes design problem
•  Bibliography
20
Approaches from the literature
TEMPLATE-MAP DESIGN
• 
Find the largest possible set of 8-mers with
–  50% GC content in each word
–  at least four mismatches between each word and the complement of each distinct word
(reverse-complement constraint)
–  at least four mismatches between each pair of words (direct Hamming constraint)
–  based on template-map design
Frutos A.G., Liu, Q., Thiel A.J., Sanner A.M.W., Condon A.E., Smith L.M., Corn
R.M. Demonstration of a word design strategy for DNA computing on surfaces.
Nucleic Acids Res. 25, 4748-4757 (1997)
Arita, M., Kobayashi, S. DNA sequence design using templates. New Generation
Computing, 20, 263-277 (2002).
Kobayashi, S., Konto, T., Arita, M. On template methods for DNA sequence
design. Lecture Notes in Computer Science, 2568, 205-214 (2003).
Koul, N. Heuristic Algorithms for Construction of Constant GC content DNA
codes. Master thesis, USI (2010).
21
TEMPLATE-MAP DESIGN
Approaches from the literature
•  The selection of maps and
templates is based on
reasoning and theoretical
results
•  Difficult to apply results to
different problems: not a
general approach
22
Approaches from the literature
MATHEMATICAL CONSTRUCTIONS
• 
• 
Approaches adapted from classic Coding Theory
Theoretical results, based on the characteristics of the
desired code, are used to produce mathematical
constructions leading to (very regular) codes
•  Example:
Theorem If C0 is a code that is fixed by reverse
permutation R, then the subcode C1 of C0 consisting
of the codewords that are unchanged by R is obtained
as the intersection of C0 and the code R(C0).
•  Not a general
method. Results
typically hold for the
problem under
investigation only
•  The codes obtained
are very regular. For
many applications this
is not desirable
King, O. D. Bounds for DNA codes with constant GC-content. Electronic
Journal of Combinatorics, 10, #R33 (2003).
Gaborit P., King O. D. Linear construction for DNA codes. Theoretical
Computer Science, 334, 99-113 (2005).
Neelakandan, I. New Approaches for Constructing Constant Weight Binary
Codes. Master thesis, USI (2010).
23
Approaches from the literature
HEURISTIC ALGORITHMS
• 
Many of the classic heuristic algorithms have been adapted, implemented and tested
• 
We will see some of them in details…!
24
Outline
• 
• 
• 
• 
• 
• 
Introduction
The DNA Codes Design problem
Approaches in the literature
Construction heuristics
Simple local searches
Metaheuristics
– 
– 
– 
– 
Intro to Stochastic Local Search
Applications to the DNA codes design problem
Intro to Variable Neighbourhood Search
Applications to the DNA codes design problem
•  Bibliography
25
Construction Heuristics
Construction Heuristic (CH)
All possible codewords with the required GC-content are examined in a
given order.
Codewords are incrementally accepted if feasible with respet to the
already accepted ones.
Smith, D.H., Hughes L.A., Perkins S. A new table of constant weight binary
codes of length grater than 28. Electron. J. of Combinatorics, 13(1), #A2 (2006).
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Math. Modelling and
Algorithms 7, 311-326 (2008).
Montemanni, R., Smith, D.H. Heuristic algorithms for constructing binary
constant weight codes. IEEE Transactions on Information Theory 55(10),
4651-4656 (2009)
26
Construction Heuristics
Example: n = 4, d = 3.
Constraints: HD, GC, RC
Lexicographic order:
AACC AACG AAGC AAGG ACAC ACAG ACCA ACCT ACGA ACGT ACTC ACTG AGAC AGAG
AGCA AGCT AGGA AGGT AGTC AGTG ATCC ATCG ATGC ATGG CAAC CAAG CACA CACT
CAGA CAGT CATC CATG CCAA CCAT CCTA CCTT CGAA CGAT CGTA CGTT CTAC CTAG
CTCA CTCT CTGA CTGT CTTC CTTG GAAC GAAG GACA GACT GAGA GAGT GATC GATG
GCAA GCAT GCTA GCTT GGAA GGAT GGTA GGTT GTAC GTAG GTCA GTCT GTGA GTGT
GTTC GTTG TACC TACG TAGC TAGG TCAC TCAG TCCA TCCT TCGA TCGT TCTC TCTG
TGAC TGAG TGCA TGCT TGGA TGGT TGTC TGTG TTCC TTCG TTGC TTGG
Solution: AACC ACAG AGGA CCTA GTCA
27
Construction Heuristics
•  The method works over any possible order of the nodes
(lexicographic, reverse lexicographic, random) => different
algorithms in fact…
•  Computational experiments suggest that random orders guarantee
better results on DNA code design problems
•  Slow for large problems (all possible codewords have to be
examined!)
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. J. of Math. Modelling and
Algorithms 7, 311-326 (2008).
28
Outline
• 
• 
• 
• 
• 
• 
Introduction
The DNA Codes Design problem
Approaches in the literature
Construction heuristics
Simple local searches
Metaheuristics
– 
– 
– 
– 
Intro to Stochastic Local Search
Applications to the DNA codes design problem
Intro to Variable Neighbourhood Search
Applications to the DNA codes design problem
•  Bibliography
29
Seed Building local search
Seed Building (SB)
Iterative approach
A set of seed codewords is considered
The set of seed codewords is dynamically adapted through iterations
During each iteration:
•  All possible codewords with the required GC-content are examined in a given
order.
•  Codewords are incrementally accepted if feasible with those already accepted in
the current iteration and with the seed codewords.
Statistics are used to expand or contract the set of seed codewords every ItrSeed
iterations, based on the quality of the solutions built.
Brouwer A.E., Shearer J.B., Sloane N.J.A., Smith W.D. A new table of constant
weight codes. IEEE Trans. Inf. Theory 36, 1334-1380 (1990).
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. J. of Math. Modelling and
Algorithms 7, 311-326 (2008).
30
Seed Building local search
Seed
codewords
management
31
Seed Building local search
Example: n = 4, d = 3.
Constraints: HD, GC, RC
Seed codewords: AACC ACAG
Random order:
CTTC CTTG CTCA CTCT CTGA CTGT CTAC CTAG CATC CATG CACA CACT CAGA
CAGT CAAC CAAG CCTA CCTT CCAA CCAT CGTA CGTT CGAA CGAT GTTC GTTG
GTCA GTCT GTGA GTGT GTAC GTAG GATC GATG GACA GACT GAGA GAGT GAAC
GAAG GCTA GCTT GCAA GCAT GGTA GGTT GGAA GGAT TTCC TTCG TTGC TTGG
TACC TACG TAGC TAGG TCTC TCTG TCCA TCCT TCGA TCGT TCAC TCAG TGTC
TGTG TGCA TGCT TGGA TGGT TGAC TGAG ATCC ATCG ATGC ATGG AACC AACG
AAGC AAGG ACTC ACTG ACCA ACCT ACGA ACGT ACAC ACAG AGTC AGTG AGCA
AGCT AGGA AGGT AGAC AGAG
Solution: AACC ACAG CCTA GTCA TCCT
32
Seed Building local search
•  The method works over any possible order of the nodes
(lexicographic, reverse lexicographic, random).
•  Experiments clearly show that a random order has to be
preferred for DNA codes design problems.
•  The process of identify a good set of codewords is
intrinsically difficult => codes produced are sometimes very
good and sometimes very poor => not a very robust method
•  Slow for large problems (all possible codewords are
examined at each iteration!)
33
Clique Search local search
•  Clique
Given an undirected graph G, a clique is a set of the vertices in
which every vertex is connected to every other vertex of the clique
•  Maximal clique problem
Given an undirected graph G, identify the largest (number of nodes)
clique of G
•  Complexity
Classic NP-hard problem
•  {0, 3, 4} is a clique
•  {2, 3, 4, 5} is a
maximal clique
34
Clique Search local search
Clique Search (CS)
Iterative approach
A partial code can be completed by solving a subproblem (which is a
maximum clique problem) to optimality
During each iteration:
•  All possible codewords with the required GC-content are examined in a
random order.
•  Codewords are accepted for the second phase if feasible with those of the
partial code.
•  A maximum clique problem is solved on the set of accepted codewords to
complete the partial code
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Math. Modelling and
Algorithms 7, 311-326 (2008).
Montemanni, R., Smith, D.H. Heuristic algorithms for constructing binary
constant weight codes. IEEE Transactions on Information Theory 55(10),
4651-4656 (2009)
35
Clique Search local search
36
Clique Search local search
Example: n = 4, d = 3. Constraints: HD, GC, RC
Partial code: CTTC CGAA TGGT GTGA
Maximum clique problem on feasible extensions of the partial
solution:
CACT
AGTG
AAGC
GCTT
37
Clique Search local search
Example: n = 4, d = 3. Constraints: HD, GC, RC
Partial code: CTTC CGAA TGGT GTGA
Maximum clique problem on feasible extensions of the partial
solution:
CACT
AGTG
AAGC
GCTT
Solution: CTTC CGAA TGGT GTGA CACT GCTT
38
Clique Search local search
•  Solving a maximum clique problem (sub-procedure) is an NPhard problem itself!
•  Heuristics have to be used for the maximum clique problem
=> no optimality is guarantee for the sub-problem solutions
•  The choice of the number of codewords to eliminate is crucial
!  too many codewords eliminated => very large maximum
clique problem => high probability of having suboptimality
!  not enough codewords eliminated => very likely to find a
code with the same number of codewords of the original
!  This aspect deserves a deeper study to tackle large problems!
39
Hybrid Search local search
Hybrid Search (HS)
Iterative approach
Merges the concepts of the two methods analyzed before.
A set of seed codewords is managed exactly as in Seed Building.
Seed codewords represent the partial code in the context of the Clique
Search.
A relaxed distance d' < d is introduced.
A candidate code has to be at least at distance d from the seeds, and d' from
the other candidate codes (this to keep the maximum clique problem to a
reasonable size!)
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
40
Hybrid Search local search
Seed Building
Clique Search
41
Hybrid Search local search
Example: n = 4, d = 3. Constraints: HD, GC, RC
Partial code (seed codewords): CAAC AGAG
Maximum clique problem on feasible extensions of the partial solution (heuristic
distance d'=1 to reduce the codewords considered):
TGGT
TCTC
TGTC
TTGC
TAGG
TACG
ATGC
ACTC
42
Hybrid Search local search
Example: n = 4, d = 3. Constraints: HD, GC, RC
Partial code (seed codewords): CAAC AGAG
Maximum clique problem on feasible extensions of the partial solution (heuristic
distance d'=1 to reduce the codewords considered):
TGGT
TCTC
TGTC
TTGC
TAGG
TACG
ATGC
ACTC
Solution: CAAC AGAG TCTC TGGT TACG ATGC
43
Hybrid Search local search
•  Sums the advantages of Seed Building to those of Clique Search
but…
•  There is the risk of summing up drawbacks instead!
•  The method deserves a further detailed study for larger problems
44
Experimental comparison of some of the heuristic
algorithms
Experimental settings
Methods coded in ANSI C
Experiments on Dual AMD Opteron 250 2.4GHz / 4GB RAM
machines
Maximum computation times: 10'000 seconds (2.8 hours)
Statistics over 5 runs for each combination problem/method
ACstrs
(5,3,2) identifies the problem with constraints Cstrs (HD is always
4
present, and therefore not listed), and with n = 5, d = 3, and GC content
= floor(n/2) = 2. [this funny notation comes from coding theory…]
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
45
Experimental comparison of some of the heuristic
algorithms
•  SB = Seed Building
•  CS = Clique Search
•  HS = Hybrid Search
46
Experimental comparison of some of the heuristic
algorithms
•  SB = Seed Building
•  CS = Clique Search
•  HS = Hybrid Search
47
Experimental comparison of some of the heuristic
algorithms
Comments
•  No clear ranking is possible among the methods considered:
Seed Building, Clique Search, and Hybrid Search
•  Methods are therefore likely to represent different
neighbourhoods
48
Idea
•  All the methods seen until now work on the search space of
feasible solutions (we never have constraints violated…)
•  What if we move into the search space of infeasible solutions?
=> we will have to minimize (i.e. bring down to zero!) a
measure of infeasibility!
•  This makes it possible to develop a completely different kind
of local search!
•  It is likely that the search space is visited in a different way by
such a family of algorithms…
49
Iterated Greedy Search local search
Iterated Greedy Search (IGS)
Iterative approach Working on an infeasible code W, trying to make it feasible.
Measure of the infeasibility of W:
where w = floor(n/2)
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
50
Iterated Greedy Search local search
Iterated Greedy Search (IGS)
An infeasible solution is obtained by adding a random codeword to a perturbed feasible
solution
During each iteration:
•  A codeword σ is selected at random and the optimal (according to Inf(W)) change of one
bit of σ is carried out.
•  If Inf(W)=0, we are done, and we can add a random codeword
51
Iterated Greedy Search local search
Perturbation of
the solution
Optimization
of the solution
52
Iterated Greedy Search local search
Example: n = 4, d = 3. Constraints: HD, GC, RC
W
Inf(W)
...
TGGT GACC CGAA TCAC CCTT
1
TGGT GACT CGAA TCAC CCTT
0
TGGT GGCA CGAA TCAC CCTT TTTG
8
TGGT GGCA CGTA TCAC CCTT TTTG
8
TGGT GGCA CGTA TCAC GCTT TTTG
7
TGGT GGCA CGTC TCAC GCTT TTTG
…
7
TGGT AGTG CGTC TCAC GCTT TTTG
4
TGGT AGTG CGTC TCAC GCTT TTCG
TGGT AGTG CTTC TCAC GCTT TTCG
3
0
TGGT AGTG GTAG TCAC GGTT TTCG AACT
9
TGGT AGTG GTAG TCTC GGTT TTCG AACT
9
...
53
Iterated Greedy Search local search
•  We change exactly one bit of a random codeword at each
iteration: more complex neighbourhoods could be considered…
•  We never accept changes that make the solution worse: might be
an idea to escape from local minima
•  A further investigation is deserved…
54
Experimental comparison of some of the heuristic
algorithms
Experimental settings
Methods coded in ANSI C
Experiments on Dual AMD Opteron 250 2.4GHz / 4GB RAM
machines
Maximum computation times: 10'000 seconds (2.8 hours)
Statistics over 5 runs for each combination problem/method
ACstrs
(5,3,2) identifies the problem with constraints Cstrs (HD is always
4
present, and therefore not listed), and with n = 5, d = 3, and GC content
= floor(n/2) = 2. [this funny notation comes from coding theory…]
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
55
Experimental comparison of some of the heuristic
algorithms
•  SB = Seed Building
•  CS = Clique Search
•  HS = Hybrid Search
•  IGS = Iterated Greedy Search
56
Experimental comparison of some of the heuristic
algorithms
•  SB = Seed Building
•  CS = Clique Search
•  HS = Hybrid Search
•  IGS = Iterated Greedy Search
57
Experimental comparison of some of the heuristic
algorithms
Comments
•  No clear ranking is possible among the methods considered:
Seed Building, Clique Search, Hybrid Search and Iterative
Greedy Search
•  Methods are likely to represent different neighbourhoods
58
Outline
• 
• 
• 
• 
• 
• 
Introduction
The DNA Codes Design problem
Approaches in the literature
Construction heuristics
Simple local searches
Metaheuristics
– 
– 
– 
– 
Intro to Stochastic Local Search
Applications to the DNA codes design problem
Intro to Variable Neighbourhood Search
Applications to the DNA codes design problem
•  Bibliography
59
Stochastic Local Search: Simple SLS methods
Goal:
Effectively escape from local minima of given evaluation function.
General approach:
For fixed neighbourhood, use step function that permits
worsening search steps.
Specific methods:
•  Randomized Iterative Improvement
•  Simulated Annealing
•  Attribute Based Hill Climber
•  Dynamic Local Search
•  Iterated Local Search
•  Tabu Search
60
Stochastic Local Search:
Randomized Iterative Improvement
Key idea:
In each search step, with a fixed probability perform an uninformed random walk
step instead of an iterative improvement step.
Randomized Iterative Improvement (RII):
determine initial candidate solution s
while termination condition is not satisfied do
With probability p:
choose a neighbor s0 of s uniformly at random
Otherwise:
choose a neighbor s0 of s such that g(s0) < g(s) or,
if no such s0 exists, choose s0 such that g(s0) is minimal
s := s0
Where g(s) is the objective function value (fitness) of solution s
61
Stochastic Local Search:
Randomized Iterative Improvement
Observations:
•  No need to terminate search when local minimum is encountered.
Instead: Impose limit on number of search steps or CPU time, from
beginning of search or after last improvement.
•  Probabilistic mechanism permits arbitrary long sequences of random walk steps
Therefore: When run sufficiently long, RII is guaranteed to find (optimal)
solution to any problem instance with arbitrarily high probability.
•  Generally, RII is often outperformed by more complex LS methods.
62
Outline
• 
• 
• 
• 
• 
• 
Introduction
The DNA Codes Design problem
Approaches in the literature
Construction heuristics
Simple local searches
Metaheuristics
– 
– 
– 
– 
Intro to Stochastic Local Search
Applications to the DNA codes design problem
Intro to Variable Neighbourhood Search
Applications to the DNA codes design problem
•  Bibliography
63
Stochastic Local Search for the DNA codes
design problem
Target: a code with k codewords
1.  Start with k random codewords
2.  Mark unsatisfied constraints (conflicts)
3.  If no unsatisfied constraints go to 8
4.  Pick 2 codewords involved in a conflict
5.  With probability p select a better word minimizing the number of conflicts
6.  Otherwise select a random codeword
7.  Go to step 3.
8.  Display all k codewords
It is a Randomized Iterative Improvement!
Tulpan, D.C., Hoos, H.H., Condon, A.E. Stochastic local search algorithms for
DNA word design. Lectures Notes in Computer Science, Springer, Berlin, 2568,
229-241 (2002).
Stochastic Local Search for
DNA codes design problem
64
the
Best Improvement
p
SBI
SI
SF
1-p
SRW
Random Walk
65
Stochastic Local Search for
DNA codes design problem
the
Initialization
No
Pick Conflict
Probability p
Best Improvement
Evaluate
Yes
Return Result
Probability 1-p
Random Walk
66
Stochastic Local Search for the DNA codes
design problem
Select Conflicts
Neighbourhood
Random Walk
Iterative / Best
Improvement
67
Stochastic Local Search for
DNA codes design problem
the
Given: a fixed set of constraints C, strand length n=8, set size k=14.
Current Set
1. ACCTGATT
8. TCACCATG
ATTCTCAG
2. TTTCTCAG
9. CTACTACG
3. ACCTTTTT
10. GGCTTTTA
4. TATATATA
11. TTGGCCAA
5. CATTCACC
12. CTATTCAC
6. GATTCAAT
13. GCGCGCGC
7. ATTCTCAA
14. CCGTTACA
Conflicts:
(1,3)
(1,5)
(2,7)
(2,9)(12,14)
(12,14)
Conflicts:
(1,3)
(1,5)
(2,11)
Pick Conflict
Neighbors: TTTCTCAG, AATCTCAG, …
1-p
p
Best Improvement
Random Walk
68
Thesis Contributions: C1 Development of novel optimization algorithms
Stochastic Local Search for
DNA codes design problem - results
the
Simple SLS without Random Replacement
k = {100, 120, 140}, n = 8, d = 4
HD constraint only
1000 successful runs
Comments:
• 
The number of
iterations required
increases with k
• 
The increase is more
dramatic when k is
high => risk of
stagnation
Distribution of the number of iterations required to have a feasible
solution for different values of k (target number of codewords)
69
Stochastic Local Search for
DNA codes design problem - results
the
SLS with Random Replacement vs Simple SLS
k = 70, n = 8, d = 4
HD, RC, GC constraints
1000 successful runs
Comments:
• 
Random Replacement
helps!
• 
Stagnation reduced
• 
Better robustness
Distribution of the number of iterations required to have a feasible
solution for k = 70 (target number of codewords)
70
Stochastic Local Search for
the DNA codes design problem
Scaling of SLS with Random Replacement
n = 8, d = 4
Number of search iterations
100000
HD
HD+GC
HD+GC+RC
10000
1000
100
10
20
40
Comments:
60
80
100
120
140
160
DNA set size
• 
SLS scales up better when less constraints are considered
• 
Why? Because less constraints => easier problem, intuitively
71
Stochastic Local Search for
DNA codes design problem - results
the
New bounds on the size of DNA codes
n
d
Previous best
SLS
6
3
56
85
10
5
132
256
14
7
240
500
18
9
380
1200
20
10
1520
2193
Note: HD, GC constraints.
72
Stochastic Local Search for
DNA codes design problem - results
the
Comments:
•  There are improvements over previous best.
•  The method is still extremely simple and intuitive
[good quality in general but...]
•  Is it possible to improve it with some refinement?
•  Where should we work to refine the method?
IDEA: trying different neighbourhoods!
73
Improved Stochastic Local Search for
the DNA codes design problem
Instead of simple 1-exchange !
• Combinatorial problem Π: DNA Word Design
•  Problem instance π : DNA/quaternary code design [ particular (n,d) combinations ]
•  Search space S(π): set of (code word) sets s
•  Neighborhood relation N(π): k-exchange + random based neighborhoods
•  Initialization function init(π): random choosing or predefined
•  Step function step(π): chooses with probability p between best improvement and
random walk
•  Terminate predicate terminate(π): a function depending on the number of iterations
performed or solution found
Tulpan, D.C. Hoos, H.H. Hybrid randomised neighbourhoods improve stochastic
local search for DNA code design. Lectures Notes in Computer Science, Springer,
Berlin, 2671, 418-433 (2003).
74
Improved Stochastic Local Search for
the DNA codes design problem
Neighbourhoods
Simple neighbourhoods
•  k-exchange / k-point mutation neighbourhoods
•  rotation-based neighbourhoods
•  random neighbourhoods
Complex neighbourhoods
•  1-exchange / 1-point mutation + rotation neighbourhoods
•  k-exchange / k-point mutation + random words neighbourhoods
•  1-exchange / 1-point mutation + rotations + random words negihbourhoods
75
Improved Stochastic Local Search for
the DNA codes design problem
Simple neighbourhoods
v-exchange / v-point mutation neighbourhoods
Example:
some of the codewords in the 2-exchange neighbourhood of CTA are:
ACA
GTT TTG TCA
76
Improved Stochastic Local Search for
the DNA codes design problem
Simple neighbourhoods
rotation-based neighbourhoods
Applying the neighbourhood to a given codeword, we get the
codewords obtained from the input codeword by shifting right the
codeword from 1 to n-1 positions.
Example:
CTA
=>
TAC, ACT
77
Improved Stochastic Local Search for
the DNA codes design problem
Simple neighbourhoods
random neighbourhoods
Example: some of the codewords in the random neighbourhood of CTA
are:
CAA CTT TTC TCA
78
Improved Stochastic Local Search for
the DNA codes design problem
Complex neighbourhoods
1-exchange + rotation neighbourhoods
v-exchange + random words neighbourhoods
1-exchange + rotations + random words neighbourhoods
•  These neighbourhoods are obtained by applying all the neighbourhoods
involved sequentially (repeated codewords have to be avoided)
•  When rotation is involved, it is applied to all the codewords obtained by the
neighbourhoods previously applied
79
Improved Stochastic Local Search for
the DNA codes design problem
The difference
is here!
80
Improved Stochastic Local Search for
the DNA codes design problem - results
k-exchange Neighbourhoods
k = 70, n = 8, d = 4
HD, RC, GC constraints
1000 successful runs
{1, 2, 3}-exchange neighbourhoods
Comments:
• 
Using larger neighbourhood seems to
helps but…
• 
The difference between 2-exchange and
3-exchange is not dramatic
• 
Larger neighbourhood means more time
at each iteration…
Distribution of the number of iterations required to have a feasible
solution for different v-exchange methods
Why 16?
2 words
I have to respect GC content
81
Improved Stochastic Local Search for
the DNA codes design problem - results
k-exchange Neighbourhoods
Time for 1 iteration
Neighbourhood
CPU Time
1-exchange
.0017
2-exchange
.0088
3-exchange
.0314
Comments:
• 
1-exchange is still the best in terms
of run times => not what we hoped!
Distribution of the CPU time required to have a feasible solution for
different v-exchange methods
82
Improved Stochastic Local Search for
the DNA codes design problem - results
Hybrid Randomized Neighbourhoods
k = 70, n = 8, d = 4
HD, RC, GC constraints
1000 successful runs
random, hybrid neighbourhoods
Comments:
• 
Pure random performs
surprisingly well
• 
1-exchange + random is
however the best method
Distribution of the number of iterations required to have a feasible
solution for different hybrid neighbourhoods
83
Improved Stochastic Local Search for
the DNA codes design problem
All combinations of neighbourhoods together (usual benchmark)
Comments:
• 
1-exchange +
rotation + random
is the most
promising
combination in
terms of number
of iterations
• 
Methods
including the
random
neighbourhood
are definitely
better
Distribution of the number of iterations required to have a feasible solution for
different neighbourhoods
84
Improved Stochastic Local Search for
the DNA codes design problem
Approximate CPU Cost per Iteration for all the combinations of neighbourhoods
considered
Neighbourhood Type
Neighbourhood size
CPU Time [sec]
1-exchange
2-exchange
3-exchange
1-exchange + rotations
random
1-exchange + random
16
72
184
128
128
16 + 112
.002184
.008830
.031493
.017294
.015100
.022889
2-exchange + random
3-exchange + random
1-exchange + rotations + random
72 + 112
184 + 112
128 + 100
.029167
.040833
.043333
Comment:
• 
1-exchange + random is a good compromise between speed and quality of the solutions
• 
Let’s see now what happen if we consider both the time spent on each iteration, and the
number of iterations required to converge… [next slide]
85
Improved Stochastic Local Search for
the DNA codes design problem
All combinations of neighbourhood together (usual benchmark)
Comments:
Distribution of the CPU time required to have a feasible solution for different
neighbourhoods
• 
Rotation is time
consuming =>
methods with
rotation are not so
convenient
anymore
• 
1-exchange +
random
neighbourhood is
far the most
promising
combination in
terms of CPU
time
86
Improved Stochastic Local Search for
the DNA codes design problem
Is this
randomized
step still
interesting?
87
Improved Stochastic Local Search for
the DNA codes design problem
k = 70, n = 8, d = 4
HD, RC, GC constraints
1000 successful runs
random, hybrid
neighbourhoods
Number of iterations to have a feasible solution for different
values of the randomizing parameter
Comments:
• 
The randomized step is useless when the hybrid randomized neighbourhood is used!
• 
This happens because the neighbourhood already does the “random work”
88
Improved Stochastic Local Search for
the DNA codes design problem
89
Improved Stochastic Local Search for
the DNA codes design problem - results
Scaling of the Improved SLS
n = 8, d = 4
HD, RC, GC constraints
1000 successful runs
1-exchange, random, hybrid
neighbourhoods
Comments:
• 
Surprising how pure random neighbourhood scales up well
• 
However, 1-exchange + random neighbourhood is the best
90
Improved Stochastic Local Search for
the DNA codes design problem
SLS Results and Analysis
•  New bounds for DNA set sizes
•  Improved SLS using various neighborhoods
Combinatorial constraints: HD, RC, GC
Improved SLS
(k)
Length
(n)
Hamming dist.
(d)
Existing Bounds
(k)
Simple SLS
(k)
4
3
-
5
6
8
4
108
112*
128
10
5
-
127
158
12
6
-
210
240
[Tulpan et al., 2002]
[Tulpan et al., 2003]
[Frutos et al., 1997]
Thesis Contributions: C1 Development of novel optimization algorithms
91
Improved Stochastic Local Search for
the DNA codes design problem
Conclusions
•  Random neighbourhoods => increased SLS performance
•  1-exchange + random neighbourhood is the best
combination
•  Larger DNA codes have been obtained
92
Another Stochastic Local Search for
the DNA codes design problem
•  A different SLS algorithm has been presented in the literature.
•  It can be seen as a Simulated Annealing algorithm without a
cooling schedule (constant temperature).
•  The current code L is always feasible
•  At each iteration a new (feasible) codeword s is added, and all
the codewords of L that are not compatible with s are
removed, leading to a new code L’
•  Code L’ is accepted with a certain probability depending on |
L’| - |L| (difference in the cardinalities of the two sets)
Chee, Y. M, Ling, S. Improved lower bounds for constant GC-content DNA
codes. IEEE Transactions on Information Theory, 54(1), 391-394 (2008).
93
Another Stochastic Local Search for
the DNA codes design problem
Max number of iterations
Code
Target number of codewords (k before)
Set of incompatible codes
Acceptance probability of the new code:
94
Another Stochastic Local Search for
the DNA codes design problem
Improvements over previous bests in the literature (theoretical methods, other SLSs and a
few more)
HD, GC and RC constraints
95
Stochastic Local Searches for
the DNA codes design problem
•  Different methods based on a similar idea
lead to very different codes
•  There is not a method dominating the others
•  The methods seem to explore the search space
in a different manner
•  Is it possible to combine the good property of
(some of) the different approaches into a
unique method?
96
Outline
• 
• 
• 
• 
• 
• 
Introduction
The DNA Codes Design problem
Approaches in the literature
Construction heuristics
Simple local searches
Metaheuristics
– 
– 
– 
– 
Intro to Stochastic Local Search
Applications to the DNA codes design problem
Intro to Variable Neighbourhood Search
Applications to the DNA codes design problem
•  Bibliography
97
VNS
98
VNS
99
VNS
100
VNS
101
VNS
102
Outline
• 
• 
• 
• 
• 
• 
Introduction
The DNA Codes Design problem
Approaches in the literature
Construction heuristics
Simple local searches
Metaheuristics
– 
– 
– 
– 
Intro to Stochastic Local Search
Applications to the DNA codes design problem
Intro to Variable Neighbourhood Search
Applications to the DNA codes design problem
•  Future research
•  Acknowledgment
•  Bibliography
103
A VNS algorithm for DNA codes design
A primitive Variable Neighbourhood Search (VNS) algorithm is
introduced.
It iteratively runs in turns the local search algorithms (basic
ingredients) seen before.
The reference solution for local searches is always the best solution
retrieved so far.
This is a Variable Neighbourhood Descent!
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes via a
variable neighbourhood search algorithm. Journal of Mathematical Modelling and
Algorithms 7, 311-326 (2008).
Montemanni, R., Smith, D.H. Heuristic algorithms for constructing binary constant
weight codes. IEEE Transactions on Information Theory 55(10), 4651-4656 (2009)
Montemanni, R., Smith, D.H., Koul, N. Three metaheuristics for the construction of
constant GC-content DNA codes. Post-proceedings of the VIII Metaheuristic
104
International Conference. S. Voss and M. Caserta eds., Springer (to appear)
A VNS algorithm for DNA codes design
Methods involved in
our implementation
105
A VNS algorithm for DNA codes design
•  We hope to take advantage of the different philosophies behind the
local search methods listed before
•  From previous experiments we know that the basic local searches
visit the search space is a different way
•  We hope basic local searches will help each other to exit from
local minima within a VNS framework
106
Experimental comparison of some of the heuristic
algorithms
Experimental settings
Methods coded in ANSI C
Experiments on Dual AMD Opteron 250 2.4GHz / 4GB RAM
machines
Maximum computation times: 10'000 seconds (2.8 hours)
Statistics over 5 runs for each combination problem/method
ACstrs
(5,3,2) identifies the problem with constraints Cstrs (HD is always
4
present, and therefore not listed), and with n = 5, d = 3, and GC content
= floor(n/2) = 2. [this funny notation comes from coding theory…]
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
107
Experimental comparison of some of the heuristic
algorithms
•  SB = Seed Building
•  CS = Clique Search
•  HS = Hybrid Search
•  IGS = Iterated Greedy Search
•  VNS = Variable Neighbourhood
Search
108
Experimental comparison of some of the heuristic
algorithms
•  SB = Seed Building
•  CS = Clique Search
•  HS = Hybrid Search
•  IGS = Iterated Greedy Search
•  VNS = Variable Neighbourhood
Search
109
Experimental comparison of some of the heuristic
algorithms
Comments
•  No clear ranking is possible among the basic methods considered:
Seed Building, Clique Search, Hybrid Search and Iterative Greedy
Search (as seen before…)
⇒  Methods are likely to represent different neighbourhoods
•  Variable Neighbourhood Search clearly dominates the other
methods
⇒  VNS takes advantage of the different neighbourhoods
⇒  VNS is likely to be competitive against all the other methods!
110
Experimental results of VNS
The VNS algorithm discussed in:
•  Montemanni, R., Smith, D.H. (2008). Construction of constant GC-content DNA codes via a
Variable Neighbourhood Search Algorithm. Journal of Mathematical Modelling and
Algorithms, 7, 311-326.
is compared with the methods discussed in the following 6 papers [which provide all the best
known codes]:
•  Li, M., Lee, H. J., Condon, A. E., and Corn, R. M. (2002). DNA word design strategy for
creating sets of non-interacting oligonucleotides for DNA microarrays. Langmuir, 18, 805-812.
•  Tulpan, D. C., Hoos, H. H., and Condon, A. E. (2002). Stochastic local search algorithms for
DNA word design. Lectures Notes in Computer Science, Springer, 2568, 229-241.
•  Tulpan, D. C. and Hoos, H. H. (2003). Hybrid randomised neighbourhoods improve
stochastic local search for DNA code design. Lectures Notes in Computer Science, Springer,
2671, 418-433.
•  King, O. D. (2003). Bounds for DNA codes with constant GC-content. Electronic Journal of
Combinatorics, 10, #R33.
•  Gaborit, P. and King, O. D. (2005). Linear construction for DNA codes. Theoretical
Computer Science, 334, 99-113.
•  Chee, Y. M. and Ling, S. (2008). Improved lower bounds for constant GC-content DNA
codes. IEEE Transactions on Information Theory, 54(1), 391-394.
Reference algorithm
Theor. Constructions
Heuristic Algorithms
111
Experimental results of VNS
Experimental settings
•  Methods coded in ANSI C
•  Experiments on Dual AMD Opteron 250 2.4GHz / 4GB RAM
machines
•  Maximum computation times: 100'000 seconds (27.8 hours)
=> Comparable with that of other heuristic algorithms
•  Best over 5 runs for each combination problem/method
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
112
Experimental results of VNS
•  We will consider 254 problems with
-  4 ≤ n ≤ 20
-  3 ≤ d ≤ n ≤ 20
-  Case 1: HD and GC constraints
-  Case 2: HD, RC and GC constraints
•  These settings matches those of the state-of-the-art tables
maintained at http://llama.med.harvard.edu/~king/dnacodes.html by O.D.
King (last checked November 2009)
•  We left out problems corresponding to very large codes (the
current VNS algorithm cannot tackle them)
113
Experimental results of VNS
•  over 254 problems considered:
•  in 128 cases the best known result is matched
•  in 52 cases a new best result is found
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
114
Detailed results of VNS
115
Detailed results of VNS
116
Detailed results of VNS
117
Detailed results of VNS
118
Experimental results of VNS
•  After the publication of the paper we have been improving
the VNS algorithms in many ways (work still in progress!)
•  over 254 problems considered:
•  in 128 132 cases the best known result is matched
•  in 52 87 cases a new best result is found
•  We miss the best known solution in 13.8% of the cases only!
Montemanni, R., Smith D.H. Metaheuristics for the construction of constant GC•  We feel there is room for further improvements…
content DNA codes. Proceedings of the MIC 2009 Conference (2009)
Montemanni, R., Smith, D.H., Koul, N. Three metaheuristics for the construction of
constant GC-content DNA codes. Post-proceedings of the VIII Metaheuristic
International Conference. S. Voss and M. Caserta eds., Springer (to appear)
119
Detailed results of VNS
Comments
•  VNS works (slightly) better on problems with RC contraints
•  Result confirmed also by our last improved implementations
•  Is this because the other methods are more competitive
without RC constraints?
YES => we might have not too much chances to improve
on problems without RC constraints
NO => we probably have chances to improve on problems
without RC constraints
=> Worth to be investigated!
120
Outline
• 
• 
• 
• 
• 
• 
• 
Introduction
Real applications
The DNA Codes Design problem
Approaches in the literature
Construction heuristics
Simple local searches
Metaheuristics
– 
– 
– 
– 
Intro to Stochastic Local Search
Applications to the DNA codes design problem
Intro to Variable Neighbourhood Search
Applications to the DNA codes design problem
•  Future research
•  Acknowledgment
•  Bibliography
121
Essential bibliography (1/4)
[HEUR] => Heuristics related publication.
Brenner, S., Lerner, R.A. (1992). Encoded combinatorial chemistry. Proceedings of the
National Academy of Science USA, 89, 5381-5383.
Adleman, L. (1994) Molecular computation of solutions to combinatorial problems. Science,
266, 1021-1024.
Frutos, A.G., Liu, Q., Thiel, A.J., Sanner, A.M.W., Condon, A.E., Smith, L.M., Corn, R.M.
(1997). Demonstration of a word design strategy for DNA computing on surfaces. Nucleic
Acids Research, 25, 4748-4757.
Hansen, P., Mladenovic, N. (2001). Variable neighbourhood search: principles and
applications. European Journal of Operational Research, 130, 449-467. [HEUR]
Marathe, A., Condon, A.E., Corn, R.M.. (2001). On combinatorial DNA word design.
Journal of Computational Biology, 8, 201-219.
Arita, M., Kobayashi, S. (2002). DNA sequence design using templates. New Generation
Computing, 20, 263-277.
122
Essential bibliography (2/4)
Li, M., Lee, H.J., Condon, A.E., Corn, R.M. (2002). DNA word design strategy for creating
sets of non-interacting oligonucleotides for DNA microarrays. Langmuir, 18, 805-812.
Tulpan, D.C., Hoos, H.H., Condon, A.E. (2002). Stochastic local search algorithms for DNA
word design. Lectures Notes in Computer Science, Springer, Berlin, 2568, 229-241.
[HEUR]
Tulpan, D.C. Hoos, H.H. (2003). Hybrid randomised neighbourhoods improve stochastic
local search for DNA code design. Lectures Notes in Computer Science, Springer, Berlin,
2671, 418-433. [HEUR]
King, O.D. (2003). Bounds for DNA codes with constant GC-content. Electronic Journal of
Combinatorics, 10, #R33. [HEUR]
Kobayashi, S., Konto, T., Arita, M. (2003). On template methods for DNA sequence design.
Lecture Notes in Computer Science, 2568, 205-214.
Hoos, H.H., Stuetzle, T. (2004). Stochastic Local Search: foundations and applications.
Morgan Kaufmann/Elsevier. [HEUR]
123
Essential bibliography (3/4)
Gaborit, P., King, O.D. (2005). Linear construction for DNA codes. Theoretical Computer
Science, 334, 99-113. [HEUR]
Tulpan, D.C. (2006). Effective heuristic methods for DNA strand design. PhD thesis,
University of British Columbia. [HEUR]
King, O.D. (2006). Tables of lower bounds for DNA codes with constant GC-content. http://
llama.med.harvard.edu/~king/dnacodes.html, last checked: November 2009. [HEUR]
Chee, Y. M, Ling, S. (2008). Improved lower bounds for constant GC-content DNA codes.
IEEE Transactions on Information Theory, 54(1), 391-394. [HEUR]
Montemanni, R., Smith, D.H. (2008). Construction of constant GC-content DNA codes via a
Variable Neighbourhood Search Algorithm. Journal of Mathematical Modelling and
Algorithms, 7, 311-326. [HEUR]
Montemanni, R., Smith, D.H. (2009). Heuristic algorithms for constructing binary constant
weight codes. IEEE Transactions on Information Theory 55(10), 4651-4656. [HEUR]
Montemanni, R., Smith D.H. (2009). Metaheuristics for the construction of constant GCcontent DNA codes. Proceedings of the MIC 2009 Conference. [HEUR]
124
Essential bibliography (4/4)
Montemanni, R., Smith D.H., Koul, N. (2010). Three metaheuristics for the construction of
constant GC-content DNA codes. Post-proceedings of the VIII Metaheuristic International
Conference. S. Voss and M. Caserta eds., Springer. [HEUR]
Tulpan, D., Montemanni, R., Ghiggi, A. (2010). Computational Sequence Design
Techniques for DNA Microarray Technologies. Submitted for publication. [HEUR]
Ghiggi, A. (2010). DNA strand design with thermodynamic constraints. Master thesis, USI.
[HEUR]
Koul, N. (2010). Heuristic Algorithms for Construction of Constant GC content DNA codes.
Master thesis, USI. [HEUR]
Neelakandan, I. (2010). New Approaches for Constructing Constant Weight Binary Codes.
Master thesis, USI. [HEUR]
125
Exercises 1
1.  We have the following code with n=4:
CGTA
GGAA
AATG
TAGA
a.  Does it respect the GC-content constraint?
b.  Does it respect the Hamming distance constraint for a DNA codes
design problem with d=2?
c.  Does it respect the Reverse Complement Hamming distance constraint
for a DNA codes design problem with with d=2?
2.  Given the settings n=4, d=3 and constraints HD, GC, RC, consider the
following code:
AACC
CAGT
GAAG
TCCT
TGAC
a.  Is it feasible?
b.  Can it be extended?
126
Exercises 2
1.  Given the settings n=3, d=2 and constraints HD, GC, show an execution of
the Construction Heuristic working on top of the inverse lexicographic order.
2.  Given the settings n=2, d=1 and constraints HD, GC, RC, show and
execution of the Construction Heuristic working on top of the lexicographic
order
3.  Given the settings n=3, d=2, constraints HD, GC, RC, and the following
partial code:
CTT
CAA
TGT
GTA
show an iteration of the Clique Search algorithm.
4.  Given the settings n=4, d=2, constraints HD, GC, RC, and the following
code:
TGGT
GACC
CGAA
TCTC
CGTT
calculate its measure of infeasibility Inf(W) according to the definition given
in slide 75 (Iterative Greedy Search)
127
Exercises 3
1.  Write the rotation neighbourhood of codeword CATGA.
2.  Write 5 of the codewords of the 3-exchange neighbourhood of codeword
CATGA.
5.  Write 5 of the codewords of the random neighbourhood of codeword
CATGA.
6.  Write 5 of the codewords of the 2-exchange + random neighbourhood of
codeword CATGA.
7.  Consider the SLS method described from slide 119 on, with input
parameters n=4, d=3, and constraints HD, GC, RC.
At a given iteration we have the following code L
CTTC
GGTT GTCA AGGA ACTG TTGG
and the selected random codeword is TTGC.
Write down code L’ (we do not care if it will be accepted or not)
128
Download