A Comparative Study of Techniques ... Side-chain Placement Eun-Jong Hong

advertisement
A Comparative Study of Techniques for Protein
Side-chain Placement
by
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
Eun-Jong Hong
B. S., Electrical Engineering
Seoul National University
(1998)
OCT 1 5 2003
LIBRARIES
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Science in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 2003
© Massachusetts Institute of Technology 2003. All rights reserved.
................
A uthor .......... ...
Department of Electrical Engineering and Computer Science
September 2, 2003
C ertified by ..
.......................
Tomas Lozano-Perez
Professor
T7hesis Supervisor
Accepted by....
Arthur C. Smith
Chairman, Department Committee on Graduate Students
SARKER
A Comparative Study of Techniques for Protein Side-chain
Placement
by
Eun-Jong Hong
Submitted to the Department of Electrical Engineering and Computer Science
on September 2, 2003, in partial fulfillment of the
requirements for the degree of
Master of Science in Electrical Engineering and Computer Science
Abstract
The prediction of energetically favorable side-chain conformation is a fundamental
element in homology modeling of proteins and the design of novel protein sequences.
The space of side-chain conformations can be approximated by a discrete space of
probabilistically representative side-chain orientations (called rotamers). The problem is, then, to find a rotamer selection for each amino acid that minimizes a potential
function. This problem is an NP-hard optimization problem. The Dead-end elimination (DEE) combined with the A* algorithm has been successfully applied to the
problem. However, DEE fails to converge for some classes of complex problems.
In this research, we explore three different approaches to find alternatives to
DEE/A* in solving the GMEC problem. We first use integer programming formulation and the branch-and-price algorithm to obtain the exact solution. There are
known ILP formulations that can be directly applied to the GMEC problem. We review these formulations and compare their effectiveness using CPLEX optimizers. At
the same time, we do a preliminary work to apply the branch-and-price to the GMEC
problem. W suggest possible decomposition schemes, and assess the feasibility of the
decomposition schemes by a computational experiment.
As the second approach, we use nonlinear programming techniques. This work
mainly relies on the continuation approach by Ng [31]. Ng's continuation approach
finds a suboptimal solution to a discrete optimization problem by solving a sequence
of related nonlinear programming problems. We implement the algorithm and do a
computational experiment to examine the performance of the method on the GMEC
problem.
We finally use the probabilistic inference methods to infer the GMEC using the
energy terms translated into probability distributions. We implement probabilistic relaxation labeling, the max-product belief propagation (BP) algorithm, and the MIME
double loop algorithm to test on side-chain placement examples and some sequence
design cases. The performance of the three methods are also compared with the ILP
method and DEE/A*.
The preliminary results suggest that probabilistic inference methods, especially
2
the max-product BP algorithm is most effective and fast among all the tested methods. Though the max-product BP algorithm is an approximate method, its speed and
accuracy are comparable to those of DEE/A* in side-chain placement, and overall superior in sequence design. Probabilistic relaxation labeling shows slightly weaker performance than the max-product BP, but the method works well up to medium-sized
cases. On the other hand, the ILP approach, the nonlinear programming approach,
and the MIME double loop algorithm turns out to be not competitive. Though the
three methods have different appearances, they are all based on the mathematical formulation and optimization techniques. We find such traditional approaches require
good understanding of the methods and considerable experimental efforts and expertise. However, we also present the results from these methods to provide reference
for future research.
Thesis Supervisor: Tomas Lozano-Perez
Title: Professor
3
Acknowledgments
I would like to thank my advisor, Prof. Tomas Lozano-Perez, for his kind guidance
and continuous encouragement. Working with him was a great learning experience
and taught me the way to do research. He introduced me to the topic of side-chain
placement as well as computational methods such as integer linear programming and
probabilistic inference. He has been so much involved in this work himself, and most
of ideas and implementations in Chapter 4 are his contributions. Without his handson advice and considerate mentorship, this work was not able to be finished.
I also would like to thank Prof. Bruce Tidor and members of his group. Prof.
Tidor allowed me to access the group's machines, and use protein energy data and
the DEE/A* implementation. Alessandro Senes, who used to be a post-Doc in the
Tidor group helped setting up the environment for my computational experiments.
I appreciate his friendly responses to my numerous requests and answers to elementary chemistry questions. I thank Michael Altman and Shaun Lippow for providing
sequence design examples and providing helps in using their programs, Bambang Adiwijaya for the useful discussion on delayed column generation.
I thank Prof. Piotr Indyk and Prof. Andreas Schultz for their helpful correspondence and suggestions on the problem, Prof. James Orlin and Prof. Dimitri Bertsekas
for their time and opinions. Junghoon Lee was a helpful source on numerical methods, and Yongil Shin became a valuable partner in discussing statistical physics and
probability. Prof. Ted Ralphs of Lehigh University and Prof. Kien-Ming Ng of NUS
gave helpful comments on their software and algorithm.
Last but not least, I express my deep gratefulness to my parents and sister for
their endless support and love that sustain me throughout all the hard work.
4
Contents
1
11
Introduction
1.1
1.2
1.3
11
Global minimum energy conformation ........
1.1.1
NP-hardness of the GMEC problem . . . .
12
1.1.2
Purpose and scope . . . . . . . . . . . . .
13
Related work . . . . . . . . . . . . . . . . . . . .
14
1.2.1
Integer linear programming (ILP) . . . . .
14
1.2.2
Probabilistic methods . . . . . . . . . . . .
14
Our approaches . . . . . . . . . . . . . . . . . . .
15
2 Integer linear programming approach
2.1
2.2
2.3
17
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.1.1
The first classical formulation by Eriksson, Zhou, and Elofsson
17
2.1.2
The second classical formulation . . . . . . . . . . . . . . . . .
19
2.1.3
The third classical formulation . . . . . . . . . . . . . . . . . .
20
2.1.4
Computational experience . . . . . . . . . . . . . . . . . . . .
21
ILP formulations
Branch-and-price
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.2.1
Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.2.2
Branching and the subproblem
. . . . . . . . . . . . . . . . .
31
2.2.3
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . .
32
2.2.4
Computational results and discussions
. . . . . . . . . . . . .
35
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
Sum m ary
5
3
Nonlinear programming approach
3.1
3.2
3.3
3.4
Ng's continuation approach
41
....................
. . .
42
3.1.1
Continuation method ....................
. . .
42
3.1.2
Smoothing algorithm . . . . . . . . . . . . . . . . . . . . . . .
42
3.1.3
Solving the transformed problem
. . . . . . . . . . . . . . . .
45
Algorithm implementation . . . . . . . . . . . . . . . . . . . . . . . .
48
3.2.1
The conjugate gradient method . . . . . . . . . . . . . . . . .
48
3.2.2
The adaptive linesearch algorithm.
. . . . . . . . . . . . . . .
48
Computational results
. . . . . . . . . . . . . . . . . . . . . . .
3.3.1
Energy clipping and preconditioning the reduced-Hessian system 53
3.3.2
Parameter control and variable elimination . . . . . . . . . . .
53
3.3.3
R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
Sum m ary
4 Probabilistic inference approach
4.1
4.2
4.3
5
50
60
M ethods . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
60
4.1.1
Probabilistic relaxation labeling
. . . . . . . . . . . . . . . .
61
4.1.2
BP algorithm . . . . . . . . . .
. . . . . . . . . . . . . . . .
61
4.1.3
Max-product BP algorithm
. .
. . . . . . . . . . . . . . . .
62
4.1.4
Double loop algorithm . . . . .
. . . . . . . . . . . . . . . .
63
4.1.5
Implementation . . . . . . . . .
. . . . . . . . . . . . . . . .
64
Results and discussions . . . . . . . . .
. . . . . . . . . . . . . . . .
65
4.2.1
Side-chain placement . . . . . .
. . . . . . . . . . . . . . . .
65
4.2.2
Sequence design . . . . . . . . .
. . . . . . . . . . . . . . . .
74
. . . . . . . . . . . . . . . .
76
Summary
. . . . . . . . . . . . . . . .
Conclusions and future work
77
6
List of Figures
2-1
Problem setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2-2
A feasible solution to the GMEC problem
. . . . . . . . . . . . . . .
26
2-3
A set of minimum weight edges . . . . . . . . . . . . . . . . . . . . .
26
2-4
Path starting from rotamer 2 of residue 1 and arriving at the same
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2-5
An elongated graph to calculate shortest paths . . . . . . . . . . . . .
28
2-6
Fragmentation of the maximum clique
. . . . . . . . . . . . . . . . .
29
4-1
An example factor graph with three residues . . . . . . . . . . . . . .
62
4-2
The change in the estimated entropy distribution from MP for lamm-80. 71
4-3
The change in the estimated entropy distribution from MP for 256b-80. 72
4-4
Histogram of estimated entropy for lamm-80 and 256b-80 at conver-
rotam er
4-5
gence of M P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
Execution time for MP in seconds vs log total conformations . . . . .
73
7
List of Tables
2.1
Comparison of three classical formulations (protein: PDB code, #res:
number of modeled residues, LP solution: I - integral, F - fractional,
TLP: time taken for CPLEX LP Optimizer, Tip: time taken for
CPLEX MIP Optimizer, symbol - : skipped, symbol * : failed).
2.2
. .
.
22
Test results for BRP1 (IS*I: total number of feasible columns, #nodes:
number of explored branch-and-bound nodes, #LP: number of solved
LPs until convergence, #cols: number of added columns until convergence, (frac): #cols to IS*1 ratio, #LPop: number of solved LPs until
reaching the optimal value, #colsop:
number of added columns until
reaching the optimal value, TLP: time taken for solving LPs in seconds,
Taub:
time taken for solving subproblems in seconds, symbol - : IS*1
. . . . . . .
37
. . . . . . . . . . . . . . . . . . . . . . . . . .
38
. . . . . . . . . . . . . . . . . . . . . . . . . . .
38
calculation overflow, symbol *: stopped while running).
2.3
Test results for BRP2.
2.4
Test result for BRP3.
2.5
Test results with random energy examples (#nodes: number of explored branch-and-bound nodes, LB: lower bound from LP-relaxation,
T: time taken to solve the instance exactly, symbol * : stopped while
running). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.1
Logarithmic barrier smoothing algorithm . . . . . . . . . . . . . . . .
47
3.2
The preconditioned CG method for possibly indefinite systems. . . . .
49
3.3
The adaptive linesearch algorithm . . . . . . . . . . . . . . . . . . . .
51
8
3.4
The parameter settings used in the implementation. . . . . . . . . . .
3.5
Results for SM2. (protein: PDB code, #res: number of residues, #var:
55
number of variables, optimal: optimal objective value, smoothing: objective value from the smoothing algorithm, #SM: number of smoothing iterations, #CG: number of CG calls, time: execution time in seconds, #NC: number of negative curvature directions used, - 0 : initial
. . . . . . . . . . . . . . .
56
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
value of the quadratic penalty parameter.)
3.6
Results for SM1.
4.1
The protein test set for the side-chain placement (logloconf: log total
conformations, optimal: optimal objective value,
tion time, TIp: IP solver solution time,
TDEE:
TLP:
LP solver solu-
DEE/A* solution time,
symbol - : skipped case, symbol * : failed, symbol F : fractional LP
solution).
4.2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
Results for the side-chain placement test (AE: the difference from the
optimal objective value, symbol - : skipped).
. . . . . . . . . . . . .
69
4.3
The fraction of incorrect rotamers from RL, MP, and DL. . . . . . . .
69
4.4
Performance comparison of MP and DEE/A* on side-chain placement
(optimal: solution value from DEE/A*, EMP: solution value from MP,
AE: difference between 'optimal' and EMp, IRMp: number of incorrectly predicted rotamers by MP,
TDEE:
time taken for DEE/A* in
seconds, TMp: time taken for MP in seconds).
4.5
. . . . . . . . . . . .
70
Fraction of incorrectly predicted rotamers by MP and statistics of the
estimated entropy for the second side-chain placement test. (IR fraction: fraction of incorrectly predicted rotamers, avg Si: average estimated entropy, max Si: maximum estimated entropy, min Si: minimum estimated entropy,
predicted rotam ers).
4.6
miniEIR
Si: minimum entropy of incorrectly
. . . . . . . . . . . . . . . . . . . . . . . . . . .
73
The protein test set for sequence design (symbol ? : unknown optimal
value, symbol * : failed).
. . . . . . . . . . . . . . . . . . . . . . . .
9
75
4.7
Results for sequence design test cases whose optimal values are known.
4.8
Results for sequence design test cases whose optimal values are unknown (E: solution value the method obtained). . . . . . . . . . . . .
10
75
75
Chapter 1
Introduction
1.1
Global minimum energy conformation
The biological function of a protein is determined by its three-dimensional (3D) structure. Therefore, an understanding of the 3D structures of proteins is a fundamental
element in studying the mechanism of life. The widely used experimental techniques
for determining the 3D structures of proteins are X-ray crystallography and NMR
spectroscopy, but their uses are difficult and sometimes impossible because of the
high cost and technical limits. On the other hand, the advances in genome sequencing techniques have produced an enormous number of amino-acid sequences whose
folded structures are unknown. Researchers have made remarkable achievements in
exploiting the sequences to predict the structures computationally. Currently, due
to the increasing number of experimentally known structures and the computational
prediction techniques such as threading, we can obtain approximate structures corresponding to new sequences in many cases. With this trend, we expect to be able to
have approximate structures for all the available sequences in the near future.
However, approximate structures are not useful enough in many practical purposes, such as understanding the molecular mechanism of the protein, or designing
an amino-acid sequence compatible with a given target structure. Homology modeling of proteins [7] and design of novel protein sequences [6] are often based on the
prediction of energetically favorable side-chain conformations. The space of side-chain
11
conformations can be approximated by a discrete space of probabilistically representative side-chain orientations (called rotamers) [34].
The discrete model of the protein energy function we use in this work is described
in terms of:
1. the self-energy of a backbone template (denoted as
ebackbone)
2. the interaction energy between the backbone and residue i in its rotamer conformation r in the absence of other free side-chains (denoted as ei,)
3. the interaction energy between residue i in the rotamer conformation r and
residue j in the rotamer conformation s, i = j (denoted as eir.j)
In the discrete conformation space, the total energy of a protein in a specific conformation C can be written as follows:
Sc = C-ackbone +
2
i
ei,
ej.
+
(1.1
i j>i
The problem is, then, to find a rotamer selection for the modeled residues that minimizes the energy function Sc, which is often called global minimum energy conformation (GMEC). In this work, we call the problem as the GMEC problem.
1.1.1
NP-hardness of the GMEC problem
The GMEC problem is a hard optimization problem. We can easily see that it is
equivalent to the maximum capacity representatives (MCR), a NP-hard optimization
problem. A Compendium of NP optimization problems by Crescenzi [5] describes the
MCR as follows:
Instance Disjoint sets Si, ... , Sm and, for any i
# j, x
C Si, and y C Si, a nonneg-
ative capacity c(x, y).
Solution A system of representatives T, i.e., a set T such that, for any i, JTnSi = 1.
Measure The capacity of the system of representatives, i.e., ExyET c(x, y).
12
To reduce the MCR to the GMEC problem, we regard each set Si as a residue
and its elements as the residue's rotamers. Then, we take the negative value of the
capacity between two elements in two different sets as the interactive energy between
corresponding rotamers and switch the maximization problem to a minimization problem. This is a GMEC problem with ej, equal 0 for all i and r. A rigorous proof of
NP-hardness of the GMEC problem can be found in [38].
As illustrated by the fact that the GMEC problem is an NP-hard optimization
problem, the general form of the GMEC problem is a challenging combinatorial problem that have many interesting aspects in both theory and practice.
1.1.2
Purpose and scope
Our purpose of the study is generally two-fold. First, we aim to find a better method
to solve the GMEC problem efficiently. Despite the theoretical concept of hardness,
we often find many instances of the GMEC problem are easily solved by exact methods
such as Dead-End Elimination (DEE) combined with A* (DEE/A*) [8, 26]. However,
DEE's elimination criteria are not always powerful in significantly reducing the problem's complexity. Though there have been modifications and improvements over the
original elimination conditions [12, 33, 11], the method is still not quite a general
solution to the problem, especially in the application of sequence design.
In this
work, we explore both analytical and approximate methods through computational
experiments. We compare their performance with one another to identify possible
alternatives to DEE/A*. There exists a comparative work by Voigt et al. [40], which
examines the performance of DEE with other well-known methods such as Monte
Carlo, genetic algorithms, and self-consistent mean-field that concludes DEE is the
the most feasible method. We intend to investigate new approaches that have not
been used or studied well, but have good standings as appropriate techniques for the
GMEC problem.
In the second place, we want to understand the working mechanism of the methods. There are methods that are not theoretically understood well, but show extraordinary performance in practice. On the other hand, some methods are effective only
13
for a specific type of instances. For example, the belief propagation and LP approach
are not generally good solutions to the GMEC problem with random artificial energy
terms, but they are very accurate for the GMEC instances with protein energy terms.
Our ultimate goal is to be able to explain why or when a specific method succeeds
and fails.
The scope of this work is mainly computational aspects of the side-chain placement problem using rotamer libraries. Therefore, we leave issues such as protein
energy models, and characteristics of rotamer libraries out of our work.
1.2
1.2.1
Related work
Integer linear programming (ILP)
The polyhedral approach is a popular technique for solving hard combinatorial optimization problems. The main idea behind the technique is to iteratively strengthen
the LP formulation by adding violated valid inequalities.
Althaus et al. [1] presented an ILP appraoch for side-chain demangling in rotamer
representation of the side chain conformation. Using an ILP formulation, they identified classes of facet-defining inequalities and devised a separation algorithm for a
subclass of inequalities. On average, the branch-and-cut algorithm was about five
times slower than their heuristic approach.
Eriksson et al. [9] also formulated the side chain positioning problem as an ILP
problem.
However, in their computational experiments, they found that the LP-
relaxation of every test instance has an integral solution and, therefore, the integer
programming (IP) techniques are not necessary. They conjecture that the GMEC
problem always has integral solutions in LP-relaxation.
1.2.2
Probabilistic methods
A seminal work using the self-consistent mean field theory was done by Koehl and
Delarue [21]. The method calculates the mean field energy as the sum of interaction
14
energy weighted by the conformational probabilities. The conformational probabilities are related to the mean field energy by the Boltzmann law. Iterative updates
of the probabilities and the mean field energy are performed until they converge. At
convergence, the rotamer with the highest probability from each residue is selected
as the conformation. The method is not exact, but has linear time complexity.
Yanover and Weiss [42] applied belief propagation (BP), generalized belief propagation (GBP), and mean field method to finding minimum energy side-chain configuration and compared the results with those from SCWRL, a protein-folding program.
Their energy function is approximate in that local interactions between neighboring
residues are considered, which results in incomplete structures of graphical models.
The energies found by each method are compared with those from one another, rather
than with optimal values from exact methods.
1.3
Our approaches
In this research, we use roughly three different approaches to solve the GMEC problem. In Chapter 2, we use integer programming formulation and the branch-and-price
algorithm to obtain exact solutions. There are known ILP formulations that can be
directly applied to the GMEC problem. We review these formulations and compare
their effectiveness using CPLEX optimizers. At the same time, we do a preliminary
work to apply the branch-and-price to the GMEC problem. We review the algorithm,
suggests possible decomposition schemes, and assess the feasibility of the decomposition schemes by a computational experiment.
As the second approach, we use nonlinear programming techniques in Chapter 3.
This work mainly relies on the continuation approach by Ng [31]. Ng's continuation
approach finds a suboptimal solution to a discrete optimization problem by solving a
sequence of related nonlinear programming problems. We implement the algorithm
and do a computational experiment to examine the performance of the method on
the GMEC problem.
In Chapter 4, we use the probabilistic inference methods to infer the GMEC using
15
the energy terms translated into probability distributions. We implement probabilistic relaxation labeling, the max-product belief propagation (BP) algorithm, and the
MIME double loop algorithm to test on side-chain placement examples as well as
some sequence design cases. The performance of the three methods are also compared with the ILP method and DEE/A*.
The preliminary results suggest that probabilistic inference methods, especially
the max-product BP algorithm is most effective and fast among all the tested methods. Though the max-product BP algorithm is an approximate method, its speed and
accuracy are comparable to those of DEE/A* in side-chain placement, and overall superior in sequence design. Probabilistic relaxation labeling shows slightly weaker performance than the max-product BP, but the method works well up to medium-sized
cases. On the other hand, the ILP approach, the nonlinear programming approach,
and the MIME double loop algorithm turns out to be not competitive. Though the
three methods have different appearances, they are all based on the mathematical formulation and optimization techniques. We find such traditional approaches require
good understanding of the methods and considerable experimental efforts and expertise. However, we also present the results from these methods to provide reference
for future research.
16
Chapter 2
Integer linear programming
approach
In this chapter, we describe different forms of integer linear programming (ILP) formulations of the GMEC problem. We first present a classical ILP formulation by
Erkisson et al., and two similar formulations adopted from related combinatorial
optimization problems.
Based on these classical ILP formulations, we review the
mathematical notion of the branch-and-price algorithm and consider three different
decomposition schemes for the column generation. Finally, some preliminary results
from implementations of the branch-and-price algorithm are presented.
2.1
2.1.1
ILP formulations
The first classical formulation by Eriksson, Zhou, and
Elofsson
In the ILP formulation of the GMEC problem by Eriksson et al. [9], the self-energy
of each rotamer is evenly distributed to every interaction energy involving the rotamer. A residue's chosen rotamer interacts with every other residue's chosen rotamer. Therefore, the self-energies can be completely incorporated into the interaction energies without affecting the total energy by modifying the interaction energies
17
as follows:
(2.1)
+ ej,
= ei,
e'
n- I
where n is the number of residues in the protein. Then, the total energy of a given
conformation C can be written as
Sc
=
(2.2)
e'g.
Z
i j>i
Since the total energy now only depends on the interaction energy, Ec can be
expressed as a set of value assignments on binary variables that decide whether an
interaction between two residues in a certain conformation exists or not. We let xijs,
be a binary variable, where its value is 1 if residue i is in the rotamer conformation r,
and residue j is in the rotamer conformation s, and 0 otherwise. We also let Ri denote
the set of all possible rotamers of residue i. Then, the total energy in conformation
C is given by
c=
SSSS
(2.3)
e' xijs.
rERj j>i sERj
i
On the other hand, there should exist exactly one interaction between any pair of
residues. Therefore, we have the following constraints on the value of xigj,:
E E xijs =
(2.4)
for all i and j, i < j.
rERj sERj
Under the constraint of (2.4), more than one rotamer can be chosen for a residue. To
make the choice of rotamers consistent throughout all residues, we need the following
constraints on xir i:
Xhq
qERh
.
pERg
Xikt
xjr
Xgpr =
=
SERj
(2.5)
tERk
(2.6)
for all g, h, i, j, k such that g, h < i < j, k, and for all r E Ri.
18
Finally, by adding the integer constraints
X'js
E {0, 1},
(2.7)
we have an integer programming that minimizes the cost function (2.3) under the
constraints (2.4) - (2.7). We denote (2.3) - (2.7) by Fl.
Eriksson et al. devised this formulation to be used within the framework of integer
programming algorithms, but they found that the LP relaxation of this formulation
always has an integral solution and hypothesized that every instance of the GMEC
problem can be solved by linear programming. However, in our experiments, we found
some instances have fractional solution to the LP relaxation of F1, which is not surprising since the GMEC problem is an NP-hard optimization problem. Nonetheless,
except two cases, all solved LP relaxations of F1 had integral solutions.
2.1.2
The second classical formulation
Here, we present the second classical ILP formulation of the GMEC problem. This
is a minor modification of the formulation for the maximum edge-weighted clique
problem (MEWCP) by Hunting et al. [17]. The goal of the MEWCP is to find a
clique with the maximum sum of edge weights.
If we take a graph theoretic approach to the GMEC problem, we can model each
rotamer r for residue i as a node ir, and the interaction between two rotamers r and
s for residues i and j, respectively, as an edge (ir, js). Then, the GMEC problem
reduces to finding the minimum edge-weighted maximum clique of the graph.
We introduce binary variables xi, for node i, for all i and r C Ri. We also adopt
binary variables yijj for each edge (ir,Js) for all i and j, and for all r E Ri and
S E R. Variable
xi,
takes value 1 if node ir is included in the selected clique, and 0
otherwise. Variable yij, takes value 1 if edge (ir, js) is in the clique, and 0 otherwise.
We define V and E as follows:
V {irV r
19
RiV i},
(2.8)
E = {(zr, js)I V ZrJ
E
V}.
(2.9)
Then, the adapted ILP formulation of the MEWCP is given by
min E eirxi, +
irEV
yir, -
E
Wir
< 0, V(ir, js) E E,
Yirs - Xis < 0,
Xir + Xj - 2Yijrij
Li,
eirjsyirjs
(2.10)
(ir,j.)EE
E {0, 1},
(2.11)
V(ir,js) E E,
(2.12)
0, V(ir, js) C E,
(2.13)
Vi, E V,
Yirjs > 0, V(ir, js) E E.
(2.14)
(2.15)
To complete the formulation of the GMEC problem, we add the following set of
constraints, which implies that exactly one rotamer should be chosen for each residue:
E Xj, = 1.
r ER
(2.16)
We denote (2.10) - (2.16) by F2. When the CPLEX MIP solver was used to solve
a given GMEC instance in both F1 and F2, F1 was faster than F2. This is mainly
because F2 heavily depends on the integrality of variables xi,. On the other hand,
F2 has an advantage when used in the branch-and-cut framework since polyhedral
results and Lagrangean relaxation techniques are available for the MEWCP that can
be readily applied to F2. [17, 28]
2.1.3
The third classical formulation
Koster et al. [25] presented another formulation that captures characteristics of two
previous formulations. Koster et al. studied solution to the frequency assignment
problem (FAP) via tree-decomposition. There are many variants of the FAP, and, interestingly, the minimum interference frequency assignment problem (MI-FAP) stud-
20
ied by Koster et al. has exactly the same problem setting as the GMEC problem,
except that the node weights and edge weights of the MI-FAP are positive integers.
The formulation suggested by Koster et al. uses node variables and combines the
two styles of constraints from the previous formulations. If we transform the problem
setting of the FAP into that of the GMEC problem, we obtain the following ILP
formulation for the GMEC problem:
min E
3
eixi, +
iEV
eijsyij,
(2.17)
(irjs)EE
Z
Xi, =
1,
(2.18)
r ER.
Syij,= xi,,
Vj # i, Vs E Ri, Vi, Vr E Ri,
(2.19)
E {0, 1}, Vzr E V,
(2.20)
sERj
Xi,
YirjS > 0, V(ir, js) E E.
(2.21)
We denote (2.17) - (2.21) by F3. In F3, (2.18) restricts the number of selected
rotamers for each residue to one as it does in F2. On the other hand, (2.19) enforces
that the selection of interactions should be consistent with the selection of rotamers.
Koster et al. studied the polytope represented by the formulation, and developed
facet defining inequalities [22, 23, 24].
2.1.4
Computational experience
We performed an experimental comparison of the classical ILP formulations F1, F2,
and F3. The formulations were implemented in C code using CPLEX Callable Library. The test cases of lbpi, lamm, larb, and 256b were generated using a common
rotamer library (called by LIB1 throughout the work), but a denser library (LIB2)
was used to generate test cases of 2end. The choice of modeled proteins followed
the work by Voigt et al. [40]. We made several cases with different sequence lengths
using the same protein to control the complexity of test cases. The energy terms were
21
Table 2.1: Comparison of three classical formulations (protein: PDB code, #res:
number of modeled residues, LP solution: I - integral, F - fractional, TLP: time taken
for CPLEX LP Optimizer, TIp: time taken for CPLEX MIP Optimizer, symbol -:
skipped, symbol * : failed).
LP solution
protein
lbpi
1aMM
larb
256b
2end
#res
10
20
25
46
10
20
70
80
10
20
30
78
30
40
50
60
70
15
25
F1
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
F
I
I
F2
F
F
F
F
F
F
F
F
I
F
F
F
-
F3
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
F
I
I
TLP
F1
0
0
0
29
0
1
42
97
0
0
0
24
3
10
26
60
116
31
281
(sec)
F2
1
8
134
6487
0
96
33989
*
0
0
15
10384
-
F3
0
1
0
14
0
0
29
69
0
0
0
21
2
4
12
36
98
35
214
Tip (sec)
F3
F1
0
0
0
0
3
1
37
16
0
0
1
0
45
48
99 102
0
0
0
0
1
0
27
25
6
3
14
6
15
36
87
37
156 112
34
41
343 240
calculated by the CHARM22 script. All experiment was done on a Debian workstation with a 2.20 GHz Intel Xeon processor and 1 GBytes of memory. We used both
CPLEX LP Optimizer and Mixed Integer Optimizer to compare the solution time
between the formulations. The results are listed in Table 2.1.
The result from LP Optimizer tells that F2 is far less effective in terms of both
the lower bounds it provides and the time needed for the LP Optimizer. F2 obtained
only fractional solutions whereas F1 and F3 solved most of the cases optimally. F1
and F3 show similar performance though F3 is slightly more effective than F1 in
LP-relaxation. Since F2 showed inferior performance in LP-relaxation, we measured
Mixed Integer Optimizer's running time only with F1 and F3. As shown in the last
two columns of Table 2.1, Mixed Integer Optimizer mostly took more time for F1
than F3, but no more than a constant factor as was the case with LP Optimizer.
CPLEX optimizers were able to solve the small- to medium-sized cases, but using the optimizers only with a fixed formulation turns out to be not very efficient.
22
On the other hand, DEE/A* is mostly faster than the CPLEX optimizers using the
classical formulations, which makes the ILP approach entirely relying on general optimizers look somewhat futile. We also observe cases that both F1 and F3 obtain
fractional solutions in LP-relaxation while solving large IP's usually takes significantly more time than solving corresponding LP-relaxations. This suggests that we
should explore the possible development of a problem specific IP solver that exploits
the special structure of the GMEC problem. As an effort toward this direction, in
Section 2.2, we investigate the possible use of the branch-and-price algorithm based
on F1 formulation.
2.2
Branch-and-price
The branch-and-price (also known as column generation IP) is a branch-and bound
algorithm that performs the bounding operation using LP-relaxation [3].
Particu-
larly, the LP relaxation at each node of the branch-and-bound tree is solved using the
delayed column generation technique. It is usually based on an IP formulation that
introduces a huge number of variables but has a tighter convex hull in LP relaxation
than a simpler formulation with fewer variables. The key idea of the method is splitting the given integer programming problem into the master problem (MP) and the
subproblem (SP) by Dantzig-Wolfe decomposition and then exploiting the problem
specific structure to solve the subproblem.
2.2.1
Formulation
In this section, we first review the general notion of the branch-and-price formulation
developed by Barnhart et al. [3] and devise appropriate Dantzig-Wolfe decomposition
of F1.
The previous classical ILP formulations can be all captured in the following general
form of ILP:
min c'x
23
(2.22)
Ax < b,2
(2.23)
E S, x E {O, 1}",
(2.24)
where c E Rn is a constant vector, and A is an m x n real-valued matrix. The basic
idea of the Dantzig-Wolfe decomposition involves splitting the constraints into two
separate sets of constraints - (2.23) and (2.24), and representing the set
S* = {XE S I XE {O,1}"}
(2.25)
by its extreme points. S* is represented by a finite set of vectors. In particular, if S is
bounded, S* is a finite set of points such that S* = {yi,...
,
yp }, where yi E R?,
i =
,. . . , P .
Now, if we are given S* = {yi, ... , yP}, any point y E S* can be represented as
Z
y=
Ak,
(2.26)
subject to the convexity constraint
S:
A k =1
(2.27)
{O, 1}, k = 1, ... Ip.
(2.28)
1<k~p
Ak
Let ck = c'yk, and ak = Ayk. Then, we obtain the general form of branch-and-price
formulation for the ILP given by (2.22) - (2.24) as follows:
min E
ck Ak
(2.29)
1<k~p
ak Ak < b,
(2.30)
1<k<p
E
Ak
1,
(2.31)
1<k=
Ak EE {0, 1},
k = 1, ... IpA
24
(2.32)
Figure 2-1: Problem setting
The fundamental difference between the classical ILP formulation (2.22) - (2.24)
and the branch- and-price formulation (2.29) - (2.32) is that S* is replaced by a finite
set of points.
Moreover, any fractional solution to the LP relaxation of (2.22) -
(2.24) is a feasible solution of the LP relaxation of (2.29) - (2.32) if and only if the
solution can be represented by a convex combination of extreme points of conv(S*).
Therefore, it is easily inferred that the LP relaxation of (2.29) - (2.32) is at least as
tight as that of (2.22) - (2.24) and more effective in branch- and-bound.
Now we recognize that success of the branch-and-price algorithm will, to a great
extent, depend on the choice of base formulation we use for (2.22) - (2.24) and how we
decompose it. In this work, we use F1 as the base ILP formulation for it usually has
the smallest number of variables and constraints among F1, F2, and F3, and also has
good lower bounds. Designing the decomposition scheme is equivalent to defining S*.
Naturally, the constraints corresponding to (2.23) will be the complement constraints
of S* in the (2.4) - (2.7).
We consider three different definitions of S* below.
Set of edge sets
Figure 2-1 is a graphical illustration of the problem setting, and Figure 2-2 shows
the feasible solution when all constraints are used. In fact, any maximum clique of
the graph is a feasible solution. In this decomposition scheme, we consider only the
25
IIf
--
--- %
'
-
-
-
-
Figure 2-2: A feasible solution to the GMEC problem
Figure 2-3: A set of minimum weight edges
constraint that one interaction is chosen between every pair of residues. Formally, we
have
S* = {Q I Q is a set of (ir,js), Vr E R, Vs E R1 , Vi, Vj}.
(2.33)
We denote this definition of S* by S1.
The subproblem is finding out the smallest weight edge between each pair of
residues. A solution to the subproblem will generally look like Figure 2-3. The size
of S1 is exponential to the original solution space, but each subproblem can be solved
within O(n) time.
26
residue 1
2
residue 5
4
4
3
2
2
03
10
residue
04
3
01
2
2
residue 3
residue 43
Figure 2-4: Path starting from rotamer 2 of residue 1 and arriving at the same rotamer
Set of paths
We can define S* to be the set of paths that starts from node i, and arrives at the
same node by a sequence of edges connecting each pair of residues as illustrated by
Figure 2-4. A column generated this way is a feasible solution to the GMEC problem
only if the number of visited rotamers at each residue is one. We denote S* defined
this way as S2.
The size of S2 is approximately the square root of the size of S1 . In this case, the
subproblem becomes the shortest path problem if we replicate a residue together with
its rotamers and add them to the path every time the residue is visited. Figure 25 shows how we can view the graph to apply the shortest path algorithm.
The
subproblem is less simple than the case S* = S1, but we can efficiently generate
multiple columns at a time if we use the k-shortest path algorithm.
The inherent hardness of the combinatorial optimization problem is independent
of the base ILP formulation or the decomposition scheme used in the branch-andprice algorithm. Therefore, the decomposition can be regarded as the design of the
balance point between the size of S* and the hardness of the subproblem.
The size of S2 is huge because more than one rotamer can be chosen from every
27
res 1
res 3
res 2
res 4
res 5
res 1'
res 3'
res 5'
res 2'
res 4'
res1"
Figure 2-5: An elongated graph to calculate shortest paths
residue on the path but the first. To make the column generation more efficient, we
can try fixing several residues' rotamers on the path as we do for the first residue.
Set of cliques
Based on the idea of partitioning a feasible solution, or a maximum clique into small
cliques, the third decomposition we consider is to define S3 as a set of cliques. Figure 26 illustrates this method. A maximum clique consisting of five nodes is fragmented
into two three-node cliques composed of nodes 12, 22, 33 and nodes 12, 33, 53 respectively. To complete the maximum clique, four edges represented by dashed lines are
added (an edge can be also regarded as a clique of two nodes.)
The branch-and-price formulation for this decomposition differs from (2.29) (2.32) in that (2.31) and (2.32) are not necessary. All the assembly of small cliques
are done by (2.30) and it is possible we obtain an integral solution when (2.29) (2.32) has a fractional optimal solution from LP-relaxation. Therefore, the branching
is also determined by examining the integrality of original variables, x of (2.22) (2.24).
The size of S3 is obviously smaller than those of Si and S2. If we let n be the
number of residues, and m be the maximum number of rotamers for one residue, the
0(
~)
i Mn(n-1)
size of S2 is O(m 2 ). In comparison, the size of is S 3 O((m + 1)").
The subproblem turns out to be the minimum edge-weighted clique problem
(MEWCP), which is an NP-hard optimization problem. Macambira and de Souza [28]
have investigated the MEWCP with polyhedral techniques, and there is also an Lagrangian relaxation approach to the problem by Hunting et al. [17]. However, it is
more efficient to solve the subproblem by heuristics such as probabilistic relaxation
28
residue 1
12
residue 5
residue 2
4
3
\
2
30
2 40
residue 4
0
103
residue 3
Figure 2-6: Fragmentation of the maximum clique
labeling [32, 15, 16] that will be examined in Chapter 4 and use the analytical techniques only if heuristics fail to find any new column.
On the other hand, there exists a pitfall in application of this decomposition; by
generating edges as columns, we may end up with the samebasis for the LP relaxation of the base formulation. To obtain a better lower bound, we need to find a
basis that better approximates the convex hull of integral feasible points than that of
the LP-relaxed base formulation. However, by generating edges as columns necessary
to connect the small cliques, the column generation will possibly converge only after
exploring the edge columns. One easily conceivable remedy is generating only cliques
with at least three nodes, but it is a question whether this is a better decomposition
scheme in practice than, say, S* = S2.
The Held-Karp heuristic and the branch-and-price algorithm
Since we do not have separable structures in the GMEC problem, it seems more
appropriate to first relax the constraints to generate columns and then to sieve them
with the rest of the constraints by LP. Defining S* to be S2 is closer to this scheme
than defining it to be S3.
We show that this scheme is, in fact, compatible with
29
the idea behind the Held-Karp heuristic, and suggest that we should take a similar
approach to develop an efficient branch-and-price algorithm for the GMEC problem.
Let G be a graph with node set V and edge set E, and aij be the weight on the
edge (i, j). Then, the traveling salesman problem (TSP) is often represented as
Y
min E
xjzj = 1,
aijxij
(2.34)
Vi E V,
(2.35)
V.
(2.36)
VjEVlj:Ai
Z
xjj = 1, Vj
ViGV,i=Aj
(2.37)
(ij x+ ji) ;> 2, VS C V, S # V,
ij= {0, 1}, V(i, j) E E.
(2.38)
(2.35) and (2.36) are conservation of flow constraints. (2.37) is a subtour elimination
constraint [4]. In Held-Karp heuristic, (2.37) and (2.38) are replaced by
(V, {(i, j) I xij
=
1}) is a 1-tree.
(2.39)
If we assign the Lagrangian multipliers u and v, the Lagrangian function is
L(x, u, v)
=
E
(ai + ui+ vj)xij - E ui - E j.
iEV
i,jeV, i7j
(2.40)
jeV
Finally, if we let
S = {x I xij E {0, 1} such that (V, {(i, j)
ij = 1}) is a 1-tree},
(2.41)
the value produced by the Held-Karp heuristic is essentially the optimal dual value
given by
q* = sup inf L(x, u,v).
xE S*
30
(2.42)
When the cost and the inequality constraints are linear, the lower bounds obtained
by Lagrangian relaxation and integer constraints relaxation are equal
yi ... yP},
if S* is a finite set of points, say S* =
y
E RIE, J
=
[4].
Therefore,
1,... ,p, the
optimal value of the following LP is also q*:
aij E
AkYk
(2.43)
Vi C V,
(2.44)
Akyk= 1, Vj E V,
(2.45)
mn
I
ViEV VjEVjfi
kip
= 1,
Aky
VjEV,jhi 1<kip
ViEV,ifj 1<k<p
Yk E S*
O < Ak < 1
k
1, ... ,p.
=
(2.46)
Note that (2.43) - (2.46) make a LP relaxation of the branch-and-price formulation
of the original ILP, or simply a Dantzig-Wolfe decomposition. Thus, we confirm that
the Held-Karp heuristic is essentially equivalent to the branch-and-price algorithm
applied to the TSP.
2.2.2
Branching and the subproblem
For all decomposition schemes, we use branching on original variables. In other words,
when the column generation converges for node U of the branch-and-bound tree and
it is found that the LP-relaxation of the restricted master problem (RMP) at U has
a fractional solution, we branch on a variable xi of (2.22) - (2.24) rather than on a
variable Ak of (2.29) - (2.32).
Formally, if we have the RMP for node U with the
form of (2.29) - (2.32), S is branched on xt to two nodes Uo and U1 , where the RMP
at U is given by
min
ckAk
(2.47)
k:1 k~p, yk=i
a Aak
b,
(2.48)
k:15ksp, yk~
Ak = 1,
k:1<k<p, yk=i
31
(2.49)
Ak E {0, 1}, k = 1,. .. , p.
(2.50)
In our implementation, the branching variable xt is an edge variable for we use (2.3)
- (2.7) as the base formulation. The branch variable xt is determined by calculating
the value of x from the fractional solution A and taking the one whose value is closest
to 1. The tie breaking can be done by the order of the indices.
On the other hand, if we let an m dimensional vector p be the dual variable for
(2.23) and a scalar q be the dual variable for (2.24), the pricing subproblem for node
Uj is given by
min (c - p'A)'y - q
(2.51)
y E S*, yt = i.
(2.52)
Regarding p and q of (2.51) as constants, the reduced cost vector d = (c -- p'A) represents edge weights for the graph of the GMEC problem. Therefore, the subproblem
becomes finding an element of S* with yt = i that has the minimum sum of edge
weights when calculated with d.
2.2.3
Implementation
To empirically find out how efficient the the decomposition schemes of Section 2.2.1
are, we implemented the branch-and-price algorithm for the GMEC problem using
SYMPHONY [35]. SYMPHONY is a generic framework for branch, cut, and price
that does branching control, subproblem tree management, LP solver interfacing, cut
pool management, etc. The framework, however, does not have the complete functionalities necessary for the branch-and-price algorithm. Thus, we had to augment
the framework to allow the column pool control by a non-enumerative scheme, and
branching on the original variables. This required more than trivial change especially
in the branching control and the column generation control. As a result, the functionalities were roughly implemented, but we could not give enough consideration to
make the implementation efficient in memory- or time-wise. We tested the implementation only with small cases to have an idea whether the column generation is viable
32
option for the GMEC problem.
The branch-and-price requires a number of feasible solution to the base formulation to use them as LP bases for the initial RMP as well as to set the initial upper
bound. We obtained a set of feasible solutions using an implementation of probabilistic relaxation labeling (RL). For detailed description and theory of probabilistic
relaxation labeling, see [32, 15, 16]. Chapter 4 also briefly describes the method and
the implementation. We started with a random label distribution and did 200 iterations of RL to find maximum 20 feasible solutions. This scheme was successful in
the small cases we tested the branch-and-price implementations on, in that it gave
the optimal solution as the initial upper bound and the rest of the branch-and-price
efforts were spent on confirming the optimality. Using RL in obtaining initial feasible
solutions may have reduced the total number of LP-relaxations solved or the CPU
time to some extent, but the general trend of the performance as the complexity grows
will not be affected much. In fact, most implementations of either the branch-and-cut
or branch-and-price algorithm opts to use the most effective heuristic for the problem
to find initial feasible solutions and an upper bound.
Another issue in the implementation is the use of an artificial variable to prevent
the infeasibility of the child RMP in branching. When the parent RMP reaches dual
feasibility with fractional optimal solution and decide to branch, its direct children
can go infeasible because a variable of the parent RMP will be fixed to 0 or 1. Rather
than heuristically finding new feasible solutions that satisfy the fixed conditions of
the branching variables, we can add an artificial variable with a huge coefficient to
make the child RMP always feasible.
We add some implementation details specific to each decomposition scheme below.
Set of edge sets
We implemented only the basic idea of S* = Si so that, after solving LP-relaxation
of the RMP, the program collects an edge between each pair of residues that has the
minimum reduced-cost among them. To do more efficiently than this, we may also
sort the edges between each pair of residues and take k of them to generate knC2
33
candidate columns at each column generation when n is the total number of residues.
We denote the implementation by BRP1.
Set of paths
For the implementation of S* = S2 , we had to assume that the number of residues
is a prime number. This is because we can make a tour of a complete graph's edges
only if the complete graph has a prime number of nodes. For more general cases
that do not have a prime number of nodes, we can augment the graph by adding
a proper number of nodes. In our experiment, we only used test cases that have a
prime number of residues.
To solve the subproblem when S* = S2 , we used the Recursive Enumeration
Algorithm (REA), an algorithm for the k shortest paths problem [18]. The residue
that has the minimum number of rotamers was chosen to be the starting residue of
the paths. For each rotamer of the starting residue, we calculated 20 shortest paths
and priced the results to add paths with negative reduced cost as new columns.
We denote the implementation by BRP2
Set of cliques
To avoid obtaining the same lower bound as from the base formulation, we restrict
the column generation to cliques with four nodes. Therefore, S* is given by,
S*
=
{QI
Q is a clique consisting of r, js, kt, 1, such that
rcR,scRj,t
R,uERj, andzhj fk
# l}
(2.53)
We took every possible quadruple of residues and used RL on them to solve the
MEWCP approximately. For example, if we have total 11 residues in the graph,
11
C4
different subgraphs or instances of the MEWCP can be made from it. By setting the
initial label distribution randomly, we ran four times of RL on each subgraph and
priced out resulting cliques to generate new columns. The test cases were restricted
to those that have more than four nodes.
34
We denote the implementation by BRP3.
2.2.4
Computational results and discussions
We tested the implementation of three decomposition schemes with small cases of
side-chain placement. We wanted to see the difference in effectiveness of three decomposition schemes by examining the lower bound they obtain, the number of LPrelaxations, the number of generated columns, and the running time. We are also
interested in the number of branching, and the latency between the points the optimal solution value is attained and the column generation actually converges. The test
cases were generated from small segments of six different proteins. LIB1 was used to
generate energy files with lbpi, lamm, larb, and 256b. Test cases of 2end and lilq
were generated using a denser library LIB2. All program codes were written in C and
compiled by GNU compiler. The experiment was performed on a Debian workstation
with a 2.20 GHz Intel Xeon processor and 1 GBytes of memory.
The results for BRP1 are summarized in Table 2.2. For reference, we included in
the table the total number of feasible columns, JS*j when it is possible to calculate
so that the number of explored columns can be compared to it. For most cases, however, we had overflow during the calculation of JS*J. The number of columns in the
table includes one artificial variable for feasibility guarantee. We stopped running the
branch-and-price programs if they do not find a solution within two hours.
All test cases that were solved were fathomed in the root node of the branchand-bound tree. This is not unexpected since the base formulation also provides the
optimal integral solution when its LP-relaxation is solved. Unfortunately, we could
not find any small protein example that has a fractional solution for the LP-relaxed
base formulation. Since our implementation of the decomposition schemes were too
slow to solve either of the two cases that have fractional solutions, we were not able
to compare the effectiveness of the base formulation and the branch-and-price formulations for protein examples.
In all solved test cases but one, the optimal values were found from the first LPrelaxation. This is because RL used as a heuristic to find the initial feasible solutions
35
actually finds the optimal solution and they are used as LP bases of the RMP. However, since RL uses a random initial label distribution, this is not always the case as
we can confirm in the third case of lamm.
Another point to note with the early discovery of the optimal value is the subsequent number of column generations until the column generation does converge. To
confirm the optimality by reaching dual feasibility is a different matter from finding the optimal value. In fact, one of the well-known issues in column generation is
the tailing-off effect, which refers to the tendency that the improvement of the ob-
jective
value slows down as the the column generation progresses, often resulting in
as many iterations to prove optimality as needed to come across the optimal value.
There are techniques to deal with the tailing-off effect such as early termination with
the guarantee of the LP-relaxation lower bound [39], but we did not go into them.
Table 2.3 and Table 2.4 list the results from BRP2 and BRP3, respectively. Comparing the results from BRP2 with those from BRP1, BRP2 obviously performs better
than BRP1. This seems to be mainly due to the efficient column generation using k
shortest paths algorithm and the smaller size of S*. It is interesting that BRP1 manages to work comparably with BRP2 for small cases considering |S
1
is huge even for
the small cases. Note that some cases that could not be solved within two hours by
BRP1 were solved by BRP2. Looking at the results from BRP2 and BRP3, BRP2 shows
slightly better performance in CPU time and the number of generated columns for
the cases solved by both BRP2 and BRP3, but BRP3 was able to find optimal solutions
for some cases BRP2 failed to do so within two hours. We suspect the size of S* is the
main factor in determining the rate of convergence rather than the column generation
method.
Since the base formulation has the smallest S*, its convergence will be faster than
any other branch-and-price formulation, yet the purpose of adopting the branch-andprice algorithm is obtaining integral solutions more efficiently by using a tighter LPrelaxation. Unfortunately, we could not validate the concept by testing the branchand-price implementations on protein energy examples. Instead, we performed a few
additional tests of BRP2 and BRP3 with artificial energy examples whose number of
36
Table 2.2: Test results for BRP1 (IS*I: total number of feasible columns, #nodes:
number of explored branch-and-bound nodes, #LP: number of solved LPs until convergence, #cols: number of added columns until convergence, (frac): #cols to IS*I
ratio, #LPOP: number of solved LPs until reaching the optimal value, #colsop: number of added columns until reaching the optimal value, TLP: time taken for solving
LPs in seconds, T,b: time taken for solving subproblems in seconds, symbol - : IS*I
calculation overflow, symbol * : stopped while running).
protein
lamm
lbpi
256B
larb
2end
lilq
#res
3
5
7
11
[
I #nodes I #LP
S*I
2.6x10
-
4
1
1
1
1
2
1
1403
228
13
-
*
*
3
5
7
11
5.7x10 5
-
1
1
1
1
1
1
20
220
13
-
*
*
3
5
576
1.7x10 6
1
4
*
1
27
1
#cols (frac)
5 (0.02%)
7(0.00%)
1423 (0.00%)
240 (0.00%)
#LPOp
1
1
1381
1
6
4
25
225
I #colsop
TLP
[
Tsub
4
7
1401
13
0
0
0
1
0
0
0
1
*
*
*
*
*
(0.00%)
(0.00%)
(0.00%)
(0.00%)
1
1
1
1
6
4
6
6
0
0
0
1
0
0
0
2
*
1
20
*
*
1
1
*
1
1
1
6
7
*
5
12
15
0
0
*
0
0
0
0
0
*
0
0
0
*
*
*
*
1
4
0
0
7
-
3
5
7
64
-
1
1
*
1
1
1
11
-
*
*
3
-
1
1
5
-
*
*
*
1
11
*
*
7
-
1
1603
1608(0.00%)
1
6
33
2
11
-
*
*
*
1
13
*
*
6 (1.04%)
10 (0.00%)
*
5 (7.81%)
38 (0.00%)
15 (0.00%)
*
4 (0.00%)
37
protein
lamm
ibpi
256B
larb
2end
liq
#res
3
5
7
11
13
3
5
7
f
Table 2.3: Test results for BRP2.
#nodes #LP
#cols (frac)
#LPOP
S*
162
7.9x104
1
1
1
1
1
1
23
15
*
*
1
1
1
1
1
2
9 (1.19%)
8 (0.00%)
28 (0.00%)
1
*
6
*
121 (0.00%)
*
1
432
1
1
1
2
9
684
8
5.0x10 5
*
*
1
1
1
2
6
-
756
1.8 x10 9
-
11
-
13
3
5
7
11
13
3
5
24
1
-
8
16
50
281
(4.94%)
(0.01%)
(0.00%)
(0.00%)
#cols,
1
1
1
1
1
1
1
1
8
16
3
21
21
9
8
21
1
21
5 (62.5%)
26 (0.00%)
1
1
1
8
1
1
1
1
21
14
16
381
21
21
5
21
*
14
17
383
11719
(58.3%)
(3.94%)
(0.00%)
(0.00%)
*
TLp
T,,s
0
0
0
0
0
0
1
2
*
*
0
0
0
0
0
0
1
*
2
*
0
0
0
577
0
0
0
149
*
*
0
0
0
0
0
*
1
*
6
21
0
0
0
2
1
21
*
*
1
21
3
3
1
21
*
*
#colso
12
3
7
11
1.4x10
-
1
*
1
*
21 (0.00%)
*
1
*
21
*
3
5
72400
-
1
1
1
9
6 (0.00%)
96 (0.00%)
1
1
7
-
*
*
*
7
11
-
1
187
3741 (0.00%)
-
*
*
*
Table 2.4: Test result for BRP3.
protein
lamm
lbpi
256B
IS*1
#res
5
7
23166
8.3 x 107
11
8.9 x 106
13
1.3 x 10
5
7
11
13
4.2
1.3
7.1
2.9
#nodes
1
1
1
#LP
3
12
#cols (frac)
19 (0.08%)
208 (0.00%)
#LPOP
1
1
11
1286 (0.01%)
1
5
7
22
17
20
85
1626
3715
(0.00%)
(0.07%)
(0.00%)
(0.01%)
1
1
1
1
TLP
Tsub
0
0
0
1
21
1
8
11
21
21
21
0
0
3
42
0
0
78
219
8
x 105
x 10 5
x 10 7
x 107
1
1
1
1
5
2034
1
4
24 (1.18%)
1
16
0
0
7
11
6.8 x 106
9.8 x 106
1
1
15
82
446 (0.00%)
6181 (0.06%)
9
6
383
2015
0
286
3
793
13
1.7 x 10
8
*
*
*
1
21
*
*
5
7
11
43908
7710
1.4 x 107
1
1
1
5
4
12
29 (0.07%)
83 (1.08%)
1342 (0.01%)
1
1
1
21
21
21
0
0
1
0
0
5
2.5 x 10
7
2end
13
5
1
1
90
10
9464 (0.04%)
42 (0.00%)
1
1
21
21
1262
0
2755
9
lilq
7
7
5.9 x 106
1
1
45
16
1388 (0.00%)
554 (0.01%)
1
1
21
21
10
0
328
4
11
-
1
8
16110 (0.00%)
1
21
44
543
larb
-
38
Table 2.5: Test results with random energy examples (#nodes: number of explored
branch-and-bound nodes, LB: lower bound from LP-relaxation, T: time taken to solve
the instance exactly, symbol * : stopped while running).
protein
lamm
256b
larb
0
0
236
2.308
5.492
5.492
0
1
12
5.681
18.525
26
*
5.436
15.587
2
102
1
3.665
0
3.604
0
11.109
9
11.071
0
0
0
4
*
*
*
1
*
*
*
*
3.664
*
7
11
5.681
20.326
5
3.665
11.109
2.362
5.515
6.409
2.362
5.492
6.409
#nodes
1
1
1
#nodes
1
3
1
7
TBRP3
TBRP2
optimal
2.362
5.515
6.409
*
11.098
*
F1 using CPLEX
Tjp
LB
BRP3
LB
BRP2
LB
#res
3
5
7
1
*
rotamer at each position is same as that of previous test cases but the energy terms
are replaced with random numbers between 0 and 1. We ran the programs on each
case for no more than 20 minutes and stopped unless they converge. The results are
summarized in Table 2.5.
We only listed the cases where the base formulation has fractional solutions in
LP-relaxation. All of BRP2, BRP3, and CPLEX optimizers took more time in solving
the random energy examples than protein energy examples. The results illustrates
the use of branch-and-price formulation to some extent. All cases of Table 2.5 have
weaker lower bounds with the base formulation than with BRP2 or BRP3. BRP3 mostly
found optimal solutions in the initial node whereas BRP2 had overall weaker lower
bounds than BRP3 or failed to attain convergence of the column generation. From
our preliminary experiment, BRP3 turns out to be more efficient than BRP2. We think
that we can improve the performance of BRP2 by changing the size of building cliques
or the residue combination rule.
2.3
Summary
In this chapter, we examined the application of integer programming techniques to
the GMEC problem. We reviewed the three known classical ILP formulations that
capture the structure of the GMEC problem. We compared the three formulations
by letting CPLEX optimizers to use each formulation on protein energy examples.
39
By the experiment, we found that Koster et al.'s formulation is more efficient than
the others, yet simply choosing one of the formulations to use with a general solver
will be more and more inefficient as the problem size grows, as shown in Table 4.1.
Motivated to find an efficient ILP method that can exploit the problem specific
structure, we investigated the use of branch-and-price algorithm, which is an exact
method for large-scale hard combinatorial problems. We reviewed the notion of the
method, and developed three decomposition schemes of Eriksson et. al's ILP formulation. We implemented the methods using SYMPHONY, a generic branch, cut, and
price framework and tested them on protein energy examples. The implementations
were able to handle small cases, and found optimal solutions at the root node of
the tree when the column generation converged, but the results could illustrate no
more than the convergence properties of different decomposition schemes since relaxed Eriksson et al.'s formulation also obtained integral solutions for the same cases.
To validate the use of the decomposition schemes, we also performed tests with
random energy examples, where the branch-and-price formulations often had tighter
lower bounds than the base formulation. Though we were not able to obtain practical
performance with our implementations, we believe that more thorough examination
of the problem will reveal a better way to apply the branch-and-price algorithm or
similar methods and contribute to understanding the combinatorial structure of the
GMEC problem.
40
Chapter 3
Nonlinear programming approach
In this chapter, we explore the nonlinear programming approach to the GMEC prob-
lem. The LP or ILP based method of Chapter 2 has the advantage that it can exploit
a fast and accurate LP solver. However, LP or ILP formulations often involve a large
number of variables and constraints. Considering that the necessary number of vari-
ables is roughly at least the square of the total number of rotamers, LP or ILP based
method can be practically of not much use when the problem size grows very large
without the support of enormous computing power. As a natural extension of this
concern, we turn our interest to the rich theory and techniques of nonlinear programming. We use a quadratic formulation of the GMEC problem that contains only as
many variables as the total number of rotamers and whose number of constraints is
equal to number of residues. Since the continuous version of the formulation is not
generally convex, we expect that obtaining the optimal solution will be hard, but
we aim to evaluate the nonlinear programming as an efficient candidate method to
compute sub-optimal solutions to the GMEC problem.
There have been several attempts to apply nonlinear programming approach to
discrete optimization problem [20, 41, 37]. However, some of them are only effective
for a special form of problems or should be used in conjunction with other heuristics
and combinatorial optimization frameworks.
In this work, we mainly rely on Ng's
framework for nonlinear nonconvex discrete optimization [31], which is simple and
purely based on nonlinear programming techniques.
41
The rest of this chapter is organized as follows. Section 3.1 reviews Ng's smoothing algorithm based on continuation approach. Section 3.2 presents tailored versions
of the preconditioned conjugate gradient method and the adaptive linesearch method.
Section 3.3 describes the application of the algorithm to the GMEC problem and discusses the computational results from it. Finally, Section 3.4 concludes the chapter.
3.1
Ng's continuation approach
In this section, we present and review Ng's work [31].
3.1.1
Continuation method
The continuation method solves a system of nonlinear equations by solving a sequence
of simpler equations that eventually converges to the original problem. Suppose we
need to solve a system of equation F(x) = 0, where x (E R" and F : R' -*
some G : R,
-
R. For
R, x E R", a new function H : Rn x [0, 1] -+ R can be defined by
H(x, A)
=
AF(x) + (1 - A)G(x).
(3.1)
If we can easily find a root for H(x, 0) = G(x) = 0 or x0 such that H(x0 , A0 ) = 0 for
some AO < 1, we can incrementally approximate the solution of the original equations
H(x, 1) = F(x) = 0 by starting from xO and solving H(x, A) = 0 as we increase A from
AO to 1. Solving H(x, A) = 0 is more advantageous than directly solving F(x) = 0
because iterative methods like Newton's method will behave well for H(x, Ak+1)
-
0
when the initial point is a solution of H(x, Ak) and Ak+l is sufficiently close to Ak.
3.1.2
Smoothing algorithm
There have been active studies on optimization of convex functions over a convex
region. Both the theory and the practice are well-established in this area. However,
minimization of a nonconvex function is still very difficult due to the existence of local
minima. The smoothing method modifies the original problem by adding a strictly
42
convex function to eliminate poor local minima while trying to preserve significant
ones. As in the continuation method, the smoothing method iteratively minimizes
the modified function as it incrementally reduces the proportion of the convex term.
Suppose, for a convex region X E Rn, we want to minimize f(x) : R -, R and
D(x) : R" --+ R is strictly convex over X. Then we define F(x, [t) by
f (x) + bA(x).
F(x, /t)
and H(x, A) =-
"),
(3.2)
we can easily observe that it is in the same
If we let A =
-
form of (3.1).
F(x, [t) is strictly convex over X if V 2 1(x) is sufficiently positive
definite for all x E X (For more rigorous description of the claim and the proof, see
[4].) Therefore, minimization of F(x, fIp) for a large positive value /P is relatively an
easy task. Once the solution x(uk) for F(x, yk) is obtained, the smoothing method
subsequently minimizes F(x, pk+1) for 0 < Ak+1 < pk from the starting point x([k).
One concern about the smoothing method is that it may generate a sequence of
x(ik)
that is close to minimizers of 1(x) regardless of their qualities as minimizers
of f (x). A remedy for this problem is using 1 (x) whose minimizers are close to none
of the minimizers of f(x). This can be achieved, when x is a discrete variable, by
using 1(x) that resembles a penalty function of the barrier method. Let us consider
a nonlinear binary optimization problem:
minimize
f(x)
(3.3)
subject to
Ax = b,
(3.4)
X E {0, 1}n
(3.5)
where x E Rn, A E R" x n, and b E Rm. By relaxing the discrete variable constraint,
we obtain
minimize
f(x)
(3.6)
subject to
Ax = b,
(3.7)
43
X G [0, 1]".
(3.8)
To solve the above problem by the smoothing method, we define 4(x) as
n
n
1(x)
= -
Znxj -
ln(1 - xj).
(3.9)
j=1
j=1
The function is well-defined for x E (0, 1)n. It is strictly convex and 4)(x) -+ oo as
xi 1 0 or xj T 1. Therefore, 1(x) also functions as a logarithmic barrier function
that eliminates the inequality constraints on x and allows the results from the barrier
method to be applied.
The transformed problem for the smoothing method is as follows:
minimize f(x)
subject to
-
[
I{ln
f xj + ln(1 - xj)}
Z
(3.10)
Ax = b,
(3.11)
x E (0, 1)n
(3.12)
The barrier method approximately finds the solution x(p) of the problem for a sequence of decreasing p until M is very close to 0 and the limit point of x(p) is the
global minimum of the original problem (3.6)-(3.8) [4].
Since the variables of the converged solution may not be close enough to 0 or 1
for binary rounding, we use a quadratic penalty function that guides the convergence
to binary values. The transformed problem after adding the extra penalty function
is as follows:
minimize
subject to
F(x)
=
f (x) - p
{lnx + ln(1 -xz)} +'yZEg
xy(1 -xl) (3.13)
Ax = b,
(3.14)
x E (0, 1)n
(3.15)
44
where J is the set of variable indices that need the enforcement of the quadratic
penalty term. This approach is justified by the fact that the problem
minimize
g(x)
(3.16)
subject to
Ax = b,
(3.17)
E {0, 1}
(3.18)
and the problem
minimize
g(x) + - En_ Xj (1
-
X)
(3.19)
Ax = b,
(3.20)
x E [0, 1]n
(3.21)
subject to
have the same minimizers for a sufficiently large -y [31].
3.1.3
Solving the transformed problem
We obtain a sequence of optimization problems in a continuous domain by transforming the original discrete nonlinear optimization problem. Nonetheless, the transformed optimization problem in the continuous domain is never less hard than the
original problem because there exist many local minimizers and saddle points. We use
the second-order method to assure the solution satisfies the second-order optimality
conditions.
Once we obtain a solution xo of (3.13)-(3.15) for the initial choice of penalty parameters /t and -y0 , we can work in the null space of A for the rest part of the algorithm.
Suppose Z is a matrix with columns that form a null space of A. Then, AZ = 0 and
the feasible region can be described as {x
Ix
= x0 + Zy, y E Rn--
such that x E
(0, 1)"}. Therefore, the problem (3.13)-(3.15) is reduced to optimization over a subset
of Rn-rn.
45
The quadratic approximation of F(x) around the current point xk is given by
Fk(x) = F(xk) + VF(xk)I(X
-
xk) + -1
2
- x
Setting the derivative of Fk(x) to zero, we obtain
xk+1
(xk
X
-'V
(3.22)
that satisfies the first-order
optimality condition.
VF(xk) + V 2 F(xk)(Xk+l - xk) = 0,
x k+1 should satisfy
xk+1 -
xk + Zy for some y E Rn-m.
(3.23)
Therefore, by substituting
y and premultiplying both sides by Z', we have a reduced-Hessian system to find a
descent direction:
Z'V 2 F(xk)Zy
=
-Z'VF(xk)
(3.24)
The Newton's method is iteratively applied to (3.13)-(3.15) until Z'VF(xk) is suffi-
ciently close to 0 and ZIV 2 F(xk)Z is positive semi-definite. The Newton's method
embedded in the smoothing algorithm is given by Table 3.1.
Since ZIV 2 F(xk)Z is usually a large dense matrix, (3.24) is solved by an it-
erative method such as the conjugate gradient (CG) method rather than by explicitly forming the inverse. Particularly, we use the preconditioned conjugate gradient
(PCG) method. CG is an algorithm to solve a positive semidefinite system. In case
Z'V 2 F(xk)Z is not positive definite, the algorithm should terminate. On the other
hand, to satisfy the second-order optimality condition, we need to find the negative
curvature direction when one exists. This can be done within the CG method by
finding a vector q such that q'Z'V 2 F(xk)Zq < 0. Since we want the eigenvalue cor-
responding to negative curvature direction to be as close to the smallest eigenvalue
as possible, we may opt to use the inverse iteration method to enhance q obtained by
the CG method.
46
Table 3.1: Logarithmic barrier smoothing algorithm.
Input:
F(x): transformed function for the smoothing algorithm,
Z: matrix with columns that constitute the null-space of A,
CF: tolerance for function evaluation,
CA: minimum allowed value of p,
M:
N:
Oft:
O7:
maximum allowed
maximum number
reduction ratio for
reduction ratio for
value of -y,
the Newton's method is applied,
p,
y,
Ap0: initial value of p,
-Y 0: initial value of -y,
XO: any feasible starting point.
Output:
x: a local minimum of the problem (3.6)-(3.8).
Smoothing iteration:
:= p 0 , y := f, and x := x0 .
while -y < M or p > c, do:
for k = 1 to N do:
Set
if IIZ'VF(x)| < CFA then stop.
else
Apply the CG method on Z'V 2 F(x)Zy = -Z'VF(x).
Do a linesearch:
Obtain a descent direction d and a step length a.
Set x := x + ad.
endif
endfor
Set p := 0,p, and y := O-y.
endwhile
47
3.2
Algorithm implementation
In this section, we describe implementation aspects of the smoothing algorithm. Since
we deal with large scale problems, efficient implementation of each step based on
numerical properties of the problems is important to obtain a reasonable performance.
We first describe a modified CG method to solve the reduced-Hessian system as well
as to find a negative curvature direction. The adaptive linesearch algorithm decides a
descent direction from the results of the CG method and calculates a step length. We
also described a modified penalty parameter control scheme that exploits the bundled
structure of the variables and early elimination of converged variables.
3.2.1
The conjugate gradient method
We use the CG method to solve reduced-Hessian systems. For indefinite systems, the
standard CG method may still converge if all its search direction are on a subspace
where the quadratic form is convex. If it ends up with a search direction that has
negative curvature, we use both the intermediate result and the negative curvature
direction to determine the descent direction and the step length. The modified preconditioned CG method in Table 3.2 solves Ax = b or outputs a negative curvature
direction when it finds one [31].
3.2.2
The adaptive linesearch algorithm.
The linesearch algorithm should decide the search direction from the Newton-type
direction and the negative curvature direction to calculate a suitable step length in the
search direction. The pursuit of Newton-type direction brings a fast convergence to
a point that satisfies the first-order optimality condition while the negative curvature
direction is necessary to escape from local non-convexity. More and Sorensen [29] and
Lucidi et al. [27] describes a curvilinear linesearch algorithm that uses a combination
of the two directions. However, the relative scaling of the two direction vectors can
be an issue, especially when the negative curvature direction is given too little weight
while this direction may give significant reduction in the objective value. In our
48
Table 3.2: The preconditioned CG method for possibly indefinite systems.
Input:
A, b,
M: preconditioner,
N: maximum number of iterations,
x0 : starting point,
er: residual error tolerance.
E,: negative curvature error tolerance.
Output:
a solution x of Ax = b with tolerance c, or a negative
curvature direction c
PCG iteration:
Set
xz zx0 ,
i :=0,
r := b - Ax 0 ,
d := M-'r,
6 new
:-
60 =
6
r'd,
new
while i < N and
E26O
6new >
Set q := Ad.
if d'q < ee||d|| 2 then
Set c := d and stop.
else
Set
q := Ad,
a
6new
d'q
x
= x + ad,
r = r - aq.
S :=M-r,
6
old
6 new
6 new
-
_
s1
d := s + @d,
i := i + 1.
endif
endwhile
49
do:
work, we used the adaptive linesearch algorithm by Gould et al. [13], where only
one of the two directions is used at a time. On the other hand, before we exploit the
negative curvature direction from the conjugate gradient method, we try enhancement
of the negative curvature by the inverse iteration method. Table 3.3 summarizes the
algorithm.
In step 3 and 4 of the algorithm, an upper bound of the possible step length amax
needs to be calculated because we have constraints 0 < xi < 1 for each i. The upper
bound is determined by ama = mini 13i, where
if pi > 0,
The parameters
ok
13, T,
Xi
k
= if Pi < 0,
oc
if pi = 0.
(3.25)
/yt, and a-k can control the behavior of the algorithm. We let
be the step length calculated from the last linesearch in the negative curvature
direction.
3.3
Computational results
In this section, we formulate the GMEC problem as a nonlinear discrete optimization problem and then solve it using the smoothing algorithm presented in previous
sections.
As we did in the second classical formulation of Section 2.1.2, we introduce binary
variables xi, for all residues i and all rotamers r E Ri. Variable xi, takes value 1 if
rotamer r is selected for the conformation of residue i, and 0 otherwise. Then the
problem can be written as
minimize
Ei ErER, E(ir)xi +
subject to
Z%<j
ErERi
ErERi Xir
xi,
=
ZsER, E(irjs)XiXjs
11
E {0, 1}, Vi, Vr E Ri.
50
(3.26)
(3.27)
(3.28)
Table 3.3: The adaptive linesearch algorithm
Input:
sk: a Newton-type direction
dk: a negative curvature direction
gk = VF(xk), Hk - V 2 F(xk), N, -k, 0 E (0, 1), and p E (0,
function F(x)
function AF(y) = gk'Y + 1y'H y
)
Output:
ak: step-length
1. Negative curvature enhancement:
if (Zs)'H k(Zs) < 0 then
if (Zs)'Hk(Zs) > -1 then and set 6 := -1.
else set 6 := (Zs)'Hk(Zs).
endif
Apply the inverse iteration on Z'HkZ with shift value 6
to obtain an eigenvalue A and the eigenvector v.
if A < (Zs)'Hk(Zs) then set s := v. endif
endif
2. Selection of the search direction:
if dk - 0, then set pk := Sk and goto 3.
dk
else dk - IjdkIV
endif
TAF(dk) then set pk := sk and goto 3.
if 9 k
else set pk := dk and goto 4.
endif
3. Linesearch in a gradient-related direction:
Find amax, and set ak := amax, ck := minr{0,pk'Hkpk},j := 1.
2ck)
while i < N and F(xk - akpk)> F(xk)
+
Set ak - ak and i := i+1.
endwhile
4. Linesearch in a negative curvature direction:
Find amax, and set ak : 0k, i:= 1.
if F(xk + akpk) < F(xk) + pLF(akpk) then
while i < N and a < cmax and F(xk + Lpk) <F(xk) + pF(
Set ak:= a, and i := i+1.
endwhile
else
while i < N and F(xk + akpk ) < F(xk) + puAF(akpk)
Set ak :
endwhile
Oak and i := i+ 1.
endif
51
fPk)
We define a symmetric matrix
Q=
(Qi,,j,) as
IE(ijrs)
Qjr,js
={ E(ir)
if i#j,
(3.29)
if i = j and r = s,
otherwise,
0
and an n x m matrix A = (Air), where m = Ei(# rotamers of residue i), as
(3.30)
Air = ith column of the m x m identity matrix Im.
If we also let i, be the p x 1 one vector, and x be the vector (xi,), then (3.26)-(3.28)
is simplified as
minimize
X'QX
(3.31)
subject to
Ax = 1,
(3.32)
E {0, 1}, Vi, Vr E Ri.
Xir
(3.33)
(3.31)-(3.33) is in the same form of (3.3)-(3.5). Therefore, we only need to calculate
the null-space matrix Z of A to apply the smoothing algorithm of Table 3.1. By the
simple structure of A, Z can be given by the form
Z = (Z1Z2 ...Zn),i
where n is the number of residues and Zi is a m x
(# rotamers
of residue i) matrix.
If we let mi be the number of rotamers of residue i, then Zi is given by
Zi = [0mi-1x& my
where
Opxq
-
1
mi-1
Imi-1
Om-lxZj..
(3.34)
m~],
is the m x n zero matrix, lP is the p x 1 one vector, and
4,
is the p x p
identity matrix. When mi is one, Z is not defined by the previous formula, but the
corresponding residue can be obviously excluded from the computation because it has
only one possible rotamer choice.
52
Energy clipping and preconditioning the reduced-Hessian
3.3.1
system
The numerical properties play a key role in determining the performance of the algorithm. Since the constraints are in the simple fixed form and the function is given
by a pure quadratic form, all the numerical properties come from the matrix
Q. We
perform a simple preprocessing on Q by clipping too large values in the matrix to
a smaller fixed value. By making a cumulative distribution of values in Q, we find
the number Qrnax that is larger than most of the values in Q, say 99% of them and
replace any value larger than that with Qmax. In this way, we can adaptively prevent
extreme ill-conditioning while not affecting the original nature of the problem too
much.
One of the segments of the algorithm whose performance is most strongly influenced by the numerical properties is the iterative solving routine of the reducedHessian system by the conjugate gradient method. Since the reduced-Hessian system
is dense and particularly large for our problem, the preconditioning can be an important factor in improving the performance of the conjugate gradient method. Nash and
Sofer [30] describe a set of techniques for positive definite matrices of the form Z'GZ
to construct approximation to (Z'GZ)-l and suggest that the simple approximation
of W'M
1
W can be effective for general use, where W' is a left inverse of Z and M
is a positive definite matrix such that M ~ G. However, Z'HZ of our problem is not
guaranteed to be positive definite and we are more interested in the quality of the
solution we obtain from the algorithm than the fine tuning of the performance. In
our application, we simply took the absolute values of the diagonal terms of Z'HZ
to make a diagonal matrix and used its inverse as the preconditioner.
3.3.2
Parameter control and variable elimination
We introduced minor modifications to the algorithm for it to work more efficiently
on the GMEC problem. A distinct condition with the GMEC problem that is mainly
different from the applications in Ng's thesis [31] is n < m, or mi is very large for some
53
i and not for others. By applying Ng's smoothing algorithm, we could find optimal
solutions or ones fairly close to them for the cases mi is small for all i. However, when
we applied it to a case where the difference between maxi mi and mini mi is large, the
solution we obtain was not good even though n is small. We suspect that a problem
arises in the monotonic decrease of a single barrier penalty parameter A for every
variable. Variables for residues with a small number of rotamers tend to converge at
the early stage of the algorithm while the ones with a large number of rotamers take
it longer to find its way to the convergence. Therefore, using a single [t seem to force
a premature convergence that is more likely to end up with a poor local minimum.
Therefore, we instead assign separate barrier penalty parameter pit for each set of
variables corresponding to the rotamers of residue i. In the same fashion, we also
assign a separate quadratic penalty parameter yi to force the binary rounding to a
different degree for each residue. Therefore, the transformed function is given by
: yi E xi,(1 - xi,)
F(x) = f (x) - E pi E {ln xi, + ln(1 - xi,)} +
i
i
rER,
(3.35)
rERj
We can devise various schemes to control the decrease of /ti or the increase of 'y.
In our experiment, we decreased pi only if the deviation of variables
or increases after an iteration of the smoothing algorithm.
xi,
stays same
We let SM1 and SM2
denote Ng's smoothing algorithm without and with the this parameter control scheme,
respectively.
However, by introducing a separate barrier penalty parameter for each residue and
letting the residue with a small number of rotamers to converge faster, we face another
numerical problem; as variables corresponding to a residue approaches the value 0 or 1
while some others are still quite far from the binary values, the calculation of gradient
and Hessian as well as most of the other matrix computation grows numerically
unstable because the magnitude of Xir
-I- or
1
l-Xir
becomes boundlessly large. Therefore,
we need to eliminate the variables that converged to one of the binary values from
the subsequent computation. The criterion for elimination can be ambiguous and
several schemes can be tried. We assumed a variable
54
xi,
reached 0 if its value is
Table 3.4: The parameter settings used in the implementation.
Smoothing:
E=
M
10-5
=
1000
N = 300
0A = 0.95
0Y = 1.02
p0 = 100
0 = 0 or 0.002
x9=
T
1/(# rotamers of residue i)
= 1000
Qmax = max{5, (99% of the cumulative distribution) + 1}
The CG method:
N = 400
E, = 10-5
Ec = 10-5
Linesearch:
N = 50
/3=0.8
yt = 10-3
smaller than a factor 10-4 of !. After elimination of variables corresponding to a
residue, if there are still remaining variables for it, we need to normalize the value of
remaining variables so that their sum is equal to 1 and the computation can proceed
on the null-space of A. On the other hand, if any of the variables corresponding to
a residue is close to 1, say larger than 0.9, then we regard the residue has converged
to a rotamer and exclude all of the variables for the residue from the subsequent
computation.
Elimination of variable in the middle of the algorithm reduces the
dimension of matrices and may contribute to speed-up of the overall performance.
3.3.3
Results
We implemented the algorithm using MATLAB 6.5 on a PC with a 1.7 GHz Pentium 4 processor, 256 MBytes memory and Windows XP for its OS. Throughout the
experiment, parameter settings as shown in Table 3.4 were used.
The results are summarized in Table 3.5.
55
Table 3.5: Results for SM2. (protein: PDB code, #res: number of residues, #var:
number of variables, optimal: optimal objective value, smoothing: objective value
from the smoothing algorithm, #SM: number of smoothing iterations, #CG: number
of CG calls, time: execution time in seconds, #NC: number of negative curvature
directions used, IT0: initial value of the quadratic penalty parameter.)
protein
256b
256b
lamm
lbpi
J7-
#res
5
6
7
8
10
11
15
19
20
#var
11
13
148
24
137
94
110
194
211
optimal
-54.16980
-70.95571
-70.18879
-87.68481
-110.18409
-94.69729
-116.8101
-152.34157
-209.37536
smoothing
-53.72762
-70.51478
-68.52338
-87.24479
-90.40045
-82.68505
-114.58994
-144.27939
-173.51333
#SM
165
165
309
196
285
248
235
364
275
#CG
431
467
926
561
954
829
715
1166
840
time
1.11
1.28
33.40
2.02
27.48
11.30
14.19
70.81
64.14
#NC
0
0
8
1
6
7
10
12
13
25
350
-242.85081
-158.74741
286
928
333.23
42
0
5
6
7
8
10
11
15
19
20
11
13
148
24
137
94
110
194
211
-54.16980
-70.95571
-70.18879
-87.68481
-110.18409
-94.69729
-116.8101
-152.34157
-209.37536
-53.72762
-70.51478
-68.52338
-87.24479
-90.40045
-81.97510
-115.08332
-144.27939
-202.06752
165
165
310
196
332
268
245
358
274
433
468
934
563
1090
866
731
1144
863
1.15
1.28
33.05
2.16
28.35
11.48
14.28
70.04
65.22
0
0
8
3
8
4
11
14
12
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
25
350
-242.85081
-156.88740
278
894
324.30
30
0.002
10
15
20
78
153
232
-62.64278
-134.63023
-167.31388
-59.63623
-127.46733
999841.81683
378
377
276
1621
1632
937
13.86
37.28
106.58
6
14
27
0.002
0.002
0.002
25
277
138.61011
1063.02470
830
2374
1100.66
50
0.002
10
15
20
128
144
214
-75.37721
-57.13359
-75.8774
-74.10404
-54.72438
14.94464
286
408
405
794
1867
1453
19.42
42.67
99.17
8
19
31
0.002
0.002
0.002
25
303
-93.65443
-69.08175
396
2154
262.39
29
0.002
56
0
0
0
0
0
0
0
0
0
0 . A very
We first experimented with the initial quadratic penalty parameter -Y
small -y0 and the increasing rate Ol close to 1 is desirable because the quadratic penalty
term should not significantly affect the initial behavior of the algorithm. Table 3.5
shows the results from using y0
=
0.002 and not using the quadratic penalty ('f
=
0)
when other parameter settings are identical. There is not noticeable difference in
both the quality of the solution and the performance for the data from 256b, which
implies that the algorithm was able to converge to a solution close to binary without
the enforcement by the quadratic penalty. However, the algorithm did not converge
to binary values when it was applied to the data from lamm and lbpi. For the rest
of the experiment, we used y0
=
0.002.
The objective values from the smoothing algorithm are fairly close to optimal
when the problem size is small, but the results were not stable for the cases with
more than 200 variables. We suspect this is an implementation specific problem that
can be addressed by careful inspection and experiment of the numerical aspects of
the algorithm such as improving the negative curvature direction obtained, the termination criteria of the CG method, and the kind of linesearch method used. At the
same time, though we used a rather conservative parameter settings, it is far from
being suitable for every case. For example, only by setting the barrier penalty parameter decreasing rate O, to 0.98 instead of 0.95 resulted in an improved solution with
the objective value -64.12139 for 20 residues of lbpi compared to 14.94464 when
01= 0.95.
We also ran SM1 algorithm on the same data set for comparison with identical
parameter settings. The results are summarized in Table 3.6. To our interest, SM1
showed more stable behavior for large cases and was able to obtain reasonable solutions for which SM2 failed to do so; for 20 residues of lamm, SM1 obtained an objective
value -164.31605
while SM2 brought an absurd value 999841.81683. However, SM1
also failed to produce any solution close to optimal for the cases of 19 residues of
256b and 25 residues of 256b where SM2 obtained near optimal results. SM1's solution was also often inferior to the one from SM2, and usually took larger number of
CG calls, and consequently longer execution time. The larger number of CG calls is
57
Table 3.6: Results for SM1.
protein
256b
lamm
lbpi
#res
5
6
7
8
10
11
15
19
20
25
10
15
20
25
10
15
20
25
#var
11
13
148
24
137
94
110
194
211
350
78
153
232
277
128
144
214
303
optimal
-54.16980
-70.95571
-70.18879
-87.68481
-110.18409
-94.69729
-116.8101
-152.34157
-209.37536
-242.85081
-62.64278
-134.63023
-167.31388
138.61011
-75.37721
-57.13359
-75.8774
-93.65443
smoothing
-53.72762
-70.51478
-68.13366
-87.24479
-86.12662
-91.90795
-116.81010
32.82828
-199.53027
-67.10875
-59.63623
-127.46733
-164.31605
340.53520
-67.25647
-44.67575
-66.64213
-72.48917
I #SM
147
147
246
159
246
211
210
259
246
251
377
376
292
781
383
367
375
349
I #CG
373
412
742
460
1023
708
639
1104
797
964
14717
20080
1083
2612
6480
5576
4731
2780
time
0.92
1.12
39.63
1.732
44.11
16.69
18.74
165.45
105.04
602.09
291.35
1341.82
284.22
1879.61
266.95
286.67
723.93
1326.31
#NC
0
0
2
1
10
1
14
3
5
20
8
8
7
52
5
6
13
14
7]
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.002
mainly due to the ill-conditioning caused by variables approaching binary values, as
was described in Section 3.3.2.
3.4
Summary
In this chapter, we explored a nonlinear approach to the GMEC problem. We relaxed the binary quadratic formulation of the GMEC problem to obtain a continuous
version of it. Then, we applied Ng's continuation approach to solve the nonlinear
nonconvex minimization problem. We presented a modified version of preconditioned
CG method for possibly indefinite systems, and an adaptive linesearch algorithm that
exploits the negative curvature direction. We also suggested a minor modification to
Ng's algorithm such as separate parameter control and variable elimination. The
computational results suggest that the continuation approach works well for small
instances where each residue do not have many rotamer choices, but the quality of
the local minima it finds for large instances is not guaranteed. The modification we
gave to Ng's algorithm was not effective for large problems in terms of the quality
of the solution it found. However, it contributed to reducing the ill-conditioning and
58
making the algorithm work more efficiently.
59
Chapter 4
Probabilistic inference approach
In previous chapters, we investigated discrete and continuous optimization methods
to solve the GMEC problem formulated as a mathematical programming problem.
In this chapter, we use probabilistic inference methods to infer the GMEC using the
energy terms transformed into probability distribution. The probabilistic inference
methods are generally approximate methods, but empirically their computation time
is short and the solutions found are usually very close to optimal. We test three
different probabilistic inference methods on side-chain placement and sequence design
to compare their performances among themselves and with the results from CPLEX
optimizers and DEE/A*. The main result of this chapter is the promising performance
of the max-product belief propagation (BP) algorithm as a tool for protein sequence
design.
4.1
Methods
In this section, we review probabilistic inference methods that will be used for computational experiment of the next section.
60
4.1.1
Probabilistic relaxation labeling
Relaxation labeling is a method for the problem of graph labeling which often becomes
a fundamental problem in image analysis and computer vision. The objective of graph
labeling is to assign the best label to each node of a graph with compatibility relations
which indicate how compatible pairs of labels are. Obviously, the GMEC problem is
closely related to the graph labeling. Probabilistic relaxation labeling is an iterative
method looking for a fixed point in (4.1), which formalizes how the compatibilities
between two nodes weighted by the the information from neighboring nodes update
the local probability distribution at each iteration.
=
x Pk(X) HjEN()
The local probability distribution P(i)
Ex,
~P(~j
over the possible labels xi of node i at iterij (xi , xj) and the probability distribution
ation k is updated with the compatibility
P (xj) of the neighboring nodes N(i). The updated probability distribution is kept at
the node and used to update the probability distribution of itself and the neighboring
nodes at the next iteration.
For more detailed description and theory of the probabilistic relaxation labeling,
see [32, 15, 16].
4.1.2
BP algorithm
In the BP algorithm, node s sends to its neighbor node i the message m8 i(xi) which
can be interpreted as how likely node s sees node i is in state xi. The BP algorithm
computes exact marginal probabilities in a singly-connected factor graph by an iterative message passing and update protocol. Since our energy function is a sum of self
and pairwise terms, through the Boltzmann mapping, we can view it as a probability
function that is a product of factors that reference only two variables. Therefore,
the factor graph for a problem with three residues looks like Figure 4-1. Then, as
shown in [19], the belief at node i is given by the product of the singleton factor Oi(xi)
61
X1
f(X1,X2)
f(X1,X3)
X2
X3
f(X2,X3)
Figure 4-1: An example factor graph with three residues
associated with variable xi and all the messages coming into it:
17
P(xi) = kqi(xi)
(4.2)
mai(xi),
sEN'(i)
where k is a normalization constant and N'(i) denotes the neighboring nodes of i
except the singleton factor node associated with xi. The message update rule at node
i is given by
mnsi (Xi)
Xj
#(Xj)
Oj
ij (Xi, zy )
Mtj (Xj),
(4.3)
tEN'(j)\s
where Oij (xi, xj) is the pairwise factor of xi and xi. Due to the form of the message
update rule, the BP algorithm is also called by the sum-product algorithm. BP is
not guaranteed to work in arbitrary topology, but Yedida et al.
[43]
have shown that
BP can only converge to a stationary point of the Bethe free energy.
4.1.3
Max-product BP algorithm
Suppose X is the set of all possible configurations of the random variable {x}. Then,
the MAP (maximum a posteriori) assignment is given by
XMAP
- arg max P({x}).
{X}eX
62
(4.4)
Finding the MAP assignment is generally NP-hard. However, if the factor graph
is singly connected, the max-product BP algorithm can efficiently solve the problem.
The max-product BP algorithm is identical to the standard BP algorithm except that
the message update rule takes the maximum of the product instead of the sum, i.e.
msi (xi)
<- max
#j (xj) bj (xi, xj)
J
mtj (Xj).
(4.5)
tEN'(j)\s
The belief update is given by (4.2). When the belief update converges, the belief for
any node i satisfies
b (xi) = k max P({x}|xj),
(4.6)
{x}eX
and the MAP assignment can be found by
(XMAP)i=
arg max bi(xi).
(4.7)
The max-product BP has no guarantee to find MAP in graphs with loops. Freeman
and Weiss have shown that the assignment from the fixed points of the max-product
BP is a neighborhood maximum of the posterior probability [10].
4.1.4
Double loop algorithm
BP is guaranteed to work correctly in a singly-connected factor graph, but its performance on loopy graphs has not been understood well. In practical applications,
however, such as decoding turbo codes, the method has been very successful even in
loopy graphs. An important discovery on the behavior of BP was made by relating
it to Gibbs free energy approximation [43], and this elucidated BP as a method for
approximate inference in general graphs. Despite this theoretical progress in understanding the fixed points of BP, its applications are limited without reliable convergence properties. The convergence problem of BP together with BP's analogy to the
Bethe free energy minimization became the motivation for the development of algorithms that attempt to minimize the Bethe free energy itself. In our application of the
max-product BP algorithm to the GMEC problem, the method performed surpris63
ingly well, but we also found cases where the method converges to an unreasonably
large energy value. In this work, to evaluate the approximate inference method based
on the Bethe free energy minimization as an alternative to the max-product BP when
it does not show a good convergence property, the double loop algorithm derived from
the mutual information minimization and marginal entropy maximization (MIME)
principle [36] is tested.
4.1.5
Implementation
We implemented the probabilistic relaxation labeling (RL), the max-product BP
(MP), and the MIME double loop algorithm (DL). When applying the probabilistic
methods, self-energies and pairwise energies need to be transformed to probabilistic
distributions. In RL, the initial estimate of Pi(xi) were given by
1
Pi(xi = r) = -(ei,
Z
- min ej),
3,s
(4.8)
where Z is a normalization constant. On the other hand, the initial estimate of a
uniform distribution for RL was found to be much worse than (4.8).
The significant link that connects the energy minimization and the probabilistic
inference is the Boltzmann law, which is generally accepted to be true in statistical
physics. The compatibility for RL was calculated by
(xj = r, xj = s) = e-e'iris,
(4.9)
where e',rj is given by (2.1). In MP and DL, both compatibility ?4'j (xi, xj) and the
evidence
#i(xi)
are necessary. These were calculated by the following relation:
(xi = r, x3 = s) = e-(eis,-emin).
Oi(xi = r) = e-eiremin),
64
(4.10)
(4.11)
where emin is defined by emin = min{mini,, ei,, mini<j,,,, es
}. The initial estimate
of messages mij (Xi j), probability distribution P(xi), or Pi (xi, xj) turned out to be
almost irrelevant to the performance of MP and DL.
The convergence of RL, MP, and DL was determined by the change in the sum
of minus log probability. We used the threshold of 10-
in the change rate to accept
the convergence. The convergence of DL's inner loop is determined by the parameter
Cthr,
for which we used the value 10-4. We used zero for both 6 and
of DL.
The rest of the implementation is fairly straightforward for all three methods.
One possible problem in the implementation is numerical underflow. Especially, in
the double loop algorithm, the update ratio of the joint probability Pij (xi, xj) and
that of the marginal probability Pi(xi) are reciprocal. Therefore, care should be taken
so that the update ratio near zero or infinity is not used in the iteration.
4.2
Results and discussions
We tested the implementation of three methods presented in Section 4.1 on the protein
data examples. Since RL and MP are fast and converge for most cases, we were able
to carry out fairly large side-chain placement as well as some sequence design. All
tests were performed on a Debian workstation with a 2.20 GHz Intel Xeon processor
and 1 GBytes of memory, and the programs were written in C and compiled by GNU
compiler.
4.2.1
Side-chain placement
The protein test set for the side-chain placement is presented in Table 4.1. Each test
case is distinguished by the protein's PDB code and the number of residues modeled.
For simplicity from now on, we will call each test case by combining the PDB code
and the number of residues modeled, for example 256b-50. Note that all the energy
calculations for lbpi, lamm, 256b, larb were done using the same rotamer library (call
it LIB1), and the test cases of 2end and lilq were generated using a different library
(call it LIB2). LIB2 consists of denser samples of X angles, and therefore have a large
65
number of rotamers for each amino acid than LIB1. As a measure of the combinatorial complexity of each test case, we added a column for log total conformations. We
know the optimal value of each test case using one or more of CPLEX optimizers, and
Altman's DEE/A* implementation [2]. We have extraordinarily large optimal values
for some cases of lamm and larb. We believe this results from the clash between
rotamers, which might happen because of the non-ideal characteristics of the rotamer
library that does not always represent the actual conformation of a side-chain very
well. We, however, included, the cases in our test to see whether the characteristics
of the rotamer library affects the performance of the methods.
For reference, we included the time taken to solve each test case by different
solvers. We used the IP solver (CPLEX MIP Optimizer) only if we cannot obtain
integral solutions using the LP solver (CPLEX LP Optimizer). For example, in the
cases of 256b-70 and 256b-80, the optimal solutions of the LP relaxation are fractional. On the other hand, for 2end-35, 2end-40, and 2end-49, the LP solver broke
down because of the system's limited memory. The IP solver is not expected to handle the cases considering its more massive efforts to find integral solutions. Altman's
DEE/A* implementation was much faster than the other two solvers in all test cases
of Table 4.1.
The test result for RL, MP, and DL are shown together in Table 4.2. Noticeably,
MP outperforms the other methods, and calculates optimal solutions for all test cases
but two of 256b. An interesting point is that MP failed only for the cases where LP
relaxation does not have integral solutions, but we do not have a good insight into
whether there exists more general connection between the two methods (Throughout
our entire experiments including those not presented in this thesis, we could not observe any other protein energy example that has fractional solution in LP relaxation.)
On the other hand, the consistent worst performer is DL. DL was able to find approximate solutions only for small cases and took longer than both RL and MP. The
probability update mechanism using Lagrangean multiplier to minimize the Bethe
free energy approximation turned out to be not very effective in finding the GMEC.
DL was accordingly not tested for large cases of the side-chain placement and all of
66
Table 4.1: The protein test set for the side-chain placement (logloconf: log total
conformations, optimal: optimal objective value, TLP: LP solver solution time, TIp:
IP solver solution time, TDEE: DEE/A* solution time, symbol - : skipped case,
symbol * : failed, symbol F : fractional LP solution).
protein
lbpi
1amM
256b
larb
2end
lilq
#res
10
15
20
25
46
10
15
20
25
70
80
10
15
20
25
30
35
40
50
60
70
80
logioconf
6.9
9.2
13.4
18.4
37.9
8.2
13.1
19.0
22.0
67.1
74.3
4.7
9.2
14.7
22.3
25.7
30.9
34.7
44.5
56.5
66.8
97.4
10
4.6
-65.10481
20
30
78
15
25
35
40
49
7
11
9.0
18.7
59.5
22.5
36.5
53.0
60.1
76.8
7.3
16.3
-128.35604
999715.70565
-460.69454
-182.56152
-285.07644
-347.29088
-402.92974
-540.60516
-81.23579
-172.50967
optimal
-75.37721
-57.13359
-75.87743
-93.65443
-205.52529
-62.64278
-134.63023
-167.31388
138.61011
238.12890
17222.13193
-110.18409
-116.81010
-209.37536
-242.85081
-285.57884
-352.56265
-379.45106
-459.25473
-564.94275
-633.09146
-626.06002
67
TLP
0
0
0
2
30
0
0
1
0
44
100
0
0
1
2
3
8
10
26
62
F
F
0
0
0
23
30
286
*
*
*
0
2
TIp
170
974
-
-
TDEE
0
0
0
0
1
0
0
0
0
1
2
0
0
0
0
0
0
0
1
1
3
11
0
0
0
0
2
6
27
30
46
0
0
sequence design. RL's performance was generally better than DL and fairly accurate
up to medium-sized cases. The fraction of incorrect rotamers from each method is
shown in Table 4.3. It is interesting that RL's incorrect prediction rate is maintained
around 0.1 despite the huge AE's for some cases in Table 4.2. We can observe that
both RL and MP are considerably faster than LP but slower than DEE/A*. RL and
MP are almost equal in speed. Table 4.3 lists the fractions of incorrectly predicted
rotamers by each method along the change of log complexity
Considering speed and the accuracy of the prediction, only MP can be an alternative to DEE/A* in the side-chain placement. To have a better comparison of
the MP and DEE/A*, we tested only MP and DEE/A* on larger cases of side-chain
placement. We generated the test cases consisting of 35 ~ 60 residues using LIB2.
The length of modeled protein sequences are around the average of those in Table 4.1,
but have more number of rotamer conformations. The description of the test cases
and the results are presented in Table 4.4.
All of the optimal values in Table 4.4 were obtained by DEE/A*. However, we
could find cases where the method fails to find the optimal solution during the A*
search due to exhausting the available memory. The failure during the A* search
implies that DEE's elimination was not effective enough to make the A*'s job doable,
and also suggests that DEE is not the ultimate solution even for side-chain placement.
MP was able to find four known optimal solutions, but the overall performance was
not as impressive as in the previous cases.
The graphical model used in MP is a complete graph consisting of nodes representing the residues. Therefore, the convergence of MP or the quality of the solution
from it will solely depend on the numerical properties of the energies. As an indirect measure for the numerical properties of the energies, we analyze the estimated
entropy of each side-chain i, which is given by
Si
-
Epi(Xi) log pi(Xi).
Xi
68
(4.12)
Table 4.2: Results for the side-chain placement test (AE: the difference from the
optimal objective value, symbol -: skipped).
protein
lbpi
lamm
256b
larb
2end
lilq
#res
10
15
20
25
46
10
15
20
25
#rotamers incorrect
DL
RL MP
0
0
2
1
0
5
1
0
5
4
0
6
7
0
0
0
1
1
0
1
0
0
6
3
0
3
AE
RL
MP
0
0.73866
0.73699
1.120E3
1.004E6
0
0.00209
0
4.072E5
0
0
0
0
0
0
0
0
0
70
12
0
-
1.026E6
0
80
10
15
20
25
30
35
40
50
60
70
80
10
20
30
78
15
25
35
14
0
0
1
0
0
2
0
0
4
14
11
0
0
0
14
3
5
6
0
2
4
5
0
4
2
19
9
3
10
2.007E6
0
0
0.19185
0
0
0.94663
0
0
1.50424
1.97973
6.462E5
0
0
0
1.034E6
4.23278
4.25637
59.91247
40
4
49
7
11
6
0
0
0
0
0
0
0
0
0
0
0
0
12
6
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0.33407
0.11505
0
0
0
0
0
0
0
0
0
0
0
-
1
4
4.33043
0
0
DL
1.23534
2.15169
2.12650
8.51308
0.06898
0.51263
29.85257
4.072E5
-
0
8.499E3
6.77121
8.77E3
0
5.79E2
11.872
1.09E6
5.62226
5.30589
5.13E3
-
7.70635
1.00E6
RTime (sec
RL MP
D
0
0
54
0
0
1
0
0
6
0
0
45
1
2
0
0
0
0
0
2
0
0
4
1
0 1363
2
8
-
3
0
0
0
1
0
0
0
2
3
4
9
0
0
0
1
8
27
147
10
0
1
0
0
0
1
1
3
12
17
61
0
0
0
8
2
13
92
0
1
4
11
0
58
451
633
73
38
157
162
106
-
246
0
1
224
0
0
0
7
Table 4.3: The fraction of incorrect rotamers from RL, MP, and DL.
logioconf [ RL I MP I DL
4.6
10
0.008
0 0.139
10
20
0.047
0 0.212
20
30
0.080
0 0.306
30'-40
0.102
0
0
0.082
40 -60
60 ~ 97.4 0.151 0.041
69
|
Table 4.4: Performance comparison of MP and DEE/A* on side-chain placement
(optimal: solution value from DEE/A*, EMP: solution value from MP, AE: difference
between 'optimal' and EMp, IRMp: number of incorrectly predicted rotamers by MP,
TDEE: time taken for DEE/A* in seconds, TMp: time taken for MP in seconds).
protein
#res
logioconf
lcbn
lisu
1igd
9rnt
1whi
lctj
laac
icex
35
49
50
50
50
50
50
50
60
60
60
60
50.3
76.7
87.0
91.6
92.1
95.5
98.5
100.3
110.1
113.3
119.0
118.1
lxnb
1pic
2hbg
2ihl
optimal
Emp
999802.90413
-329.62462
-376.93007
-391.89270
-450.82517
-439.59678
-549.36958
-519.22413
?
-331.36163
-560.59013
?
999818.01809
-329.62462
-349.05877
-391.89270
-450.82517
-439.59678
-547.33491
-518.85970
-437.26643
-330.80530
-560.28258
-527.61321
TDEE
AE
15.1139
0
27.87193
0
0
0
2.03467
0.36443
IRMp
3
0
7
0
0
0
2
5
*
*
*
1
4
130
646
*
*
0.55633
0.30755
*
14
875
282
70
467
162
289
179
TMp
58
1406
762
435
1568
878
1396
851
1087
1342
2865
3254
Si that is close to 0 implies that nearly one rotamer is a good candidate for the GMEC
of residue i whereas high Si suggests that there are many rotamers with almost equal
chance to be the conformation. Simply to connect this intuition with MP's performance, we suspect easily solved cases by MP will have overall lower entropy than
those that are not. As a rough examination of this conjecture, we take a look at
MP's estimated entropy of each side-chain.
Figure 4-2 shows the initial entropy distribution and its change when MP converged after 4 iterations for lamm-80. The entropy of each side-chain generally decreased from the average 0.888 to 0.627. However, unlike our expectation that most
of the side-chains will have entropy close to 0 because MP converged so easily for this
case, we observe more than 20% of side-chains have entropy larger than 1 at the end.
For comparison, we also plotted the change of entropy for 256b-80.
256b-80 has
similar complexity as lamm-80, but is one of the only two cases of Table 4.1 that MP
failed. Figure 4-3 shows the result. 256b-80's average entropy dropped from 1.123 to
0.719, but its general trend of the entropy change looks similar to that of lamm-80.
Another view of the two distributions is given by the entropy histogram of Figure 4-4,
where the final entropy distribution of 256b-80 is only slightly more extended to the
large entropy values. This observation and the statistics of estimated entropy listed
70
Initial entropy distribution
3
.......................................................................................................................................
2.5
2
k1.5
0.512
Residue number
After 4 iterations
2.5
S2
05
0
co)
ol
0o-
V,
t2
0.~
8)
r
R1
Lto
z-
o
Residue number
Figure 4-2: The change in the estimated entropy distribution from MP for lanm-80.
together with the fraction of incorrectly predicted rotamers in Table 4.5 suggests that
it is hard to find a direct correlation between MP's prediction error and the estimated
entropy. However, the last column of Table 4.5 at least tells us that the residue position with an incorrect rotamer prediction might have a threshold minimum of the
estimated entropy (in our test results, approximately 1.00). This fact may provide
a step to use MP as a preprocessing procedure to figure out the conformation of a
residue with low estimated entropy before using an exact method such as DEE/A*
to solve the rest of the problem.
Figure 4-5 shows the empirical relation between the log complexity and the
time taken for MP in seconds. It can be seen that the time complexity is roughly a
polynomial of log complexity.
71
0
initial entropy distribution
3.5
3
2.5
.5
1a
0.5
Residue number
After 5 iterations
3
----- --
2.5
0
w4
1.5
0.5
Figure 4-3: The change in the estimated entropy distribution from MP for 256b-80.
-1
*1
30
25
(a
20
E)
:3
zi 1510
510
i.
Cf
0.5
1
15
2
11
2.5
Entropy
Figure 4-4: Histogram of estimated entropy for lamm-80 and 256b-80 at convergence
of MP.
72
Table 4.5: Fraction of incorrectly predicted rotamers by MP and statistics of the
estimated entropy for the second side-chain placement test. (IR fraction: fraction
of incorrectly predicted rotamers, avg Sj: average estimated entropy, max Sj: maximum estimated entropy, min Sj: minimum estimated entropy, miniEIR Si: minimum
entropy of incorrectly predicted rotamers).
I #res
protein
lcbn
lisu
ligd
IR fraction I avg Si
0.09
1.77
0
1.80
0.14
2.16
0
2.14
0
1.88
0
2.32
0.04
2.17
35
49
50
max Si
3.78
5.05
4.06
min Si
0.00
0.00
0.00
0.00
0.24
0.00
0.70
minjeIR Si
1.66
1.03
-
9rnt
50
1whi
lctj
laac
50
50
50
icex
50
0.10
2.16
4.13
0.47
1.71
1pic
2hbg
60
60
0.02
0.07
2.22
2.25
4.04
4.55
0.58
0.58
4.04
3.31
4.43
4.33
4.29
4.34
-
2.54
3000
2500
0
2000
1500
E
:0
Z-)
1000
--&
0
500
0.o
0
C
aCli)b oo'
o
cC0
o o
-500
-1
nnn
B
.
20
.
40
80
s0
100
120
140
log number of points In the coformation space
Figure 4-5: Execution time for MP in seconds vs log total conformations.
73
4.2.2
Sequence design
Table 4.6 shows the protein test set for sequence design. The optimal solutions were
obtained only for three cases, and the rest three cases were not solved through either
LP or DEE/A*. Both CPLEX LP Optimizer and Altman's DEE/A* implementation
failed in R15R23, sweet7-NOW, and P2P2prime due to the system's limited memory.
On the other hand, the LP solver managed to solve lilq-design3 after five hours.
DEE/A* broke down for this case during the A* search.
We let RL and MP solve the sequence design cases. The results are summarized
in Table 4.7. For all the cases with known optimal solutions, MP computed the
solutions exactly in extremely short time. In protein design, unlike the side-chain
placement, the measure of accuracy is usually the fraction of amino acids that are
predicted incorrectly, but MP's results are correct to the rotamers. RL's performance
was slightly worse than MP's, but was also impressive. RL resulted in two and one
incorrect rotamers for lilq-design2 and lilq-design3 respectively, but its predictions of amino acids were exact.
In the other three extremely large cases of sequence design whose optimal values
are unknown, we can see MP's accuracy somewhat drops. We know that MP's solutions for R15R23 and P2P2prime are not optimal because RL's solutions have lower
energy values. Since both RL and MP are probabilistic inference methods, the probability that RL or MP converges to the correct rotamer will decrease as the number of
allowed rotamers at each position increases. Despite this expected limitation, RL and
MP prove themselves to be good alternatives to DEE/A* in sequence design. Seeing
that DEE/A* fails in the design cases whose sequence length and log complexity are
even smaller than the side-chain placement cases it solved, DEE/A*'s elimination
criteria seem less powerful especially in sequence design. We think combining the two
approaches, say, by feeding DEE's incompletely eliminated output to MP or RL as
an input may make a good scheme.
74
Table 4.6: The protein test set for sequence design (symbol ? : unknown optimal
value, symbol * : failed).
Case ID
lilq-designl
lilq-design2
lilq-design3
R15R23
sweet7-NOW
P2P2prime
logioconf
19.1
21.2
23.6
optimal
-186.108150
-190.791630
-194.170360
22.5
43.6
60.1
TLP
11
616
19522
TDEE
?*
?*
?*
2
40
*
*
*
*
Table 4.7: Results for sequence design test cases whose optimal values are known.
Case ID
lilq-designl
lilq-design2
lilq-design3
#amino acids incorrect
RL
MP
0
0
0
0
0
0
#rotamers incorrect
RL
MP
0
0
2
0
1
0
AE
RL
MP
0
0
0.22507
0
0.23092
0
Time (sec)
RL
MP
3
1
9
3
13
24
Table 4.8: Results for sequence design test cases whose optimal values are unknown
(E: solution value the method obtained).
E
Case ID
R15R23
sweet7-NOW
P2P2prime
RL
999874.33767
999664.69920
-384.30067
MP
999926.45661
-178.83174
-382.10387
75
Time
RL
63
300
248
(sec)
MP
157
285
396
4.3
Summary
In this chapter, we took probabilistic inference approach to the GMEC problem. By
transforming the self and pairwise energies into probability distributions, we were able
to apply techniques such as relaxation labeling, the max-product belief propagation,
and the MIME double loop algorithm.
We had the most satisfactory results with the max-product BP algorithm both in
side-chain placement and sequence design. On the other hand, DEE/A* had difficulty
in dealing with sequence design and complex side-chain placement.
Probabilistic
relaxation labeling was reasonably fast and often found good approximate solutions,
but the MIME double loop algorithm mostly ended up with poor local minima. We
also attempted analysis of the correlation between the max-product BP's error and
the estimated entropy of side-chains.
76
Chapter 5
Conclusions and future work
In this work, we investigated three different approaches to the global minimum energy
conformation (GMEC) problem. We first compared the effectiveness of three known
ILP formulations on protein energy examples using CPLEX optimizers. Interestingly,
most solved LP-relaxation of Eriksson et al.'s and Koster et al.'s formulations had
integral solutions for protein energy examples. We then devised three decomposition schemes of Eriksson et al.'s ILP formulation to solve the GMEC problem using
the branch-and-price algorithm. From the computational experiments, we observed
that the implementations of the branch-and-price formulations can obtain tight lower
bounds and often find the optimal solution at the root node for protein energy examples as well as random energy examples. Though we could not obtain practical
performance with the simple implementations, we think the performance can be improved by adopting various known techniques for the branch-and-price algorithm such
as tailing-off effect control. The results from random energy examples suggest that
the branch-and-price approach can be more favorable for hard instances. Developing
different decomposition schemes or subproblem solution methods are interesting topics that can be more investigated.
In the context of the ILP approach, another interesting direction not pursued
here is randomized rounding scheme. We note that the approximation results of the
DENSE-k-SUBGRAPH problem (DSP) can be used to obtain approximation ratio
for rounding fractional LP solution to the GMEC problem. However, we need to
77
obtain a good ratio for the case k
<
n, a different condition from that studied in Han
et al.'s work [14].
As the second approach, we used nonlinear programming techniques. We took
Ng's continuation approach to transform the discrete optimization problem into a
sequence of nonlinear continuous optimization problems. We described detailed procedures necessary to solve each steps, and tested the implementation with protein
energy examples. The implementation was able to obtain good solutions of small
cases quickly, but accuracy and speed were not guaranteed for larger cases. However,
the method can be made faster through implementation in lower level language such
as C, and used to provide upper bounds in conjunction with other exact methods.
In the probabilistic inference approach, we presented results of applying probabilistic relaxation labeling, the max-product belief propagation (BP), and the MIME
double loop algorithm to the GMEC problem. The max-product BP algorithm was
by far the most effective method throughout our work. It turned out to be comparable to DEE/A* in side-chain placement and superior in sequence design and very
large side-chain placement. While the MIME double loop algorithm did not do well,
probabilistic relaxation labeling usually had good approximate solutions for both sidechain placement and sequence design. We also investigated the correlation between
the max-product BP's error and the estimated entropy of side-chains, but we could
not relate the high estimated entropy with prediction failure at the position.
Various approaches can be taken to combine several efficient methods such as
DEE/A* and the max-product BP in serial or parallel fashion to obtain an accurate
and efficient method. On the other hand, one of the most interesting and important
issues in using the max-product BP for protein side-chain placement will be characterizing the convergence behavior of the max-product BP and the optimality condition.
We suspect the concurrence of fractional LP solutions and errors of the max-product
BP mentioned in Section 4.2.1 might serve to guide in finding the answers.
78
Bibliography
[1] Ernst Althaus, Oliver Kohlbacher, Hans-Peter Lenhof, and Peter Miller. A combinatorial approach to protein docking with flexible side-chains. In R. Shamir,
S. Miyano, S. Istrail, P Pevzner, and M. Waterman, editors, Proceedings of the
4th Annual InternationalConference on ComputationalMolecular Biology, pages
15-24. ACM Press, 2000.
[2] Michael D. Altman. DEE/A* implementation. the Tidor group, Al Lab, MIT,
2003.
[3] Cynthia Barnhart, Ellis L. Johnson, George L. Nemhauser, Martin W. P. Savelsbergh, and Pamela H. Vance. Branch-and-price: column generation for solving
huge integer programs. Operations Research, 46:316-329, 1998.
[4] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA,
1999.
[5]
Pierluigi Crescenzi and Viggo Kann. A compendium of np optimization problems.
available at http: //www.nada.kth. se/viggo/problemlist/compendium. html.
[6] Bassil I.Dahiyat and Stephen L. Mayo. Protein design automation. Protein
Science, 5:895-903, 1996.
[7] Marc De Maeyer, Johan Desmet, and Ignace Lasters. All in one: a highly detailed
rotamer library improves both accuracy and speed in the modeling of side-chains
by dead-end elimination. Folding & Design, 2:53-66, 1997.
79
[8]
Johan Desmet, Mark De Maeyer, Bart Hazes, and Ignace Lasters. The deadend elimination theorem and its use in protein side-chain positioning. Nature,
356:539-542, 1992.
[9] Olivia Eriksson, Yishao Zhou, and Arne Elofsson.
Side chain-positioning as
an integer programming problem. In WABI, volume 2149 of Lecture Notes in
Computer Science, pages 128-141. Springer, 2001.
[10] William T. Freeman and Yair Weiss. On the fixed points of the max-product
algorithm. Technical Report TR-99-39, MERL, 2000.
[11] D. Ben Gordon and Stephen L. Mayo. Branch-and-terminate: a combinatorial
optimization algorithm for protein design. Structure with Folding and Design,
7(9):1089-1098, 1999.
[12] D. Benjamin Gordon and Stephen L. Mayo. Radical performance enhancements
for combinatorial optimization algorithms based on the dead-end elimination
theorem. Journal of Computational Chemistry, 13:1505-1514, 1998.
[13] Nicholas I. M. Gould, Stefano Lucidi, Massimo Roma, and Philippe L. Toint.
Exploiting negative curvature directions in linesearch methods for unconstrained
optimization. Technical Report RAL-TR-97-064, Rutherford Appleton Laboratory, 1997.
[14] Qiaoming Han, Yinyu Ye, , and Jiawei Zhang.
Approximation of dense-k-
subgraph. 2000.
[15] Robert M. Haralick. An interpretation for probabilistic relaxation. Computer
Vision, Graphics, and Image Processing, 22:388-395, 1983.
[16] Robert A. Hummel and Steven W. Zucker. On the foundations of relaxation
labeling processes. IEEE Transactions on Pattern Analysis and Machine intelligence, PAMI-5(3):267-287, 1983.
80
[17]
Olivia Hunting, Ulrich Faigle, and Walter Kern. A lagrangian relaxation approach to the edge-weighted clique problem. European Journal of Operations
Research, 131(1):119-131, May 2001.
[18] Victor M. Jimenez and Andres Marzal. An algorithm for efficient computation of k shortest paths. Technical Report DSCI-I/38/94, Depto. de Sistemas
Informaiticos y Computaci6n, Universidad Politecnica de Valencia, Valencia,
Spain, 1994.
[19] Michael I. Jordan. An introduction to probabilisticgraphical models. University
of California, Berkeley, 2003.
[20] Narendra Karmarkar, Mauricio G. C. Resende, and K. G. Ramakrishnan. An
interior point algorithm to solve computationally difficult set covering problems.
Mathematical Programming,52:597-618, 1991.
[21] Patrice Koehl and Marc Delarue. Application of a self-consistent mean field
theory to predict protein side-chains conformation and estimate their conformational entropy. Journal of Molecular Biology, 239:249-275, 1994.
[22] Arie M.C.A Koster. Frequency Assignment: Models and Algorithms. PhD thesis,
Maastricht University, November 1999.
[23] Arie M.C.A. Koster, Stan P.M. van Hoesel, and Antoon W.J. Kolen. The partial
constraint satisfaction problem: Facets and lifting theorems. Operations research
letters, 23(3-5):89-97, 1998.
[24] Arie M.C.A. Koster, Stan P.M. van Hoesel, and Antoon W.J. Kolen. Lower
bounds for minimum interference frequency assignment problems. Technical Report RM 99/026, Maastricht University, October 1999.
[25] Arie M.C.A. Koster, Stan P.M. van Hoesel, and Antoon W.J. Kolen. Solving
frequency assignment problems via tree-decomposition. Technical Report RM
99/011, Maastricht University, 1999.
81
[26] Andrew R. Leach and Andrew P. Lemon. Exploring the conformational space of
protein side chains using dead-end elimination and the a* algorithm. PROTEINS:
Structure, Function, and Genetics, 33:227-239, 1998.
[27] Stefano Lucidi, Francesco Rochetich, and Massimo Roma. Curvilinear stabilization techniques for truncated newton methods in large scale unconstrained
optimization. SIAM Journal on Optimization, 8(4):916-939, 1998.
[28] Elder Magalhaes Macambira and Cid Carvalho de Souza. The edge-weighted
clique problem: Valid inequalities, facets and polyhedral computations. European
Journal of Operations Research, 123(2):346-371, June 2000.
[29] Jorge J. More and Danny C. Sorensen.
On the use of directions of negative
curvature in a modified newton method. Mathematical Programming, 16:1-20,
1979.
[30] Stephen G. Nash and Ariela Sofer. Preconditioning reduced matrices. SIAM
Journal of Matrix Analysis and Applications, 17(1):47-68, 1996.
[31] Kien-Ming Ng. A continuation approachfor solving nonlinearoptimization problems with discrete variables. PhD thesis, Stanford University, 2002.
[32] Shmuel Peleg. A new probabilistic relaxation scheme. IEEE Transactions on
PatternAnalysis and Machine intelligence, PAMI-2(4):362-369, 1980.
[33] N. A. Pierce, J. A. Spriet, J. Desmet, and S. L. Mayo. Conformational splitting:
a more powerful criterion for dead-end elimination. Journal of Computational
Chemistry, 21:999-1009, 2000.
[34] J. W. Ponders and F. M. Richards. Tertiary templates for proteins. Journal of
Molecular Biology, 193(4):775-791, 1987.
[35] Ted K.
Ralphs.
Symphony version 3.0 user's guide.
http: //www. branchandcut . org/SYMPHONY, 2003.
82
available at
[36] Anand Rangarajan and Alan L. Yuille. Mime: Mutual information minimization
and entropy maximization for bayesian belief propagation. In T. G. Dietterich,
S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 873-880, Cambridge, MA, 2002. MIT Press.
[37] Mohit Tawarmalani and Nikolaos V. Sahinidis. Global optimization of mixedinteger nonlinear programs: A theoretical and computational study. In Mathematical Programming.(submitted, 1999).
[38] Ron Unger and John Moult. Finding the lowest free energy of conformation of a
protein is a NP-hard problem: proof and implications. Bulletin of Mathematical
Biology, 55(6):1183-1198, 1993.
[39] Frangois Vanderbeck and Laurence Wolsey. An exact algorithm for ip column
generation. Operations Research Letters, 19(4):151-159, 1996.
[40] Christopher A. Voigt, D. Benjamin Gordon, and Stephen L. Mayo. Protein
design automation. Protein Science, 5:895-903, 1996.
[41] J. P. Warners, T. Terlaky, C. Roos, and B. Jansen. Potential reduction algorithms
for structured combinatorial optimization problems. Operations Research Letters,
21:55-64, 1997.
[42] Chen Yanover and Yair Weiss. Approximate inference and protein-folding. In
Proceedings of Neural Information Processing Systems, 2002.
[43] Jonathan S. Yedida, William T. Freeman, and Weiss Yair. Bethe free energy,
kikuchi approximations and belief propagation algorithms. Technical Report
TR2001-16, MERL, 2001.
83
Download