HEURISTIC & SPECIAL CASE ALGORITHMS FOR DISPERSION PROBLEMS - RAVI, ROZENKRANTZ, TAYI

advertisement
HEURISTIC & SPECIAL CASE
ALGORITHMS FOR
DISPERSION PROBLEMS RAVI, ROZENKRANTZ, TAYI
ROB CHURCHILL
(THANKS TO BEHZAD)
Problem:
given V = {v1, v2, …, vn}, find a subset of p nodes
(2 <= p <= n) such that some distance function
between nodes is maximized
My first reaction: sounds like a Max-k Cover
Problem except instead of covering, maximizing
distances
Max-Min Facility
Dispersion
(MMFD)
Given non-negative, symmetric distance function
w(x,y) where x, y ∈ V
Find a subset P = {vi1, vi2, …, vip} of V where |P| = p,
s.t. f(P) = minx,y ∈ P{w(x, y)} is maximized.
Max-Avg Facility
Dispersion (MAFD)
Given non-negative, symmetric distance function
w(x,y) where x, y ∈ V
Find a subset P = {vi1, vi2, …, vip} of V where |P| = p,
s.t. f(P) = 2/[p(p-1)] * Σx,y ∈ Pw(x, y) is maximized.
MMFD & MAFD are NP-Hard
Even when distance function is a metric
Reduction to the NP-Complete problem CLIQUE.
Checks to see if a given graph G = (V, E)
contains a clique of size >= J
Reduction
w(x, y) = 1 if they are connected, 0 otherwise
set J = p
For MAFD, if Clique(J) = 1, then there exists a
clique of size J. If J < 1, then there does not exist a
clique of size J
For MMFD, Clique(J) = 1 if there exists a clique of
size J and 0 if there does not
How do we solve these?
If we can’t get an optimal solution, we will settle for
a good approximation
There are no absolute approximation algorithms for
MMFD or MAFD unless P = NP
We want a relative approximation algorithm
Use Greedy Algorithms
“Greed is good.” - Gordon Gekko
Max-Min Greedy Algorithm
Step 1. Let vi and vj be the endpoints of an edge of maximum weight.
Step 2. P <— {vi, vj}.
Step 3. while ( |P| < p ) do
begin
a. Find a node v ∈ V \ P such that minv' ∈ P {w(v, v’)} is maximum among the
nodes in V \ P.
b. P <— P U {v}
end
Step 4. Output P.
Provides a 2-approximation to the optimal value
Max-Avg Greedy Algorithm
Step 1. Let vi and vj be the endpoints of an edge of maximum weight.
Step 2. P <— {vi, vj}.
Step 3. while ( |P| < p ) do
begin
a. Find a node v ∈ V \ P such that Σv’ ∈ P w(v, v') is maximum among the
nodes in V \ P.
b. P <— P U {v}.
end
Step 4. Output P.
Provides a 4-approximation of the optimal solution
Special Cases
For one dimensional data points, you can solve
MMFD & MAFD optimally in polynomial time
For two dimensional data points, you can solve
MAFD slightly more accurately than the greedy
algorithm in polynomial time
2-D MMFD is NP-Hard, 2-D MAFD is open
1-D MAFD & MMFD
Restricting the points to 1-D allows for a dynamic
programming optimal solution in polynomial time
O(max{n log n, pn})
V = {x1, x2, …, xn}
How it works
Sort the points in V (n log n time)
w(x, y) = distance from x to y
OPT(j, k) = the solution value with k points picked
from x1, …, xj
OPT(n, p) = optimal solution for the whole set
Recursive Statement
OPT(j, k) = max {OPT(j-1, k), OPT(j-1, k-1) U xj}
Runtime MAFD
OPT(j-1, k) and OPT(j-1, k-1) are constant time
lookups
Store the representative of OPT(j-1, k-1) in
μ(j-1, k-1)
OPT(j-1, k-1) U xj is constant time:
w(xj, μ(j-1, k-1)) + OPT(j-1, k-1)*(k-1) / k = average
distance
Runtime MMFD
Store the most recently picked element in the
optimal solution in f(j-1, k-1)
This gives a constant time computation of OPT(j-1,
k-1) U xj:
min {OPT(j-1, k-1), w(xj, f(j-1, k-1))}
Runtime
Both are O (nlogn + pn) since their computation
times per iteration are constant if the right
information is stored
The Dynamic Programming Algorithm
(*- - In the following, array F represents the function f in the formulation. - -*)
Step 1. Sort the given points, and let {x, x2, …, xn} denote the points in increasing order.
Step 2. for j := 1 to n do F [0, j] <— 0;
Step 3. F [1,1] <— 0.
Step 4. (*- - Compute the value of an optimal placement - - *)
for j := 2 to n do
for k:= 1 to min (p,j) do
begin
t1 <— F[k, j - 1] + k(p - k)(xj - xj-1);
t2 <— F[k - 1, j - 1] + (k - 1)(p - k + 1)(xj - xj-1);
if t1 > t2, then (*- - do not include xj - - *)
F[k, j] <— t1;
else (*- - Include xj - - *)
F[k, j] <— t2;
end;
—>
The Algorithm cont.
Step 5. (*- - Construct an optimal placement - - *)
P <— {x1}; k <— p; j <— n;
while k > 1 do
begin
if F[k, j] = F[k - 1, j - 1] + (k - 1)(p- k + 1)(xj - xj-1),
then (*- - xj to be included in optimal placement - - *)
begin
P <— P U {xj}; k <— k - 1;
end;
j <— j - 1;
end;
Step 6. Output P.
2-D MAFD Heuristic
Uses 1-D MAFD algorithm as the base
Gives a π/2-approximation
How it works
given V = {v1, v2, …, vn}
vi = {xi, yi} (coordinates)
p <= n = |V|
The Algorithm
Step 1. Obtain the projections of the given set V of points
on each of the four axes defined by the equations
y = 0, y = x, x = 0, and y = -x
Step 2. Find optimal solutions to each of the four resulting
instances of 1-D MAFD.
Step 3. Return the placement corresponding to the best of
the four solutions found in Step 2.
Relation to Study Group
Formation & High Variance
Clusters
These create one maximum distance group, not k
max distance groups
If you want k-HVclusters, set p = n/k and run the
algorithm (whichever you choose) k-1 times (last
n/k points are the last cluster
This could guarantee that the first couple of groups
have a high variance, but not the later ones
Study group formation
Most study groups only study one subject
If you wanted to assign students one study group per subject, you could simplify their
attributes to one dimension per subject and solve each subject optimally.
Instead of the exact algorithm described, minimize the distance from the mean, but
stay on the opposite side of the mean as the teacher node
Maybe have positive & negative distances to reflect which side of the mean a point is
on
This would ensure that people who would learn (under mean) would be picked before
people who would not learn
You want multiple study groups and highest amount of learning
Not sure how to do this…
References
S.S. Ravi, D.J. Rosenkrantz, and G.K. Tayi. 1994.
Heuristic and Special Case Algorithms for Dispersion
Problems.
Download