HEURISTIC & SPECIAL CASE ALGORITHMS FOR DISPERSION PROBLEMS RAVI, ROZENKRANTZ, TAYI ROB CHURCHILL (THANKS TO BEHZAD) Problem: given V = {v1, v2, …, vn}, find a subset of p nodes (2 <= p <= n) such that some distance function between nodes is maximized My first reaction: sounds like a Max-k Cover Problem except instead of covering, maximizing distances Max-Min Facility Dispersion (MMFD) Given non-negative, symmetric distance function w(x,y) where x, y ∈ V Find a subset P = {vi1, vi2, …, vip} of V where |P| = p, s.t. f(P) = minx,y ∈ P{w(x, y)} is maximized. Max-Avg Facility Dispersion (MAFD) Given non-negative, symmetric distance function w(x,y) where x, y ∈ V Find a subset P = {vi1, vi2, …, vip} of V where |P| = p, s.t. f(P) = 2/[p(p-1)] * Σx,y ∈ Pw(x, y) is maximized. MMFD & MAFD are NP-Hard Even when distance function is a metric Reduction to the NP-Complete problem CLIQUE. Checks to see if a given graph G = (V, E) contains a clique of size >= J Reduction w(x, y) = 1 if they are connected, 0 otherwise set J = p For MAFD, if Clique(J) = 1, then there exists a clique of size J. If J < 1, then there does not exist a clique of size J For MMFD, Clique(J) = 1 if there exists a clique of size J and 0 if there does not How do we solve these? If we can’t get an optimal solution, we will settle for a good approximation There are no absolute approximation algorithms for MMFD or MAFD unless P = NP We want a relative approximation algorithm Use Greedy Algorithms “Greed is good.” - Gordon Gekko Max-Min Greedy Algorithm Step 1. Let vi and vj be the endpoints of an edge of maximum weight. Step 2. P <— {vi, vj}. Step 3. while ( |P| < p ) do begin a. Find a node v ∈ V \ P such that minv' ∈ P {w(v, v’)} is maximum among the nodes in V \ P. b. P <— P U {v} end Step 4. Output P. Provides a 2-approximation to the optimal value Max-Avg Greedy Algorithm Step 1. Let vi and vj be the endpoints of an edge of maximum weight. Step 2. P <— {vi, vj}. Step 3. while ( |P| < p ) do begin a. Find a node v ∈ V \ P such that Σv’ ∈ P w(v, v') is maximum among the nodes in V \ P. b. P <— P U {v}. end Step 4. Output P. Provides a 4-approximation of the optimal solution Special Cases For one dimensional data points, you can solve MMFD & MAFD optimally in polynomial time For two dimensional data points, you can solve MAFD slightly more accurately than the greedy algorithm in polynomial time 2-D MMFD is NP-Hard, 2-D MAFD is open 1-D MAFD & MMFD Restricting the points to 1-D allows for a dynamic programming optimal solution in polynomial time O(max{n log n, pn}) V = {x1, x2, …, xn} How it works Sort the points in V (n log n time) w(x, y) = distance from x to y OPT(j, k) = the solution value with k points picked from x1, …, xj OPT(n, p) = optimal solution for the whole set Recursive Statement OPT(j, k) = max {OPT(j-1, k), OPT(j-1, k-1) U xj} Runtime MAFD OPT(j-1, k) and OPT(j-1, k-1) are constant time lookups Store the representative of OPT(j-1, k-1) in μ(j-1, k-1) OPT(j-1, k-1) U xj is constant time: w(xj, μ(j-1, k-1)) + OPT(j-1, k-1)*(k-1) / k = average distance Runtime MMFD Store the most recently picked element in the optimal solution in f(j-1, k-1) This gives a constant time computation of OPT(j-1, k-1) U xj: min {OPT(j-1, k-1), w(xj, f(j-1, k-1))} Runtime Both are O (nlogn + pn) since their computation times per iteration are constant if the right information is stored The Dynamic Programming Algorithm (*- - In the following, array F represents the function f in the formulation. - -*) Step 1. Sort the given points, and let {x, x2, …, xn} denote the points in increasing order. Step 2. for j := 1 to n do F [0, j] <— 0; Step 3. F [1,1] <— 0. Step 4. (*- - Compute the value of an optimal placement - - *) for j := 2 to n do for k:= 1 to min (p,j) do begin t1 <— F[k, j - 1] + k(p - k)(xj - xj-1); t2 <— F[k - 1, j - 1] + (k - 1)(p - k + 1)(xj - xj-1); if t1 > t2, then (*- - do not include xj - - *) F[k, j] <— t1; else (*- - Include xj - - *) F[k, j] <— t2; end; —> The Algorithm cont. Step 5. (*- - Construct an optimal placement - - *) P <— {x1}; k <— p; j <— n; while k > 1 do begin if F[k, j] = F[k - 1, j - 1] + (k - 1)(p- k + 1)(xj - xj-1), then (*- - xj to be included in optimal placement - - *) begin P <— P U {xj}; k <— k - 1; end; j <— j - 1; end; Step 6. Output P. 2-D MAFD Heuristic Uses 1-D MAFD algorithm as the base Gives a π/2-approximation How it works given V = {v1, v2, …, vn} vi = {xi, yi} (coordinates) p <= n = |V| The Algorithm Step 1. Obtain the projections of the given set V of points on each of the four axes defined by the equations y = 0, y = x, x = 0, and y = -x Step 2. Find optimal solutions to each of the four resulting instances of 1-D MAFD. Step 3. Return the placement corresponding to the best of the four solutions found in Step 2. Relation to Study Group Formation & High Variance Clusters These create one maximum distance group, not k max distance groups If you want k-HVclusters, set p = n/k and run the algorithm (whichever you choose) k-1 times (last n/k points are the last cluster This could guarantee that the first couple of groups have a high variance, but not the later ones Study group formation Most study groups only study one subject If you wanted to assign students one study group per subject, you could simplify their attributes to one dimension per subject and solve each subject optimally. Instead of the exact algorithm described, minimize the distance from the mean, but stay on the opposite side of the mean as the teacher node Maybe have positive & negative distances to reflect which side of the mean a point is on This would ensure that people who would learn (under mean) would be picked before people who would not learn You want multiple study groups and highest amount of learning Not sure how to do this… References S.S. Ravi, D.J. Rosenkrantz, and G.K. Tayi. 1994. Heuristic and Special Case Algorithms for Dispersion Problems.