Closest Pair Problem
Given a set p of n
2 points in 2D space. Find the “closest” pair of points (closest usually refers to Euclidean distance p
1
x
1
, y
1
, p
2
x
2
, y
2
, so distance
x
1
x
2 y
1
y
2
2
).
Why would I want to do something like this?
Consider air or sea traffic control. We continually recalculate the speed, direction and position of vehicles in relation to one another to detect potential collisions.
How many pairs of points are there in a set containing n points? n = 3 a b P
P
ab , ac , bc
3 c n = 4 a b P
P
ab , ac , ad , bc , bd , cd
6 c d n = 5 a d c b e
P
P
ab , ac , ad , ae , bc , bd , be , cd , ce , de
10 n
C
2
n
n
1
2 n
2
2
n
A brute force method would check all pairs in an exhaustive search
–
.
Closest Pair Algorithm
Data Structures struct Point
{ int x; int y;
}; struct ClosestPair
{ float dMin;
Point p1;
Point p2;
};
Point pX[], pY[], pXL[], pYL[], pXR[], pYR[], pYC[];
ClosestPair clPL, clPR, clP; int n, middle, x, y;
Important preprocessing step
Before first call to ClosestPair , pX and pY must be sorted. pX is sorted in ascending order by x-coordinate. pX must be copied to pY , and pY is sorted in ascending order by the y-coordinate.
General Idea
1
2 3 5
4
6
In any pair of partitions, the closest pair lies to the left of the border, the right of the border, or has one point on the left of the border and one point on the right of the border (i.e. straddling). both on left straddling both on right
Divide
Start by partitioning the points in “half” using 1 st
partition giving regions
and
.
Further partition the left and right “halves” using 2 nd
and 3 rd
partitions giving regions
,
,
, and
.
Conquer / Combine
In regions
and
calculate the distance of the closest pairs.
Combine regions and .
Distance will be the minimum regions
and
.
Determine whether there are any closer points that straddle the border between regions
and
(one only need check those points less than distance from the border).
Distance will be the minimum from regions
and
and the border.
The closest pair in region
has been found.
Repeat for regions
and
as done for
and
to find the closest pair in region
.
Combine regions
and
..
Distance will be the minimum for the whole set of points.
The closest pair has been found.
Algorithm
ClosestPair (pX, pY, n) if n <= 3 clP = BruteForce(pX, n) else middle = (n – 1) / 2 x = pX[middle].x y = pX[middle].y for i = 0 to middle pXL[i].x = pX[i].x pXL[i].y = pX[i].y j = 0 for i = middle + 1 to n – 1 pXR[j].x = pX[i].x pXR[j].y = pX[i].y j++ j = 0 k = 0 for i = 0 to n – 1 if pY[i].x < x or (pY[i].x == x and
pY[i].y <= y and j <= middle) pYL[j].x = pY[i].x pYL[j].y = pY[i].y j ++ else pYR[k].x = pY[i].x pYR[k].y = pY[i].y k ++ clPL = ClosestPair(pXL, pYL, j) clPR = ClosestPair(pXR, pYR, k) clP = Min(clPL, clPR) j = 0 for i = 0 to n – 1 if pY[i].x >= x – clP.dMin && pY[i].x <= x + clP.dMin
&& Abs(pY[i].x – x) <= clP.dMin pYC[j].x = pY[i].x pYC[j].y = pY[i].y j ++ k = j for i = 0 to k – 2
for j = i + 1 to k – 1
if Abs(pYC[j].y – pYC[i].y > clP.dMin)
break
d = Dist (pYC[i], pYC[j])
if d < clP.dMin
clP.dMin = d
clP.p1.x = pYC[i].x
clP.p1.y = pYC[i].y
clP.p2.x = pYC[j].x
clP.p2.y = pYC[j].y return clP
Complexity
Start with pX and pY . Both arrays need to be sorted in a pre-processing step. This can be done at a cost that is O
n log n
.
For each recursive step, pX and pY need to be split in “half” O
log n
times. Copying values from pY into pYL and pYR is done sequentially, placing each element of pY into pYL or pYR . This can be done at a cost of O
. Therefore, the total cost of each recursive splitting of points is O
n log n
.
For the merge step, where points are checked to determine whether any pairs straddle the border, the cost is actually O
. The idea is that for each point within clP.dMin
of the border, only a few points need to be checked.
The overall complexity of the closest pair algorithm is therefore O
n log n
.
Selection Problem
Given an unsorted collection of n elements, select the K th smallest.
One approach would be to sort the collection using one of the best sorting algorithms (would take O
n log n
time) and then select the element in the th
K position.
Can we achieve O
running time for any value of K ?
An algorithm called Randomized Quickselect runs in O
for the average case.
The algorithm works much like the Quicksort algorithm. The difference comes after the partition phase; we do not need to examine both partitions but rather just one because we know which one contains the th
K element.
Randomized Quickselect Algorithm
RandomizedQuickSelect (a, left, right, K) if left == right return a[left] pivot = RandomizedPartition(a, left, right) i = pivot - left + 1 if K == i return a[pivot] else if K < i return RandomizedQuickSelect(a, left, pivot - 1, K) else return RandomizedQuickSelect(a, pivot + 1, right, K - i)
Additional Function:
RandomizedPartition(a, left, right) r = rand(left, right)
Exchange(a[right], a[r]) pivot = a[right] i = left - 1 j = right while true while a[++i] < pivot
― while pivot < a[--j]
― if i < j
Swap(a[i], a [j]); else break;
Exchange(a[i], a[right]) return i
The worst case running time is O
and occurs when we use partitions around the largest or smallest elements.
Example
13
0
Randomized Quickselect the 5 th element:
13
81
92
43 31
65
Select pivot p = 65
57
26
75
0
13
57 43
26
Select pivot p = 26
0
31
65
92
75
81
26
43 31
57
Select pivot p = 57
31
43
31
Select pivot p = 43
43
57
13 0 26 31 43 57 65 92 75 81