Colorado School of Mines Computer Vision Professor William Hoff Dept of Electrical Engineering &Computer Science Colorado School of Mines Computer Vision http://inside.mines.edu/~whoff/ 1 RANSAC Colorado School of Mines Computer Vision 2 Identifying Correct Matches • • • • Let’s say that we want to match points from one image to another We detect feature points in each image (e.g., SIFT points) and then match their descriptors Now we have a set of matching pairs, but some are incorrect How do we identify the correct matches? Graffiti image dataset from http://kahlan.eps.surrey.ac.uk/featurespace/web/data.htm Colorado School of Mines Computer Vision 3 SIFT Points from each image Number of features detected: 1741 Colorado School of Mines Number of features detected: 2052 Computer Vision 4 14 143 104 34255 162 30 103 440 252 164 385 33 91 23 237 240 263 325 Tentative matches (based on feature descriptors): 442 80 2279 68 23075 20 32 79 48 395 275 86 258 51 129 133 290 Tentative matches 278 333120426 41 38 338 128 208 204 202 125 124 201 427 47 216 215 130 132 376 49 53 340 270 269 102 330 396 429 161 401 98 101 182 329 341 257 256 99 415 16 441 85 386 65 289 274 221 131 399 398 418 354 293137 10 247 139 405 365 428 434 358165 223 226 378 225 93 420 419 346 314 260 267 220 362 288 304 262 196 59 138 58 244166 322 179109 261 203 197 430 309 27 432 183 423 259 332 371 54 152 438 437 175 174 277 276 191 339 181 281 41422 413 178 306 173 271 92 214 74 242 241 189114 439347 194 199273 123 105 285 384 424 94 76 228 344 400 146 21 31 108 200 394 145 144 233 151 308 307 282 388 361442 207 335 334 302 153 397 305 392 117 45 213 291 348 360 172 286 157 193 425 352 412 379 268 431 236 368 326 206372 421 287 78 406 217 297 296 407 42 389 60 19 353 17 380 337 110 87 163 176 243 186 369 81 375 410 245167 170323 169 112 370 100 50 188 313 62 35 37119 248 61 115 2 70140 134 390 364 212 435 126 311 18 324 69 265 192 433 111 190 298 180 402 142 417 422 135350 345 436 187 185 355 107 363 391 367 366 411 150 24 373342374283 36 158357 39 264 403 46 136 319 177 318 218 284 377 13 239 356 238 295 349 89312359 83 82 393 336 408 96 95 84 210 343 106 198 272 43 8890 147 383 116 11 280 279 235 156 25 205 211 66 231381 40 160 316 317 184 382 127 294 310 209 26 97 299 118122 351 321 64 229 246 254 1 67 29 224 315 44 149 292 73 77 52 219 387 168 253 320 28 232 303 250 249 222 416 15 57 301 328 154 3 300141 148 121 55 251 171 409 113 159 8 7 155 404 327 634 71 12 234 195 72 266 331 56 Colorado School of Mines 164 34 56 Computer Vision 227 20 80 131 162 9 86 68 230 75 65 16 137 293 399 10354398 226 139 225 378 59 138 58 79 30 237 255 440 103 252 240385 91 33 41 55 57 219 263 275 23 325 129 54 44 48 52 32 222 426278 395 338 161 441 128 208 4188538640198 102 258 101 99 344 257 256 270 269 182329 330 216 215 204 290 420 419 247 134 304 165 358 105 202 125 124 93 314 377 376 405365 74 413 22 430309 53 201 228 414152 274 427 340 166362 27 244 223 306 333 341 322179 260 76 21 384 145 438 437 109261 429 396 146 434 262267 92 144 233 242 241 151 175432 174 423183 196 173 178 181259 428 221 308 400 94 307 197 332 35 51 106 47 302153 191 203 46 114 38 189 289 194199 352 379 305 271 276 442 424 388361 277 371 412 220 346 297 296 108 273 123 339 78 60 157 281 353 19 380 236 214 200 17 431360 62 410 172 392 61 409 104 347 439 81 2 70 285 335394 334 132 140 421 87 117 193 425 326 268 207 282 163 389 243 406 36 435 69 245 313 176 110 368 167 298 100 170 169 135 291 248 323 186 350 142 407 206 372213 188 112 369337 18 311 390 370 286 411 436 364 31 217 136 115 37 417 324 348 150 355 402 287 295 349 180 265 192433 111 422 190 126 212 239356 238 363 107185 391 375 357 187 403 83 82 367 366 319 318 359 177 147 66 31295 84 264 96 231 383 294 39 15625 88 345 382 235 90 229 381 299 373 14130 393 64 351 342 374 283 143 160 336 1 67 26 310 316317 292 218 284 43 198 116 272408 73 149 97 184 210343 77 89 155 158 280 279 127 40 205 246 321 29 232303 301 15 211 3 300 387315 254 118 13 5 50 122 209 8 141 148 154 416 28 249 7 168 253 320 63 250 4 71 12 159 234 72 251 171 6 404 328 113 121 327 195 266 331 133 415 56 42 49 11 24 45 119 120 224 288 397 5 % Demonstrate the matching of a plane, from one image to another. clear variables close all %rng(0); % Reset random number generator Find SIFT points in images (1 of 3) % Read in test images. I1 = imread('img1.png'); I2 = imread('img3.png'); % If color, convert them to grayscale if size(I1,3)>1 I1 = rgb2gray(I1); if size(I2,3)>1 I2 = rgb2gray(I2); end end % Extract SIFT features. % First make sure the vl_sift code is in the path if exist('vl_sift', 'file')==0 run('C:\Users\William\Documents\Research\vlfeat-0.9.20\toolbox\vl_setup'); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % First image I1 = single(I1); % Convert to single precision floating point figure(1), imshow(I1,[]); % These parameters limit the number of features detected peak_thresh = 0; % increase to limit; default is 0 edge_thresh = 10; % decrease to limit; default is 10 [f1,d1] = vl_sift(I1, ... 'PeakThresh', peak_thresh, ... 'edgethresh', edge_thresh ); fprintf('Number of frames (features) detected: %d\n', size(f1,2)); % Show all SIFT features detected h = vl_plotframe(f1) ; set(h,'color','y','linewidth',1.5) ; Colorado School of Mines Computer Vision 6 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Second image I2 = single(I2); figure(2), imshow(I2,[]); [f2,d2] = vl_sift(I2, ... 'PeakThresh', peak_thresh, ... 'edgethresh', edge_thresh ); fprintf('Number of frames (features) detected: %d\n', size(f2,2)); Find SIFT points in images (2 of 3) % Show all SIFT features detected h = vl_plotframe(f2) ; set(h,'color','y','linewidth',1.5) ; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Match features % The index of the original match and the closest descriptor is stored in % each column of matches and the distance between the pair is stored in % scores. % Define threshold for matching. Descriptor D1 is matched to a descriptor % D2 only if the distance d(D1,D2) multiplied by THRESH is not greater than % the distance of D1 to all other descriptors thresh = 1.5; % default = 1.5; increase to limit matches [matches, scores] = vl_ubcmatch(d1, d2, thresh); fprintf('Number of matching frames (features): %d\n', size(matches,2)); indices1 = matches(1,:); f1match = f1(:,indices1); d1match = d1(:,indices1); % Get matching features indices2 = matches(2,:); f2match = f2(:,indices2); d2match = d2(:,indices2); Colorado School of Mines Computer Vision 7 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Show matches figure(3), imshow([I1,I2],[]); o = size(I1,2) ; line([f1match(1,:);f2match(1,:)+o], ... [f1match(2,:);f2match(2,:)]) ; for i=1:size(f1match,2) x = f1match(1,i); y = f1match(2,i); text(x,y,sprintf('%d',i), 'Color', 'r'); end for i=1:size(f2match,2) x = f2match(1,i); y = f2match(2,i); text(x+o,y,sprintf('%d',i), 'Color', 'r'); end Colorado School of Mines Computer Vision Find SIFT points in images (3 of 3) 8 Identifying incorrect matches • We find a transformation that explains the movement of the greatest number of matches – Those points that agree with that transformation are correct (inliers) – All other matches are outliers and are discarded • What transformation to use? – Depends on your application ... for a planar object, you could use a homography (projective transform) – To fit a homography, requires at least 4 corresponding image points How to find this transformation? Colorado School of Mines Computer Vision 9 Algorithm (RANSAC) 1) Randomly select a minimal set of points and solve for the transformation 2) Count the number of points that agree with this transformation 3) If this transformation has the highest number of inliers so far, save it 4) Repeat for N trials and return the best transformation Colorado School of Mines Computer Vision 10 Example: line fitting We will randomly select subsets of 2 points (since 2 is the minimum number of points to fit a line) Colorado School of Mines Computer Vision Example: line fitting n=2 Colorado School of Mines Computer Vision Model fitting Colorado School of Mines Computer Vision Measure distances Colorado School of Mines Computer Vision Count inliers c=3 Colorado School of Mines Computer Vision Another trial c=3 Colorado School of Mines Computer Vision The best model c=15 Colorado School of Mines Computer Vision RANSAC Algorithm • “Random Sample Consensus” – Can be used to find inliers for any type of model fitting • Widely used in computer vision • Requires two parameters: – The number of trials N (how many do we need?) – The agreement threshold (how close does an inlier have to be?) Fischler, M. A., & Bolles, R. C. 1981. Random sampling consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the Associations for Computing Machinery, 24(26), 381–395. Colorado School of Mines Computer Vision 18 Number of samples (trials) required Try example: e = 0.5 p = 0.99 s=2 N = 10 • Assume we know e, the probability that a point is an outlier • The probability that a sample of size s is all inliers is (1- e)s • With N samples (trials), the probability that all samples have at least one outlier is palloutliers = (1-(1- e)s )N • We want at least one sample to have no outliers. The probability is pnotalloutliers = 1 - palloutliers • We’ll pick pnotalloutliers to be some high value, like 0.99. • If we know e, we can solve for N – So (1- e) is the probability that any point is an inlier – So 1-(1- e)s is the probability of getting at least one outlier in that sample log(1 pnotalloutliers ) N log(1 (1 e ) s ) Colorado School of Mines N is the number of trials needed to get at least one sample of all inliers, with probability pnotalloutliers Computer Vision 19 Number of samples required Colorado School of Mines Computer Vision 20 Better estimate for required number of trials • Again, assume we know e, the probability that a point is an outlier – So (1- e) is the probability that any point is an inlier • Assume we have M total points. The estimated number of inliers is 𝑛𝑖𝑛𝑙𝑖𝑒𝑟 = 1 − 𝜀 𝑀 “n choose k” is the number of ways to choose k items from a set of n items n n! k k!(n k )! • The number of ways we can pick a sample of s points out 𝑀 of M points is 𝑚 = 𝑠 • The number of ways we can choose s points that are all 𝑛𝑖𝑛𝑙𝑖𝑒𝑟 inliers is 𝑛 = 𝑠 • The probability that any sample of s points is all inliers is p=n/m N is the number of trials needed to log(1 pnotalloutliers ) N get at least one sample of all inliers, log(1 p ) with probability pnotalloutliers Colorado School of Mines Computer Vision 21 Estimating outlier probability • If you don’t know the outlier probability e in advance, you can estimate it from the data: – Start with a very conservative guess for e (say, 0.9) – Compute the number of trials required ... it will be very large – Each time you get a fit, count the number of inliers – If the number of inliers is the best so far, revise your estimate of e: 𝜀 = (1 − 𝑛𝑖𝑛𝑙𝑖𝑒𝑟 )/𝑀 – Recompute the number of trials required using the new estimate of e and continue Colorado School of Mines Computer Vision 22 Deciding if a point is an inlier • You can just pick a threshold for a point’s residual error, for it to be counted as an inlier (e.g., you could measure this empirically) • Another strategy: – Compute the conditional probability of the residual assuming that it is an inlier, and compare it to the conditional probability of the residual assuming that it is an outlier – Namely, test if 𝑃 𝑟 𝑖𝑛𝑙𝑖𝑒𝑟 > 𝑃(𝑟|𝑜𝑢𝑡𝑙𝑖𝑒𝑟) – Typically, we assume inliers have Gaussian distributed noise (with standard deviation s), and outliers have uniform distributed noise – This approach is called MSAC (“m-estimator sample consensus”) P. H. S. Torr , A. Zisserman, "MLESAC: A new robust estimator with application to estimating image geometry" , Computer Vision and Image Understanding, vol 78 Issue 1, April 2000, pp 138 - 156. http://www.robots.ox.ac.uk/~vgg/publications/2000/Torr00/torr00.pdf Colorado School of Mines Computer Vision 23 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Fit homography to corresponding points. Use the MLESAC (m-estimator % sample consensus) algorithm to find inliers. See the paper: "MLESAC: A % new robust estimator with application to estimating image geometry" by % Torr and Zisserman, at % http://www.robots.ox.ac.uk/~vgg/publications/2000/Torr00/torr00.pdf MLESAC code (1 of 5) N = size(f1match,2); % Number of corresponding point pairs pts1 = f1match(1:2,:)'; % Get x,y coordinates from image 1, size is (N,2) pts2 = f2match(1:2,:)'; % Get x,y coordinates from image 2, size is (N,2) % % % % S Estimate the uncertainty of point locations in the second image. We'll assume Gaussian distributed residual errors, with a sigma of S pixels. Actually, points found in higher scales will have proportionally larger errors. = 1.0; % Sigma of errors of points found in lowest scale % Create a 2xN matrix of sigmas for each point, where each column is the % sigmax, sigmay values for that point. sigmas = [S * f2match(3,:); S * f2match(3,:)]; % Assume same for x,y % We will need the sigmas as a Nx2 matrix. sigs = sigmas'; % Now, if a point is an outlier, its residual error can be anything; i.e., % any value between 0 and the size of the image. Let's % assume a uniform probability distribution. pOutlier = 1/(size(I1,1)*size(I1,2)); % Estimate the fraction of inliers. We'll actually estimate this later % from the data, but start with a worst case scenario assumption. inlierFraction = 0.1; % Let's say that we want to get a good sample with probability Ps. Ps = 0.99; Colorado School of Mines Computer Vision 24 % Determine the required number of iterations nInlier = round(N*inlierFraction); % number of inliers among the N points % Calculate the number of ways to pick a set of 4 points, out of N points. % This is "N-choose-4", which is N*(N-1)*(N-2)*(N-3)/factorial(4). m = nInlier*(nInlier-1)*(nInlier-2)*(nInlier-3)/factorial(4); % #ways to choose 4 inliers n = N*(N-1)*(N-2)*(N-3)/factorial(4); % #ways to choose 4 points p = m/n; % probability that any sample of 4 points is all inliers nIterations = log(1-Ps) / log(1 - p); nIterations = ceil(nIterations); fprintf('Initial estimated number of iterations needed: %d\n', nIterations); MLESAC code (2 of 5) sample_count = 0; % The number of Ransac trials pBest = -Inf; % Best probability found so far (actually, the log) while sample_count < nIterations % Grab 4 matching points at random v = randperm(N); p1 = pts1(v(1:4),:); p2 = pts2(v(1:4),:); % Try fitting a homography, that transforms p1 to p2. Fitting will % fail if points are co-linear, so do a "catch" that so the program % doesn't quit. Also, Matlab will display a warning if the result is % close to singular, so turn warnings temporarily off. warning('off', 'all'); try tform_1_2 = cp2tform(p1,p2,'projective'); catch continue; end warning('on', 'all'); % Ok, we were able to fit a homography to this sample. sample_count = sample_count + 1; % Use that homography to transform all pts1 to pts2 pts2map = tformfwd(pts1, tform_1_2); % Look at the residual errors of all the points. dp = (pts2map - pts2); % Size is Nx2 Colorado School of Mines Computer Vision 25 % Compute the probability that each residual error, dpi, could be an % inlier. This is the equation of a Gaussian; ie. % G(dpi) = exp(-dpi' * Ci^-1 * dpi)/(2*pi*det(Ci)^0.5) % where Ci is the covariance matrix of residual dpi: % Ci = [ sxi^2 0; 0 syi^2 ] % Note that det(Ci)^0.5 = sxi*syi dp = dp ./ sigs; % Scale by the sigmas rsq = dp(:,1).^2 + dp(:,2).^2; % residual squared distance errors numerator = exp(-rsq/2); % Numerator of Gaussians, size is Nx1 denominator = 2*pi*sigs(:,1).*sigs(:,2); % Denominator, size Nx1 MLESAC code (3 of 5) % These are the probabilities of each point, if they are inliers (ie, % this is just the Gaussian probability of the error). pInlier = numerator./denominator; % Let's define inliers to be those points whose inlier probability is % greater than the outlier probability. indicesInliers = (pInlier > pOutlier); nInlier = sum(indicesInliers); % Update the number of iterations required if nInlier/N > inlierFraction inlierFraction = nInlier/N; m = nInlier*(nInlier-1)*(nInlier-2)*(nInlier-3)/factorial(4); n = N*(N-1)*(N-2)*(N-3)/factorial(4); % #ways to choose 4 points p = m/n; % probability that any sample of 4 points is all inliers nIterations = log(1-Ps) / log(1 - p); nIterations = ceil(nIterations); fprintf('(Iteration %d): New estimated number of iterations needed: %d\n', ... sample_count, nIterations); end % Net probability of the data (log) p = sum(log(pInlier(indicesInliers))) + (N-nInlier)*log(pOutlier); Colorado School of Mines Computer Vision 26 % Keep this solution if probability is better than the best so far. if p>pBest pBest = p; nBest = nInlier; indicesBest = indicesInliers; fprintf(' (Iter %d): best so far with %d inliers, p=%f\n', ... sample_count, nInlier, p); %disp(find(indicesBest)'); MLESAC code (4 of 5) % Show inliers figure(4), imshow([I1,I2],[]); o = size(I1,2) ; for i=1:size(f1match,2) x1 = f1match(1,i); y1 = f1match(2,i); x2 = f2match(1,i); y2 = f2match(2,i); if indicesBest(i) text(x1,y1,sprintf('%d',i), 'Color', 'g'); text(x2+o,y2,sprintf('%d',i), 'Color', 'g'); else text(x1,y1,sprintf('%d',i), 'Color', 'r'); text(x2+o,y2,sprintf('%d',i), 'Color', 'r'); end end pause end end fprintf('Final number of iterations used: %d\n', sample_count); fprintf('Estimated inlier fraction: %f\n', inlierFraction); % Show the final inliers f1Inlier = f1match(:,indicesBest); f2Inlier = f2match(:,indicesBest); figure, subplot(1,2,1), imshow(I1,[]); h = vl_plotframe(f1Inlier) ; set(h,'color','y','linewidth',1.5) ; subplot(1,2,2), imshow(I2,[]); h = vl_plotframe(f2Inlier) ; set(h,'color','y','linewidth',1.5) ; Colorado School of Mines Computer Vision 27 % Ok, refit homography using all the inliers. p1 = pts1(indicesBest,:); p2 = pts2(indicesBest,:); tform_1_2 = cp2tform(p1,p2,'projective'); % Use that homography to transform all p1 to p2 p2map = tformfwd(p1, tform_1_2); MLESAC code (5 of 5) % Look at the residual errors of all the points. dp = (p2map - p2); rsq = dp(:,1).^2 + dp(:,2).^2; % residual squared distance errors rmsErr = sqrt( sum(rsq)/nBest ); disp('RMS error: '), disp(rmsErr); % Get the 3x3 homography matrix, such that p2 = H*p1. Hom_1_2 = tform_1_2.tdata.T'; I1warp = imtransform(I1, tform_1_2, ... 'XData',[1,size(I1,2)], ... 'YData',[1,size(I1,1)]); % Overlay the images RGB(:,:,1) = (I1warp+I2)/2; RGB(:,:,2) = (I1warp+I2)/2; RGB(:,:,3) = I2/2; RGB = uint8(RGB); figure, imshow(RGB); % Also show a difference image Idiff = I2 - I1warp; figure, imshow(abs(Idiff),[]); Colorado School of Mines Computer Vision 28 14 143 104 34255 162 30 103 440 252 164 385 33 91 23 237 240 263 325 80 2279 68 23075 372 inliers 20 51 129 278 333120426 41 38 32 79 48 395 275 86 258 338 128 208 204 202 124 125 201 427 47 215 216 130 133 30 237 255 440 103 252 240385 91 290 132 376 49 53 340 269 270 102 329 396 429 161 401 99 101 182 330 341 256 257 98 415 16 441 85 386 65 289 274 221 131 398 399 418 354 137 10 293 247 139 405 365 428 434 358165 223 225 378 226 93 419 420 346 314 260 267 220 362 288 304 261 196 58 138 59 244166 322 179109 262 203 197 430 309 27 432 183 423 259 332 371 54 152 437 438 174 175 276 277 191 339 181 281 41322 414 178 306 173 271 92 214 74 241 242 189114 439347 194 199273 123 105 285 384 424 94 76 228 344 400 146 21 31 108 200 394 144 145 233 151 307 308 282 388 361442 207 153 334 335 302 397 305 392 117 45 213 291 348 360 172 286 157 193 425 352 412 379 268 431 236 368 326 206372 421 287 78 406 217 296 297 407 42 389 60 19 353 17 380 337 110 87 163 176 243 186 369 81 375 410 245167 169323 170 112 370 100 50 188 313 62 35 37119 248 61 115 2 70140 134 390 364 212 435 126 311 18 324 69 265 192 433 111 190 298 180 402 142 417 422 135350 345 436 187 185 355 107 363 391 366 367 411 150 24 373342374283 36 158356 39 264 403 46 136 318 177 319 218 284 377 13 238 357 239 295 349 89312359 82 83 393 336 408 95 96 84 210 343 106 198 272 43 8890 147 383 116 11 279 280 235 156 25 205 211 66 231382 40 160 316 317 184 381 127 294 310 209 26 97 299 118122 351 321 64 229 246 254 1 67 29 224 315 44 149 292 73 77 52 219 387 168 253 320 28 232 303 249 250 222 416 15 57 301 328 3 300141 148 154 121 55 251 171 409 113 159 7 8 155 404 327 634 71 12 234 195 72 266 331 56 difference image Colorado School of Mines 164 34 56 86 227 20 80 131 162 9 68 230 75 65 16 137 293 398 10354399 225 139 226 378 58 138 59 79 219 263 33 41 55 57 275 129 54 44 48 52 32 222 426278 395 338 161 441 128 208 4188538640199 102 258 101 98 344 256 257 269 270 182330 329 215 216 204 290 419 420 247 134 304 165 358 105 202 124 125 93 314 376 377 405365 74 414 22 430309 53 201 228 413152 274 427 340 166362 27 244 223 306 333 341 322179 260 76 21 384 145 437 438 109262 429 396 146 434 261267 92 144 233 241 242 151 174432 175 423259 183 196 173 178 181 428 221 307 400 94 308 197 332 35 51 106 47 302 191 153 203 46 114 38 189 289 194199 352 379 305 271 277 442 424 388361 276 412 220 346 296 297 108 371 273 123 339 78 60 157 281 353 19 380 236 214 200 17 431360 62 410 172 392 61 409 104 347 439 81 2 70 285 334394 335 132 140 421 87 117 193 425 326 268 207 282 163 389 243 406 36 435 69 368 245 313 176 167 298 100 169 170 110 135 291 248 186 323 350 142 407 372213 188 112 369337 206 18 311 390 370 286 411 436 364 31 217 136 115 37 417 324 348 150 355 402 287 295 349 180 265 192433 111 422 190 126 212 238357 239 363 107185 391 375 356 187 403 82 83 366 367 318 319 359 177 147 66 31296 84 264 95 231 383 294 39 15625 88 345 381 235 90 229 382 299 373 14130 393 64 351 342 374 283 143 160 336 1 67 26 310 316317 292 218 284 43 198 116 272408 73 149 97 184 210343 77 89 155 158 279 280 127 40 205 246 321 29 232303 301 15 211 3 300 387315 254 118 13 5 50 122 209 7 141 148 154 416 28 250 8 168 253 320 63 249 4 71 12 159 234 72 251 171 6 404 328 113 121 327 195 266 331 23 325 56 42 49 11 24 45 119 120 224 288 397 fused image Computer Vision 133 415 29