Key Points Often we are unable to detect the complete contour of an object or characterize the details of its shape. In such cases, we can still represent shape information in an implicit way. The idea is to find key points in the image of the object that can be used to identify the objects and that do not change dramatically when the orientation or lighting of the object change. A good choice for this purpose are corners in the image. November 18, 2014 Computer Vision Lecture 18: Object Recognition II 1 Corner Detection with FAST A very efficient algorithm for corner detection is called FAST (Features from Accelerated Segment Test). For any given point c in the image, we can test whether it is a corner by: • Considering the 16 pixels at a radius of 3 pixels around c and • finding the longest run (i.e., uninterrupted sequence) of pixels whose intensity is either • greater than that of c plus a threshold or • less that that of c minus the same threshold. • If the run is at least 12 pixels long, then c is a corner. November 18, 2014 Computer Vision Lecture 18: Object Recognition II 2 Corner Detection with FAST November 18, 2014 Computer Vision Lecture 18: Object Recognition II 3 Corner Detection with FAST The FAST algorithm can be made even faster by first checking only pixels 1, 5, 9, and 13. If not at least three of them fulfill the intensity condition, we can immediately rule out that the given point is a corner. In order to avoid detecting multiple corners near the same pixel, we can require a minimum distance between corners. If two corners are too close, we only keep the one with the higher corner score. Such a score can be computed as the sum of intensity differences between c and the pixels in the run. November 18, 2014 Computer Vision Lecture 18: Object Recognition II 4 Key Point Description with BRIEF Now that we have identified interesting points in the image, how can we describe them so that we can detect them in other images? A very efficient method is to use BRIEF (Binary Robust Independent Elementary Features) descriptors. They can be described with minimal memory requirement (32 bytes per point). Their comparison only requires 256 binary operations. November 18, 2014 Computer Vision Lecture 18: Object Recognition II 5 Key Point Description with BRIEF First, smooth the input image with a 9×9 Gaussian filter. Then choose 256 pairs of points within a 35×35 pixel area, following a Gaussian distribution with = 7 pixels. Center the resulting mask on a corner c. For every pair of points, if intensity at the first point is greater than at the second one, add a 0 to the bitstring, otherwise add a 1. The resulting bit string of length 256 is the descriptor for point c. November 18, 2014 Computer Vision Lecture 18: Object Recognition II 6 Key Point Description with BRIEF November 18, 2014 Computer Vision Lecture 18: Object Recognition II 7 Key Point Matching with BRIEF In order to compute the matching distance between the descriptors of two different points, we can simply count the number of mismatching bits in their description (Hamming distance). For example, the bit strings 100110 and 110100 have a Hamming distance of 2, because they differ in their second and fifth bits. In order to find the match for point c in another image, we can find the pixel in that image whose descriptor has the smallest Hamming distance to the one for c. November 18, 2014 Computer Vision Lecture 18: Object Recognition II 8 Key Point Matching with BRIEF November 18, 2014 Computer Vision Lecture 18: Object Recognition II 9 How do Neural Networks (NNs) work? • NNs are able to learn by adapting their connectivity patterns so that the organism improves its behavior in terms of reaching certain (evolutionary) goals. • The strength of a connection, or whether it is excitatory or inhibitory, depends on the state of a receiving neuron’s synapses. • The NN achieves learning by appropriately adapting the states of its synapses. November 18, 2014 Computer Vision Lecture 18: Object Recognition II 10 Supervised Function Approximation In supervised learning, we train an artificial NN (ANN) with a set of vector pairs, so-called exemplars. Each pair (x, y) consists of an input vector x and a corresponding output vector y. Whenever the network receives input x, we would like it to provide output y. The exemplars thus describe the function that we want to “teach” our network. Besides learning the exemplars, we would like our network to generalize, that is, give plausible output for inputs that the network had not been trained with. November 18, 2014 Computer Vision Lecture 18: Object Recognition II 11 An Artificial Neuron synapses x1 neuron i x2 Wi,1 Wi,2 … … xi Wi,n xn n net input signal net i (t ) wi , j (t ) x j (t ) j 1 output November 18, 2014 x i (t ) f i (neti (t )) Computer Vision Lecture 18: Object Recognition II 12 Linear Separability n w x f ( x1 , x2 ,..., xn ) 1, if i i i 1 0, otherwise Input space in the two-dimensional case (n = 2): x2 -3 -2 -1 0 3 2 1 1 1 -1 -2 -3 2 1 3 x1 w1 = 1, w2 = 2, =2 November 18, 2014 x2 -3 -2 -1 3 2 1 1 1 -1 -2 -3 2 0 3 x1 w1 = -2, w2 = 1, =2 Computer Vision Lecture 18: Object Recognition II x2 -3 -2 -1 3 2 1 1 -1 -2 -3 2 0 3 x1 w1 = -2, w2 = 1, =1 13