Colorado School of Mines Image and Multidimensional Signal Processing Professor William Hoff Dept of Electrical Engineering &Computer Science Colorado School of Mines Image and Multidimensional Signal Processing http://inside.mines.edu/~whoff/ Pattern Recognition Colorado School of Mines Image and Multidimensional Signal Processing 2 Pattern Recognition • The process by which patterns in data are found, recognized, discovered – Usually aims to classify data (patterns) based on either a priori knowledge or on statistical information extracted from the patterns – The patterns to be classified are observations, defining points in a multidimensional space • Classification is usually based on a set of patterns that have already been classified (e.g., by a person) – This set of patterns is termed the training set – The learning strategy is called supervised • Learning can also be unsupervised – In this case there is no training set – Instead it establishes the classes itself based on the statistical regularities of the patterns • Resources – “Statistical Toolbox” in Matlab – Journals include “Pattern Recognition”, “IEEE Trans. Pattern Analysis & Machine Intelligence” – Good book: “Pattern Recognition and Machine Learning” by Bishop Colorado School of Mines Image and Multidimensional Signal Processing 3 Approaches • Statistical Pattern Recognition – We assume that the patterns are generated by a probabilistic system – The data is reduced to vectors of numbers and statistical techniques are used for classification • Structural Pattern Recognition – The process is based on the structural interrelationships of features – The data is converted to a discrete structure (such as a grammar or a graph) and classification techniques such as parsing and graph matching are used • Neural – The model simulates the behavior of biological neural networks Colorado School of Mines Image and Multidimensional Signal Processing 4 Unsupervised Pattern Recognition • The system must learn the classifier from unlabeled data • It’s related to the problem of trying to estimate the underlying probability density function of the data • Approaches to unsupervised learning include – clustering (e.g., k-means, mixture models, hierarchical clustering) – techniques for dimensionality reduction (e.g., principal component analysis, independent component analysis, non-negative matrix factorization, singular value decomposition) Colorado School of Mines Image and Multidimensional Signal Processing 5 k-means Clustering • Given a set of n-dimensional vectors • Also specify k (number of desired clusters) • The algorithm partitions the vectors into clusters, such that it minimizes the sum, over all clusters, of the within-cluster sums of point-to-clustercentroid distances Colorado School of Mines Image and Multidimensional Signal Processing 6 k-means Algorithm 1. 2. 3. 4. 5. Given a set of vectors {xi} Randomly choose a set of k means {mi} as the center of each cluster For each vector xi compute distance to each mi; assign xi to the closest cluster Update the means to get a new set of cluster centers Repeat steps 3 and 4 until there is no more change in cluster centers k means is guaranteed to terminate, but may not find the global optimum in the least squares sense Colorado School of Mines Image and Multidimensional Signal Processing 7 Example: Indexed Storage of Color Images • If an image uses 8-bits for each of R,G,B, there are 2^24 possible colors • Most images don’t use the entire color space of possible values – we can get by with fewer • We’ll use k-means clustering to find the reduced set of colors Image using the full color space Colorado School of Mines Image using only 32 discrete colors Image and Multidimensional Signal Processing 8 Indexed Storage of Color Images • For each image, we find a set of colors that are a good approximation of the entire set of pixels in the image; and put those into a colormap • Then for each pixel, we just store the indices into the colormap Image of indices (0..63) Colorado School of Mines Color image, using 64 colors Image and Multidimensional Signal Processing 9 Indexed storage of Color Images [img,cmap] = imread(‘kids.tif’); imshow(img,cmap); Also see rgb2ind, ind2rgb • Use a colormap >> cmap(1:20,:) – Image f(x,y) stores indices into a lookup table (colormap) – Colormap specifies RGB for each index ans = 0.0980 0.0941 0.1020 0.1451 0.1020 0.1098 0.1608 0.1412 0.1804 0.2196 0.1216 0.1020 0.2431 0.1569 0.1373 0.2196 0.1843 0.2118 0.2471 0.2353 0.2824 0.3137 0.1490 0.1020 0.3294 0.2039 0.1569 0.4118 0.1725 0.0902 0.4235 0.2314 0.1373 0.3176 0.2471 0.2431 0.4039 0.2784 0.2118 0.4078 0.3137 0.2627 0.3255 0.3059 0.3490 0.5176 0.2039 0.0157 Index = 17 0.5059 0.2275 0.0902 0.6039 0.2392 0.0471 Display system will display these values of RGB 0.6392 0.3059 0.0353 0.5098 0.2706 0.1529 : Colorado School of Mines Image and Multidimensional Signal Processing 10 clear all close all % Read image RGB = im2double(imread('peppers.png')); RGB = imresize(RGB, 0.5); %RGB = im2double(imread('pears.png')); RGB = imresize(RGB, 0.5); %RGB = im2double(imread('tissue.png')); RGB = imresize(RGB, 0.5); %RGB = im2double(imread('robot.jpg')); RGB = imresize(RGB, 0.15); figure, imshow(RGB); % Convert 3-dimensional array (M,N,3) array to 2D (MxN,3) X = reshape(RGB, [], 3); k=16; % Number of clusters to find % Call kmeans. It returns: % IDX: for each point in X, which cluster (1..k) it was assigned to % C: the k cluster centers [IDX,C] = kmeans(X,k, ... 'EmptyAction', 'drop'); % if a cluster becomes empty, drop it % Reshape the index array back to a 2-dimensional image I = reshape(IDX, size(RGB,1), size(RGB,2)); % Show the reduced color image figure, imshow(I, C); % Plot pixels in color space figure hold on for i=1:20:size(X,1) plot3(X(i, 1), X(i, 2), X(i, 3), ... '.', 'Color', C(IDX(i),:)); end hold off % Also plot cluster centers hold on for i=1:k plot3(C(i,1), C(i,2), C(i,3), 'ro', 'MarkerFaceColor', 'r'); end Colorado School of Mines Image and Multidimensional Signal Processing hold off 11 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 1.5 1 0.5 0 0 Colorado School of Mines 0.2 0.4 0.6 0.8 1 1.2 1.4 Image and Multidimensional Signal Processing 12 Supervised Statistical Methods • A class is a set of objects having some important properties in common – We might have known description for each – We might have set of samples for each • A feature extractor is a program that inputs the data (image) and extracts features that can be used in classification; these values are put into a feature vector • A classifier is a program that inputs the feature vector and assigns it to one of a set of designated classes or to the “reject” class • We will look at these classifiers: – Decision tree – Nearest class mean • Another powerful classifier is “support vector machines” Colorado School of Mines Image and Multidimensional Signal Processing 13 Feature Vector Representation • A feature vector is a vector x=[x1, x2, … , xn], where each xj is a real number • The elements xj may be object measurements • For example, xj may be a count of object parts or properties • Example: an object region can be represented by [#holes, #strokes, moments, …] Colorado School of Mines Image and Multidimensional Signal Processing from Shapiro & Stockman 14 Possible features for Char Recognition Colorado School of Mines Image and Multidimensional Signal Processing from Shapiro & Stockman 15 Discriminant functions • Functions f(x, K) perform some computation on feature vector x • Knowledge K from training or programming is used • Final stage determines class Colorado School of Mines Image and Multidimensional Signal Processing from Shapiro & Stockman 16 Decision Trees • Strength: Easy to understand • Weakness: Overtraining from Shapiro & Stockman Class #holes #strokes Best axis Mom of inertia ‘A’ 1 3 90 Med ‘B’ 2 1 90 Large ‘8’ 2 0 90 Med ‘0’ 1 0 90 Large ‘1’ 0 1 90 Low ‘W’ 0 4 90 Large ‘X’ 0 2 ? Large ‘*’ 0 0 ? Large ‘-’ 0 1 0 Low ‘/’ 0 1 60 Low Colorado School of Mines Image and Multidimensional Signal Processing 17 Entropy-Based Automatic Decision Tree Construction Training Set S x1=(f11,f12,…f1m) x2=(f21,f22, f2m) . . xn=(fn1,f22, f2m) Node 1 What feature should be used? What values? Choose the feature which results in the most information gain, as measured by the decrease in entropy from Shapiro & Stockman Colorado School of Mines Image and Multidimensional Signal Processing 18 Entropy Given a set of training vectors S, if there are c classes, c Entropy(S) = -pi log (pi) 2 i=1 where pi is the proportion of category i examples in S. If all examples belong to the same category, the entropy is 0. If the examples are equally mixed (1/c examples of each class), the entropy is a maximum at 1.0. e.g. for c=2, -.5 log .5 - .5 log .5 = -.5(-1) -.5(-1) = 1 2 Colorado School of Mines 2 Image and Multidimensional Signal Processing from Shapiro & Stockman 19 Decision-Tree Classifier • Uses subsets of features in sequence • Feature extraction may be interleaved with classification decisions • Can be easy to design and efficient in execution Colorado School of Mines Image and Multidimensional Signal Processing from Shapiro & Stockman 20 Matlab demo • See Help->Demos->Toolboxes->Statistics->Multivariate Analysis – then Classification • Fisher's iris data consists of measurements on the sepal length, sepal width, petal length, and petal width of 150 iris specimens. There are 50 specimens from each of three species. load fisheriris X = meas(:, 1:2); % just use 2 features % Classes y(1:50,1) = 1; % class 'setosa' y(51:100,1) = 2; % class 'versico' y(101:150,1) = 3; % class 'virginica‘ figure hold on plot(X(1:50,1), X(1:50,2), '+r'); plot(X(51:100,1), X(51:100,2), '+g'); plot(X(101:150,1), X(101:150,2), '+b'); xlabel('Sepal length'), ylabel('Sepal width'); hold off Colorado School of Mines Image and Multidimensional Signal Processing 21 4.5 Sepal width 4 3.5 3 2.5 2 Colorado School of Mines 4 4.5 5 5.5 6 6.5 Sepal length 7 Image and Multidimensional Signal Processing 7.5 8 22 • • tree = treefit(X, y); treedisp(tree,'names',{'SL' 'SW'}); Colorado School of Mines Image and Multidimensional Signal Processing 23 % Visualize tree xmin = min(X(:,1)); xmax = max(X(:,1)); ymin = min(X(:,2)); ymax = max(X(:,2)); 4.5 figure, hold on; dx = (xmax-xmin)/40; dy = (ymax-ymin)/40; for x=xmin:dx:xmax for y=ymin:dy:ymax class = treeval(tree, ... [x y]); if class==1 plot(x,y,'+r'); elseif class==2 plot(x,y,'+g'); else plot(x,y,'+b'); end end End hold off; Colorado School of Mines 4 3.5 3 2.5 2 4 4.5 5 Image and Multidimensional Signal Processing 5.5 6 6.5 7 7.5 8 24 The input parameter “splitmin” has default value 10. Setting splitmin=1 will cause the decision tree to split (make a new node) if there are any instances that are still not correctly labeled. “splitmin” = 1 Colorado School of Mines Image and Multidimensional Signal Processing 25 Sepal width “splitmin” = 1 4.5 4.5 4 4 3.5 3.5 3 3 2.5 2.5 2 4 4.5 5 Colorado School of Mines 5.5 6 6.5 Sepal length 7 7.5 8 2 4 4.5 Image and Multidimensional Signal Processing 5 5.5 6 6.5 7 7.5 8 26 “splitmin” = 20 4.5 4 3.5 3 2.5 2 Colorado School of Mines 4 4.5 5 Image and Multidimensional Signal Processing 5.5 6 6.5 7 7.5 8 27 “splitmin” = 40 4.5 4 3.5 3 2.5 2 Colorado School of Mines 4 4.5 5 Image and Multidimensional Signal Processing 5.5 6 6.5 7 7.5 8 28 Classification using nearest class mean • Compute the Euclidean distance between feature vector X and the mean of each class. x1 x 2 2 x [ i ] x [ i ] 1 1 i 1,d • Choose closest class, if close enough (reject otherwise) • Low error rate (intersection) Colorado School of Mines Image and Multidimensional Signal Processing from Shapiro & Stockman 29 Scaling Distance Using Standard Deviations • Scale distance to the mean of class c, according to the measured standard deviation si in each direction i x xc 2 x [ i ] x [ i ] / s c i • Otherwise, a point near the top of class 3 will be closer to the class 2 mean from Shapiro & Stockman Colorado School of Mines Image and Multidimensional Signal Processing 30 If ellipses are not aligned with axes • • Instead of using standard deviation along each separate axis, use the covariance matrix C x x x x x x x x xx x xx o x oo x o o ooo o x oo Variance (of a single variable Dx) is defined as N 1 2 s2 D x i N 1 i 1 • Covariance (of two variables, Dx and Dy) is s xx2 s xy2 C xy 2 2 s yx s yy 1 N 2 s D x i x N 1 i 1 2 xx s 2 yy 1 N Dyi y 2 N 1 i 1 s s 2 xy Colorado School of Mines 2 yx 1 N Dxi x Dyi y N 1 i 1 Image and Multidimensional Signal Processing 31 Examples 3 3 2 2 1 1 0 0 -1 -1 -2 -2 -3 -3 -2 -1 C = 0.0497 0.0123 0 1 2 0.0123 0.8590 3 -3 -3 -2 -1 C = 0.8590 0.8836 0 1 2 3 0.8836 1.1069 • Notes – Off diagonal values are small if variables are independent – Off diagonal values are large if variables are correlated (they vary together) Colorado School of Mines Image and Multidimensional Signal Processing Matlab “cov” function 32 Probability Density • Let’s assume that errors are Gaussian • The probability density for an 2-dimensional error vector Dx is pDx 1 2 Cx 12 exp 12 DxT Cx1Dx 0.4 0.3 0.2 0.1 0 3 2 1 2 0 -1 x2 - axis Colorado School of Mines -1 -2 -3 Image and Multidimensional Signal Processing 0 3 1 -2 x1 - axis 33 Probability Density • Look at where the probability is a constant. This is where the exponent is a constant: DxT Cx1Dx z 2 • This is the equation of an ellipse. • For example, with uncorrelated errors this reduces to Dx 2 Dy 2 2 2 z 2 s xx s yy • Can choose z to get desired probability. For z=3, the cumulative probability is about 97%. Colorado School of Mines Image and Multidimensional Signal Processing 34 Plotting 3 3 2 2 1 1 x2 - axis x2 - axis • Contours of constant probability 0 0 -1 -1 -2 -2 -3 -3 -2 -1 Colorado School of Mines 0 x1 - axis 1 2 3 -3 -3 -2 Image and Multidimensional Signal Processing -1 0 x1 - axis 1 2 3 35 % Show covariance of two variables clear all close all Matlab code randn('state',0); yp = randn(40,1); xp = 0.25 * randn(40,1); % xp = randn(40,1); % yp = xp + 0.5*randn(40,1); plot(xp,yp, '+'), axis equal; axis([-3.0 3.0 -3.0 3.0]); C = cov(xp,yp) Cinv = inv(C); detCsqrt = sqrt(det(C)); % Plot the probability density, % p(x,y) = (1/(2pi det(C)^0.5))exp(-x Cinv x/2) L = 3.0; delta = 0.1; [x1,x2] = meshgrid(-L:delta:L,-L:delta:L); for i=1:size(x1,1) for j=1:size(x1,2) x = [x1(i,j); x2(i,j)]; fX(i,j) = (1/(2*pi*detCsqrt)) * exp( -0.5*x'*Cinv*x ); end end hold on % meshc(x1,x2,fX); contour(x1,x2,fX); xlabel('x1 - axis'); ylabel('x2 - axis'); Colorado School of Mines % this does a surface plot % this does a contour plot Image and Multidimensional Signal Processing 36 Example – flower data from Matlab % This loads in the measurements: % meas(N,4) are the feature values % species{N} are the species names load fisheriris; % There are three classes X1 = meas(strmatch('setosa', species), 3:4); % use features 3,4 X2 = meas(strmatch('virginica', species), 3:4); X3 = meas(strmatch('versicolor', species), 3:4); hold on plot( X1(:,1), X1(:,2), '.r' ); plot( X2(:,1), X2(:,2), '.g' ); plot( X3(:,1), X3(:,2), '.b' ); m1 = sum(X1)/length(X1); m2 = sum(X2)/length(X2); m3 = sum(X3)/length(X3); plot( m1(1), m1(2), '*r' ); plot( m2(1), m2(2), '*g' ); plot( m3(1), m3(2), '*b' ); 2.5 2 1.5 1 0.5 0 Colorado School of Mines 1 2 3 Image and Multidimensional Signal Processing 4 5 6 7 37 Overlaying probability contours % Plot the contours of equal probability [f1,f2] = meshgrid( min(meas(:,3)):0.1:max(meas(:,3)), ... min(meas(:,4)):0.1:max(meas(:,4)) ); C1 = cov(X1); Cinv = inv(C1); detCsqrt = sqrt(det(C1)); for i=1:size(f1,1) for j=1:size(f1,2) x = [f1(i,j) f2(i,j)]; fX(i,j) = (1/(2*pi*detCsqrt)) * exp( -0.5*(x-m1)*Cinv*(x-m1)' ); 2.5 end end contour(f1,f2,fX); 2 (repeat for other two classes) 1.5 1 0.5 0 Colorado School of Mines 1 2 3 Image and Multidimensional Signal Processing 4 5 6 7 38 Mahalanobis distance • Given an unknown feature vector x, which class is it closest to? – Assume you know the class centers (centroids) zi, and their covariances Ci – We find the class that has the smallest distance from its center to the point in feature space • The distance is weighted by the covariance – this is called the “Mahalanobis distance” • For example, the Mahalanobis distance of feature vector x to the ith class is di x zi T Ci1 x zi • where Ci is the covariance matrix of the feature vectors in the ith class Colorado School of Mines Image and Multidimensional Signal Processing 39 Summary / Questions • In pattern recognition, we classify “patterns” (usually in the form of vectors) into “classes”. • Training of the classifier can be supervised (i.e., we have to provide labeled training data) or unsupervised. – k-means clustering is an example of unsupervised learning • Approaches to classification include – Statistical – Structural – Neural • Name some statistical pattern recognition methods. Colorado School of Mines Image and Multidimensional Signal Processing 40