Image and Multidimensional Signal Processing Colorado School of Mines

advertisement
Colorado School of Mines
Image and Multidimensional Signal
Processing
Professor William Hoff
Dept of Electrical Engineering &Computer Science
Colorado School of Mines
Image and Multidimensional Signal Processing
http://inside.mines.edu/~whoff/
Pattern Recognition
Colorado School of Mines
Image and Multidimensional Signal Processing
2
Pattern Recognition
• The process by which patterns in data are found, recognized, discovered
– Usually aims to classify data (patterns) based on either a priori knowledge or on
statistical information extracted from the patterns
– The patterns to be classified are observations, defining points in a multidimensional
space
• Classification is usually based on a set of patterns that have already been
classified (e.g., by a person)
– This set of patterns is termed the training set
– The learning strategy is called supervised
• Learning can also be unsupervised
– In this case there is no training set
– Instead it establishes the classes itself based on the statistical regularities of the patterns
• Resources
– “Statistical Toolbox” in Matlab
– Journals include “Pattern Recognition”, “IEEE Trans. Pattern Analysis & Machine
Intelligence”
– Good book: “Pattern Recognition and Machine Learning” by Bishop
Colorado School of Mines
Image and Multidimensional Signal Processing
3
Approaches
• Statistical Pattern Recognition
– We assume that the patterns are generated by a probabilistic system
– The data is reduced to vectors of numbers and statistical techniques
are used for classification
• Structural Pattern Recognition
– The process is based on the structural interrelationships of features
– The data is converted to a discrete structure (such as a grammar or a
graph) and classification techniques such as parsing and graph
matching are used
• Neural
– The model simulates the behavior of biological neural networks
Colorado School of Mines
Image and Multidimensional Signal Processing
4
Unsupervised Pattern Recognition
• The system must learn the classifier from unlabeled data
• It’s related to the problem of trying to estimate the underlying
probability density function of the data
• Approaches to unsupervised learning include
– clustering (e.g., k-means, mixture models, hierarchical clustering)
– techniques for dimensionality reduction (e.g., principal component
analysis, independent component analysis, non-negative matrix
factorization, singular value decomposition)
Colorado School of Mines
Image and Multidimensional Signal Processing
5
k-means Clustering
• Given a set of n-dimensional vectors
• Also specify k (number of desired clusters)
• The algorithm partitions the vectors into clusters, such that it minimizes
the sum, over all clusters, of the within-cluster sums of point-to-clustercentroid distances
Colorado School of Mines
Image and Multidimensional Signal Processing
6
k-means Algorithm
1.
2.
3.
4.
5.
Given a set of vectors {xi}
Randomly choose a set of k means {mi} as the center of
each cluster
For each vector xi compute distance to each mi; assign xi to
the closest cluster
Update the means to get a new set of cluster centers
Repeat steps 3 and 4 until there is no more change in
cluster centers
k means is guaranteed to terminate, but may not find the global optimum in
the least squares sense
Colorado School of Mines
Image and Multidimensional Signal Processing
7
Example: Indexed Storage of Color Images
• If an image uses 8-bits for each of R,G,B, there are 2^24 possible colors
• Most images don’t use the entire color space of possible values – we
can get by with fewer
• We’ll use k-means clustering to find the reduced set of colors
Image using the full color space
Colorado School of Mines
Image using only 32 discrete colors
Image and Multidimensional Signal Processing
8
Indexed Storage of Color Images
• For each image, we find a set of colors that are a good approximation of
the entire set of pixels in the image; and put those into a colormap
• Then for each pixel, we just store the indices into the colormap
Image of indices (0..63)
Colorado School of Mines
Color image, using 64 colors
Image and Multidimensional Signal Processing
9
Indexed storage of Color Images
[img,cmap] = imread(‘kids.tif’);
imshow(img,cmap);
Also see rgb2ind, ind2rgb
• Use a colormap
>> cmap(1:20,:)
– Image f(x,y) stores indices into a lookup table
(colormap)
– Colormap specifies RGB for each index
ans =
0.0980 0.0941 0.1020
0.1451 0.1020 0.1098
0.1608 0.1412 0.1804
0.2196 0.1216 0.1020
0.2431 0.1569 0.1373
0.2196 0.1843 0.2118
0.2471 0.2353 0.2824
0.3137 0.1490 0.1020
0.3294 0.2039 0.1569
0.4118 0.1725 0.0902
0.4235 0.2314 0.1373
0.3176 0.2471 0.2431
0.4039 0.2784 0.2118
0.4078 0.3137 0.2627
0.3255 0.3059 0.3490
0.5176 0.2039 0.0157
Index = 17
0.5059 0.2275 0.0902
0.6039 0.2392 0.0471
Display system will
display these values of
RGB
0.6392 0.3059 0.0353
0.5098 0.2706 0.1529
:
Colorado School of Mines
Image and Multidimensional Signal Processing
10
clear all
close all
% Read image
RGB = im2double(imread('peppers.png')); RGB = imresize(RGB, 0.5);
%RGB = im2double(imread('pears.png')); RGB = imresize(RGB, 0.5);
%RGB = im2double(imread('tissue.png')); RGB = imresize(RGB, 0.5);
%RGB = im2double(imread('robot.jpg')); RGB = imresize(RGB, 0.15);
figure, imshow(RGB);
% Convert 3-dimensional array (M,N,3) array to 2D (MxN,3)
X = reshape(RGB, [], 3);
k=16;
% Number of clusters to find
% Call kmeans. It returns:
% IDX: for each point in X, which cluster (1..k) it was assigned to
% C:
the k cluster centers
[IDX,C] = kmeans(X,k, ...
'EmptyAction', 'drop');
% if a cluster becomes empty, drop it
% Reshape the index array back to a 2-dimensional image
I = reshape(IDX, size(RGB,1), size(RGB,2));
% Show the reduced color image
figure, imshow(I, C);
% Plot pixels in color space
figure
hold on
for i=1:20:size(X,1)
plot3(X(i, 1), X(i, 2), X(i, 3), ...
'.', 'Color', C(IDX(i),:));
end
hold off
% Also plot cluster centers
hold on
for i=1:k
plot3(C(i,1), C(i,2), C(i,3), 'ro', 'MarkerFaceColor', 'r');
end
Colorado School of Mines
Image and Multidimensional Signal Processing
hold off
11
1.2
1
0.8
0.6
0.4
0.2
0
-0.2
1.5
1
0.5
0
0
Colorado School of Mines
0.2
0.4
0.6
0.8
1
1.2
1.4
Image and Multidimensional Signal Processing
12
Supervised Statistical Methods
• A class is a set of objects having some important properties in common
– We might have known description for each
– We might have set of samples for each
• A feature extractor is a program that inputs the data (image) and extracts
features that can be used in classification; these values are put into a
feature vector
• A classifier is a program that inputs the feature vector and assigns it to one
of a set of designated classes or to the “reject” class
• We will look at these classifiers:
– Decision tree
– Nearest class mean
• Another powerful classifier is “support vector machines”
Colorado School of Mines
Image and Multidimensional Signal Processing
13
Feature Vector Representation
• A feature vector is a vector
x=[x1, x2, … , xn], where each xj
is a real number
• The elements xj may be object
measurements
• For example, xj may be a count
of object parts or properties
• Example: an object region can
be represented by [#holes,
#strokes, moments, …]
Colorado School of Mines
Image and Multidimensional Signal Processing
from Shapiro & Stockman
14
Possible features for Char Recognition
Colorado School of Mines
Image and Multidimensional Signal Processing
from Shapiro & Stockman
15
Discriminant functions
• Functions f(x, K)
perform some
computation on
feature vector x
• Knowledge K from
training or
programming is used
• Final stage
determines class
Colorado School of Mines
Image and Multidimensional Signal Processing
from Shapiro & Stockman
16
Decision Trees
• Strength: Easy to understand
• Weakness: Overtraining
from Shapiro &
Stockman
Class
#holes
#strokes
Best axis
Mom of
inertia
‘A’
1
3
90
Med
‘B’
2
1
90
Large
‘8’
2
0
90
Med
‘0’
1
0
90
Large
‘1’
0
1
90
Low
‘W’
0
4
90
Large
‘X’
0
2
?
Large
‘*’
0
0
?
Large
‘-’
0
1
0
Low
‘/’
0
1
60
Low
Colorado School of Mines
Image and Multidimensional Signal Processing
17
Entropy-Based Automatic Decision Tree
Construction
Training Set S
x1=(f11,f12,…f1m)
x2=(f21,f22, f2m)
.
.
xn=(fn1,f22, f2m)
Node 1
What feature
should be used?
What values?
Choose the feature which results in the most information
gain, as measured by the decrease in entropy
from Shapiro & Stockman
Colorado School of Mines
Image and Multidimensional Signal Processing
18
Entropy
Given a set of training vectors S, if there are c classes,
c
Entropy(S) =  -pi log (pi)
2
i=1
where pi is the proportion of category i examples in S.
If all examples belong to the same category, the entropy is 0.
If the examples are equally mixed (1/c examples of each class),
the entropy is a maximum at 1.0.
e.g. for c=2, -.5 log .5 - .5 log .5 = -.5(-1) -.5(-1) = 1
2
Colorado School of Mines
2
Image and Multidimensional Signal Processing
from Shapiro & Stockman
19
Decision-Tree Classifier
• Uses subsets of features
in sequence
• Feature extraction may
be interleaved with
classification decisions
• Can be easy to design
and efficient in execution
Colorado School of Mines
Image and Multidimensional Signal Processing
from Shapiro & Stockman
20
Matlab demo
•
See Help->Demos->Toolboxes->Statistics->Multivariate Analysis
– then Classification
•
Fisher's iris data consists of measurements on the sepal length, sepal width, petal
length, and petal width of 150 iris specimens. There are 50 specimens from each
of three species.
load fisheriris
X = meas(:, 1:2);
% just use 2 features
% Classes
y(1:50,1) = 1;
% class 'setosa'
y(51:100,1) = 2;
% class 'versico'
y(101:150,1) = 3;
% class 'virginica‘
figure
hold on
plot(X(1:50,1), X(1:50,2), '+r');
plot(X(51:100,1), X(51:100,2), '+g');
plot(X(101:150,1), X(101:150,2), '+b');
xlabel('Sepal length'), ylabel('Sepal width');
hold off
Colorado School of Mines
Image and Multidimensional Signal Processing
21
4.5
Sepal width
4
3.5
3
2.5
2
Colorado School of Mines
4
4.5
5
5.5
6
6.5
Sepal length
7
Image and Multidimensional Signal Processing
7.5
8
22
•
•
tree = treefit(X, y);
treedisp(tree,'names',{'SL' 'SW'});
Colorado School of Mines
Image and Multidimensional Signal Processing
23
% Visualize tree
xmin = min(X(:,1));
xmax = max(X(:,1));
ymin = min(X(:,2));
ymax = max(X(:,2));
4.5
figure, hold on;
dx = (xmax-xmin)/40;
dy = (ymax-ymin)/40;
for x=xmin:dx:xmax
for y=ymin:dy:ymax
class = treeval(tree, ...
[x y]);
if class==1
plot(x,y,'+r');
elseif class==2
plot(x,y,'+g');
else
plot(x,y,'+b');
end
end
End
hold off;
Colorado School of Mines
4
3.5
3
2.5
2
4
4.5
5
Image and Multidimensional Signal Processing
5.5
6
6.5
7
7.5
8
24
The input parameter “splitmin” has default value 10. Setting splitmin=1 will
cause the decision tree to split (make a new node) if there are any instances
that are still not correctly labeled.
“splitmin” = 1
Colorado School of Mines
Image and Multidimensional Signal Processing
25
Sepal width
“splitmin” = 1
4.5
4.5
4
4
3.5
3.5
3
3
2.5
2.5
2
4
4.5
5
Colorado School of Mines
5.5
6
6.5
Sepal length
7
7.5
8
2
4
4.5
Image and Multidimensional Signal Processing
5
5.5
6
6.5
7
7.5
8
26
“splitmin” = 20
4.5
4
3.5
3
2.5
2
Colorado School of Mines
4
4.5
5
Image and Multidimensional Signal Processing
5.5
6
6.5
7
7.5
8
27
“splitmin” = 40
4.5
4
3.5
3
2.5
2
Colorado School of Mines
4
4.5
5
Image and Multidimensional Signal Processing
5.5
6
6.5
7
7.5
8
28
Classification using nearest class mean
• Compute the Euclidean distance
between feature vector X and the
mean of each class.
x1  x 2 
2


x
[
i
]

x
[
i
]
 1 1
i 1,d
• Choose closest class, if close
enough (reject otherwise)
• Low error rate (intersection)
Colorado School of Mines
Image and Multidimensional Signal Processing
from Shapiro & Stockman 29
Scaling Distance Using Standard Deviations
• Scale distance to the
mean of class c,
according to the
measured standard
deviation si in each
direction i
x  xc 
2




x
[
i
]

x
[
i
]
/
s

c
i
• Otherwise, a point near
the top of class 3 will be
closer to the class 2
mean
from Shapiro & Stockman
Colorado School of Mines
Image and Multidimensional Signal Processing
30
If ellipses are not aligned with axes
•
•
Instead of using standard deviation along
each separate axis, use the covariance
matrix C
x
x
x x
x x x
x xx x
xx
o
x
oo
x
o
o ooo
o
x
oo
Variance (of a single variable Dx) is
defined as
N
1
2


s2 
D
x



i
N  1 i 1
•
Covariance (of two variables, Dx and Dy)
is
 s xx2 s xy2 

C xy   2
2 
 s yx s yy 
1 N
2


s 
D
x


 i x
N  1 i 1
2
xx
s
2
yy
1 N
Dyi   y 2


N  1 i 1
s s
2
xy
Colorado School of Mines
2
yx
1 N
Dxi   x Dyi   y 


N  1 i 1
Image and Multidimensional Signal Processing
31
Examples
3
3
2
2
1
1
0
0
-1
-1
-2
-2
-3
-3
-2
-1
C =
0.0497
0.0123
0
1
2
0.0123
0.8590
3
-3
-3
-2
-1
C =
0.8590
0.8836
0
1
2
3
0.8836
1.1069
• Notes
– Off diagonal values are small if variables are independent
– Off diagonal values are large if variables are correlated (they vary together)
Colorado School of Mines
Image and Multidimensional Signal Processing
Matlab “cov” function
32
Probability Density
• Let’s assume that errors
are Gaussian
• The probability density
for an 2-dimensional
error vector Dx is
pDx  
1
2 Cx
12
exp  12 DxT Cx1Dx 
0.4
0.3
0.2
0.1
0
3
2
1
2
0
-1
x2 - axis
Colorado School of Mines
-1
-2
-3
Image and Multidimensional Signal Processing
0
3
1
-2
x1 - axis
33
Probability Density
•
Look at where the probability is a constant. This is where the
exponent is a constant:
DxT Cx1Dx  z 2
•
This is the equation of an ellipse.
•
For example, with uncorrelated errors this reduces to
Dx 2 Dy 2 2
 2 z
2
s xx s yy
•
Can choose z to get desired probability. For z=3, the cumulative
probability is about 97%.
Colorado School of Mines
Image and Multidimensional Signal Processing
34
Plotting
3
3
2
2
1
1
x2 - axis
x2 - axis
• Contours of constant probability
0
0
-1
-1
-2
-2
-3
-3
-2
-1
Colorado School of Mines
0
x1 - axis
1
2
3
-3
-3
-2
Image and Multidimensional Signal Processing
-1
0
x1 - axis
1
2
3
35
% Show covariance of two variables
clear all
close all
Matlab code
randn('state',0);
yp = randn(40,1);
xp = 0.25 * randn(40,1);
% xp = randn(40,1);
% yp = xp + 0.5*randn(40,1);
plot(xp,yp, '+'), axis equal;
axis([-3.0 3.0 -3.0 3.0]);
C = cov(xp,yp)
Cinv = inv(C);
detCsqrt = sqrt(det(C));
% Plot the probability density,
% p(x,y) = (1/(2pi det(C)^0.5))exp(-x Cinv x/2)
L = 3.0;
delta = 0.1;
[x1,x2] = meshgrid(-L:delta:L,-L:delta:L);
for i=1:size(x1,1)
for j=1:size(x1,2)
x = [x1(i,j); x2(i,j)];
fX(i,j) = (1/(2*pi*detCsqrt)) * exp( -0.5*x'*Cinv*x );
end
end
hold on
% meshc(x1,x2,fX);
contour(x1,x2,fX);
xlabel('x1 - axis');
ylabel('x2 - axis');
Colorado School of Mines
% this does a surface plot
% this does a contour plot
Image and Multidimensional Signal Processing
36
Example – flower data from Matlab
% This loads in the measurements:
%
meas(N,4) are the feature values
%
species{N} are the species names
load fisheriris;
% There are three classes
X1 = meas(strmatch('setosa', species), 3:4);
% use features 3,4
X2 = meas(strmatch('virginica', species), 3:4);
X3 = meas(strmatch('versicolor', species), 3:4);
hold on
plot( X1(:,1), X1(:,2), '.r' );
plot( X2(:,1), X2(:,2), '.g' );
plot( X3(:,1), X3(:,2), '.b' );
m1 = sum(X1)/length(X1);
m2 = sum(X2)/length(X2);
m3 = sum(X3)/length(X3);
plot( m1(1), m1(2), '*r' );
plot( m2(1), m2(2), '*g' );
plot( m3(1), m3(2), '*b' );
2.5
2
1.5
1
0.5
0
Colorado School of Mines
1
2
3
Image and Multidimensional Signal Processing
4
5
6
7
37
Overlaying probability contours
% Plot the contours of equal probability
[f1,f2] = meshgrid( min(meas(:,3)):0.1:max(meas(:,3)), ...
min(meas(:,4)):0.1:max(meas(:,4)) );
C1 = cov(X1);
Cinv = inv(C1);
detCsqrt = sqrt(det(C1));
for i=1:size(f1,1)
for j=1:size(f1,2)
x = [f1(i,j) f2(i,j)];
fX(i,j) = (1/(2*pi*detCsqrt)) * exp( -0.5*(x-m1)*Cinv*(x-m1)' );
2.5
end
end
contour(f1,f2,fX);
2
(repeat for other two classes)
1.5
1
0.5
0
Colorado School of Mines
1
2
3
Image and Multidimensional Signal Processing
4
5
6
7
38
Mahalanobis distance
• Given an unknown feature vector x, which class is it closest to?
– Assume you know the class centers (centroids) zi, and their covariances Ci
– We find the class that has the smallest distance from its center to the point in
feature space
• The distance is weighted by the covariance – this is called the
“Mahalanobis distance”
• For example, the Mahalanobis distance of feature vector x to the ith
class is
di 
x  zi T Ci1 x  zi 
• where Ci is the covariance matrix of the feature vectors in the ith
class
Colorado School of Mines
Image and Multidimensional Signal Processing
39
Summary / Questions
• In pattern recognition, we classify “patterns” (usually in
the form of vectors) into “classes”.
• Training of the classifier can be supervised (i.e., we
have to provide labeled training data) or unsupervised.
– k-means clustering is an example of unsupervised learning
• Approaches to classification include
– Statistical
– Structural
– Neural
• Name some statistical pattern recognition methods.
Colorado School of Mines
Image and Multidimensional Signal Processing
40
Download