lecture13-classification

advertisement
Classification
ECE 847:
Digital Image Processing
Stan Birchfield
Clemson University
Acknowledgment
Many slides
are courtesy of
Frank Dellaert
and
Jim Rehg
at Georgia Tech
from http://www-static.cc.gatech.edu/classes/AY2007/cs4495_fall/html/materials.html
Classification problems
• Detection – Search set, find all instances
of class
• Recognition – Given instance, label its
identity
• Verification – Given instance and
hypothesized identity, verify whether
correct
• Tracking – Like detection, but local search
and fixed identity
Classification issues
• Feature extraction – needed for practical reasons;
distinction is somewhat arbitrary:
– Perfect feature extraction  classification is trivial
– Perfect classifier  no need for feature extraction
• occlusion (missing features)
• mereology – study of part/whole relationships
POLOPONY, BEATS (not BE EATS)
• segmentation – how can we classify before segmenting?
how can we segment before classifying?
• context
• computational complexity: 20x20 binary input is 10120
patterns!
Mereology example
What does this say?
Decision theory
• Decision theory – goal is to make a decision (i.e.,
set a decision boundary) so as to minimize cost
• Pattern classification is perhaps most important
subfield of decision theory
• Supervised learning: features, data sets,
algorithm
decision
boundary
Overfitting
Could separate perfectly using nearest neighbors
But poor generalization (overfitting) – will not work well on
new data
decision
boundary
Occam’s razor – The simplest explanation is the best
(Philosophical principle based upon the orderliness of the creation)
Bayes decision theory
Problem: Given a feature x,
determine the most likely class: w1 or w2
1
class-conditional pdfs
0
Easy to measure with enough examples
Bayes’ rule
likelihood
(class-conditional pdf)
prior
evidence
(normalization factor)
posterior
1
0
1
0
What is this P(w1|x) ?
• Probability of class 1 given data x
1.0
P(w1|x)
P(w2|x) ?
0.0
P(w1|x)+P(w2|x)=1 !
Note: Area under each curve is not 1
x
Bayes Classifier
• Classifier: Select
• Decision boundaries occur where
1.0
P(1|x)
P(2|x)
0.0
select
w2
select
w1
select
w2
Bayes Risk
The total risk is the expected loss when using the classifier:
where
(We’re assuming loss is constant here)
1.0
P(1|x)
P(2|x)
0.0
The shaded area is called the Bayes risk
Discriminative vs.
Generative
Finding a decision boundary is not the same as
modeling a conditional density.
Note: Bug in Forsyth-Ponce book:
P(1|x)+P(2|x) != 1
Histograms
• One way to compute classconditional pdfs is to collect a bunch
of examples and store a histogram
• Then
normalize
Application: Skin
Histograms
• Skin has a very small range of (intensity
independent) colours, and little texture
– Compute colour measure, check if colour is in this
range, check if there is little texture (median filter)
– See this as a classifier - we can set up the tests by hand,
or learn them.
– get class conditional densities (histograms), priors from
data (counting)
• Classifier is
Finding skin color
3D histogram in RGB space
M. J. Jones and J. M. Rehg, Statistical Color Models with Application to Skin Detection, Int. J. of Computer Vision, 46(1):81-96, Jan 2002.
Histogram
skin
non-skin
Results
Note:
We have assumed
that all pixels are
independent!
Context is ignored
Confusion matrix
true positive
= hit
false negative
= miss
= false dismissal
= Type II error
• sensitivity
= true positive rate
= hit rate
= recall
TPR = TP / (TP+FN)
• false negative rate
FNR = FN / (TP+FN)
TPR + FNR = 1
false positive
= false alarm
= false detection
= Type I error
• false positive rate
= false alarm rate
= fallout
FPR = FP / (FP+TN)
• specificity
SPC = TN / (FP+TN)
FPR + SPC = 1
Receiver operating
characteristic (ROC) curve
TPR
equal error rate
(EER) = 88%
FPR
confusion matrix
for image classifier:
Cross-validation
Naïve Bayes
• Quantize image patches, then compute a
histogram of patch types within a face
• But histograms suffer from the curse of
dimensionality
• Histogram in N dimensions is intractable with
N>5
• To solve this, assume independence among the
pixels
• Features are the patch types
P(image|face) = P(label 1 at (x1,y1)|face)...P(label k
at (xk,yk)|face)
Histograms applied to faces
and cars
H. Schneiderman, T. Kanade. "A Statistical Method for 3D Object Detection Applied to Faces and Cars".
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2000)
Alternative: Kernel density
estimation (Parzen windows)
K/N is fraction of
samples that fall
into volume V
Parzen windows
• Non-parametric technique
• Center kernel at each data point, sum
results (and normalize) to get pdf
Parzen windows
Gaussian Parzen Windows
Parzen Window Density
Estimation
Comparison
Histograms
• non-parametric
• smoothing parameter = #
of bins
• discard data afterwards
• discontinuous
• boundaries arbitrary
• d dimensions  Md bins
(curse of dimensionality)
Parzen windows
• non-parametric
• smoothing parameter =
size of kernel
• need data always
• discontinuous (box) or
continuous (Gaussian)
• boundaries data driven
(box) or no boundaries
(Gaussian)
• dimensionality not as
much of a curse
Another alternative: Locally
Weighted Averaging (LWA)
• Keep instance database
• At each query point, form locally weighted
average
f(i) = 1 for positive examples,
0 for negative examples
• Equivalent to Parzen windows
• memory based, lazy learning, applicable to
any kernel, can be slow
LWA Classifier, Circular Kernel
Data, 2 classes
Kernel Weights
All Data
LWA Posterior
K-Nearest Neighbors
Classification = majority vote of K nearest neighbors
Recognition by finding
patterns
• We have seen very
simple template
matching (under
filters)
• Some objects
behave like quite
simple templates
– Frontal faces
• Strategy:
– Find image
windows
– Correct lighting
– Pass them to a
statistical test (a
classifier) that
accepts faces and
rejects non-faces
Finding faces
• Faces “look like”
templates (at least
when they’re frontal).
• General strategy:
• Issues
– How corrected?
– What features?
– What classifier?
– search image windows
at a range of scales
– Correct for illumination
– Present corrected
window to classifier
test image
training
database
training image
feature
extraction
classifier
decision
learner
Face detection
http://ocw.mit.edu/NR/rdonlyres/Brain-and-Cognitive-Sciences/9-913Fall-2004/B89E6E21-3DDA-4E70-9107-C66F7B8C7DED/0/class1_2_2004.pdf
Face recognition
http://ocw.mit.edu/NR/rdonlyres/Brain-and-Cognitive-Sciences/9-913Fall-2004/B89E6E21-3DDA-4E70-9107-C66F7B8C7DED/0/class1_2_2004.pdf
Linear discriminant functions
•
•
•
•
g(x) = wTx+w0
decision surface is hyperplane
w is perpendicular to hyperplane
neural network: combination of
linear discriminant functions
• sigmoid function is differentiable,
enables backpropagation
Neural networks for
detecting faces
Henry A. Rowley, Shumeet Baluja, and Takeo Kanade, Neural Network-Based Face Detection,
IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 20, number 1, pages 23-38, January 1998.
Neural networks for
detecting faces
positive training images:
scaled, rotated, translated,
and mirrored
negative training images
Neural networks for
detecting faces
Arbitration
Bootstrapping
• Hardest examples to classify are
those near the decision boundary
• These are also the most useful for
training
• Approach: Run detector, find
examples of misclassification, feed
back into training process
Results
Real-time face detection
• Components
– Cascade architecture
– Box sum features (integral image)
H1
H2
Non-face
Hn
Non-face
Face
Viola and Jones, CVPR 2001
Haar-like features
(Integral image makes
computation fast)
More features
Example
•Feature’s value is calculated as the difference between the
sum of the pixels within white and black rectangle regions.
f i  Sum(r i, white )  Sum(r i, black )
 1 if f i  threshold
hi ( x)  
 1 if f i  threshold
Boosting
Adaboost
F  sign ( w1h1  w2 h2  ...  wn hn )
 1 if f i   i
where , hi ( x)  
 1 if f i   i
The more distinctive the feature, the larger the weight.
Training images
Results
Training
Viola-Jones
Direct Feature Selection
(two orders of magnitude faster)
Jianxin Wu, James M. Rehg, Matthew D. Mullin. Learning a Rare Event Detection Cascade
by Direct Feature Selection, NIPS 2003.
Using OpenCV detector
1.
2.
3.
4.
5.
6.
Collect a database of positive samples and a
database of negative samples.
Mark object by objectmarker.exe
Build a vec file out of positive samples using
createsamples.exe
Run haartraining.exe to build the classifier.
Run performance.exe to evaluate the classifier.
Run haarconv.exe to convert classifier to .xml
file
Using OpenCV detector
1.
2.
Mark positive samples: info.txt
Use createsamples,exe to pack the positive samples into
“hw.vec” file.
createsamples –info info.txt –vec hw.vec –w 15 –h 12
3.
(The minimum size of marked object was 15 by 12)
Use haartraining.exe to train the classifier.
haartraining –data hw –vec hw.vec -bg background.txt –
mem 100 –w 15 –h 12 –nstages 18
4.
5.
Convert classifier to xml. Convert hw hw.xml 15 12.
Use performance.exe to check the performance.
performance –dada hw.xml –info.txt –w 15 –h 12 –ni
6.
Use PatternDetector class in Blepo to display the results
m_Detector = new PatternDetector(xml_file_name);
7.
In the results, you will see a object detected twice or
more, with overlap.
from Zhichao Chen
Using OpenCV detector
Result from checking performance:
Here you can see that the classifier detected 469 positive objects and missed 36. The
false positive is bigger(1991), because
•
A positive object might be detected many times and the positions are slightly
different. Some “good” detections are regarded as “false”
•
We only used 18 stages . More stages would reduce the false positives, at the
expense of more training time.
•
No background image was included for training.
Conclusions:
•
Use the proper sample size for training. Basically, the sample size should be
similar to the minimum size of the marked object.
from Zhichao Chen
•
If the FPR is too high, increase the number of stages.
OpenCV detector links
• Original Viola-Jones paper:
http://research.microsoft.com/~viola/Pubs/Detect/violaJones_CV
PR2001.pdf
• OpenCV library:
http://sourceforge.net/projects/opencvlibrary
• How-to build a cascade of boosted classifiers based on Haarlike features:
http://lab.cntl.kyutech.ac.jp/~kobalab/nishida/opencv/OpenCV_O
bjectDetection_HowTo.pdf
• Objectmarker.exe and haarconv.exe, *.dll:
http://www.iem.pw.edu.pl/~domanskj/haarkit.rar
from Zhichao Chen
Fisher linear discriminant
http://ocw.mit.edu/NR/rdonlyres/Brain-and-Cognitive-Sciences/9-913Fall-2004/B89E6E21-3DDA-4E70-9107-C66F7B8C7DED/0/class1_2_2004.pdf
Linear SVMs
http://ocw.mit.edu/NR/rdonlyres/Brain-and-Cognitive-Sciences/9-913Fall-2004/B89E6E21-3DDA-4E70-9107-C66F7B8C7DED/0/class1_2_2004.pdf
Non-linear SVMs
http://ocw.mit.edu/NR/rdonlyres/Brain-and-Cognitive-Sciences/9-913Fall-2004/B89E6E21-3DDA-4E70-9107-C66F7B8C7DED/0/class1_2_2004.pdf
Eigenfaces
Download