Real Time Recognition of Room Numbers Chris Card

advertisement
Real Time Recognition of
Room Numbers
Chris Card
Overview
 The project’s goal is to correctly recognize a specified
room number without falsely detecting other room
numbers
 This project involved two central steps:
I.
Processing the captured frame with a classifier
II. Verifying the classifier’s output with normalized cross
correlation
Classification of Captured Frame
 The classification process has two parts:
 Haar classifier
 OpenCV’s CascadeClassifier class
Haar Classifier
 Primarily used for facial recognition but works just as
well for static objects (i.e. pencils, room numbers, etc.)
 A Haar classifier is built with a set of rectangular
features:
 Two-rectangle features also known as edge features
 Three-rectangle features also known as line features
 Four-rectangle features
 There are more rectangle features but the ones mentioned
are the most common
 The aforementioned rectangular features (or groups of
these features) are then used to form Integral Images
 An Integral Image is an intermediate representation of
the input image
 Ex: To calculate an integral image from a three-rectangle
feature:
 Take the sum (mean or any other) of the two outer
rectangles
 Then find their difference with the sum (mean or any other)
of the center rectangle
 The rectangular feature can be rotated (usually by 45˚)
and then another integral image is calculated
 This process creates a classifier that is not reliant on
exact pixel intensity values but on the difference of
intensity values
How to train the classifier
 For static objects typically only one to four initial
positive samples are needed, but for faces
approximately 5,000 initial positive samples are
needed
 697 negative samples were then collected
 Using opencv_createsamples application, 2000
positive samples were created from the 4 initial
positive samples and 697 negative samples
 This performs one or more transformations on
one of the initial positive samples and then
overlays them onto one or more negative
samples
 After the 2000 positive samples are created they
are passed into opencv_traincascade with the
negative samples
 opencv_traincascade trains a cascade classifier
using the Haar feature set
 It was trained with 13 stages, a min hit rate of
98% and a max false alarm rate of 45%
 One problem I found using this is that the
classifier can be over trained and can no longer
find new features to compute
<!‐‐ stage 12 ‐‐> <_> <maxWeakCount>6</maxWeakCount> <stageThreshold>‐1.8979565799236298e‐001</stageThreshold> <weakClassifiers> <_> <internalNodes> 0 ‐1 39 ‐7.0739118382334709e‐004</internalNodes> <leafValues> ‐8.7951809167861938e‐001 5.3361600637435913e‐001</leafValues></_> <_> <internalNodes> 0 ‐1 39 1.0181085672229528e‐003</internalNodes> <leafValues> 3.3164915442466736e‐001 ‐8.6216866970062256e‐001</leafValues></_> <_> <internalNodes> 0 ‐1 40 ‐8.0450221896171570e‐002</internalNodes> <leafValues> 6.6421937942504883e‐001 ‐4.3154355883598328e‐001</leafValues></_> <_> <internalNodes> 0 ‐1 26 ‐8.2689840346574783e‐003</internalNodes> <leafValues> 3.3077424764633179e‐001 ‐8.1963634490966797e‐001</leafValues></_> <_> <internalNodes> 0 ‐1 9 4.5741321519017220e‐003</internalNodes> <leafValues> ‐8.6653584241867065e‐001 2.4597480893135071e‐001</leafValues></_> <_> <internalNodes> 0 ‐1 5 ‐1.5729553997516632e‐003</internalNodes> <leafValues> ‐9.4442540407180786e‐001 2.1286776661872864e‐
001</leafValues></_></weakClassifiers></_></stages> <features> <_> <rects> <_> 0 0 12 24 ‐1.</_> <_> 4 8 4 8 9.</_></rects> <tilted>0</tilted></_> Using the Classifier
 After creating the classifier it was then loaded into
OpenCV’s CascadeClassifier class
 CascadeClassifier class has a method called
detectMultiScale() which performs the object
detection. It can even detect multiple of the same
object in the frame
room.detectMultiScale(frame_gray, // The input frame converted to grayscale rooms,//vector of rectangles to draw if classification is successfull 1.1, //the scale factor to perform classification by 4, //Minimum number of neighbors to be classified as a hit CV_HAAR_SCALE_IMAGE,//Flag to controll how it searches the input image to perform classification Size(24,24)); //The size of the images that the classifier was trained with //THE SIZE MUST BE THE SAME VALUE OR ELSE IT WILL FAIL Using the Classifier (cont)
 I found a problem with the classifier:
 It will detect another object that has a %
similarity to the desired object that is ≥ the min
hit rate
 In my case there are signs that are 98% similar
to the sign I trained my classifier for
 Therefore, the positive hit has to be verified with
normalized cross correlation
Verifying the Classifiers Detection
 Because the template image must be parallel with the
part of the image that I want to get a hit of > .72, the
input image must be rotated
 Then normalized cross correlation is performed on all the
rotated images using OpenCV’s matchTemplate()
method
 Even though my program has to do all of this to verify
the classifier’s hit, my program still runs in relatively real
time because I use OpenMP to parallelize it into 4
threads
vector<Mat> _rotate_image(Mat src,int lower, int upper)
{ vector<Mat> rotated_img; Mat rotated,rotated2; const int len = max(src.cols,src.rows); const Point2d center(len/2.,len/2.); #pragma omp parallel num_threads(4) { #pragma omp for private(rotated,rotated2) for(int i = lower; i < upper; i++) { if(omp_debug) cout<<"cols: "<<src.cols<<" :: rows: "<<src.rows<<endl; Mat matrix = getRotationMatrix2D(center,i,1.0); Mat matrix2 = getRotationMatrix2D(center,(double)(i+.5),1.0); //rotates the images warpAffine(src, //source image to rotate rotated, //location to store the rotated image matrix, //the matrix representing the rotation Size(len,len)); //size of the output image warpAffine(src,rotated2,matrix2,Size(len,len)); #pragma omp critical { rotated_img.push_back(rotated.clone()); rotated_img.push_back(rotated2.clone()); } } } return rotated_img; } bool _match_template(vector<Mat> rotated_imgs, Mat target, double &maxVal, Point &maxLoc)
{ Mat score; double minVal_local, maxVal_local; Point minLoc_local,maxLoc_local; bool abort = false; #pragma omp parallel num_threads(4) { #pragma omp for private(score,minVal_local, maxVal_local,minLoc_local,maxLoc_local) for(int i = 0; i < rotated_imgs.size(); i++) { #pragma omp flush(abort) if(!abort) { matchTemplate(rotated_imgs[i],target, score,CV_TM_CCOEFF_NORMED); threshold(score,score,.7, 1,THRESH_BINARY); minMaxLoc(score,&minVal_local,&maxVal_local,&minLoc_local,&maxLoc_local); #pragma omp critical { if(1 == maxVal_local) { maxVal = maxVal_local; maxLoc = maxLoc_local; abort = true; #pragma omp flush(abort) } else if(!abort) { maxVal = maxVal_local; maxLoc = maxLoc_local; } } } } } return abort; } Conclusion
 My program doesn’t report any false hits to the user and
only reports that it detected the correct sign 100% of
the time
 However my classifier’s output, without validation, will
classify the object correctly ≈26% of the time and will
get it wrong ≈73% of the time (This is only from one
run of the program)
 I would like to improve the accuracy and reliability of my
classifier by trying different types of classifiers (i.e. HOG
or LDP) or changing how I train the Haar classifier
Output from live
webcam feed
Questions?
Download