Real Time Recognition of Room Numbers Chris Card Overview The project’s goal is to correctly recognize a specified room number without falsely detecting other room numbers This project involved two central steps: I. Processing the captured frame with a classifier II. Verifying the classifier’s output with normalized cross correlation Classification of Captured Frame The classification process has two parts: Haar classifier OpenCV’s CascadeClassifier class Haar Classifier Primarily used for facial recognition but works just as well for static objects (i.e. pencils, room numbers, etc.) A Haar classifier is built with a set of rectangular features: Two-rectangle features also known as edge features Three-rectangle features also known as line features Four-rectangle features There are more rectangle features but the ones mentioned are the most common The aforementioned rectangular features (or groups of these features) are then used to form Integral Images An Integral Image is an intermediate representation of the input image Ex: To calculate an integral image from a three-rectangle feature: Take the sum (mean or any other) of the two outer rectangles Then find their difference with the sum (mean or any other) of the center rectangle The rectangular feature can be rotated (usually by 45˚) and then another integral image is calculated This process creates a classifier that is not reliant on exact pixel intensity values but on the difference of intensity values How to train the classifier For static objects typically only one to four initial positive samples are needed, but for faces approximately 5,000 initial positive samples are needed 697 negative samples were then collected Using opencv_createsamples application, 2000 positive samples were created from the 4 initial positive samples and 697 negative samples This performs one or more transformations on one of the initial positive samples and then overlays them onto one or more negative samples After the 2000 positive samples are created they are passed into opencv_traincascade with the negative samples opencv_traincascade trains a cascade classifier using the Haar feature set It was trained with 13 stages, a min hit rate of 98% and a max false alarm rate of 45% One problem I found using this is that the classifier can be over trained and can no longer find new features to compute <!‐‐ stage 12 ‐‐> <_> <maxWeakCount>6</maxWeakCount> <stageThreshold>‐1.8979565799236298e‐001</stageThreshold> <weakClassifiers> <_> <internalNodes> 0 ‐1 39 ‐7.0739118382334709e‐004</internalNodes> <leafValues> ‐8.7951809167861938e‐001 5.3361600637435913e‐001</leafValues></_> <_> <internalNodes> 0 ‐1 39 1.0181085672229528e‐003</internalNodes> <leafValues> 3.3164915442466736e‐001 ‐8.6216866970062256e‐001</leafValues></_> <_> <internalNodes> 0 ‐1 40 ‐8.0450221896171570e‐002</internalNodes> <leafValues> 6.6421937942504883e‐001 ‐4.3154355883598328e‐001</leafValues></_> <_> <internalNodes> 0 ‐1 26 ‐8.2689840346574783e‐003</internalNodes> <leafValues> 3.3077424764633179e‐001 ‐8.1963634490966797e‐001</leafValues></_> <_> <internalNodes> 0 ‐1 9 4.5741321519017220e‐003</internalNodes> <leafValues> ‐8.6653584241867065e‐001 2.4597480893135071e‐001</leafValues></_> <_> <internalNodes> 0 ‐1 5 ‐1.5729553997516632e‐003</internalNodes> <leafValues> ‐9.4442540407180786e‐001 2.1286776661872864e‐ 001</leafValues></_></weakClassifiers></_></stages> <features> <_> <rects> <_> 0 0 12 24 ‐1.</_> <_> 4 8 4 8 9.</_></rects> <tilted>0</tilted></_> Using the Classifier After creating the classifier it was then loaded into OpenCV’s CascadeClassifier class CascadeClassifier class has a method called detectMultiScale() which performs the object detection. It can even detect multiple of the same object in the frame room.detectMultiScale(frame_gray, // The input frame converted to grayscale rooms,//vector of rectangles to draw if classification is successfull 1.1, //the scale factor to perform classification by 4, //Minimum number of neighbors to be classified as a hit CV_HAAR_SCALE_IMAGE,//Flag to controll how it searches the input image to perform classification Size(24,24)); //The size of the images that the classifier was trained with //THE SIZE MUST BE THE SAME VALUE OR ELSE IT WILL FAIL Using the Classifier (cont) I found a problem with the classifier: It will detect another object that has a % similarity to the desired object that is ≥ the min hit rate In my case there are signs that are 98% similar to the sign I trained my classifier for Therefore, the positive hit has to be verified with normalized cross correlation Verifying the Classifiers Detection Because the template image must be parallel with the part of the image that I want to get a hit of > .72, the input image must be rotated Then normalized cross correlation is performed on all the rotated images using OpenCV’s matchTemplate() method Even though my program has to do all of this to verify the classifier’s hit, my program still runs in relatively real time because I use OpenMP to parallelize it into 4 threads vector<Mat> _rotate_image(Mat src,int lower, int upper) { vector<Mat> rotated_img; Mat rotated,rotated2; const int len = max(src.cols,src.rows); const Point2d center(len/2.,len/2.); #pragma omp parallel num_threads(4) { #pragma omp for private(rotated,rotated2) for(int i = lower; i < upper; i++) { if(omp_debug) cout<<"cols: "<<src.cols<<" :: rows: "<<src.rows<<endl; Mat matrix = getRotationMatrix2D(center,i,1.0); Mat matrix2 = getRotationMatrix2D(center,(double)(i+.5),1.0); //rotates the images warpAffine(src, //source image to rotate rotated, //location to store the rotated image matrix, //the matrix representing the rotation Size(len,len)); //size of the output image warpAffine(src,rotated2,matrix2,Size(len,len)); #pragma omp critical { rotated_img.push_back(rotated.clone()); rotated_img.push_back(rotated2.clone()); } } } return rotated_img; } bool _match_template(vector<Mat> rotated_imgs, Mat target, double &maxVal, Point &maxLoc) { Mat score; double minVal_local, maxVal_local; Point minLoc_local,maxLoc_local; bool abort = false; #pragma omp parallel num_threads(4) { #pragma omp for private(score,minVal_local, maxVal_local,minLoc_local,maxLoc_local) for(int i = 0; i < rotated_imgs.size(); i++) { #pragma omp flush(abort) if(!abort) { matchTemplate(rotated_imgs[i],target, score,CV_TM_CCOEFF_NORMED); threshold(score,score,.7, 1,THRESH_BINARY); minMaxLoc(score,&minVal_local,&maxVal_local,&minLoc_local,&maxLoc_local); #pragma omp critical { if(1 == maxVal_local) { maxVal = maxVal_local; maxLoc = maxLoc_local; abort = true; #pragma omp flush(abort) } else if(!abort) { maxVal = maxVal_local; maxLoc = maxLoc_local; } } } } } return abort; } Conclusion My program doesn’t report any false hits to the user and only reports that it detected the correct sign 100% of the time However my classifier’s output, without validation, will classify the object correctly ≈26% of the time and will get it wrong ≈73% of the time (This is only from one run of the program) I would like to improve the accuracy and reliability of my classifier by trying different types of classifiers (i.e. HOG or LDP) or changing how I train the Haar classifier Output from live webcam feed Questions?