BABEŞ-BOLYAI UNIVERSITY CLUJ-NAPOCA FACULTY OF MATHEMATICS AND COMPUTER SCIENCE SPECIALIZATION FORMAL METHODS IN PROGRAMMING Pattern Recognition Author Mezei Sergiu-Vlad 2010 Table of Content 1. Introduction............................................................................................................................ 3 1.1. Pattern Recognition ........................................................................................................................... 3 1.2. Computer Vision ............................................................................................................................... 4 2. OpenCV (Open Source Computer Vision) .......................................................................... 6 3. Pattern Recognition ............................................................................................................... 9 3.1.Classifiers ........................................................................................................................................... 9 3.1.1. Decision Functions ................................................................................................................ 9 3.1.2. Statistical Approach ............................................................................................................. 10 3.1.3. Fuzzy Classifiers .................................................................................................................. 10 3.1.4. Syntactic Approach.............................................................................................................. 10 3.1.5. Neural networks ................................................................................................................... 11 3.2.Traffic Sign Recognition Using OpenCV ........................................................................................ 11 4. Recent applications .............................................................................................................. 14 5. Conclusions........................................................................................................................... 16 6. Bibliography ......................................................................................................................... 17 1 Abstract Pattern recognition is an activity that humans normally excel in. We do it almost all the time, without realizing it. We receive information through our various senses, which is instantaneously processed by our mind such that we are able to identify the source of the information, without having made any perceptible effort. What is even more impressive is the ability to accurately perform recognition tasks even in non-ideal conditions, for example, when the received information is ambiguous, imprecise or even incomplete. In fact, most of our day-to-day activities are based on our success in performing various pattern recognition tasks. For example, when we read, we recognize the letters, words and, ultimately concepts and notions, from the visual signals received by our mind, which processes them extremely fast and probably does a neurobiological implementation of template matching. This paper intends to provide an insight of the way a machine can perform recognition tasks in order to perceive the environment like a human being does. 2 1. Introduction 1.1. Pattern Recognition The subject of Pattern Recognition (PR) or Pattern recognition by machine basically deals with the problem of constructing methods and algorithms that can enable the machine-implementation of various recognition tasks that a human normally performs. The motivation is to find ways in which the recognition tasks can be performed faster, more accurately and maybe more economically than a human can. The purpose of PR also includes tasks at which humans are not good at, like reading bar codes. The goal of pattern recognition research is to devise ways and means of automating certain decisionmaking processes that lead to classification and recognition. For example a sorting factory can use these algorithms to sort the red apples from the green ones which travel on a conveyor belt using a video camera, which will be the “eye” of the machine, and a computer which will be processing the input information from the camera to determine which apple is red or green. Pattern recognition has been an intensely studied field for the past few decades, as is amply borne out by numerous books (References [1, 2, 3, 4, 5, 6, 7] for example). The object which is inspected for the “recognition” process is called a pattern. We normally refer to a pattern as a description of an object which we want to recognize. In this paper we are interested in spatial patterns like humans, apples, fingerprints, electrocardiograms or chromosomes. In most cases a pattern recognition problem is a problem of discriminating between different populations. So, the recognition process turns into classification. To determine which class an object belongs to, we must first find which features are going to determine this classification. In constructing a pattern recognition system, i.e. a system that will be able to obtain an unknown incoming pattern and classify it in one (or more) of several given classes, we clearly want to employ all the available related information that was previously accumulated. We assume that some sample 3 patterns with known classification are available. These patterns with their typical attributes from a training set which provides relevant information how to associate between input information and decision making. By using the training set the pattern recognition system may learn various types of information like statistical parameters, relevant features, etc. 1.2. Computer Vision Computer vision is the science that helps an artificial system sense the surrounding environment like a human senses the environment through sight. It develops a basis through which we can automatically process and obtain data from an image or a sequence of images (a video stream). The artificial vision system that is implemented in software and hardware goes hand in hand with research made on biological vision. This is an extremely difficult task and requires an understanding of human perception so we could say that Computer Vision is a polyvalent domain. Sciences like psychology, neurophysiology, cognitive science, geometry and areas of physics such as optics, artificial intelligence and pattern recognition are included and at some level considered prerequisites for Computer Vision. Computer vision is a wide and relatively new field of study. In the early days, it was difficult to process even small sets of image data, although earlier work exists, starting from the 1950s, but it was done in a more simple and theoretical way. It was not until the late 1970s that a more focused study of the field appeared on the computing stage, when computers could manage the processing of large data sets such as images and video streams. Computer vision covers a wide range of topics which are often related to other disciplines, and consequently there is no standard formulation of "the computer vision problem". Moreover, there is no standard formulation of how computer vision problems should be solved. Although, there are an abundance of methods for solving various well-defined computer vision tasks, where the methods often are very task specific and seldom can be generalized over a wide range of applications. Many of the methods and applications are still in the state of basic research, but more and more methods have found their way into commercial products, where they often constitute a part of a larger system which can solve complex tasks (e.g., in the area of medical images, or quality control and measurements in industrial processes). In most practical computer vision applications, the computers are pre-programmed to solve a particular task, but methods based on learning are now becoming increasingly common. 4 As a technological discipline, computer vision seeks to apply its theories and models to the construction of computer vision systems. Examples of applications of computer vision include systems for: Controlling processes (e.g. an industrial robot or an autonomous vehicle). Detecting events (e.g. for visual surveillance or people counting). Organizing information (e.g. for indexing databases of images and image sequences). Modeling objects or environments (e.g. industrial inspection, medical image analysis or topographical modeling). Interaction (e.g. as the input to a device for computer-human interaction). Sub-domains of computer vision include scene reconstruction, event detection, video tracking, object recognition, learning, indexing, motion estimation, and image restoration. 5 2. OpenCV (Open Source Computer Vision) OpenCV is an open source computer vision library. The library is written in C and C++ and runs under Linux, Windows and Mac OS X. There is active development on interfaces for Python, Ruby, Matlab, and other languages. OpenCV was designed for computational efficiency and with a strong focus on realtime applications. OpenCV is written in optimized C and can take advantage of multicore processors.[13, 14] The OpenCV library has over 500 functions that span many fields in vision, including factory product inspection, medical imaging, security, user interface, camera calibration, stereo vision, and robotics. Officially launched in 1999, the OpenCV project was initially an Intel Research initiative to advance CPU-intensive applications, part of a series of projects including real-time ray tracing and 3D display walls. The main contributors to the project included Intel’s Performance Library Team, as well as a number of optimization experts in Intel Russia. In the early days of OpenCV, the goals of the project were described as: Advance vision research by providing not only open but also optimized code for basic vision infrastructure. No more reinventing the wheel. Disseminate vision knowledge by providing a common infrastructure that developers could build on, so that code would be more readily readable and transferable. Advance vision-based commercial applications by making portable, performance-optimized code available for free, with a license that did not require commercial applications to be open or free themselves. The first alpha version of OpenCV was released to the public at the IEEE Conference on Computer Vision and Pattern Recognition in 2000, and five betas were released between 2001 and 2005. The first 1.0 version was released in 2006. In mid 2008, OpenCV obtained corporate support from Willow Garage, and is now again under active development. A version 1.1 "pre-release" was released in 6 October 2008, and a book by two authors of OpenCV published by O'Reilly Media went on the market that same month (see Learning OpenCV: Computer Vision with the OpenCV Library ). The second major release of the OpenCV was on October 2009. OpenCV 2 includes major changes to the C++ interface, aiming at easier, more type-safe patterns, new functions, and better implementations for existing ones in terms of performance (especially on multi-core systems). Since then, OpenCV has been under heavy development and it is currently at the version number 2.1. Now, anybody can contribute to the OpenCV library development using the public SVN link where developers can upgrade the repository with their contributions. Here is a list of the most important domains that are covered: Statistics and moments computing Structural analysis Motion analysis and object tracking Pattern recognition Graphic interface General Image Processing and Analysis functions Segmentation Geometric descriptors Transforms Machine Learning: Detection, Recognition Matrix Math Image Pyramids Camera Calibration, Stereo, 3D Utilities and Data Structures Fitting Currently there are no standard APIs for Computer Vision, the only software that can be found is research code (which is slow and unstable), expensive commercial toolkits (like Halcon or Matlab with Simulink) or specialized solutions made for video surveillance, medical equipment, etc. Therefore there is no standard library that simplifies the development of computer vision applications. OpenCV was 7 created to serve as a solution to this problem. Created by the Intel Corporation, it is especially optimized for Inter Processors but, as an open source project, it is usable on any architecture. As a consequence, the algorithms are guaranteed to be fast especially because this library also has the purpose to point out the capabilities of the Intel processors. 8 3. Pattern Recognition 3.1.Classifiers The final goal in pattern recognition is classification of a pattern. From the original information that we obtain about the pattern we first identify the features and then use a feature extractor to measure them. These measurements are then sent to a classifier which performs the actual classification (determines at which of the existing classes to classify the pattern). In this section we assume the presence of natural grouping, meaning that we have some knowledge about the classes and the data. For example we may know the exact or approximate number of the classes and the correct classification of some given patterns which are called the training patterns. 3.1.1. Decision Functions When the number of classes is known and when the training patterns are such that there is geometrical separation between the classes we can often use a set of decision functions to classify an unknown pattern. For example a case when we have two classes Class1 and Class2 which exist in R over n and a hyperplane d(x) = 0 which separates the two classes. Then we can use the decision function d(x) as a linear classifier and classify new pattern by d(x) > 0 => x belongs to Class1 d(x)>0 => x belongs to Class2 The hyperplane d(x) = 0 is called a decision boundary. When a set of hyperplanes can separate between m given classes in R over n, these classes are linearly separable. Sometimes, we cannot classify elements only by using linear decision functions, in this case we can either use generalized functions (which are nonlinear) or use a nonlinear classifier of transform the problem to a space of a much higher dimension where classification is done using linear boundaries. 9 3.1.2. Statistical Approach Many times the training pattern of various classes overlap for example when they are originated by some statistical distributions [3, 4]. In this case a statistical approach is needed, especially when the various distribution functions of the classes are known. A statistical classifier must also evaluate the risk associated with every classification which measures the probability of misclassification. The Bayes classifier [3, 5, 6, 10] based on Bayes formula from probability theory minimizes the total expected risk. In order to use Bayes classifier one must know beforehand the pattern distribution function for each class. If these distributions are unknown they must at least be approximated using the training patterns. 3.1.3. Fuzzy Classifiers There are times when classification is performed with some degree of uncertainty. Either the classification outcome itself may be in doubt, or the classified pattern x may belong in some degree to more than one class. For example a medium apple cannot be considered to fully belong to class “big”, although at the same time it cannot be fully accepted in the class “small” (provided only these two classes exist). So, we naturally introduce fuzzy classification where a pattern is a member of every class with some degree of membership between 0 and 1 [1, 2, 9]. 3.1.4. Syntactic Approach Unlike the previous approaches, the syntactic pattern recognition utilizes the structure of the patterns [7]. Instead of carrying an analysis based strictly on quantitative characteristics of the pattern, we emphasize the interrelationships between the primitives, the components which compare the pattern. Usual patterns that are subject to syntactic pattern recognition research are characters, fingerprints, chromosomes, etc. 10 3.1.5. Neural networks The neural network approach assumes as other approaches before that a set of training patters and their correct classification is given [11, 12]. The architecture of the net which includes input layer, output layer and hidden layers may be very complex. It is characterized by a set of weights and activation function which determine how any information (input data) is being transmitted to the output layer. The neural network is trained by training patterns and adjusts the weights until the correct classifications are obtained. It is then used to classify arbitrary unknown patterns. There are several neural net classifiers but the most used and most simple one is the perceptron. Pattern recognition and classification have been used for numerous applications. In the following pages of this paper, I would like to present an application that will revolutionize the way we travel. 3.2.Traffic Sign Recognition Using OpenCV Traffic Sign Recognition is used to regulate traffic signs, warn a driver, and command or forbid certain actions. Fast real-time and robust automatic traffic sign detection and recognition can support and simplify the tasks that a driver has to perform and significantly increase driving safety and comfort. Generally, traffic signs provide the driver with a variety of information for safe and efficient navigation. Automatic recognition of traffic signs is, therefore, important for automated intelligent driving vehicles or for driver assistance systems. However, identification of traffic signs with respect to various natural background viewing conditions still remains a challenging task. Traffic Sign Recognition Systems usually have been developed into two specific phases [15 - 21]. The first phase is normally related to the detection of traffic signs in a video sequence or an image using image processing. The second one is related to recognition of those detected signs, which deals with the interest of performance in an artificial neural network. The detection algorithms are normally based on shape or color segmentation. The segmented potential regions are extracted as input in recognition stage. The efficiency and speed of the detection play important roles in the system. To recognize traffic signs, various methods for automatic traffic sign identification have been developed and show promising results. Neural Networks precisely represents a technology used in traffic sign recognition [15, 17, 20, 21]. One specific area in which many neural network applications have been developed is 11 the automatic recognition of signs. Fig. 1 illustrates the concept of automated intelligent driving vehicle or for a driver assistance system. Human visual recognition processing Automatic road sign detection and recognition system Neural Networks deal the problem of Traffic Sign Recognition Guiding, Warning, or Regulating to make driving safer and easier. Fig. 1 The concept of an automated intelligent driving vehicle or for a driver assistance system The difficulties of traffic sign detection and recognition are involved with the performance of a system in real time. Highly efficient algorithms and powerful performance hardware are required in the system [17]. Furthermore, the environmental constraints include lighting, shadow occlusion, air pollution, weather conditions (sunny, rainy, foggy, etc.) as well as additional image distortions, such as, motion blur, vehicle vibration, and abrupt contrast changes which can possibly occur frequently in an actual system [17,21]. In recent studies, the detection and recognition of traffic signs have been developed in many research centers. A vision system for the traffic sign recognition and integrated autonomous vehicle was developed as part of the European research project PROMETHEUS at DAIMLER-BENZ Research Center [17]. Moreover, many techniques have been developed for road sign recognition. For example, Pacheco et al. [22] used special color barcodes under road signs for detecting road signs in a vision-based system, however, this took a lot of time and resources. A genetic algorithm was also proposed by Aoyagi and Askura [23] to identify road sign from gray-level images. Because of a limitation of crossover, mutation operator, and optimal solution, it is not guaranteed to achieve results. Color indexing was proposed by Lalonde and Li [24] to approach identifying road sign, unfortunately, the computation time was not accepted in complex traffic scenes. 12 This application represents the real implementation using an intelligent vehicle. The main objective is to reduce the search space and indicate only potential regions for increasing the efficiency and speed of the system. A higher robust and faster intelligent algorithm is required to provide the necessary accuracy in recognition of traffic signs. In the detection phase, the acquisition image is preprocessed, enhanced, and segmented according to the sign properties of color and shape. The traffic sign images are investigated to detect potential pixel regions which could be recognized as possible road signs from the complex background. The potential objects are then normalized to a specified size, and input to recognition phase. This study investigates only circle and hexagonal shaped objects because these shapes are normally present in many types of traffic signs. Multilayer Perceptron (MLP) with respect to a back propagation learning algorithm is an alternative method to approach the problem of recognizing signs in this work. The image processing tool used for this application is a free and noncommercial Intel® Open Source Computer Vision Library (OpenCV) [14]. Here are the steps that this proposed application follows in order to achieve the end result: The images are pre-processed in stages with image processing techniques, such as, threshold technique, Gaussian filter, Canny edge detection, Contour and Fit Ellipse. The images are extracted to a small area region of the traffic sign called “blob.” The main reason to select this method is to reduce the computational cost in order to facilitate real time implementation. This first step is achieved using the OpenCV library that provided the algorithm necessary to process the acquired images. Then, in the second step, the Neural Network stages are performed to recognize the traffic sign patterns. The first strategy is to reduce the number of MLP inputs by preprocessing the traffic sign image, and the second strategy is to search for the best network architecture by selecting a suitable error criterion for training. The system is trained by a training data set, and validated by a validating data set to find the suitable network architecture. The cross-validation technique is implemented by a training data set, validating data set, and test set. The experiments show consistent results together with accurate classifications of traffic sign patterns with complex background images. The processing time in each frame of an image is approached and satisfied to apply in the real intelligent vehicle or driver assistance application. 13 4. Recent applications In the last few years there has been a lot of interest in the field of computer vision and pattern recognition by large organizations like DARPA which is the research and development office for the U.S. Department of Defense. DARPA’s mission is to maintain technological superiority of the U.S. military and prevent technological surprise from harming U.S. national security. DARPA funds unique and innovative research through the private sector, academic and other nonprofit organizations as well as government labs. In 2007, DARPA hosted an autonomous vehicle race through traffic called the “Urban Challenge”. The DARPA Urban Challenge was held on November 3rd, 2007, at the former George AFB in Victorville, California. Building on the success of the 2004 and 2005 Grand Challenges, this event required teams to build an autonomous vehicle capable of driving in traffic, performing complex maneuvers such as merging, passing, parking and negotiating intersections. This event was truly groundbreaking as the first time autonomous vehicles have interacted with both manned and unmanned vehicle traffic in an urban environment. Teams from around the world were whittled down through a series of qualifying steps, beginning with technical papers and videos, then advancing to actual vehicle testing at team sites. Of the 89 teams to initially apply, 35 teams were invited to the National Qualification Event (NQE), a rigorous eight-day vehicle testing period. The NQE was co-located with the Final Event in Victorville, CA. DARPA transformed the roads of the former George AFB into an autonomous vehicle testing ground. After tallying all of the NQE scores, DARPA announced on November 1, 2007, that 11 teams would be competing in the Final Event. And so at sunrise on November 3, in front of a crowd of thousands on hand to witness history being made, Dr. Tether, DARPA Director, raised the green flag and the race was on. One by one, all 11 finalist robots were released from their starting chutes, followed by a chase vehicle equipped with an emergency stop control. This event was not just a timed race however – robots were also being judged on their ability to follow California driving rules. DARPA officials pored through reams of data throughout the night, analyzing each team’s infractions and elapsed At the awards ceremony the next morning, DARPA announced the winning order: 14 run times. 1st Place: $2,000,000 Tartan Racing – Pittsburgh, PA 2nd Place: $1,000,000 Stanford Racing Team– Stanford, CA 3rd Place : $500,000 Victor Tango – Blacksburg, VA OpenCV was of key use in the vision system of “Stanley”, the vehicle that the ‘Stanford Racing’ team used. [25]. 15 5. Conclusions In this paper I have discussed about the approaches of pattern recognition in computer vision that have emerged so far over the past few decades. In my opinion, these advances in pattern recognition and computer vision represent the building blocks for the near future of the technology based world. I hope I have provided some insight into the world of robots that were a matter of science fiction at the end of the 20th century. I can only think of advantages in continuing the research into this amazing subject, the main advantage is the preservation of human life by assigning the life-threatening tasks to robots. There continues to be a great demand for pattern recognition applications in the current technology based world, so future research on the matter of pattern recognition is predicted to continue and intensify in this century. 16 6. Bibliography [1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Functions. New York: Plenum Press, 1981 [2] J.C. Bezdek, J. Keller, R. Krisnapuram, and N. R. Pal, Fuzzy models and Algorithms for pattern recognition and Image Processing. Boston: Kluwer Academic, 1999. [3] P.A. Devijver and J. Kittler, Pattern Recognition: A statistical Approach. London: Prentice-Hall, 1982. [4] L. Devroye, L. Gyorfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. New York: Springer, 1996 [5] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973 [6] R.O. Duda, D.G. Stork, and P.E. Hart, Pattern Classification and scene Analysis, New York: John Wiley and Sons, second ed., 2000 [7] K.S. FU, Syntactic Pattern Recognition and Applications. Englewood Clifffs, NJ: Prentice-Hall, 1982 [8] K. Fukunaga, Introduction to Statistical Pattern Recognition. New York: Morgan Kaufmann, 1990. [9] S.K. pal and D. Dutta Majumder, Fuzzy Mathematical Approach to Pattern Recognition. New York: Wiley (Halsted Press), 1986. [10] S. Theodoridis and K. Koutroumbas, Pattern Recognition. San Diego: Academic Press, 1999. [11] B.D. Ripley, Pattern Recognition and Neural Networks. Cambridge: University Press, 1996 [12] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford: Univeristy Press, 1995 [13] Bradski, G. , Kaehler, A.: Learning OpenCV , O’Reilly Media, Inc., USA, 2008. [14] Open Source Computer Vision Library – Reference Manual, Intel [15] R. Vicen-Bueno, R. Gil-Pita, M.P. Jarabo-Amores and F. L´opez-Ferreras, “Complexity Reduction in Neural Networks Applied to Traffic Sign Recognition”, Proceedings of the 13th European Signal Processing Conference, Antalya, Turkey, September 4-8, 2005. [16] R. Vicen-Bueno, R. Gil-Pita, M. Rosa-Zurera, M. Utrilla-Manso, and F. Lopez-Ferreras, “Multilayer Perceptrons Applied to Traffic Sign Recognition Tasks”, LNCS 3512, IWANN 2005, J. Cabestany, A. Prieto, and D.F. Sandoval (Eds.), Springer-Verlag, Berlin, Heidelberg, 2005. [17] H. X. Liu, and B. Ran, “Vision-Based Stop Sign Detection and Recognition System for Intelligent Vehicle”, Transportation Research Board (TRB) Annual Meeting 2001, Washington, D.C., USA, January 7-11, 2001. 17 [18] H. Fleyeh, and M. Dougherty, “Road And Traffic Sign Detection And Recognition”, Proceedings of the 16th Mini - EURO Conference and 10th Meeting of EWGT. [19] D. S. Kang, N. C. Griswold, and N. Kehtarnavaz, “An Invariant Traffic Sign Recognition System Based on Sequential Color Processing and Geometrical Transformation”, Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation Volume , Issue , 21-24 Apr 1994. [20] M. Rincon, S. Lafuente-Arroyo, and S. Maldonado-Bascon, “Knowledge Modeling for the Traffic Sign Recognition Task”, Springer Berlin / Heidelberg Volume 3561/2005. [21] C. Y. Fang, C. S. Fuh, P. S. Yen, S. Cherng, and S. W. Chen, “An Automatic Road Sign Recognition System based on a Computational Model of Human Recognition Processing”, Computer Vision and Image Understanding, Vol. 96 , Issue 2 (November 2004). [22] L. Pacheco, J. Batlle, X. Cufi, “A new approach to real time traffic sign recognition based on color information”, Proceedings of the Intelligent Vehicles Symposium, Paris, 1994. [23] Y. Aoyagi, T. Asakura, “A study on traffic sign recognition in scene image using genetic algorithms and neural networks”, Proceedings of the IEEE IECON Int. Conf. on Industrial Electronics, Control, and Instrumentation, Taipei, Taiwan, vol. 3, 1996. [24] M. Lalonde, Y. Li, Detection of Road Signs Using Color Indexing, Technical Report CRIMIT95/12-49, Centre de Recherche Informatique de <http://www.crim.ca/sbc/english/cime/> publications.html, 1995. [25] http://tech.groups.yahoo.com/group/OpenCV 18 Montreal. Available from: