Computer Vision: history and applications Albert Alemany Font Helsinki Metropolia University of Applied Sciences Media Engineering April 2014 Table of contents 1. Introduction..............................................................................................................3 2. Understanding what is computer vision ...................................................................4 2.1. What is “vision”? ......................................................................................................... 4 2.2. Computer vision and its related disciplines .................................................................. 4 3. History of computer vision .......................................................................................6 4. Applications of computer vision ...............................................................................6 4.1. Face and smile detection ............................................................................................ 6 4.2. Optical character recognition (OCR)............................................................................ 7 4.3. Smart cars .................................................................................................................. 7 4.4. Medical imaging .......................................................................................................... 8 4.5. Video-based interaction: gaming ................................................................................. 8 4.6. Computer vision as a barrier ....................................................................................... 8 5. Conclusions .............................................................................................................9 6. References ............................................................................................................ 10 2 1. Introduction According to Aristotle, Vision is knowing what is where by looking, which is essentially valid. Our vision and brain identify, from the information that arrive to our eyes, the objects we are interested in and their position in the environment, which is very important for a lot of our activities. Computer Vision, somehow tries to emulate that capacity in computers, so that by means of the interpretation of the acquired images, for example with a camera, the different objects can be recognized in the environment as well as their position in the space. The easiness with which we “see”, brought the first artificial intelligence researchers to start thinking, around 1960, that making a computer interpret images was relatively easy, but it turned out to be different [4]. Many years of investigation have proven that it is a very complex subject. However, over the last few years there have been considerable progresses. Computer vision brings together different fields such as mathematics, physics, biology and engineering. It provides us a better understanding of human vision, how we perceive and interpret things. Our world is surrounded by images and movies, and every time more useful applications are being developed; applications that are touching our lives, making them easier, safer and more fun. The goal of this thesis is to investigate how computer vision has evolved over the years since it first appeared, and to explore the different applications that have been developed and how they have helped us, improving our lives. Also, in this thesis I will reflect on where computer vision is going to go in the next years and discuss how we should address it from an ethical point of view. 3 2. Understanding what is computer vision 2.1. What is “vision”? Vision is the window to the world of many organisms. Its main function is to recognize and localize objects in the environment through image processing. Computational vision is the study of these processes, in order to understand them and to build machines with similar capacities.
There are different definitions of vision. The following ones are among the most important: “Vision is knowing what is where by looking” (Aristotle) “Vision is to get from the information of our senses, valid properties from the external world”, Gibson [3]. “Vision is a process that, from images of the external world, it produces a description that is useful to the observer and that doesn’t contain irrelevant information”, Marr [7]. All of these definitions are essentially valid, but maybe the one that is closer to the current idea about computer vision is the definition of Marr. In his definition there are three important aspects that we have to consider: (i) vision is a computational process, (ii) the description obtained depends on the observer and (iii) it is necessary to remove the information that is not useful (information reduction). 2.2. Computer vision and its related disciplines The term “Computer Vision” has been used a lot in the last few years and it is often mistaken for other concepts. In the Figure 1 the different disciplines and fields related to computer vision are shown. Figure 1: Computer vision related disciplines 4 Digital image processing is the process by which taking an image, a modified version of it is produced. In the Figures 1.1 and 1.2 two examples are illustrated. In the first one, the segmentation can be observed, where the goal is to identify from an image the pixels that belong to an object. In that case, the output is a binary image formed by white and black pixels, which means “object” or “no-object”. The second example is about restoration of an image. In that case, a blurry image becomes clearer. Figure 2: Image processing – segmentation: the goal is to separate the study object from the background of the image [6]. Figure 3: Image processing - restoration: the goal is to remove the movement of the camera when the photography was taken [5]. Machine vision is similar to computer vision but it is more practical, whereas computer vision is more academic. Machine vision is not as advanced in theoretical sense as computer vision. There is a lot of deep mathematics in computer vision, while in machine vision practical issues such as cost and speed of processing are likely to dominate over academic matters [8]. 5 3. History of computer vision In 19060’s Larry Roberts wrote his thesis about the possibility of extracting 3D geometric information from 2D views. This lead to a lot of research in the MIT’s artificial intelligence labs as well as in other research institutions. In 1970’s MIT’s artificial intelligence lab started a course in computer vision. In 1980’s OCR (Optical character recognition) systems were starting being used in various industrial applications to read and verify letters, symbols and numbers. Smart cameras were developed in the late 80’s. In the 90’s, the first face recognition systems appeared. [1] [4] 4. Applications of computer vision 4.1. Face and smile detection In the 90’s the first face recognition systems appeared. Nowadays almost any digital camera is able to detect faces and adjust the exposure and flash in order to obtain the best results. The Figure 4 shows an example of how a camera detects the faces of the people standing in front of it and how it draws a rectangle in each of them. Some cameras also have the “auto trigger” option, where the photo is automatically taken when the person in front of the camera is smiling. Figure 4: Automatic face detection 6 4.2. Optical character recognition (OCR) Optical character recognition is the technology to convert scanned docs into text that a computer can read. As the Figure 5 shows, Optical character recognition software are used for car license plates recognition. The radars must be able to localize a license plate of a vehicle with variable conditions regarding illumination, perspective and different environments. Figure 5: License Plate Recognition OCR software Another application of OCR is converting handwriting in real time to control a computer. This is called pen computing. A tablet is used to replace a keyboard and commands are sent to the computer using gesture recognition. This technology is also used in database indexing. Printed documents are converted into electronic copies, becoming searchable documents. This is what Google Books does. Google has scanned and converted into text a lot of magazines, and now people can perform searches on these books. 4.3. Smart cars With the help of computer vision, our society has been able to develop cars that can effectively drive by themselves. An autonomous vehicle can imitate the human driving capacities. It is able to sense its surrounding environment and to act accordingly. In order to do that, it uses technologies such as radars, lidars, GPS, and computer vision. This type of vehicles not only brings the possibility of a driverless trip, but they also suppose other advantages. They would reduce the number of car accidents, because these autonomous systems increase the security compared to a human driver. Also, they could increase the capacity of the highways and decrease the traffic congestion due to the 7 reduction of the security distance between vehicles. Another advantage is the possible reduction of traffic signs, because of the fact that these vehicles could receive the information electronically. 4.4. Medical imaging In order to help the physician into the diagnosis process, 3D models are created by computers by combining different 2D scans such as CT (Computerized tomography) and MRI (Magnetic resonance imaging) [2]. Also, by processing a magnetic resonance image, the internal structures can be easily located, granting the surgeon x-ray vision, which is a step forward towards minimally invasive surgeries [2]. 4.5. Video-based interaction: gaming Vision-based interfaces have been developed lately, allowing the player to move his body to interact with the game. The interface can sense the position of the body, the orientation of the head, the direction of gaze as well as the different gestures produced by the player. Then the character in the game may respond accordingly. These interfaces provide a much more exciting and fun experience overall. The application of computer vision to computer games fronts some challenges. It is important that the response time is as fast as possible. Also, the hardware cost needs to be very low. 4.6. Computer vision as a barrier Since several decades ago, many artists such as Salvador Dalí or M.C. Escher have worked with optical illusions. An optical illusion is any illusion of the sense of that makes us perceive the reality erroneously. A computer would not have any difficulty solving an optical illusion, because optical illusions are based on physiological and cognitive matters. However, there are other images which a human could interpret its content rather easily, while for the current computers it is impossible to do. The Figure 6 shows a Google CAPTCHA. A CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a test used in computer to determine weather the user is a human or not. These CAPTCHAs are used as a barrier in Internet, and they have been getting harder over the last few years. That is because the OCR systems are getting better. 8 Figure 6: Google CAPTCHA 5. Conclusions Computer vision does not have to be thought as when computers are going to be capable of holding enough artificial intelligence and do what humans can do. It is not trying to mimic human behavior, but to extend it beyond that. Despite the fact that there is still a lot of land to discover in the field of computer vision, and that some books on that topic get obsolete as soon as they are published, computer vision is getting to a point where is changing, and will change even more our lives drastically. Ranging from image inspection and assembly tasks to motion controllers devices, computer vision is touching our lives in every area. Object recognition, face detection, smart cars and medical imaging are only a small list of applications that are changing how we as humans and as a society live and coexist. Computer vision is leading us to a sensor-driven world, which is going to help us improve as a society in a lot of aspects, but probably not in all of them. Currently the majority of security cameras rely on human intervention to be able to detect strange behavior or anomalies, but that is going to change. There will come a point where these cameras, by using biometric identification techniques, will be able to recognize, identify and track people. Until now, computer vision has provide only useful tools, but soon some questions will need to be answered regarding where the barriers should be put to separate what is ethically and morally correct and what is not, specially considering the fact the current laws regarding data collection and management, privacy and surveillance are blurry. 9 6. References [1] Y. Aloimonos, Special Issue on Purposive and Qualitative Active Vision, 1992. [2] C. H. Chen, Computer Vision in Medical Imaging, Oct. 15, 2013. [3] J. J. Gibson, The Ecological Approach to Visual Perception. Boston: Houghton Miin, 1979. [4] J. Gribbin. Historia de la Ciencia (1543-2001). Editorial Crítica, Barcelona, 2003. [5] D. Mery and D. Filbert. A fast non-iterative algorithm for the removal of blur caused by uniform linear motion in X-ray images. In Proceedings of the 15th World Conference on NonDestructive Testing (15th - WCNDT), Rome, Oct. 15-21 2000. [6] D. Mery and F. Pedreschi. Segmentation of colour food images using a robust algorithm. Journal of Food Engineering, 2004. (accepted April 2004). [7] D. Marr, Vision. San Francisco: Freeman, 1982. [8] E. Trucco and A. Verri, Introductory techniques for 3-D Computer Vision, 1998. 10