Computer Vision - Personal web pages for people of Metropolia

advertisement
Computer Vision: history and applications
Albert Alemany Font
Helsinki Metropolia University of Applied Sciences
Media Engineering
April 2014
Table of contents
1. Introduction..............................................................................................................3
2. Understanding what is computer vision ...................................................................4
2.1. What is “vision”? ......................................................................................................... 4
2.2. Computer vision and its related disciplines .................................................................. 4
3. History of computer vision .......................................................................................6
4. Applications of computer vision ...............................................................................6
4.1. Face and smile detection ............................................................................................ 6
4.2. Optical character recognition (OCR)............................................................................ 7
4.3. Smart cars .................................................................................................................. 7
4.4. Medical imaging .......................................................................................................... 8
4.5. Video-based interaction: gaming ................................................................................. 8
4.6. Computer vision as a barrier ....................................................................................... 8
5. Conclusions .............................................................................................................9
6. References ............................................................................................................ 10
2
1. Introduction
According to Aristotle, Vision is knowing what is where by looking, which is essentially valid.
Our vision and brain identify, from the information that arrive to our eyes, the objects we are
interested in and their position in the environment, which is very important for a lot of our
activities. Computer Vision, somehow tries to emulate that capacity in computers, so that by
means of the interpretation of the acquired images, for example with a camera, the different
objects can be recognized in the environment as well as their position in the space.
The easiness with which we “see”, brought the first artificial intelligence researchers to start
thinking, around 1960, that making a computer interpret images was relatively easy, but it
turned out to be different [4]. Many years of investigation have proven that it is a very
complex subject. However, over the last few years there have been considerable progresses.
Computer vision brings together different fields such as mathematics, physics, biology and
engineering. It provides us a better understanding of human vision, how we perceive and
interpret things. Our world is surrounded by images and movies, and every time more useful
applications are being developed; applications that are touching our lives, making them
easier, safer and more fun.
The goal of this thesis is to investigate how computer vision has evolved over the years since
it first appeared, and to explore the different applications that have been developed and how
they have helped us, improving our lives. Also, in this thesis I will reflect on where computer
vision is going to go in the next years and discuss how we should address it from an ethical
point of view.
3
2. Understanding what is computer vision
2.1. What is “vision”?
Vision is the window to the world of many organisms. Its main function is to recognize and
localize objects in the environment through image processing. Computational vision is the
study of these processes, in order to understand them and to build machines with similar
capacities.
There are different definitions of vision. The following ones are among the most
important:
“Vision is knowing what is where by looking” (Aristotle)
“Vision is to get from the information of our senses, valid properties from the external world”,
Gibson [3].
“Vision is a process that, from images of the external world, it produces a description that is
useful to the observer and that doesn’t contain irrelevant information”, Marr [7].
All of these definitions are essentially valid, but maybe the one that is closer to the current
idea about computer vision is the definition of Marr. In his definition there are three important
aspects that we have to consider: (i) vision is a computational process, (ii) the description
obtained depends on the observer and (iii) it is necessary to remove the information that is
not useful (information reduction).
2.2. Computer vision and its related disciplines
The term “Computer Vision” has been used a lot in the last few years and it is often mistaken
for other concepts. In the Figure 1 the different disciplines and fields related to computer
vision are shown.
Figure 1: Computer vision related disciplines
4
Digital image processing is the process by which taking an image, a modified version of it is
produced. In the Figures 1.1 and 1.2 two examples are illustrated. In the first one, the
segmentation can be observed, where the goal is to identify from an image the pixels that
belong to an object. In that case, the output is a binary image formed by white and black
pixels, which means “object” or “no-object”. The second example is about restoration of an
image. In that case, a blurry image becomes clearer.
Figure 2: Image processing – segmentation: the goal is to separate the study object from the background
of the image [6].
Figure 3: Image processing - restoration: the goal is to remove the movement of the camera when the
photography was taken [5].
Machine vision is similar to computer vision but it is more practical, whereas computer vision
is more academic. Machine vision is not as advanced in theoretical sense as computer
vision. There is a lot of deep mathematics in computer vision, while in machine vision
practical issues such as cost and speed of processing are likely to dominate over academic
matters [8].
5
3. History of computer vision
In 19060’s Larry Roberts wrote his thesis about the possibility of extracting 3D geometric
information from 2D views. This lead to a lot of research in the MIT’s artificial intelligence
labs as well as in other research institutions. In 1970’s MIT’s artificial intelligence lab started
a course in computer vision. In 1980’s OCR (Optical character recognition) systems were
starting being used in various industrial applications to read and verify letters, symbols and
numbers. Smart cameras were developed in the late 80’s. In the 90’s, the first face
recognition systems appeared. [1] [4]
4. Applications of computer vision
4.1. Face and smile detection
In the 90’s the first face recognition systems appeared. Nowadays almost any digital camera
is able to detect faces and adjust the exposure and flash in order to obtain the best results.
The Figure 4 shows an example of how a camera detects the faces of the people standing in
front of it and how it draws a rectangle in each of them. Some cameras also have the “auto
trigger” option, where the photo is automatically taken when the person in front of the camera
is smiling.
Figure 4: Automatic face detection
6
4.2. Optical character recognition (OCR)
Optical character recognition is the technology to convert scanned docs into text that a
computer can read. As the Figure 5 shows, Optical character recognition software are used
for car license plates recognition. The radars must be able to localize a license plate of a
vehicle
with
variable
conditions
regarding
illumination,
perspective
and
different
environments.
Figure 5: License Plate Recognition OCR software
Another application of OCR is converting handwriting in real time to control a computer. This
is called pen computing. A tablet is used to replace a keyboard and commands are sent to
the computer using gesture recognition.
This technology is also used in database indexing. Printed documents are converted into
electronic copies, becoming searchable documents. This is what Google Books does.
Google has scanned and converted into text a lot of magazines, and now people can perform
searches on these books.
4.3. Smart cars
With the help of computer vision, our society has been able to develop cars that can
effectively drive by themselves. An autonomous vehicle can imitate the human driving
capacities. It is able to sense its surrounding environment and to act accordingly. In order to
do that, it uses technologies such as radars, lidars, GPS, and computer vision.
This type of vehicles not only brings the possibility of a driverless trip, but they also suppose
other advantages. They would reduce the number of car accidents, because these
autonomous systems increase the security compared to a human driver. Also, they could
increase the capacity of the highways and decrease the traffic congestion due to the
7
reduction of the security distance between vehicles. Another advantage is the possible
reduction of traffic signs, because of the fact that these vehicles could receive the information
electronically.
4.4. Medical imaging
In order to help the physician into the diagnosis process, 3D models are created by
computers by combining different 2D scans such as CT (Computerized tomography) and
MRI (Magnetic resonance imaging) [2].
Also, by processing a magnetic resonance image, the internal structures can be easily
located, granting the surgeon x-ray vision, which is a step forward towards minimally invasive
surgeries [2].
4.5. Video-based interaction: gaming
Vision-based interfaces have been developed lately, allowing the player to move his body to
interact with the game. The interface can sense the position of the body, the orientation of
the head, the direction of gaze as well as the different gestures produced by the player. Then
the character in the game may respond accordingly. These interfaces provide a much more
exciting and fun experience overall.
The application of computer vision to computer games fronts some challenges. It is important
that the response time is as fast as possible. Also, the hardware cost needs to be very low.
4.6. Computer vision as a barrier
Since several decades ago, many artists such as Salvador Dalí or M.C. Escher have worked
with optical illusions. An optical illusion is any illusion of the sense of that makes us perceive
the reality erroneously. A computer would not have any difficulty solving an optical illusion,
because optical illusions are based on physiological and cognitive matters.
However, there are other images which a human could interpret its content rather easily,
while for the current computers it is impossible to do. The Figure 6 shows a Google
CAPTCHA. A CAPTCHA (Completely Automated Public Turing test to tell Computers and
Humans Apart) is a test used in computer to determine weather the user is a human or not.
These CAPTCHAs are used as a barrier in Internet, and they have been getting harder over
the last few years. That is because the OCR systems are getting better.
8
Figure 6: Google CAPTCHA
5. Conclusions
Computer vision does not have to be thought as when computers are going to be capable of
holding enough artificial intelligence and do what humans can do. It is not trying to mimic
human behavior, but to extend it beyond that.
Despite the fact that there is still a lot of land to discover in the field of computer vision, and
that some books on that topic get obsolete as soon as they are published, computer vision is
getting to a point where is changing, and will change even more our lives drastically.
Ranging from image inspection and assembly tasks to motion controllers devices, computer
vision is touching our lives in every area. Object recognition, face detection, smart cars and
medical imaging are only a small list of applications that are changing how we as humans
and as a society live and coexist.
Computer vision is leading us to a sensor-driven world, which is going to help us improve as
a society in a lot of aspects, but probably not in all of them. Currently the majority of security
cameras rely on human intervention to be able to detect strange behavior or anomalies, but
that is going to change. There will come a point where these cameras, by using biometric
identification techniques, will be able to recognize, identify and track people.
Until now, computer vision has provide only useful tools, but soon some questions will need
to be answered regarding where the barriers should be put to separate what is ethically and
morally correct and what is not, specially considering the fact the current laws regarding data
collection and management, privacy and surveillance are blurry.
9
6. References
[1] Y. Aloimonos, Special Issue on Purposive and Qualitative Active Vision, 1992.
[2] C. H. Chen, Computer Vision in Medical Imaging, Oct. 15, 2013.
[3] J. J. Gibson, The Ecological Approach to Visual Perception. Boston: Houghton Miin,
1979.
[4] J. Gribbin. Historia de la Ciencia (1543-2001). Editorial Crítica, Barcelona, 2003.
[5] D. Mery and D. Filbert. A fast non-iterative algorithm for the removal of blur caused by
uniform linear motion in X-ray images. In Proceedings of the 15th World Conference on NonDestructive Testing (15th - WCNDT), Rome, Oct. 15-21 2000.
[6] D. Mery and F. Pedreschi. Segmentation of colour food images using a robust algorithm.
Journal of Food Engineering, 2004. (accepted April 2004).
[7] D. Marr, Vision. San Francisco: Freeman, 1982.
[8] E. Trucco and A. Verri, Introductory techniques for 3-D Computer Vision, 1998.
10
Download