CV2_CH1b.Introduction - Digital Camera and Computer Vision

advertisement
UW Computer Vision (CSE 576)
Staff
Steve Seitz
Rick Szeliski
seitz@cs.washington.edu szeliski@microsoft.com
TA: Jiun-Hung Chen
jhchen@cs.washington.edu
Web Page
• http://www.cs.washington.edu/education/courses/cse576/08sp/
Today
•
•
•
•
Introduction
Computer vision overview
Course overview
Image processing
Readings
•
Book: Richard Szeliski, Computer Vision: Algorithms and Applications
– (please check Web site weekly for updated drafts)
– Introduction: Chapter 1.0
1.1 What Is Computer Vision?
• Optical character recognition (OCR): reading
handwritten postal codes on letters (Figure 1.4a) and
automatic number plate recognition (ANPR);
• Machine inspection: rapid parts inspection for quality
assurance using stereo vision with specialized
illumination to measure tolerances on aircraft wings or
auto body parts (Figure 1.4b) or looking for defects in
steel castings using X-ray vision;
• Retail: object recognition for automated checkout lanes
(Figure 1.4c);
• 3D model building (photogrammetry): fully automated
construction of 3D models from aerial photographs used
1.1 What Is Computer Vision?
• Medical imaging: registering pre-operative and intra-
operative imagery (Figure 1.4d) or performing long-term
studies of people’s brain morphology as they age;
• Automotive safety: detecting unexpected obstacles
such as pedestrians on the street, under conditions
where active vision techniques such as radar or lidar do
not work well (Figure 1.4e; see also Miller, Campbell,
Huttenlocher et al. (2008); Montemerlo, Becker, Bhat et
al. (2008); Urmson, Anhalt, Bagnell et al. (2008) for
examples of fully
automated driving);
1.1 What Is Computer Vision?
• Match move: merging computer-generated imagery
(CGI) with live action footage by tracking feature points
in the source video to estimate the 3D camera motion
and shape of the environment.
• Such techniques are widely used in Hollywood (e.g., in
movies such as Jurassic Park) (Roble 1999; Roble and
Zafar 2009);
• They also require the use of precise matting to insert
new elements between foreground and background
elements (Chuang, Agarwala, Curless et al. 2002).
1.1 What Is Computer Vision?
• Motion capture (mocap): using retro-reflective markers
viewed from multiple cameras or other vision-based
techniques to capture actors for computer animation;
• Surveillance: monitoring for intruders, analyzing
highway traffic (Figure 1.4f), and monitoring pools for
drowning victims;
• Fingerprint recognition and biometrics: for automatic
access authentication as well as forensic applications.
What Is Computer Vision?
Does human vision (left) or
computer vision (right)
perform better?
Terminator 2
How can we write a computer program to tell the left side is a
real human and right side a robot?
Every Picture Tells a Story
Black and white photo by H RogerViollet of a train accident at La Gare
Montparnasse station in Paris on 22
October, 1895 when engine 120-721
failed to stop at the platform, went
through a first-floor window and
crashed down onto the street.
Goal of computer vision is to write computer programs
that can interpret images.
Can Computers Match (or Beat) Human Vision?
Yes and no (but mostly no!)
• Humans are much better at “hard” and versatile things.
• Computers can be better at “easy,” specific, and repetitive things.
Human Perception Has Its Shortcomings…
Although this image appears to
be a fairly run-of-the-mill
picture of Bill Clinton and Al
Gore, a closer inspection
reveals that both men have been
digitally given identical inner
face features and their mutual
configuration. Only the external
features are different. It
appears, therefore, that the
human visual system makes
strong use of the overall head
shape in order to determine
facial identity.
Sinha and Poggio, Nature, 1996
Copyright A.Kitaoka 2003
Rotating Snakes
Circular snakes appear to rotate spontaneously.
Current State of the Art
The next slides show some examples of what
current vision systems can do.
Earth Viewers (3D Modeling)
Image from Microsoft’s Virtual Earth
(see also: Google Earth)
Photosynth
http://photosynth.net/
Based on Photo Tourism technology developed here in CSE!
by Noah Snavely, Steve Seitz, and Rick Szeliski
CSE: Computer Science and Engineering
Optical Character Recognition (OCR)
Technology to convert scanned documents to
text
• If you have a scanner, it probably came with OCR software
Digit Recognition, AT&T Laboratories
AT&T: American Telephone and Telegraph
http://www.research.att.com/~yann/
License plate readers
http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
Face Detection
Many new digital cameras now detect faces
• Canon, Sony, Fuji, …
• Smile shutter
• Subject metering for auto focus, auto exposure, and auto
white balance
Smile Detection?
Sony Cyber-shot® T70 Digital Still Camera
Object Recognition (in Supermarkets)
LaneHawk by EvolutionRobotics
“A smart camera is flush-mounted in the checkout lane, continuously watching
for items. When an item is detected and recognized, the cashier verifies the
quantity of items that were found under the basket, and continues to close the
transaction. The item can remain under the basket, and with LaneHawk, you are
assured to get paid for it… “
Face Recognition
Who is she?
Vision-Based Biometrics
“How the Afghan Girl was Identified by Her Iris Patterns” Read the story
Login without a Password…
Fingerprint scanners on
many new laptops,
other devices
Face recognition systems now
beginning to appear more widely
http://www.sensiblevision.com/
Object Recognition (in Mobile Phones)
This is becoming real:
•
Microsoft Research
• Point & Find, Nokia
Special Effects: Shape Capture
The Matrix movies, ESC Entertainment, XYZ RGB, NRC
NRC: National Research Council
Special Effects: Motion Capture
Pirates of the Carribean, Industrial Light and Magic
Click here for interactive demo
Sports
Sportvision first down line
Nice explanation on www.howstuffworks.com
• The system has to know the orientation of the field with respect to
the camera so that it can paint the first-down line with the correct
perspective from that camera's point of view.
• The system has to know, in that same perspective framework, exactly
where every yard line is.
Smart Cars
Slide content courtesy of Amnon Shashua
Mobileye
•
•
•
•
Vision systems currently in high-end BMW, GM, Volvo models
By 2010: 70% of car manufacturers.
Video demo
BMW: Bavarian Motor Works
GM: General Motors
Vision-Based Interaction (and Games)
Digimask: put your face on a 3D avatar.
Nintendo Wii has camera-based IR
tracking built in. See Lee’s work at
CMU on clever tricks on using it to
create a multi-touch display!
IR: Infra-Red
CMU: Carnegie Mellon University
“Game turns moviegoers into Human Joysticks”, CNET
Camera tracking a crowd, based on this work.
Vision in Space
NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.
Vision systems (JPL) used for several tasks
•
•
•
•
•
Panorama stitching
3D terrain modeling
Obstacle detection, position tracking
For more, read “Computer Vision on Mars” by Matthies et al.
JPL: Jet Propulsion Laboratory
Robotics
NASA’s Mars Spirit Rover
http://en.wikipedia.org/wiki/Spirit_rover
NASA: National Aeronautics and Space Administration
http://www.robocup.org/
Medical Imaging
3D imaging
MRI, CT
MRI: Magnetic Resonance Imaging
CT: Computer Tomography
Image guided surgery
Grimson et al., MIT
MIT: Massachusetts Institute of Technology
Current State of the Art
You just saw examples of current systems.
• Many of these are less than 5 years old.
This is a very active research area, and rapidly changing.
• Many new applications in the next 5 years
To learn more about vision applications and companies.
• David Lowe maintains an excellent overview of vision
companies
– http://www.cs.ubc.ca/spider/lowe/vision.html
Consumer-Level Applications
• Stitching: turning overlapping photos into a
single seamlessly stitched panorama (Figure
1.5a), as described in Chapter 9;
• Exposure bracketing: merging multiple
exposures taken under challenging lighting
conditions (strong sunlight and shadows) into
a single perfectly exposed image (Figure
1.5b), as described in Section 10.2;
• Morphing: turning a picture of one of your
friends into another, using a seamless morph
transition (Figure 1.5c);
• 3D modeling: converting one or more
snapshots into a 3D model of the object or
person you are photographing (Figure 1.5d),
as described in Section 12.6
• Video match move and stabilization:
inserting 2D pictures or 3D models into your
videos by automatically tracking nearby
reference points (see Section 7.4.2) or using
motion estimates to remove shake from your
videos (see Section 8.2.1);
• Photo-based walkthroughs: navigating a large
collection of photographs, such as the interior
of your house, by flying between different
photos in 3D (see Sections 13.1.2 and
13.5.5)
• Face detection: for improved camera focusing
as well as more relevant image searching
(see Section 14.1.1);
• Visual authentication: automatically logging
family members onto your home computer as
they sit down in front of the webcam (see
Section 14.2).
1.2 A Brief History
• Computational theory: What is the goal of the
computation (task) and what are the
constraints that are known or can be brought
to bear on the problem?
• Representations and algorithms: How are
the input, output, and intermediate
information represented and which algorithms
are used to calculate the desired result?
1.2 A Brief History
• Hardware implementation: How are the
representations and algorithms mapped onto
actual hardware, e.g., a biological vision
system or a specialized piece of silicon?
• Conversely, how can hardware constraints be
used to guide the choice of representation
and algorithm?
• With the increasing use of graphics chips
(GPUs) and many-core architectures for
computer vision (see Section C.2), this
question is again becoming quite relevant.
1.3 Book Overview
This Course
http://www.csie.ntu.edu.tw/~fuh/vcourse/szeliski
http://www.cs.washington.edu/education/courses/cse576/08sp/
Optional Project 1: Features
Optional Project 2: Panorama Stitching
http://www.cs.washington.edu/education/courses/cse576/05sp/projects/proj2/artifacts/winners.html
Indri Atmosukarto, 576 08sp
Optional Project 3: Face Recognition
Final Project
Open-ended project of your choosing
General Comments
Prerequisites—these are essential!
• Data structures
• A good working knowledge of C and C++ programming
– (or willingness/time to pick it up quickly!)
• Linear algebra
• Vector calculus
Course does not assume prior imaging experience
• computer vision, image processing, graphics, etc.
Project due Mar. 15
use correlation to do image matching
find dx, dy to minimize
| PIX(ima, x, y)  PIX(imb, x  dx, y  dy) |
( x , y )R
DC & CV Lab.
DC & CV Lab.
CSIE NTU
DC & CV Lab.
CSIE NTU
DC & CV Lab.
CSIE NTU
Download