ppt

advertisement
Image Understanding
Xuejin Chen
University of Science and Technology of China
xjchen99@ustc.edu.cn
63600852
Webpage:
http://staff.ustc.edu.cn/~xjchen99/teaching/ImageUnderstanding/ImageUnderstanding.html
Computer Vision (CSE 576)
• Staff
Steve Seitz
Rick Szeliski
seitz@cs.washington.edu szeliski@microsoft.com
• Web Page
– http://www.cs.washington.edu/education/courses/cse576/08sp/
– http://www.cs.washington.edu/education/courses/cse576/08sp/
• Book
– http://szeliski.org/Book/
– http://szeliski.org/Book/drafts/SzeliskiBook_20100903_draft.pdf
Today
• Content
– Intros
– Overview
– Image formation
• Readings
– Book: Richard Szeliski, Computer Vision: Algorithms and Applications
• Intro: Ch 1.0
Image Understanding
Image Understanding
Digital image
Image Understanding
Digital image
2D Image
RGBD Image
Image Sequence
Image Understanding
Image Understanding
Computer
Image Understanding
Computer
Image Understanding
Computer
Human
Image Understanding
Computer
Human
Image Understanding
Computer
Human
Like?
Same as?
Better than?
…
Computer Vision
• How does computer interpret images?
– To describe the world that we see in one or more
images and
– to reconstruct the properties, such as shape,
illumination, and color distributions.
• Semantic Interpretation
Computer Vision
• How does computer interpret images?
– To describe the world that we see in one or more
images and
– to reconstruct the properties, such as shape,
illumination, and color distributions.
• Semantic Interpretation
What do you see?
What do you see?
What do you see?
•
•
•
•
•
•
•
Purple flower
Individual petal
Green background
Plant
Layers, depth
Shading, texture
…
How about computer?
• Image Formation
– MxN matrix with color of each pixel
– Vector images…
18,34,9
• Difficult to interpret
– Inverse problem
– Infer unknowns
132,115,211
Human perception has its shortcomings…
Human perception has its shortcomings…
Copyright A.Kitaoka 2003
Dragon Optical Illusion
http://www.grandillusions.com/images/articles/opticalillusions/dragon_illusion/dragon_illusion.wmv
Every picture tells a story
• Goal of computer vision is to write computer programs that can
interpret images
Can computers match (or beat) human vision?
• Yes and no (but mostly no!)
– humans are much better at “hard” things
– computers can be better at “easy” things
Timeline of the most active topics
1970
1980
1990
Digital image processing
Blocks world, line labeling
Generalized cylinders
Pictorial structure
Stereo correspondence
Intrinsic images
Optical flow
Structure from motion
Image pyramids
Scale-space processing
Shape from shading, texture, and focus
Physically-based modeling
Regularization
Markov Random fields
Kalman filters
3D range data processing
Projective invariants
Factorization
Timeline of the most active topics
1990
2000
2010
3D range data processing
Projective invariants
Factorization
Physical-based vision
Graph cuts
Particle filtering
Energy-based segmentation
Face recognition and detection
Subspace methods
image-based modeling and rendering
Texture synthesis and inpainting
Computational photography
Feature-based recognition
MRF inference algorithms
Category recognition
Learning
Image retrieval
RGBD image processing
Scene understanding
1970s
1980s
1990s
2000s
2010s
(b) Image-based rendering/modeling
(a) Scene understanding
(c) Recognition by 3d models
(d) Group profiling/tracking
New Algorithms and Applications
• Data-Driven Algorithms
– Large database
– Statistic algorithms
• Interactive Vision
– Human interaction
– User interface
• Internet Applications
Chapter Contents
Image formation
Segmentation
Image processing
Motion
Features
Stitching
Recognition
Current state of the art
• The next slides show some examples of what
current vision systems can do
Earth viewers (3D modeling)
Image from Microsoft’s Virtual Earth
(see also: Google Earth)
Photosynth
http://labs.live.com/photosynth/
Based on Photo Tourism technology developed here in CSE!
by Noah Snavely, Steve Seitz, and Rick Szeliski
Recognition
• Barcode Recognition
Object recognition (in mobile phones)
• This is becoming real:
–
Microsoft Research
– Point & Find, Nokia
Optical character recognition (OCR)
Technology to convert scanned docs to text
• If you have a scanner, it probably came with OCR software
Digit recognition, AT&T labs
http://www.research.att.com/~yann/
License plate readers
http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
Face detection
• Many new digital cameras now detect faces
– Canon, Sony, Fuji, …
Smile detection?
Sony Cyber-shot® T70 Digital Still Camera
Object recognition (in supermarkets)
LaneHawk by EvolutionRobotics
“A smart camera is flush-mounted in the checkout lane, continuously watching
for items. When an item is detected and recognized, the cashier verifies the
quantity of items that were found under the basket, and continues to close the
transaction. The item can remain under the basket, and with LaneHawk,you are
assured to get paid for it… “
Face recognition
Who is she?
Vision-based biometrics
“How the Afghan Girl was Identified by Her Iris Patterns” Read the story
Login without a password…
Fingerprint scanners on
many new laptops,
other devices
Face recognition systems now
beginning to appear more widely
http://www.sensiblevision.com/
Special effects: shape capture
The Matrix movies, ESC Entertainment, XYZRGB, NRC
Special effects: motion capture
Pirates of the Carribean, Industrial Light and Magic
Click here for interactive demo
Sports
Sportvision first down line
Nice explanation on www.howstuffworks.com
Slide content courtesy of Amnon Shashua
Smart cars
• Mobileye
– Vision systems currently in high-end BMW, GM,
Volvo models
– By 2010: 70% of car manufacturers.
– Video demo
Vision-based interaction (and games)
Digimask: put your face on a 3D avatar.
Nintendo Wii has camera-based IR
tracking built in. See Lee’s work at
CMU on clever tricks on using it to
create a multi-touch display!
“Game turns moviegoers into Human Joysticks”, CNET
Camera tracking a crowd, based on this work.
Vision-based interaction
• Microsoft kinect
3D Modeling using Kinect
• KinectFusion
• …
Vision in space
NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.
Vision systems (JPL) used for several tasks
•
•
•
•
Panorama stitching
3D terrain modeling
Obstacle detection, position tracking
For more, read “Computer Vision on Mars” by Matthies et al.
Curiosity Rover
Master Camera(MastCam)
provides multiple spectra and true-color
imaging with two cameras
Chemistry and Camera complex (ChemCam)
Navigation cameras (NavCams)
Two pairs of black and white navigation cameras mounted on the mast to support
ground navigation. The cameras have a 45 degree angle of view and use visible
light to capture stereoscopic 3-D imagery.
Hazard avoidance cameras (hazcams)
Four pairs of black and white navigation cameras called hazcams—two pairs in
the front and two pairs in the back. The cameras use visible light to capture
stereoscopic three-dimensional (3-D) imagery.[74] The cameras have a 120 degree
field of view and map the terrain at up to 3 m (9.8 ft) in front of the rover.
http://en.wikipedia.org/wiki/Curiosity_rover
Robotics
NASA’s Mars Spirit Rover
http://en.wikipedia.org/wiki/Spirit_rover
http://www.robocup.org/
Medical imaging
3D imaging
MRI, CT
Image guided surgery
Grimson et al., MIT
Current state of the art
• You just saw examples of current systems.
– Many of these are less than 10 years old
• This is a very active research area, and rapidly changing
• To learn more about vision applications and companies
– David Lowe maintains an excellent overview of vision
companies
• http://www.cs.ubc.ca/spider/lowe/vision.html
Many Interesting Problems
Structure
From Motion
Object
Tracking
Ref: http://www.cs.ubc.ca/spider/lowe/vision.html
Reconstruction
Face
Detection
This course
• Webpage:
– http://staff.ustc.edu.cn/~xjchen99/teaching/teaching.html
• Ref:
– http://www.cs.washington.edu/education/courses/cse576/08sp/
Grading
• Based on 4 projects
• 3 tests before you start 3 projects
Project 1: features
Project 2: panorama stitching
•
http://www.cs.washington.edu/education/courses/cse576/05sp/projects/proj2/artifacts/winners.html
Indri Atmosukarto, 576 08sp
Project 3: Face Recognition
Final Project
• Open-ended project
– in teams of two~three students
General Comments
• Prerequisites—these are essential!
– Data structures
– A good working knowledge of C and C++
programming
• (or willingness/time to pick it up quickly!)
– Linear algebra
– Vector calculus
• Course does not assume prior imaging experience
– computer vision, image processing, graphics, etc.
Download