Intro

advertisement
16-721: Advanced Machine Perception
Staff:
• Instructor: Alexei (Alyosha) Efros
(efros@cs), 4207 NSH
• TA: David Bradley
(dbradley@cs), 2216 NSH
Web Page:
• http://www.cs.cmu.edu/~efros/co
urses/AP06/
Today
Introduction
Why Perception?
Administrative stuff
Overview of the course
Image Datasets
A bit about me
Alexei (Alyosha) Efros
Relatively new faculty (RI/CSD)
Ph.D 2003, from UC Berkeley (signed by Arnie!)
Research Fellow, University of Oxford, ’03-’04
Teaching
I am still learning…
The plan is to have fun and learn cool things, both
you and me!
Social warning: I don’t see well
Research
Vision, Graphics, Data-driven “stuff”
PhD Thesis on Texture and Action Synthesis
Smart Erase button in Microsoft Digital Image Pro:
Antonio Criminisi’s son cannot walk but he can fly
The story begins…
“All happy families are alike; each unhappy family is
unhappy in its own way.”
-- Lev Tolstoy, Anna Karenina
“What does it mean, to see? The plain man's answer (and
Aristotle's, too). would be, to know what is where by
looking.”
-- David Marr, Vision (1982)
Vision: a split personality
“What does it mean, to see? The plain man's answer (and
Aristotle's, too). would be, to know what is where by looking. In
other words, vision is the process of discovering from images
what is present in the world, and where it is.”
depth map
Answer #1: pixel of brightness 243 at position (124,54)
…and depth .7 meters
Answer #2: looks like bottom edge of whiteboard
showing at the top of the image
Is the difference just a matter of scale?
Measurement vs. Perception
Brightness: Measurement vs. Perception
Brightness: Measurement vs. Perception
Proof!
Lengths: Measurement vs. Perception
Müller-Lyer Illusion
http://www.michaelbach.de/ot/sze_muelue/index.html
Vision as Measurement Device
Real-time stereo on Mars
Physics-based Vision
Structure from Motion
Virtualized Reality
…but why?
Reason #1:
• Semester too short, can’t cover everything
• Other great classes offered at CMU, e.g.:
– Appearance Modeling (Srinivas Narasimhan, every fall)
– Medical Vision (Yanxi Liu)
– Structure from Motion (Martial Hebert, sometime?)
“But what if I don’t care about this wishy-washy human
perception stuff? I just want to make my robot go!”
Reason #2:
• For measurement, other sensors are often better (in DARPA
Grand Challenge, vision was barely used!)
Reason #3:
The goals of computer vision (what + where)
are in terms of what humans care about.
So what do humans care about?
slide by Fei Fei, Fergus & Torralba
Verification: is that a bus?
slide by Fei Fei, Fergus & Torralba
Detection: are there cars?
slide by Fei Fei, Fergus & Torralba
Identification: is that a picture of Mao?
slide by Fei Fei, Fergus & Torralba
Object categorization
sky
building
flag
banner
face
wall
street lamp
bus
bus
cars
slide by Fei Fei, Fergus & Torralba
Scene and context categorization
• outdoor
• city
• traffic
•…
slide by Fei Fei, Fergus & Torralba
Rough 3D layout, depth ordering
Challenges 1: view point variation
Michelangelo 1475-1564
Challenges 2: illumination
slide credit: S. Ullman
Challenges 3: occlusion
Magritte, 1957
Challenges 4: scale
slide by Fei Fei, Fergus & Torralba
Challenges 5: deformation
Xu, Beihong 1943
Challenges 6: background clutter
Klimt, 1913
Challenges 7: object intra-class variation
slide by Fei-Fei, Fergus & Torralba
Challenges 8: local ambiguity
slide by Fei-Fei, Fergus & Torralba
Challenges 9: the world behind the image
In this course, we will:
Take a few baby steps…
Course Organization
Requirements:
1. Paper Presentations (50%)
• Paper Advocate
• Paper Demo Presenter
• Paper Opponent
2. Class Participation (20%)
• Keep annotated bibliography
• Post questions / comments on Quick-topic
• Ask questions / debate / flight / be involved!
3. Final Project (30%)
• Do something with lots of data (at least 500 images)
• Groups of 1, 2, or 3
Paper Advocate
1. Pick a paper from list
•
•
That you like and willing to defend
Sometimes I will make you do two papers, or background
2. Meet with me before starting to talk about
how to present the paper(s)
3. Prepare a good, conference-quality
presentation (20-45 min, depending on
difficulty of material)
4. Meet with me again 2 days before class to
go over the presentation
•
Office hours at end of each class
5. Present and defend the paper in front of
class
Paper Demo Presenter
For some papers, we will have separate demo
presentations
1. Sign up for a paper you find interesting
2. Get the code online (or implement if easy)
3. Run it on a toy problem, play with
parameters
4. Run it on a new dataset
5. Prepare short 5-10 min presentation
detailing results
6. Can cooperate with Paper Advocate
Paper Opponent
1. Sign up for a paper you don’t like /
suspicious about
2. Prepare an argument (with or without slides)
against the paper:
•
•
•
•
Paper weaknesses
Relevance to real problems
Existence of better alternative approaches
Etc.
3. Present in front of class (5-10 min)
Class Participation
Keep annotated bibliography of papers you
read (always a good idea!). The format is up
to you. At least, it needs to have:
• Summary of key points
• A few Interesting insights, “aha moments”, keen
observations, etc.
• Weaknesses of approach. Unanswered
questions. Areas of further investigation,
improvement.
Submit your thoughts for current paper(s)
before each class (printout)
Class Participation
In addition, submit interesting observations or
questions to QuickTopic before class for
public discussion.
Be active in class. Voice your ideas, concerns.
You need to participate: either in class or in
QuickTopic every week!
Dave will be watching and keeping track!
Final Project
Can grow out of paper presentation, or your
own research
But it needs to use large amounts of data!
1-3 people per project.
Project proposals in a few weeks.
Project presentations at the end of semester.
Results presented as a CVPR-format paper.
Hopefully, a few papers may be submitted to
conferences.
End of Semester Awards
We will vote for:
•
•
•
•
Best Paper Presenter
Best Paper Opponent
Best Demo
Best Project
Prize: dinner in a nice restaurant
Course Outline
Physiology of Vision (1 lecture)
Overview of Human Visual Percetion (1 lecture)
• Need presenter for Monday!
Part I: Low-level vision (images as texture)
• Texture segmentation, image retrieval, scene models, “Bag of
words” representations
Part II: Mid-level vision (segmentation)
• Principles of grouping, Normalized Cuts, Mean-shift, DDMCMC, Graph-cut, super-pixels
Part III: 2D Recognition
• Window scanning (Schniderman+Kanade, Viola+Jones)
• Correspondence Matching (schanfer matching, housedorf
distance, shape contexts, invariant features, active appearance
models)
• Recognition with Segmentation (top-down + buttom-up)
• Words and Pictures
Course Outline (cont.)
Part IV: Intrinsic Images
• Shading vs. reflectance
• Recovering surface orientations and depth
• Style vs. content
Part V: Dealing with Data
• Isomap, LLE, Non-negative Matrix Factorization
Part VI: Tracking and Motion Segmentation
• Particle filtering, examplar-based, layers
Sign up to present one paper on Wed on
QuickTopic
Datasets
See web page
Download