COGS 300 Lecture Notes 13W Term 2

advertisement
COGS 300
Notes
February 27, 2014
Today’s “Person” Example: George Wald
George Wald was the biologist who first worked out the biochemistry of the role Vitamin A plays
in colour vision, for which he received the 1967 Nobel prize in physiology or medicine.
On March 4, 1969, he gave a speech at an anti Vietnam war teach-in at MIT. The speech was
printed in full by the Boston Globe and eventually translated into over forty languages, reprinted
millions of times, and even made into a record album.
The Boston Globe itself distributed 87,000 reprints of the speech in response to requests. Editorial
page editor Charles L. Whipple said it was his impression that about half the requests received by
the Globe were from young people who wanted to send it to their parents, and the other half from
parents who wanted to send it to their children.
The speech is still worth a read today. . .
George Wald, A Generation in Search of a Future, March 4, 1969
Recall: This Week’s Learning Goals
1. To consider Marr’s “three levels” at which any machine carrying out an information-processing
task must be understood
2. To provide a “fun” overview of human colour vision
— including its deficiencies (compared to some other animals)
3. To explore two state-of-the-art computer vision systems:
(a) UBC’s Curious George: a successful, embodied robot vision system
(b) CMU’s “big data” NEIL (Never Ending Image Learner)
1
Take Away Points from Tuesday
• Human vision occurs in the 400nm to 700nm wavelength range
• Colour perception must account for the effects of
— the spectral distribution of the light source(s)
— the spectral absorption/scattering properties of any medium (other than a vacuum) through
which the light travels
— the spectral reflectance of an object’s surface material
— the spectral sensitivity of the sensor
• The human retina normally has three distinct cone types, nominally referred to as R (red), G
(green) and B (blue)
— The B system is different, both functionally and anatomically, from the RG system
• Some teleost fish, amphibians and aquatic reptiles alter the spectral absorption properties of
their visual pigments based on metabolism with two different forms of vitamin A
• Aquatic environments impose environmental constraints on the “computational theory” of
the vision problem that biological systems solve
Sample Exam Question (short answer):
None of Marr’s three levels explicitly take into account speed of computation. Do you think this is
a shortcoming of his analysis? (Briefly justify your answer).
Quiz 1
Which of these is an example of an Object-Attribute Relationship?
A)
B)
C)
D)
E)
Bus is found in bus depot
Ocean is blue
Alleys are narrow
Sunflower is yellow
None of the above
Quiz 2
According to Chen et al., object-object relationships include:
A) Partonomy relationships
B) Taxonomy relationships
2
C) Similarity relationships
D) All of the above
Quiz 3
‘Object category recognition’ is when robots recognize visual objects based on their
A)
B)
C)
D)
E)
Similarity to a prototype
Shape
Size
Semantic meaning
B&C
Quiz 4
Which of the following is not one of the phases of the Semantic Robot Vision Challenge?
A)
B)
C)
D)
E)
Mapping phase
Exploration phase
Training phase
Recognition phase
None of the above
Quiz 5
Chen et al. suggested that semantic drift can be avoided by. . .
A)
B)
C)
D)
E)
Extracting visual data from highly reliable sources
Using more constraints on visual data
Comparing data labels with wiki references
Aligning another NEIL system
None of the above
Simultaneous Localization and Mapping (SLAM)
SLAM is the problem in mobile robotics of building a geometrically accurate map of an unknown
environment while at the same time navigating the environment using the map.
SLAM is considered one of the “successes” of recent robotics research. It is being used in unmanned aerial vehicles, autonomous underwater vehicles, planetary rovers and domestic robots.
Part of the reason for the SLAM’s success is the use of computer vision technology for landmark
extraction.
3
Simultaneous Localization and Mapping (SLAM)
Basic components of a SLAM system for a mobile robot:
1. landmark extraction
2. data association
3. state estimation
4. state update
5. landmark update
SLAM Hardware
Typical hardware components for SLAM:
1. mobile robot platform
2. odometry (to measure rotation of wheels, degrees turned)
3. range measurement
— laser scanner
— sonar
4. camera(s)
SLAM: Overview of the Process
sensor input
odometry change
Landmark
Extraction
Odometry
Update
Data
Association
Old
Observation
New
Observation
4
The robot moves causing its odometry to change. Odometry information is updated accordingly.
Landmarks are extracted from the environment via sensor input (laser and/or vision) in the robot’s
new position. The robot then attempts to associate currently seen landmarks with previously observed landmarks. Previously seen landmarks are used to update the estimate of the robot’s current
position. Landmarks not previously been seen are added to the database of landmarks as new
observations (so they can used again later).
SLAM Landmark Properties
Desirable properties for SLAM landmarks:
1. Landmarks detectable (from a range of distances/orientations)
2. Individual landmarks distinguished from each other
3. Landmarks abundant in the environment
4. Landmarks stationary (i.e., permanent fixtures of the environment)
5. Landmarks well localized in the environment
LCI’s Curious George Robots
Curious George I, II, . . . are a series of UBC LCI mobile robots (see https://www.cs.ubc.
ca/labs/lci/curious_george/)
• Curious George I won the robot league of the 2007 Semantic Robot Vision Challenge (SRVC)
at the AAAI 2007 conference in Vancouver, July, 2007
• Curious George II won the robot league of the 2008 Semantic Robot Vision Challenge (SRVC)
at the CVPR 2008 conference in Anchorage, Alaska, June, 2008
• Owing to a Wi-Fi technical glitch, Curious George “did not finish” in the robot league of
the 2009 competition at the 5th International Symposium on Visual Computing (ISVC),
Las Vegas, Nevada, December, 2009. First place was won in the software league of the
competition
5
Example 1: LCI’s Curious George I
ActiveMedia PowerBot. SICK LMS200 range finder. Directed Perception PTU-D46-17.5 pan-tilt
unit. Point Grey Research Bumblebee colour stereo camera. Canon PowerShot G7 (10MPix, 6x
optical zoom)
People in LCI are equipping a wheelchair with SLAM and other computer vision capability as
part of a new national network called ICAST (Intelligent Computational Assistive Science and
Technology).
Building a Map
video: SLAM in action
6
Semantic Robot Vision Challenge (SRVC)
• A robot is given a list of names of objects, both particular and generic, which it must find
in a small test room. The robot can use its broad knowledge about object classes to find the
objects
• Or it could download images of the named objects from the Web and construct classifiers to
recognize those objects
• The robot enters the test area and searches for the objects. The robot returns a list of the
objects’ names and the best image matching the name. In each image it must outline the
object with a bounding box (i.e., a tight-fitting rectangle)
SRVC Contest: Phase 1
Automated analysis of web to learn previously unseen objects. Sample query: “shoe”
SRVC Contest: Phase 2
Explore contest environment, collect imagery
7
SRVC Contest: Phase 3
Perform recognition
But. . . Consider the “Banana Problem”
Data Sources
• Google Image Search
– Large numbers of images
– Lowest noise ratio of common search engines
• Walmart product images
– Fewer images per category
– Very high image quality
• In the future, IKEA world?
– Well structured, rich context, organized by place
• ImageNet, an image database organized according to the WordNet hierarchy
8
SRCV 2009 List of Objects
The official list of objects consisted of:
1. pumpkin
2. orange
3. red ping pong paddle
4. white soccer ball
5. laptop
6. dinosaur
7. bottle
8. toy car
9. frying pan
10. book “I am a Strange Loop” by Douglas Hofstadter
SRCV 2009 List of Objects (cont’d)
11. book “Fugitive from the Cubicle Police”
12. book “Photoshop in a Nutshell”
13. CD “And Winter Came” by Enya
14. CD “The Essential Collection” by Karl Jenkins and Adiemus
15. DVD “Hitchhiker’s Guide to the Galaxy” widescreen
16. game “Call of Duty 4” box
17. toy Domo-kun
18. Lay’s Classic Potato Chips
19. Pepperidge Farm Goldfish Baked Snack Crackers
20. Pepperidge Farm Milano Distinctive Cookies
9
Curious George Results:
Scoring:
• 8 of 12
specific
instances
• 4 of 8
generic
categories
What are the Challenges?
• How to navigate autonomously in the environment
• How to see (i.e., acquire images of) all the objects
• How to deal with conflicting meaning and appearance in web images
• How to properly assign semantic labels to the objects seen by the robot at different times
(and from different viewpoints)
• Other?
Example 2: Never Ending Image Learner (NEIL)
NEIL is an attempt to develop the world’s largest visual structured knowledge base with minimum
human effort
NEIL uses web data to extract:
1. Labeled examples of object categories with bounding boxes
2. Labeled examples of scenes
3. Labeled examples of attributes
10
4. Visual subclasses for object categories
5. Common sense relationships about scenes, objects and attributes
NEIL: Key “Insight”
“Our key insight is that at a large scale one can simultaneously label the visual instances and extract
common sense relationships in a joint semi-supervised learning framework”
Contributions:
1. A “never ending” learning algorithm for gathering visual knowledge from the Internet (via
macro-vision)
2. Automatic generation of a large visual structured knowledge base
— labeled instances of scenes, objects, and attributes
— relationships among them
3. Demonstration of how joint discovery of relationships and labeling of instances at a gigantic
scale can provide constraints for improving semi-supervised learning
NEIL’s core semi-supervised learning (SSL) algorithm works with a fixed vocabulary. The authors’
state noun phrases from the never-ending language learning (NELL) ontology are used to grow the
vocabulary.
NEIL cycles between extracting global relationships, labeling data and learning classifiers/detectors
for building visual knowledge from the Internet.
Note: For objects NEIL learns detectors and for scenes NEIL builds classifiers.
NEIL: Relationships
Four types of relationships:
1. Object–Object (e.g., Wheel is a part of Car)
2. Object–Attribute (e.g., Sheep is/has White)
3. Scene–Object (e.g., Car is found in Raceway)
4. Scene–Attribute (e.g., Alley is/has Narrow)
NEIL: Implementation Details
To train scene and attribute classifiers, NEIL extracts a 3,912 dimensional feature vector from each
image:
11
Feature
GIST
Concatenated bag of words (BOW) for:
SIFT
HOG
Lab colour
Texton
Dimension
512
1,000
1,000
400
1,000
total 3,912
“Features of randomly sampled windows from other categories are used as negative examples for
SVM training. . . For the object and attribute section, we use CHOG features with a bin size of 8.
We train the detectors using latent SVM model (without parts)”
NEIL: Summary Statistics
As of October 20, 2013, NEIL has an ontology of:
1,152
1,034
87
object categories
scene categories
attributes
Bootstrap using “seed” images from ImageNet, SUN and (top-images) from Google Image Search
More than 2 million images downloaded
NEIL has labeled more than 400K visual instances (including 300,000 objects with their bounding
boxes)
NEIL also has extracted 1,703 common sense relationships
NEIL: Explore the Web Site
Browse the current visual knowledge base at
www.neil-kb.com
12
Download