COGS 300 Notes February 27, 2014 Today’s “Person” Example: George Wald George Wald was the biologist who first worked out the biochemistry of the role Vitamin A plays in colour vision, for which he received the 1967 Nobel prize in physiology or medicine. On March 4, 1969, he gave a speech at an anti Vietnam war teach-in at MIT. The speech was printed in full by the Boston Globe and eventually translated into over forty languages, reprinted millions of times, and even made into a record album. The Boston Globe itself distributed 87,000 reprints of the speech in response to requests. Editorial page editor Charles L. Whipple said it was his impression that about half the requests received by the Globe were from young people who wanted to send it to their parents, and the other half from parents who wanted to send it to their children. The speech is still worth a read today. . . George Wald, A Generation in Search of a Future, March 4, 1969 Recall: This Week’s Learning Goals 1. To consider Marr’s “three levels” at which any machine carrying out an information-processing task must be understood 2. To provide a “fun” overview of human colour vision — including its deficiencies (compared to some other animals) 3. To explore two state-of-the-art computer vision systems: (a) UBC’s Curious George: a successful, embodied robot vision system (b) CMU’s “big data” NEIL (Never Ending Image Learner) 1 Take Away Points from Tuesday • Human vision occurs in the 400nm to 700nm wavelength range • Colour perception must account for the effects of — the spectral distribution of the light source(s) — the spectral absorption/scattering properties of any medium (other than a vacuum) through which the light travels — the spectral reflectance of an object’s surface material — the spectral sensitivity of the sensor • The human retina normally has three distinct cone types, nominally referred to as R (red), G (green) and B (blue) — The B system is different, both functionally and anatomically, from the RG system • Some teleost fish, amphibians and aquatic reptiles alter the spectral absorption properties of their visual pigments based on metabolism with two different forms of vitamin A • Aquatic environments impose environmental constraints on the “computational theory” of the vision problem that biological systems solve Sample Exam Question (short answer): None of Marr’s three levels explicitly take into account speed of computation. Do you think this is a shortcoming of his analysis? (Briefly justify your answer). Quiz 1 Which of these is an example of an Object-Attribute Relationship? A) B) C) D) E) Bus is found in bus depot Ocean is blue Alleys are narrow Sunflower is yellow None of the above Quiz 2 According to Chen et al., object-object relationships include: A) Partonomy relationships B) Taxonomy relationships 2 C) Similarity relationships D) All of the above Quiz 3 ‘Object category recognition’ is when robots recognize visual objects based on their A) B) C) D) E) Similarity to a prototype Shape Size Semantic meaning B&C Quiz 4 Which of the following is not one of the phases of the Semantic Robot Vision Challenge? A) B) C) D) E) Mapping phase Exploration phase Training phase Recognition phase None of the above Quiz 5 Chen et al. suggested that semantic drift can be avoided by. . . A) B) C) D) E) Extracting visual data from highly reliable sources Using more constraints on visual data Comparing data labels with wiki references Aligning another NEIL system None of the above Simultaneous Localization and Mapping (SLAM) SLAM is the problem in mobile robotics of building a geometrically accurate map of an unknown environment while at the same time navigating the environment using the map. SLAM is considered one of the “successes” of recent robotics research. It is being used in unmanned aerial vehicles, autonomous underwater vehicles, planetary rovers and domestic robots. Part of the reason for the SLAM’s success is the use of computer vision technology for landmark extraction. 3 Simultaneous Localization and Mapping (SLAM) Basic components of a SLAM system for a mobile robot: 1. landmark extraction 2. data association 3. state estimation 4. state update 5. landmark update SLAM Hardware Typical hardware components for SLAM: 1. mobile robot platform 2. odometry (to measure rotation of wheels, degrees turned) 3. range measurement — laser scanner — sonar 4. camera(s) SLAM: Overview of the Process sensor input odometry change Landmark Extraction Odometry Update Data Association Old Observation New Observation 4 The robot moves causing its odometry to change. Odometry information is updated accordingly. Landmarks are extracted from the environment via sensor input (laser and/or vision) in the robot’s new position. The robot then attempts to associate currently seen landmarks with previously observed landmarks. Previously seen landmarks are used to update the estimate of the robot’s current position. Landmarks not previously been seen are added to the database of landmarks as new observations (so they can used again later). SLAM Landmark Properties Desirable properties for SLAM landmarks: 1. Landmarks detectable (from a range of distances/orientations) 2. Individual landmarks distinguished from each other 3. Landmarks abundant in the environment 4. Landmarks stationary (i.e., permanent fixtures of the environment) 5. Landmarks well localized in the environment LCI’s Curious George Robots Curious George I, II, . . . are a series of UBC LCI mobile robots (see https://www.cs.ubc. ca/labs/lci/curious_george/) • Curious George I won the robot league of the 2007 Semantic Robot Vision Challenge (SRVC) at the AAAI 2007 conference in Vancouver, July, 2007 • Curious George II won the robot league of the 2008 Semantic Robot Vision Challenge (SRVC) at the CVPR 2008 conference in Anchorage, Alaska, June, 2008 • Owing to a Wi-Fi technical glitch, Curious George “did not finish” in the robot league of the 2009 competition at the 5th International Symposium on Visual Computing (ISVC), Las Vegas, Nevada, December, 2009. First place was won in the software league of the competition 5 Example 1: LCI’s Curious George I ActiveMedia PowerBot. SICK LMS200 range finder. Directed Perception PTU-D46-17.5 pan-tilt unit. Point Grey Research Bumblebee colour stereo camera. Canon PowerShot G7 (10MPix, 6x optical zoom) People in LCI are equipping a wheelchair with SLAM and other computer vision capability as part of a new national network called ICAST (Intelligent Computational Assistive Science and Technology). Building a Map video: SLAM in action 6 Semantic Robot Vision Challenge (SRVC) • A robot is given a list of names of objects, both particular and generic, which it must find in a small test room. The robot can use its broad knowledge about object classes to find the objects • Or it could download images of the named objects from the Web and construct classifiers to recognize those objects • The robot enters the test area and searches for the objects. The robot returns a list of the objects’ names and the best image matching the name. In each image it must outline the object with a bounding box (i.e., a tight-fitting rectangle) SRVC Contest: Phase 1 Automated analysis of web to learn previously unseen objects. Sample query: “shoe” SRVC Contest: Phase 2 Explore contest environment, collect imagery 7 SRVC Contest: Phase 3 Perform recognition But. . . Consider the “Banana Problem” Data Sources • Google Image Search – Large numbers of images – Lowest noise ratio of common search engines • Walmart product images – Fewer images per category – Very high image quality • In the future, IKEA world? – Well structured, rich context, organized by place • ImageNet, an image database organized according to the WordNet hierarchy 8 SRCV 2009 List of Objects The official list of objects consisted of: 1. pumpkin 2. orange 3. red ping pong paddle 4. white soccer ball 5. laptop 6. dinosaur 7. bottle 8. toy car 9. frying pan 10. book “I am a Strange Loop” by Douglas Hofstadter SRCV 2009 List of Objects (cont’d) 11. book “Fugitive from the Cubicle Police” 12. book “Photoshop in a Nutshell” 13. CD “And Winter Came” by Enya 14. CD “The Essential Collection” by Karl Jenkins and Adiemus 15. DVD “Hitchhiker’s Guide to the Galaxy” widescreen 16. game “Call of Duty 4” box 17. toy Domo-kun 18. Lay’s Classic Potato Chips 19. Pepperidge Farm Goldfish Baked Snack Crackers 20. Pepperidge Farm Milano Distinctive Cookies 9 Curious George Results: Scoring: • 8 of 12 specific instances • 4 of 8 generic categories What are the Challenges? • How to navigate autonomously in the environment • How to see (i.e., acquire images of) all the objects • How to deal with conflicting meaning and appearance in web images • How to properly assign semantic labels to the objects seen by the robot at different times (and from different viewpoints) • Other? Example 2: Never Ending Image Learner (NEIL) NEIL is an attempt to develop the world’s largest visual structured knowledge base with minimum human effort NEIL uses web data to extract: 1. Labeled examples of object categories with bounding boxes 2. Labeled examples of scenes 3. Labeled examples of attributes 10 4. Visual subclasses for object categories 5. Common sense relationships about scenes, objects and attributes NEIL: Key “Insight” “Our key insight is that at a large scale one can simultaneously label the visual instances and extract common sense relationships in a joint semi-supervised learning framework” Contributions: 1. A “never ending” learning algorithm for gathering visual knowledge from the Internet (via macro-vision) 2. Automatic generation of a large visual structured knowledge base — labeled instances of scenes, objects, and attributes — relationships among them 3. Demonstration of how joint discovery of relationships and labeling of instances at a gigantic scale can provide constraints for improving semi-supervised learning NEIL’s core semi-supervised learning (SSL) algorithm works with a fixed vocabulary. The authors’ state noun phrases from the never-ending language learning (NELL) ontology are used to grow the vocabulary. NEIL cycles between extracting global relationships, labeling data and learning classifiers/detectors for building visual knowledge from the Internet. Note: For objects NEIL learns detectors and for scenes NEIL builds classifiers. NEIL: Relationships Four types of relationships: 1. Object–Object (e.g., Wheel is a part of Car) 2. Object–Attribute (e.g., Sheep is/has White) 3. Scene–Object (e.g., Car is found in Raceway) 4. Scene–Attribute (e.g., Alley is/has Narrow) NEIL: Implementation Details To train scene and attribute classifiers, NEIL extracts a 3,912 dimensional feature vector from each image: 11 Feature GIST Concatenated bag of words (BOW) for: SIFT HOG Lab colour Texton Dimension 512 1,000 1,000 400 1,000 total 3,912 “Features of randomly sampled windows from other categories are used as negative examples for SVM training. . . For the object and attribute section, we use CHOG features with a bin size of 8. We train the detectors using latent SVM model (without parts)” NEIL: Summary Statistics As of October 20, 2013, NEIL has an ontology of: 1,152 1,034 87 object categories scene categories attributes Bootstrap using “seed” images from ImageNet, SUN and (top-images) from Google Image Search More than 2 million images downloaded NEIL has labeled more than 400K visual instances (including 300,000 objects with their bounding boxes) NEIL also has extracted 1,703 common sense relationships NEIL: Explore the Web Site Browse the current visual knowledge base at www.neil-kb.com 12