CORNELL UNIVERSITY CS 764 Seminar in Computer Vision Ramin Zabih Fall 1998 CORNELL UNIVERSITY Course mechanics Meeting time will be Tue/Thu 11-12, here • Starting a week from today Home page is now up www/CS764 Assignment: present one paper • You’ll have a lot of freedom, but you need to talk to me in advance • Some possible papers will be posted shortly 2 CORNELL UNIVERSITY Topic of this seminar The use of “knowledge” in the analysis of visual data • Sometimes called “context” Clearly this is vital • On both psychological and technical grounds • But how? No one has much of an idea… What is the interface between reasoning and perception? (Or, mind and body?) 3 CORNELL UNIVERSITY What is the visual system’s “contract” Two standard (bad) answers Answer 1: describe the scene in terms of surfaces [low-level vision] • There is a green patch 2” wide 1’ away Answer 2: describe the scene in terms of objects [model-based recognition] • Start with a set of 3D models (modelbase) • Determine position and pose 4 CORNELL UNIVERSITY Why are these answers wrong? They are almost purely data-driven • Bottom-up (from the data) versus top-down (from somewhere else) They report “objective fact”, with no room for the task at hand • For a given image, there is only one right answer Other problems as well • Not very useful, etc. 5 CORNELL UNIVERSITY Technical and psychological arguments There are technical arguments against this • Vision is an inverse problem – Many 3D scenes could explain a single 2D image • On engineering grounds, this makes no sense – Ultimately, perception is used for some task The human perceptual system has both topdown and bottom-up elements • Various optical illusions – Two people can look at the same picture and see something completely different 6 CORNELL UNIVERSITY Your vision system doesn’t listen 10 CORNELL UNIVERSITY It makes “reasonable” assumptions 11 CORNELL UNIVERSITY Low-level vision has its solution Inverse problems require assumptions The assumptions for low-level vision are extremely general (I.e., weak) • Reflect the physics of the visible world • For example, motion or depth or intensity tend to be “coherent” – Saying that every pixel is moving differently from its neighbors is a very unlikely answer – The world we live in tends not to do that – Helmholtz’s “unconscious inference” 12 CORNELL UNIVERSITY We’ll need high-level vision Most of the field is low-level vision or modelbased recognition • Partly to avoid the confusion CS764 is about Key question: how to avoid brittleness? • Can make the visual system compute just what we need for our task (I.e., berries) • But how to handle the unexpected (I.e., lions)? 13 CORNELL UNIVERSITY A short historical perspective 1960’s vision was completely task-specific • A black blob in the center of the image is a telephone • These efforts are now considered “hacks” 1970’s vision became completely general • Marr pushed the field towards precise technical questions • Low-level vision and recognition became dominant 14 CORNELL UNIVERSITY Tasks strike back In the mid-1980’s, several attempts were made to re-introduce a notion of task • Active/animate/purposive vision These attempts are widely viewed as failures, for good reasons • We’ll look at them a bit next week It’s not enough to have good intuitions • There needs to be technical merit as well 15 CORNELL UNIVERSITY Desiderata Technical solutions (algorithms) that are very roughly consistent with human data • Goal is not AI, psychology or philosophy Provide visual summaries useful for tasks, but degrade gracefully • Handle open/unstructured environments • Deal with expectations and breakdown 16 CORNELL UNIVERSITY Our path for 764 No good computational work to read • Perhaps Vera will fix this? We will examine papers along these lines: • • • • Computational approaches that failed Psychological data that is highly suggestive Neurologically inspired architectures Cognitive scientists and philosophers – Their goal is argument, not algorithm! – They’ve thought the most about these issues 17