G52HPA: History and Philosophy of Artificial Intelligence Lecture 6: Vision & Reality Tony Pridmore & Natascha Alechina School of Computer Science {tpp,nza}@cs.nott.ac.uk Outline of this lecture • What is Vision? • Early days - blocks world semantics • David Marr - representations and assumptions • The cycle of perception • Probabilistic models • Illusions • Relation to philosophy G52HPA Lecture 5: Vision & Reality 2 What is Vision? • “to know what is where by seeing” – Aristotle • Extraction of symbolic descriptions of the viewed environment from an individual or sequence of images – not image processing – image processing takes an image and produces a (hopefully) better one, usually for human consumption Red: 33 Green: 11 Blue: 14 G52HPA Lecture 5: Vision & Reality 3 What is Vision? • Backprojection: the value of a pixel depends upon – illumination – viewpoint – surface reflectance – surface geometry “Vision is putting the toothpaste back in the tube” – John Mayhew G52HPA Lecture 5: Vision & Reality 4 What is Vision? • At a higher level, the goal is to provide semantic information about the viewed world • Requires a transformation (implicit or explicit) to a world, as opposed to camera, coordinate frame • Work at this level usually involves time-varying image sequences G52HPA Lecture 5: Vision & Reality 5 Early days – blocks world semantics • Initially considered to be easy, Marvin Minsky famously hired Sussman (1st year ug) to build a vision system over the summer • Attempt with real camera, etc quickly lead AI community to retreat to the “core” problem; extracting semantics • Also restricted attention to a toy environment; the blocks world – Polyhedral (trihedral) solids – Line drawings not real images – No illumination effects, texture, shading – Goal is to interpret in terms of surfaces and objects G52HPA Lecture 5: Vision & Reality 6 Early days – blocks world semantics • Labelling approach – list possible interpretations of each line (convex, concave, etc), aim to produce a consistent labelling of the drawing • Guzman’s (1968) first attempt was a hack • Huffman & Clowes identified core labels in early 1970s • David Waltz added shadows, and invented relaxation labelling in 1975 G52HPA Lecture 5: Vision & Reality 7 David Marr • Physiologist and mathematician • Argued (1977) for study of real vision systems • To understand vision, or any component of AI, requires Computational theory what is computed and why? – Algorithm how is it computed? – Mechanism what is it computed on? G52HPA Lecture 5: Vision & Reality 8 David Marr • Marr proposed a representational framework that moved steadily from the image towards higher level interpretations G52HPA Lecture 5: Vision & Reality 9 David Marr • Marr emphasised computational theory, particularly the clear statement of the assumptions made, because – Vision is impossible without prior knowledge – Many possible, but hugely unlikely worlds could generate any image – We only get a clear, instantaneous and usually unambiguous percept because of the assumptions our visual system makes • Lead to work in the late 70’s/early 80’s on computational theories for – Binocular stereo – Motion recovery (optic flow) – Shape from texture – Shape from shading… G52HPA Lecture 5: Vision & Reality 10 The 1980s: Expert Systems and the Cycle of Perception • Marr’s representational framework was essentially linear • Recognition of the role of prior knowledge and the ES boom lead to a large body of work on knowledge-based vision – Rule-based systems – Blackboard architectures – Greater emphasis on domain knowledge • Most implicitly adopt Neisser’s (1976) cycle of perception architecture Results now explicitly depend on prior knowledge G52HPA Lecture 5: Vision & Reality 11 Modern Computer Vision: Probabilistic Models • Computer vision now recognises – the importance of prior knowledge – that there is no unique solution to any problem, only a set of possible solutions with associated probabilities • Probabilistic models now dominate, from low level image segmentation to high level event recognition – Expectation maximisation (EM) algorithm – Hidden markov models – Bayesian filtering to track moving objects G52HPA Lecture 5: Vision & Reality 12 Modern Computer Vision: Probabilistic Models 1 2 x 3 P(1 x ) P(2 x ) P(3 x ) e.g. Segmentation using EM G52HPA Lecture 5: Vision & Reality 13 Illusions as Evidence The Ames Room The Ponzo Illusion G52HPA Lecture 5: Vision & Reality 14 Illusions as Tools • Local vs global • High-level vs low-level G52HPA Lecture 5: Vision & Reality 15 Relation to Philosophy • Perception is not a passive, but an active process – as true of e.g. language understanding as vision • We are rarely aware of it, it may be an 'art concealed in the depths of the human soul‘ (Kant) • In Critique of Pure Reason, Kant suggested that – all our empirical knowledge is made up of both 'what we receive through impressions‘ and of what 'our own faculty of knowledge supplies from itself • This casts doubt on both the representations we have of the world and the validity of any reasoning and/or planning we do over them G52HPA Lecture 5: Vision & Reality 16