UW Computer Vision (CSE 576) Staff Steve Seitz Rick Szeliski seitz@cs.washington.edu szeliski@microsoft.com TA: Jiun-Hung Chen jhchen@cs.washington.edu Web Page • http://www.cs.washington.edu/education/courses/cse576/08sp/ Today • • • • Introduction Computer vision overview Course overview Image processing Readings • Book: Richard Szeliski, Computer Vision: Algorithms and Applications – (please check Web site weekly for updated drafts) – Introduction: Chapter 1.0 1.1 What Is Computer Vision? • Optical character recognition (OCR): reading handwritten postal codes on letters (Figure 1.4a) and automatic number plate recognition (ANPR); • Machine inspection: rapid parts inspection for quality assurance using stereo vision with specialized illumination to measure tolerances on aircraft wings or auto body parts (Figure 1.4b) or looking for defects in steel castings using X-ray vision; • Retail: object recognition for automated checkout lanes (Figure 1.4c); • 3D model building (photogrammetry): fully automated construction of 3D models from aerial photographs used 1.1 What Is Computer Vision? • Medical imaging: registering pre-operative and intra- operative imagery (Figure 1.4d) or performing long-term studies of people’s brain morphology as they age; • Automotive safety: detecting unexpected obstacles such as pedestrians on the street, under conditions where active vision techniques such as radar or lidar do not work well (Figure 1.4e; see also Miller, Campbell, Huttenlocher et al. (2008); Montemerlo, Becker, Bhat et al. (2008); Urmson, Anhalt, Bagnell et al. (2008) for examples of fully automated driving); 1.1 What Is Computer Vision? • Match move: merging computer-generated imagery (CGI) with live action footage by tracking feature points in the source video to estimate the 3D camera motion and shape of the environment. • Such techniques are widely used in Hollywood (e.g., in movies such as Jurassic Park) (Roble 1999; Roble and Zafar 2009); • They also require the use of precise matting to insert new elements between foreground and background elements (Chuang, Agarwala, Curless et al. 2002). 1.1 What Is Computer Vision? • Motion capture (mocap): using retro-reflective markers viewed from multiple cameras or other vision-based techniques to capture actors for computer animation; • Surveillance: monitoring for intruders, analyzing highway traffic (Figure 1.4f), and monitoring pools for drowning victims; • Fingerprint recognition and biometrics: for automatic access authentication as well as forensic applications. What Is Computer Vision? Does human vision (left) or computer vision (right) perform better? Terminator 2 How can we write a computer program to tell the left side is a real human and right side a robot? Every Picture Tells a Story Black and white photo by H RogerViollet of a train accident at La Gare Montparnasse station in Paris on 22 October, 1895 when engine 120-721 failed to stop at the platform, went through a first-floor window and crashed down onto the street. Goal of computer vision is to write computer programs that can interpret images. Can Computers Match (or Beat) Human Vision? Yes and no (but mostly no!) • Humans are much better at “hard” and versatile things. • Computers can be better at “easy,” specific, and repetitive things. Human Perception Has Its Shortcomings… Although this image appears to be a fairly run-of-the-mill picture of Bill Clinton and Al Gore, a closer inspection reveals that both men have been digitally given identical inner face features and their mutual configuration. Only the external features are different. It appears, therefore, that the human visual system makes strong use of the overall head shape in order to determine facial identity. Sinha and Poggio, Nature, 1996 Copyright A.Kitaoka 2003 Rotating Snakes Circular snakes appear to rotate spontaneously. Current State of the Art The next slides show some examples of what current vision systems can do. Earth Viewers (3D Modeling) Image from Microsoft’s Virtual Earth (see also: Google Earth) Photosynth http://photosynth.net/ Based on Photo Tourism technology developed here in CSE! by Noah Snavely, Steve Seitz, and Rick Szeliski CSE: Computer Science and Engineering Optical Character Recognition (OCR) Technology to convert scanned documents to text • If you have a scanner, it probably came with OCR software Digit Recognition, AT&T Laboratories AT&T: American Telephone and Telegraph http://www.research.att.com/~yann/ License plate readers http://en.wikipedia.org/wiki/Automatic_number_plate_recognition Face Detection Many new digital cameras now detect faces • Canon, Sony, Fuji, … • Smile shutter • Subject metering for auto focus, auto exposure, and auto white balance Smile Detection? Sony Cyber-shot® T70 Digital Still Camera Object Recognition (in Supermarkets) LaneHawk by EvolutionRobotics “A smart camera is flush-mounted in the checkout lane, continuously watching for items. When an item is detected and recognized, the cashier verifies the quantity of items that were found under the basket, and continues to close the transaction. The item can remain under the basket, and with LaneHawk, you are assured to get paid for it… “ Face Recognition Who is she? Vision-Based Biometrics “How the Afghan Girl was Identified by Her Iris Patterns” Read the story Login without a Password… Fingerprint scanners on many new laptops, other devices Face recognition systems now beginning to appear more widely http://www.sensiblevision.com/ Object Recognition (in Mobile Phones) This is becoming real: • Microsoft Research • Point & Find, Nokia Special Effects: Shape Capture The Matrix movies, ESC Entertainment, XYZ RGB, NRC NRC: National Research Council Special Effects: Motion Capture Pirates of the Carribean, Industrial Light and Magic Click here for interactive demo Sports Sportvision first down line Nice explanation on www.howstuffworks.com • The system has to know the orientation of the field with respect to the camera so that it can paint the first-down line with the correct perspective from that camera's point of view. • The system has to know, in that same perspective framework, exactly where every yard line is. Smart Cars Slide content courtesy of Amnon Shashua Mobileye • • • • Vision systems currently in high-end BMW, GM, Volvo models By 2010: 70% of car manufacturers. Video demo BMW: Bavarian Motor Works GM: General Motors Vision-Based Interaction (and Games) Digimask: put your face on a 3D avatar. Nintendo Wii has camera-based IR tracking built in. See Lee’s work at CMU on clever tricks on using it to create a multi-touch display! IR: Infra-Red CMU: Carnegie Mellon University “Game turns moviegoers into Human Joysticks”, CNET Camera tracking a crowd, based on this work. Vision in Space NASA'S Mars Exploration Rover Spirit captured this westward view from atop a low plateau where Spirit spent the closing months of 2007. Vision systems (JPL) used for several tasks • • • • • Panorama stitching 3D terrain modeling Obstacle detection, position tracking For more, read “Computer Vision on Mars” by Matthies et al. JPL: Jet Propulsion Laboratory Robotics NASA’s Mars Spirit Rover http://en.wikipedia.org/wiki/Spirit_rover NASA: National Aeronautics and Space Administration http://www.robocup.org/ Medical Imaging 3D imaging MRI, CT MRI: Magnetic Resonance Imaging CT: Computer Tomography Image guided surgery Grimson et al., MIT MIT: Massachusetts Institute of Technology Current State of the Art You just saw examples of current systems. • Many of these are less than 5 years old. This is a very active research area, and rapidly changing. • Many new applications in the next 5 years To learn more about vision applications and companies. • David Lowe maintains an excellent overview of vision companies – http://www.cs.ubc.ca/spider/lowe/vision.html Consumer-Level Applications • Stitching: turning overlapping photos into a single seamlessly stitched panorama (Figure 1.5a), as described in Chapter 9; • Exposure bracketing: merging multiple exposures taken under challenging lighting conditions (strong sunlight and shadows) into a single perfectly exposed image (Figure 1.5b), as described in Section 10.2; • Morphing: turning a picture of one of your friends into another, using a seamless morph transition (Figure 1.5c); • 3D modeling: converting one or more snapshots into a 3D model of the object or person you are photographing (Figure 1.5d), as described in Section 12.6 • Video match move and stabilization: inserting 2D pictures or 3D models into your videos by automatically tracking nearby reference points (see Section 7.4.2) or using motion estimates to remove shake from your videos (see Section 8.2.1); • Photo-based walkthroughs: navigating a large collection of photographs, such as the interior of your house, by flying between different photos in 3D (see Sections 13.1.2 and 13.5.5) • Face detection: for improved camera focusing as well as more relevant image searching (see Section 14.1.1); • Visual authentication: automatically logging family members onto your home computer as they sit down in front of the webcam (see Section 14.2). 1.2 A Brief History • Computational theory: What is the goal of the computation (task) and what are the constraints that are known or can be brought to bear on the problem? • Representations and algorithms: How are the input, output, and intermediate information represented and which algorithms are used to calculate the desired result? 1.2 A Brief History • Hardware implementation: How are the representations and algorithms mapped onto actual hardware, e.g., a biological vision system or a specialized piece of silicon? • Conversely, how can hardware constraints be used to guide the choice of representation and algorithm? • With the increasing use of graphics chips (GPUs) and many-core architectures for computer vision (see Section C.2), this question is again becoming quite relevant. 1.3 Book Overview This Course http://www.csie.ntu.edu.tw/~fuh/vcourse/szeliski http://www.cs.washington.edu/education/courses/cse576/08sp/ Optional Project 1: Features Optional Project 2: Panorama Stitching http://www.cs.washington.edu/education/courses/cse576/05sp/projects/proj2/artifacts/winners.html Indri Atmosukarto, 576 08sp Optional Project 3: Face Recognition Final Project Open-ended project of your choosing General Comments Prerequisites—these are essential! • Data structures • A good working knowledge of C and C++ programming – (or willingness/time to pick it up quickly!) • Linear algebra • Vector calculus Course does not assume prior imaging experience • computer vision, image processing, graphics, etc. Project due Mar. 15 use correlation to do image matching find dx, dy to minimize | PIX(ima, x, y) PIX(imb, x dx, y dy) | ( x , y )R DC & CV Lab. DC & CV Lab. CSIE NTU DC & CV Lab. CSIE NTU DC & CV Lab. CSIE NTU