Image Understanding Xuejin Chen University of Science and Technology of China xjchen99@ustc.edu.cn 63600852 Webpage: http://staff.ustc.edu.cn/~xjchen99/teaching/ImageUnderstanding/ImageUnderstanding.html Computer Vision (CSE 576) • Staff Steve Seitz Rick Szeliski seitz@cs.washington.edu szeliski@microsoft.com • Web Page – http://www.cs.washington.edu/education/courses/cse576/08sp/ – http://www.cs.washington.edu/education/courses/cse576/08sp/ • Book – http://szeliski.org/Book/ – http://szeliski.org/Book/drafts/SzeliskiBook_20100903_draft.pdf Today • Content – Intros – Overview – Image formation • Readings – Book: Richard Szeliski, Computer Vision: Algorithms and Applications • Intro: Ch 1.0 Image Understanding Image Understanding Digital image Image Understanding Digital image 2D Image RGBD Image Image Sequence Image Understanding Image Understanding Computer Image Understanding Computer Image Understanding Computer Human Image Understanding Computer Human Image Understanding Computer Human Like? Same as? Better than? … Computer Vision • How does computer interpret images? – To describe the world that we see in one or more images and – to reconstruct the properties, such as shape, illumination, and color distributions. • Semantic Interpretation Computer Vision • How does computer interpret images? – To describe the world that we see in one or more images and – to reconstruct the properties, such as shape, illumination, and color distributions. • Semantic Interpretation What do you see? What do you see? What do you see? • • • • • • • Purple flower Individual petal Green background Plant Layers, depth Shading, texture … How about computer? • Image Formation – MxN matrix with color of each pixel – Vector images… 18,34,9 • Difficult to interpret – Inverse problem – Infer unknowns 132,115,211 Human perception has its shortcomings… Human perception has its shortcomings… Copyright A.Kitaoka 2003 Dragon Optical Illusion http://www.grandillusions.com/images/articles/opticalillusions/dragon_illusion/dragon_illusion.wmv Every picture tells a story • Goal of computer vision is to write computer programs that can interpret images Can computers match (or beat) human vision? • Yes and no (but mostly no!) – humans are much better at “hard” things – computers can be better at “easy” things Timeline of the most active topics 1970 1980 1990 Digital image processing Blocks world, line labeling Generalized cylinders Pictorial structure Stereo correspondence Intrinsic images Optical flow Structure from motion Image pyramids Scale-space processing Shape from shading, texture, and focus Physically-based modeling Regularization Markov Random fields Kalman filters 3D range data processing Projective invariants Factorization Timeline of the most active topics 1990 2000 2010 3D range data processing Projective invariants Factorization Physical-based vision Graph cuts Particle filtering Energy-based segmentation Face recognition and detection Subspace methods image-based modeling and rendering Texture synthesis and inpainting Computational photography Feature-based recognition MRF inference algorithms Category recognition Learning Image retrieval RGBD image processing Scene understanding 1970s 1980s 1990s 2000s 2010s (b) Image-based rendering/modeling (a) Scene understanding (c) Recognition by 3d models (d) Group profiling/tracking New Algorithms and Applications • Data-Driven Algorithms – Large database – Statistic algorithms • Interactive Vision – Human interaction – User interface • Internet Applications Chapter Contents Image formation Segmentation Image processing Motion Features Stitching Recognition Current state of the art • The next slides show some examples of what current vision systems can do Earth viewers (3D modeling) Image from Microsoft’s Virtual Earth (see also: Google Earth) Photosynth http://labs.live.com/photosynth/ Based on Photo Tourism technology developed here in CSE! by Noah Snavely, Steve Seitz, and Rick Szeliski Recognition • Barcode Recognition Object recognition (in mobile phones) • This is becoming real: – Microsoft Research – Point & Find, Nokia Optical character recognition (OCR) Technology to convert scanned docs to text • If you have a scanner, it probably came with OCR software Digit recognition, AT&T labs http://www.research.att.com/~yann/ License plate readers http://en.wikipedia.org/wiki/Automatic_number_plate_recognition Face detection • Many new digital cameras now detect faces – Canon, Sony, Fuji, … Smile detection? Sony Cyber-shot® T70 Digital Still Camera Object recognition (in supermarkets) LaneHawk by EvolutionRobotics “A smart camera is flush-mounted in the checkout lane, continuously watching for items. When an item is detected and recognized, the cashier verifies the quantity of items that were found under the basket, and continues to close the transaction. The item can remain under the basket, and with LaneHawk,you are assured to get paid for it… “ Face recognition Who is she? Vision-based biometrics “How the Afghan Girl was Identified by Her Iris Patterns” Read the story Login without a password… Fingerprint scanners on many new laptops, other devices Face recognition systems now beginning to appear more widely http://www.sensiblevision.com/ Special effects: shape capture The Matrix movies, ESC Entertainment, XYZRGB, NRC Special effects: motion capture Pirates of the Carribean, Industrial Light and Magic Click here for interactive demo Sports Sportvision first down line Nice explanation on www.howstuffworks.com Slide content courtesy of Amnon Shashua Smart cars • Mobileye – Vision systems currently in high-end BMW, GM, Volvo models – By 2010: 70% of car manufacturers. – Video demo Vision-based interaction (and games) Digimask: put your face on a 3D avatar. Nintendo Wii has camera-based IR tracking built in. See Lee’s work at CMU on clever tricks on using it to create a multi-touch display! “Game turns moviegoers into Human Joysticks”, CNET Camera tracking a crowd, based on this work. Vision-based interaction • Microsoft kinect 3D Modeling using Kinect • KinectFusion • … Vision in space NASA'S Mars Exploration Rover Spirit captured this westward view from atop a low plateau where Spirit spent the closing months of 2007. Vision systems (JPL) used for several tasks • • • • Panorama stitching 3D terrain modeling Obstacle detection, position tracking For more, read “Computer Vision on Mars” by Matthies et al. Curiosity Rover Master Camera(MastCam) provides multiple spectra and true-color imaging with two cameras Chemistry and Camera complex (ChemCam) Navigation cameras (NavCams) Two pairs of black and white navigation cameras mounted on the mast to support ground navigation. The cameras have a 45 degree angle of view and use visible light to capture stereoscopic 3-D imagery. Hazard avoidance cameras (hazcams) Four pairs of black and white navigation cameras called hazcams—two pairs in the front and two pairs in the back. The cameras use visible light to capture stereoscopic three-dimensional (3-D) imagery.[74] The cameras have a 120 degree field of view and map the terrain at up to 3 m (9.8 ft) in front of the rover. http://en.wikipedia.org/wiki/Curiosity_rover Robotics NASA’s Mars Spirit Rover http://en.wikipedia.org/wiki/Spirit_rover http://www.robocup.org/ Medical imaging 3D imaging MRI, CT Image guided surgery Grimson et al., MIT Current state of the art • You just saw examples of current systems. – Many of these are less than 10 years old • This is a very active research area, and rapidly changing • To learn more about vision applications and companies – David Lowe maintains an excellent overview of vision companies • http://www.cs.ubc.ca/spider/lowe/vision.html Many Interesting Problems Structure From Motion Object Tracking Ref: http://www.cs.ubc.ca/spider/lowe/vision.html Reconstruction Face Detection This course • Webpage: – http://staff.ustc.edu.cn/~xjchen99/teaching/teaching.html • Ref: – http://www.cs.washington.edu/education/courses/cse576/08sp/ Grading • Based on 4 projects • 3 tests before you start 3 projects Project 1: features Project 2: panorama stitching • http://www.cs.washington.edu/education/courses/cse576/05sp/projects/proj2/artifacts/winners.html Indri Atmosukarto, 576 08sp Project 3: Face Recognition Final Project • Open-ended project – in teams of two~three students General Comments • Prerequisites—these are essential! – Data structures – A good working knowledge of C and C++ programming • (or willingness/time to pick it up quickly!) – Linear algebra – Vector calculus • Course does not assume prior imaging experience – computer vision, image processing, graphics, etc.