CENG 483: Int. To Computer Vision Fatoş T. Yarman Vural Ü. Ruşen Aktaş Spring 2011 1 Textbook: 1. L. Shapiro and Stockman, Computer Vision Reccomended Books: 1. R. Szeliski, Computer Vision: Algorithms and Applications, Dec 23, 2008 2. D. Forsyth, J. Ponce, Computer Vision: Modern Approach 3. B. Jahne, H. Haubacker, Computer Vision and Applications 2 Enjoying the course: A Research Project: Quizes: Midterm: Final: 30% with your partner 10% with your partner 30% 30% 3 Project: Segmentation by Fusion Domain: Medical, Remote Sensing, etc. Due Dates: 1 Page summary: March 1 Literature Survey: March 31 Algorithm development: April 15 Paper submission: May 15 4 • What is Computer Vision? Make the computer SEE SEE: Extracting Visual information from any sensed data Goal : Make useful decisions about objects and scenes based on sensed data 5 OBJECT perceptible vision material thing Object According to Plato • Things consisting of forms and matter • Forms are proper subjects of philosophical investigation, for they have the highest degree of reality. • Matter is the ordinary substace OBJECTS ANIMALS ….. INANIMATE PLANTS NATURAL VERTEBRATE MAMMALS TAPIR MAN-MADE BIRDS BOAR GROUSE CAMERA How many object categories are there? Biederman 198 SCENE Consists of multiple objects Goal : Make useful decisions about objects and scenes based on sensed data 10 Bruegel, Sensed Data: Images All sorts of sensor data carying visual info Optic Thermal IR MR SAR …. Goal : Make useful decisions about objects and scenes based on 12 sensed data IMAGES: Sattelite,CT, SAR, Thermal, scientific 13 Useful Decisions Recognize, classify, detect, locallize, retrieve, annotate, varify Goal : Make useful decisions about objects and scenes based on sensed data 14 So what does recognition involve? Verification: is that a lamp? Detection: are there people? Identification: is that Potala Palace? Object categorization mountain tree building banner street lamp vendor people Scene and context categorization • outdoor • city •… APPLICATION DOMAINS OF COMPUTER VISION 21 Traffics Pedestrian and car detection meters Ped Ped Car meters Lane detection Assisted driving • Collision warning systems with adaptive cruise control, • Lane departure warning systems, • Rear object detection systems, Retrieval: Improving online search Query: STREET Digital Album Similarity Retrieval of Brain Data 24 Image Databases: Content-Based Retrieval Images from my Ground-Truth collection. What categories of image databases exist today? 25 Abstract Regions for Object Recognition Original Images Color Regions Texture Regions Line Clusters 26 Insect Identification for Ecology Studies Calineuria (Cal) Doroneuria (Dor) Yoraperla (Yor) 27 Document Analysis 28 Surveillance: Object and Event Recognition in Aerial Videos Original Video Frame 29 Color Regions Structure Regions Video Analysis What are the objects? What are the events? 30 3D Reconstruction of the Blood Vessel Tree 31 Recognition of 3D Object Classes from Range Data 32 3D Scanning Scanning Michelangelo’s “The David” • The Digital Michelangelo Project - http://graphics.stanford.edu/projects/mich/ • UW Prof. Brian Curless, collaborator • 2 BILLION polygons, accuracy to .29mm 33 The Digital Michelangelo Project, Levoy et al. 34 35 36 37 Tasks in Computer Vision • Segment an image into useful regions • Perform measurements on certain areas • Determine what object(s) are in the scene liver kidney spleen • Calculate the precise location(s) of objects • Visually inspect a manufactured object • Construct a 3D model of the imaged object • Find “interesting” events in a video 38 HISTORY OF COMPUTER VISION 1970 1980s 1990 2000s Why is it Difficult? What are the Challenges 44 Challenges 1: view point variation Michelangelo 1475-1564 Challenges 2: illumination slide credit: S. Ullman Challenges 3: occlusion Magritte, 1957 Challenges 4: scale Challenges 5: deformation Xu, Beihong 19 Challenges 6: background clutter Klimt, 1913 Challenges 7: intra-class variation The Three Stages of Computer Vision • low-level image image • mid-level image features • high-level features analysis 52 Low-Level sharpening blurring 53 Low-Level Canny original image edge image Mid-Level ORT data structure edge image circular arcs and line segments 54 Mid-level K-means clustering (followed by connected component analysis) regions of homogeneous color original color image data structure 55 Low- to High-Level low-level edge image mid-level high-level consistent line clusters Building Recognition 56 Recognition Scale / orientation range to search over Speed Context Course content Image representatiın Matrices, functions Image file formats Binary Image Analysis Pixel and neighborhood Masks and convolution Counting and labeling Morphological operations 58 Thresholding Object Recognition conceps Representation Classification Measures Gray-level Image Analysis Gray level mapping Noise removal, Smoothing 59 Color and shading Color spaces Shades Texture Texels, texture description Texture measure Segmentation Clustering Region Growing Content Based Image retrieval 60 Imaging and Image Representation Ch:2 Shapiro et al. 61 Classical Imaging Process Light reaches surfaces in 3D Surfaces reflect Sensor element receives light energy Intensity counts Angles count Material counts What are radiance and irradiance? 62 Radiometry and Computer Vision* • Radiometry is a branch of physics that deals with the measurement of the flow and transfer of radiant energy. • Radiance is the power of light that is emitted from a unit surface area into some spatial angle; the corresponding photometric term is brightness. • Irradiance is the amount of energy that an imagecapturing device gets per unit of an efficient sensitive area of the camera. Quantizing it gives image gray tones. •From Sonka, Hlavac, and Boyle, Image Processing, Analysis, and Machine Vision, ITP, 1999. 63 Sensors: Image acquisition Devices CCD (Charged Couple Device ) X-Ray Devices Microwave Devices UV Devices Thermal Cameras IR Devices 3-D scanners 64 CCD type camera: Commonly used in industrial applications Array of small fixed elements Each element converts the light energy to electric charge 1x1 cm Can add refracting elements to get color in 2x2 neighborhoods 8-bit intensity common 65 Computer Vision Algorithms Main concern of CV is to develop Algorithms 66 LIDAR also senses surfaces Stockman MSU/CSE Fall 2008 Single sensing element scans scene Laser light reflected off surface and returned Phase shift codes distance Brightness change codes albedo (surface reflectance) 67 2.5D face image from Minolta Vivid 910 scanner A rotating mirror scans a laser stripe across the object. 320x240 rangels obtained in about 2 seconds.Stockman MSU/CSE Fall 2008 [x,y,z,R,G,B] image. 68 3D scanning technology 3D image of voxels obtained Usually computationally expensive reconstruction of 3D from many 2D scans (CAT computer-aided-tomography) Stockman MSU/CSE Fall 2008 69 Magnetic Resonance Imaging Stockman MSU/CSE Fall 2008 Sense density of certain chemistry S slices x R rows x C columns Volume element (voxel) about 2mm per side At left is shaded 2D image created by “volume rendering” a 3D volume: darkness codes depth 70 Single slice through human head MRIs are computed structures, computed from many views. At left is MRA (angiograph), which shows blood flow. CAT scans are computed in much the same manner from X-ray transmission data. Stockman MSU/CSE Fall 2008 71 Problems in Image Acquisition 72 73 Human eye as a spherical camera 75-150 millionRods sense intensity 6-7 million Cones sense color Fovea has tightly packed area, more cones Periphery has more rods Focal length is about 20mm Pupil/iris controls light entry • Eye scans, or saccades to image details on fovea • 100M sensing cells funnel to 1M optic nerve connections to the brain Stockman MSU/CSE Fall 2008 74 RODES AND CONES Cones Image Formation Problems in HVS Mach Band Effect Contrast Illusions Images: 2D projections of 3D The 3D world has color, texture, surfaces, volumes, light sources, temperature, reflectance, … A 2D image is a projection of a scene from a specific viewpoint. 82 Digital Images form arrays Digitizing- SAmpling Quantization Digital Image: Sampled and quantized Sampling at different resolution Sampling Quantization What is the appropriate sampling and quantization rates? Resolution • resolution: precision of the sensor • nominal resolution: size of a single pixel in scene coordinates (ie. meters, mm) • common use of resolution: num_rows X num_cols (ie. 515 x 480) • field of view (FOV): size of the scene a sensor can sense 91 92 Images as Functions • A gray-tone image is a function: g(x,y) = val or f(row, col) = val • A color image is just three functions or a vector-valued function: f(row,col) =(r(row,col), g(row,col), b(row,col)) •Multi-spectral Image: f(row,col) =(f1(row,col), f2(row,col),…, fn(row,col)) 93 Gray-tone Image as Function 94 Image vs Matrix There are many different file formats. 95 Digital Image Terminology: 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 95 92 93 92 94 0 1 96 93 93 93 95 1 1 94 93 94 93 95 0 0 93 92 92 93 96 0 0 92 92 93 93 95 • binary image • gray-scale (or gray-tone) image • color image • multi-spectral image • range image • labeled image pixel (with value 94) its 3x3 neighborhood region of medium intensity resolution (7x7) 96 Image File Formats Portable Gray Map (PGM) older form GIF was early commercial version JPEG (JPG) is modern version MPEG for motion Many others exist: header plus data Do they handle color? Do they provide for compression? Are there good packages that use them or at least convert between them? 97 Commpression: Reduce the redundancy 1. 2. Lossy Lossless 98 Run Coding Row1 Row2 Row3 0001001000000 0001111000000 0001001000000 Code 1: 3(0)1(1)2(0)1(1)6(0) Or Code2: (4,4)(7,7) 99 PGM image with ASCII info. P2 means ASCII gray Comments W=16; H=8 192 is max intensity Can be made with editor Large images are usually not stored as ASCII 100 PBM/PGM/PPM Codes • P1: ascii binary (PBM) • P2: ascii grayscale (PGM) • P3: ascii color (PPM) • P4: byte binary (PBM) • P5: byte grayscale (PGM) • P6: byte color (PPM) 101 JPG current popular form Public standard Allows for image compression; often 10:1 or 30:1 are easily possible 8x8 intensity regions are fit with basis of cosines Error in cosine fit coded as well Parameters then compressed with Huffman coding Common for most digital cameras 102 103 From 3D Scenes to 2D Images • Object • World • Camera • Real Image • Pixel Image 104 Binary Image Analysis 105 Binary image analysis • consists of a set of image analysis operations that are used to produce or process binary images, usually images of 0’s and 1’s. 0 represents the background 1 represents the foreground 00010010001000 00011110001000 00010010001000 106 Binary Image Analysis is used in a number of practical applications, e.g. • part inspection • object counting •Connected component labeling • document processing 107 What kinds of operations? Separate objects from background and from one another Aggregate pixels for each object Compute features for each object 108 Example: red blood cell image Many blood cells are separate objects Many touch – bad! Salt and pepper noise from thresholding How useable is this data? 109 Results of analysis 63 separate objects detected Single cells have area about 50 Noise spots Gobs of cells 110 Useful Operations 1. Thresholding a gray-tone image 2. Determining good thresholds 3. Connected components analysis 4. Binary mathematical morphology 5. All sorts of feature extractors (area, centroid, circularity, …) 111 1. Thresholding •Convert gray level or color image into binary image •Use histogram 112 Histogram Background is black Healthy cherry is bright Bruise is medium dark Histogram shows two cherry regions (black background has been removed) pixel counts 0 gray-tone values 256 113 Histogram-Directed Thresholding How can we use a histogram to separate an image into 2 (or several) different regions? Is there a single clear threshold? 2? 3? 114 Automatic Thresholding: Otsu’s Method Assumption: the histogram is bimodal Grp 1 Grp 2 t Method: find the threshold t that minimizes the weighted sum of within-group variances for the two groups that result from separating the gray tones at value t. 115 Thresholding Example original gray tone image binary thresholded image 116 2. Connected Components Labeling Once you have a binary image, you can identify and then analyze each connected set of pixels. The connected components operation takes in a binary image and produces a labeled image in which each pixel has the integer label of either the background (0) or a component. binary image after morphology connected components 117 Methods for CC Analysis 1. Recursive Tracking (almost never used) 2. Parallel Growing (needs parallel hardware) 3. Row-by-Row (most common) • Classical Algorithm (see text) • Efficient Run-Length Algorithm (developed for speed in real industrial applications) 118 Equivalent Labels Original Binary Image 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0111100001 0111100011 0111100111 0111100111 1111100111 1111100111 1111111111 1111111111 0000011111 119 Equivalent Labels The Labeling Process 0001110000222200003 0001111000222200033 0001111100222200333 0001111110222200333 0001111111111100333 0001111111111100333 0001111111111111111 0001111111111111111 0001111110000011111 120 12 13 Run-Length Data Structure 01234 0 11 11 1 11 1 2 1 1 1 1 Binary Image 3 4 1111 Rstart Rend 0 1 2 3 4 1 3 5 0 7 2 4 6 Row Index 0 7 row 0 1 2 3 4 5 6 7 0 0 1 1 2 2 4 121 scol ecol label UNUSED 0 1 3 4 0 1 4 4 0 2 4 4 1 4 0 0 0 0 0 0 0 0 Runs Run-Length Algorithm Procedure run_length_classical { initialize Run-Length and Union-Find data structures count <- 0 /* Pass 1 (by rows) */ for each current row and its previous row { move pointer P along the runs of current row move pointer Q along the runs of previous row 122 Case 1: No Overlap Q Q |/////| |/////| |///| |////| P |/////| |///| P /* new label */ count <- count + 1 label(P) <- count P <- P + 1 /* check Q’s next run */ Q <- Q + 1 123 Case 2: Overlap Subcase 2: P’s run has a label that is different from Q’s run Subcase 1: P’s run has no label yet Q Q |///////| |/////| |/////////////| |///////| |/////| |/////////////| P P label(P) <- label(Q) move pointer(s) union(label(P),label(Q)) move pointer(s) } 124 Pass 2 (by runs) /* Relabel each run with the name of the equivalence class of its label */ For each run M { label(M) <- find(label(M)) } } where union and find refer to the operations of the Union-Find data structure, which keeps track of sets of equivalent labels. 125 Labeling shown as Pseudo-Color connected components of 1’s from thresholded image connected components of cluster labels 126 Mathematical Morphology Binary mathematical morphology consists of two basic operations dilation and erosion and several composite relations closing and opening conditional dilation ... 127 Dilation Dilation expands the connected sets of 1s of a binary image. It can be used for 1. growing features 2. filling holes and gaps 128 Erosion Erosion shrinks the connected sets of 1s of a binary image. It can be used for 1. shrinking features 2. Removing bridges, branches and small protrusions 129 Structuring Elements A structuring element is a shape mask used in the basic morphological operations. They can be any shape and size that is digitally representable, and each has an origin. box disk hexagon box(length,width) disk(diameter) 130 something Dilation with Structuring Elements The arguments to dilation and erosion are 1. a binary image B 2. a structuring element S dilate(B,S) takes binary image B, places the origin of structuring element S over each 1-pixel, and ORs the structuring element S into the output image at the corresponding position. 0000 0110 0000 B 1 11 S origin dilate 0110 0111 0000 BS 131 Erosion with Structuring Elements erode(B,S) takes a binary image B, places the origin of structuring element S over every pixel position, and ORs a binary 1 into that position of the output image only if every position of S (with a 1) covers a 1 in B. origin 0 0 0 1 0 0 0 1 B 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 erode S 0 0 0 0 0 0 0 0 B 132 0 1 1 0 0 1 1 0 S 0 0 0 0 Example to Try B 0 0 1 1 0 0 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 S 111 111 111 erode dilate with same structuring element 133 Opening and Closing • Closing is the compound operation of dilation followed by erosion (with the same structuring element) • Opening is the compound operation of erosion followed by dilation (with the same structuring element) 134 Use of Opening Original Opening Corners 1. What kind of structuring element was used in the opening? 2. How did we get the corners? 135 Gear Tooth Inspection original binary image detected defects 136 How did they do it? Some Details 137 Region Properties Properties of the regions can be used to recognize objects. • geometric properties (Ch 3) • gray-tone properties • color properties • texture properties • shape properties (a few in Ch 3) • motion properties • relationship properties (1 in Ch 3) 138 Geometric and Shape Properties • • • • • • • • • • • area centroid perimeter perimeter length circularity elongation mean and standard deviation of radial distance bounding box extremal axis length from bounding box second order moments (row, column, mixed) lengths and orientations of axes of best-fit ellipse Which are statistical? Which are structural? 139 Region Adjacency Graph A region adjacency graph (RAG) is a graph in which each node represents a region of the image and an edge connects two nodes if the regions are adjacent. 1 1 2 4 3 2 4 3 140