1 - METU Computer Engineering

advertisement
CENG 483: Int. To Computer Vision
Fatoş T. Yarman Vural
Ü. Ruşen Aktaş
Spring 2011
1
Textbook:
1.
L. Shapiro and Stockman, Computer Vision
Reccomended Books:
1. R. Szeliski, Computer Vision: Algorithms and
Applications, Dec 23, 2008
2. D. Forsyth, J. Ponce, Computer Vision: Modern
Approach
3. B. Jahne, H. Haubacker, Computer Vision and
Applications
2
Enjoying the course:
A Research Project:
Quizes:
Midterm:
Final:
30% with your partner
10% with your partner
30%
30%
3
Project: Segmentation by Fusion
Domain: Medical, Remote Sensing, etc.
 Due Dates:
 1 Page summary: March 1
 Literature Survey: March 31
 Algorithm development: April 15
 Paper submission: May 15

4
•
What is Computer Vision?
Make the computer SEE
 SEE: Extracting Visual information from any
sensed data


Goal : Make useful decisions about objects
and scenes based on sensed data
5
OBJECT
perceptible
vision
material
thing
Object According to Plato
• Things consisting of forms and matter
• Forms are proper subjects of
philosophical investigation, for they
have the highest degree of reality.
• Matter is the ordinary substace
OBJECTS
ANIMALS
…..
INANIMATE
PLANTS
NATURAL
VERTEBRATE
MAMMALS
TAPIR
MAN-MADE
BIRDS
BOAR
GROUSE
CAMERA
How many object categories are there?
Biederman 198
SCENE

Consists of multiple objects
Goal : Make useful decisions about objects and scenes
based on sensed data
10
Bruegel,
Sensed Data: Images

All sorts of sensor data carying visual info
 Optic
 Thermal
 IR
 MR
 SAR
 ….
Goal : Make useful decisions about objects and scenes based on
12
sensed data
IMAGES: Sattelite,CT, SAR,
Thermal, scientific
13
Useful Decisions

Recognize, classify, detect, locallize,
retrieve, annotate, varify
Goal : Make useful decisions about objects and scenes based on
sensed data
14
So what does recognition involve?
Verification: is that a lamp?
Detection: are there people?
Identification: is that Potala Palace?
Object categorization
mountain
tree
building
banner
street lamp
vendor
people
Scene and context categorization
• outdoor
• city
•…
APPLICATION DOMAINS OF
COMPUTER VISION
21
Traffics
Pedestrian and car detection
meters
Ped
Ped
Car
meters
Lane detection
Assisted driving
• Collision warning systems with
adaptive cruise control,
• Lane departure warning
systems,
• Rear object detection systems,
Retrieval: Improving online search
Query:
STREET
Digital Album
Similarity Retrieval of Brain Data
24
Image Databases: Content-Based Retrieval
Images from my Ground-Truth collection.
What categories of image databases exist today?
25
Abstract Regions for Object Recognition
Original Images
Color Regions
Texture Regions
Line Clusters
26
Insect Identification for Ecology Studies
Calineuria (Cal)
Doroneuria (Dor)
Yoraperla (Yor)
27
Document Analysis
28
Surveillance: Object and Event Recognition in Aerial Videos
Original Video Frame
29
Color Regions
Structure Regions
Video Analysis
What are the objects? What are the events?
30
3D Reconstruction of the Blood Vessel Tree
31
Recognition of 3D Object Classes from Range Data
32
3D Scanning
Scanning Michelangelo’s “The David”
• The Digital Michelangelo Project
- http://graphics.stanford.edu/projects/mich/
• UW Prof. Brian Curless, collaborator
• 2 BILLION polygons, accuracy to .29mm
33
The Digital Michelangelo Project, Levoy et al.
34
35
36
37
Tasks in Computer Vision
• Segment an image into useful regions
• Perform measurements on certain areas
• Determine what object(s) are in the scene
liver
kidney
spleen
• Calculate the precise location(s) of objects
• Visually inspect a manufactured object
• Construct a 3D model of the imaged object
• Find “interesting” events in a video
38
HISTORY OF COMPUTER VISION
1970
1980s
1990
2000s
Why is it Difficult?
What are the Challenges
44
Challenges 1: view point variation
Michelangelo 1475-1564
Challenges 2: illumination
slide credit: S. Ullman
Challenges 3: occlusion
Magritte, 1957
Challenges 4: scale
Challenges 5: deformation
Xu, Beihong 19
Challenges 6: background clutter
Klimt, 1913
Challenges 7: intra-class variation
The Three Stages of Computer Vision
• low-level
image
image
• mid-level
image
features
• high-level
features
analysis
52
Low-Level
sharpening
blurring
53
Low-Level
Canny
original image
edge image
Mid-Level
ORT
data
structure
edge image
circular arcs and line segments
54
Mid-level
K-means
clustering
(followed by
connected
component
analysis)
regions of homogeneous color
original color image
data
structure
55
Low- to High-Level
low-level
edge image
mid-level
high-level
consistent
line clusters
Building Recognition
56
Recognition
Scale / orientation range to search over
 Speed
 Context

Course content
Image representatiın
 Matrices, functions
 Image file formats
 Binary Image Analysis
 Pixel and neighborhood
 Masks and convolution
 Counting and labeling
 Morphological operations

58
Thresholding
 Object Recognition conceps
 Representation
 Classification
 Measures
 Gray-level Image Analysis
 Gray level mapping
 Noise removal,
 Smoothing

59
Color and shading
 Color spaces
 Shades
 Texture
 Texels, texture description
 Texture measure
 Segmentation
 Clustering
 Region Growing
 Content Based Image retrieval

60
Imaging
and
Image Representation
Ch:2 Shapiro et al.
61
Classical Imaging Process






Light reaches surfaces
in 3D
Surfaces reflect
Sensor element
receives light energy
Intensity counts
Angles count
Material counts
What are radiance and irradiance?
62
Radiometry and Computer Vision*
• Radiometry is a branch of physics that deals with the
measurement of the flow and transfer of radiant energy.
• Radiance is the power of light that is emitted from a
unit surface area into some spatial angle;
the corresponding photometric term is brightness.
• Irradiance is the amount of energy that an imagecapturing device gets per unit of an efficient sensitive
area of the camera. Quantizing it gives image gray tones.
•From Sonka, Hlavac, and Boyle, Image Processing, Analysis, and
Machine Vision, ITP, 1999.
63
Sensors:
Image acquisition Devices
CCD (Charged Couple Device )
 X-Ray Devices
 Microwave Devices
 UV Devices
 Thermal Cameras
 IR Devices
 3-D scanners

64
CCD type camera:
Commonly used in industrial applications





Array of small fixed elements
Each element converts the
light energy to electric
charge
1x1 cm
Can add refracting elements
to get color in 2x2
neighborhoods
8-bit intensity common
65
Computer Vision
Algorithms
Main concern of CV is to develop Algorithms
66
LIDAR also senses surfaces




Stockman MSU/CSE Fall 2008
Single sensing
element scans
scene
Laser light reflected
off surface and
returned
Phase shift codes
distance
Brightness change
codes albedo
(surface
reflectance)
67
2.5D face image from Minolta Vivid 910
scanner
A rotating mirror scans a laser stripe
across the object. 320x240 rangels
obtained in about 2 seconds.Stockman MSU/CSE Fall 2008
[x,y,z,R,G,B] image.
68
3D scanning technology
3D image of voxels obtained
 Usually computationally expensive
reconstruction of 3D from many 2D scans
(CAT computer-aided-tomography)

Stockman MSU/CSE Fall 2008
69
Magnetic Resonance Imaging




Stockman MSU/CSE Fall 2008
Sense density of
certain chemistry
S slices x R rows x C
columns
Volume element
(voxel) about 2mm
per side
At left is shaded 2D
image created by
“volume rendering” a
3D volume: darkness
codes depth
70
Single slice through human head



MRIs are computed
structures, computed
from many views.
At left is MRA
(angiograph), which
shows blood flow.
CAT scans are
computed in much
the same manner
from X-ray
transmission data.
Stockman MSU/CSE Fall 2008
71
Problems in Image Acquisition
72
73
Human eye as a spherical camera






75-150 millionRods sense
intensity
6-7 million Cones sense color
Fovea has tightly packed area,
more cones
Periphery has more rods
Focal length is about 20mm
Pupil/iris controls light entry
• Eye scans, or saccades to image
details on fovea
• 100M sensing cells funnel to 1M
optic nerve connections to the brain
Stockman MSU/CSE Fall 2008
74
RODES AND CONES
Cones
Image Formation
Problems in HVS Mach Band Effect
Contrast
Illusions
Images: 2D projections of 3D


The 3D world has color, texture, surfaces,
volumes, light sources, temperature,
reflectance, …
A 2D image is a projection of a scene from a
specific viewpoint.
82
Digital Images form arrays
Digitizing- SAmpling
Quantization
Digital Image: Sampled and quantized
Sampling at different resolution
Sampling
Quantization
What is the appropriate sampling and quantization rates?
Resolution
• resolution: precision of the sensor
• nominal resolution: size of a single pixel in scene
coordinates (ie. meters, mm)
• common use of resolution: num_rows X num_cols
(ie. 515 x 480)
• field of view (FOV): size of the scene a sensor can sense
91
92
Images as Functions
• A gray-tone
image is a function:
g(x,y) = val or f(row, col) = val
• A color image is just three functions or a
vector-valued function:
f(row,col) =(r(row,col), g(row,col), b(row,col))
•Multi-spectral Image:
f(row,col) =(f1(row,col), f2(row,col),…, fn(row,col))
93
Gray-tone Image as Function
94
Image vs Matrix
There are many different file formats.
95
Digital Image Terminology:
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
1
95
92
93
92
94
0
1
96
93
93
93
95
1
1
94
93
94
93
95
0
0
93
92
92
93
96
0
0
92
92
93
93
95
• binary image
• gray-scale (or gray-tone) image
• color image
• multi-spectral image
• range image
• labeled image
pixel (with value 94)
its 3x3 neighborhood
region of medium
intensity
resolution (7x7)
96
Image File Formats








Portable Gray Map (PGM) older form
GIF was early commercial version
JPEG (JPG) is modern version
MPEG for motion
Many others exist: header plus data
Do they handle color?
Do they provide for compression?
Are there good packages that use them
or at least convert between them?
97
Commpression:
Reduce the redundancy
1.
2.
Lossy
Lossless
98
Run Coding
Row1
 Row2
 Row3

0001001000000
0001111000000
0001001000000
Code 1: 3(0)1(1)2(0)1(1)6(0)
Or
Code2: (4,4)(7,7)
99
PGM image with ASCII info.






P2 means ASCII
gray
Comments
W=16; H=8
192 is max
intensity
Can be made
with editor
Large images
are usually not
stored as ASCII
100
PBM/PGM/PPM Codes
• P1: ascii binary (PBM)
• P2: ascii grayscale (PGM)
• P3: ascii color (PPM)
• P4: byte binary (PBM)
• P5: byte grayscale (PGM)
• P6: byte color (PPM)
101
JPG current popular form






Public standard
Allows for image compression; often 10:1 or
30:1 are easily possible
8x8 intensity regions are fit with basis of cosines
Error in cosine fit coded as well
Parameters then compressed with Huffman
coding
Common for most digital cameras
102
103
From 3D Scenes to 2D Images
• Object
• World
• Camera
• Real Image
• Pixel Image
104
Binary Image Analysis
105
Binary image analysis
• consists of a set of image analysis operations
that are used to produce or process binary
images, usually images of 0’s and 1’s.
0 represents the background
1 represents the foreground
00010010001000
00011110001000
00010010001000
106
Binary Image Analysis

is used in a number of
practical applications, e.g.
• part inspection
• object counting
•Connected component labeling
• document processing
107
What kinds of operations?

Separate objects from background
and from one another

Aggregate pixels for each object

Compute features for each object
108
Example: red blood cell image




Many blood cells are
separate objects
Many touch – bad!
Salt and pepper noise
from thresholding
How useable is this
data?
109
Results of analysis




63 separate
objects
detected
Single cells
have area
about 50
Noise spots
Gobs of cells
110
Useful Operations
1. Thresholding a gray-tone image
2. Determining good thresholds
3. Connected components analysis
4. Binary mathematical morphology
5. All sorts of feature extractors
(area, centroid, circularity, …)
111
1. Thresholding
•Convert gray level or color image into binary image
•Use histogram
112
Histogram




Background is black
Healthy cherry is bright
Bruise is medium dark
Histogram shows two
cherry regions (black
background has been
removed)
pixel
counts
0 gray-tone values
256
113
Histogram-Directed Thresholding
How can we use a histogram to separate an
image into 2 (or several) different regions?
Is there a single clear threshold? 2? 3?
114
Automatic Thresholding: Otsu’s Method
Assumption: the histogram is bimodal
Grp 1
Grp 2
t
Method: find the threshold t that minimizes
the weighted sum of within-group variances
for the two groups that result from separating
the gray tones at value t.
115
Thresholding Example
original gray tone image
binary thresholded image
116
2. Connected Components Labeling
Once you have a binary image, you can identify and
then analyze each connected set of pixels.
The connected components operation takes in a binary image
and produces a labeled image in which each pixel has the
integer label of either the background (0) or a component.
binary image after morphology
connected components
117
Methods for CC Analysis
1. Recursive Tracking (almost never used)
2. Parallel Growing (needs parallel hardware)
3. Row-by-Row (most common)
• Classical Algorithm (see text)
• Efficient Run-Length Algorithm
(developed for speed in real
industrial applications)
118
Equivalent Labels
Original Binary Image
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
0
0
1
1
1
1
1
1
1
0
0
0
1
1
1
1
1
1
0111100001
0111100011
0111100111
0111100111
1111100111
1111100111
1111111111
1111111111
0000011111
119
Equivalent Labels
The Labeling Process
0001110000222200003
0001111000222200033
0001111100222200333
0001111110222200333
0001111111111100333
0001111111111100333
0001111111111111111
0001111111111111111
0001111110000011111
120
12
13
Run-Length Data Structure
01234
0 11 11
1 11
1
2 1 1 1 1 Binary
Image
3
4
1111
Rstart Rend
0
1
2
3
4
1
3
5
0
7
2
4
6 Row Index
0
7
row
0
1
2
3
4
5
6
7
0
0
1
1
2
2
4
121
scol
ecol
label
UNUSED
0
1
3
4
0
1
4
4
0
2
4
4
1
4
0
0
0
0
0
0
0
0
Runs
Run-Length Algorithm
Procedure run_length_classical
{
initialize Run-Length and Union-Find data structures
count <- 0
/* Pass 1 (by rows) */
for each current row and its previous row
{
move pointer P along the runs of current row
move pointer Q along the runs of previous row
122
Case 1: No Overlap
Q
Q
|/////|
|/////|
|///|
|////|
P
|/////|
|///|
P
/* new label */
count <- count + 1
label(P) <- count
P <- P + 1
/* check Q’s next run */
Q <- Q + 1
123
Case 2: Overlap
Subcase 2:
P’s run has a label that is
different from Q’s run
Subcase 1:
P’s run has no label yet
Q
Q
|///////|
|/////|
|/////////////|
|///////|
|/////|
|/////////////|
P
P
label(P) <- label(Q)
move pointer(s)
union(label(P),label(Q))
move pointer(s)
}
124
Pass 2 (by runs)
/* Relabel each run with the name of the
equivalence class of its label */
For each run M
{
label(M) <- find(label(M))
}
}
where union and find refer to the operations of the
Union-Find data structure, which keeps track of sets
of equivalent labels.
125
Labeling shown as Pseudo-Color
connected
components
of 1’s from
thresholded
image
connected
components
of cluster
labels
126
Mathematical Morphology
Binary mathematical morphology consists of two
basic operations
dilation and erosion
and several composite relations
closing and opening
conditional dilation
...
127
Dilation
Dilation expands the connected sets of 1s of a binary image.
It can be used for
1. growing features
2. filling holes and gaps
128
Erosion
Erosion shrinks the connected sets of 1s of a binary image.
It can be used for
1. shrinking features
2. Removing bridges, branches and small protrusions
129
Structuring Elements
A structuring element is a shape mask used in
the basic morphological operations.
They can be any shape and size that is
digitally representable, and each has an origin.
box
disk
hexagon
box(length,width)
disk(diameter)
130
something
Dilation with Structuring Elements
The arguments to dilation and erosion are
1. a binary image B
2. a structuring element S
dilate(B,S) takes binary image B, places the origin
of structuring element S over each 1-pixel, and ORs
the structuring element S into the output image at
the corresponding position.
0000
0110
0000
B
1
11
S
origin
dilate
0110
0111
0000
BS
131
Erosion with Structuring Elements
erode(B,S) takes a binary image B, places the origin
of structuring element S over every pixel position, and
ORs a binary 1 into that position of the output image only if
every position of S (with a 1) covers a 1 in B.
origin
0
0
0
1
0
0
0
1
B
1
1
1
1
1
1
1
1
0
0
0
1
1
1
1
erode
S
0
0
0
0
0
0
0
0
B
132
0
1
1
0
0
1
1
0
S
0
0
0
0
Example to Try
B
0
0
1
1
0
0
0
0
0
1
1
0
0
0
1
1
1
1
1
1
1
0
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
0
1
0
0
0
0
0
0
1
0
0
0
S
111
111
111
erode
dilate with same
structuring element
133
Opening and Closing
• Closing is the compound operation of dilation followed
by erosion (with the same structuring element)
• Opening is the compound operation of erosion followed
by dilation (with the same structuring element)
134
Use of Opening
Original
Opening
Corners
1. What kind of structuring element was used in the opening?
2. How did we get the corners?
135
Gear Tooth Inspection
original
binary
image
detected
defects
136
How did
they do it?
Some Details
137
Region Properties
Properties of the regions can be used to recognize objects.
• geometric properties (Ch 3)
• gray-tone properties
• color properties
• texture properties
• shape properties (a few in Ch 3)
• motion properties
• relationship properties (1 in Ch 3)
138
Geometric and Shape Properties
•
•
•
•
•
•
•
•
•
•
•
area
centroid
perimeter
perimeter length
circularity
elongation
mean and standard deviation of radial distance
bounding box
extremal axis length from bounding box
second order moments (row, column, mixed)
lengths and orientations of axes of best-fit ellipse
Which are statistical? Which are structural?
139
Region Adjacency Graph
A region adjacency graph (RAG) is a graph in which
each node represents a region of the image and an edge
connects two nodes if the regions are adjacent.
1
1
2
4
3
2
4
3
140
Download