2. Edge-Based Segmentation

advertisement
An Introduction of Image Segmentation
Chien-Chi Chen
E-mail: zdadadaz@yahoo.com.tw
Graduate Institute of Communication Engineering
National Taiwan University, Taipei, Taiwan, ROC
Abstract
Segmentation was used to identify the object of image that we are interested.
We have three approaches to do it. The first is Edge detection. The second is to use
threshold. The third is the region-based segmentation. It does not mean that these
three of that method can solve all of the problems that we met, but these approaches
are the basic methods in segmentation.
1. Introduction
We first discuss from the case of the monochrome and static images. The
fundamental problem in segmentation is to partition an image into regions.
Segmentation algorithms for monochrome images are generally based on one of the
following two basic categories. The first one is Edge-based segmentation. The second
one is Region-based segmentation. Another method is to use the threshold. It belongs
to Edge-based segmentation.
There are three goals that we want to achieve:
1.
2.
3.
The first one is speed. Because we need to save the time from segmentation and
give complicate compression more time to process.
The second, one is to have a good shape matching even under less computation
time.
The third, one is that the result of segmenting shape will be intact but not
fragmentary which means that we want to have good connectivity.
2. Edge-Based Segmentation
The focus of this section is on the segmentation methods based on detection in
1
sharp, local changes in intensity. The three types of image feature in which we are
interested are isolated points, lines , and edges. Edge pixels are pixels at which the
intensity of an image function changes abruptly.
2.1. Fundamental
We know local changes in intensity can be detected using derivatives. For
reasons that will become evident , first- and second-order derivatives are particularly
well suited for this purpose.
Figure 2.1 The intensity histogram of image
We have following conclusions from Figure 2.1 The intensity histogram of image
Figure 2.1:
First-order derivatives generally produce thicker edges in an image.
1. Second-order derivative have a stronger response to fine detail, such as thin lines,
isolated points, and noise.
2. Second-derivatives produce a double-edge response at ramp and step transitions
in intensity.
3. The sign of the second derivative can be used to determine whether a transition
into an edge is from light to dark or dark to light.
2.2. Isolated Points
It is based on the conclusions reached in the preceding section. We know that
point detection should be based on the second derivative, so we expect Laplacian
mask.
2
Figure 2.2
The mask of isolation point
We can use the mask to scan the all point of image , and count the response of every
point, if the response of point is greater than T(the threshold we set), we can define
the point to 1(light), if not, set to 0(dark).
G( x, y) 

1
0
if R ( x , y ) T
otherwise
(2.1)
2.3. Line Detection
As discussion in section2.1,we know the second order derivative have stronger
response and to produce thinner lines than 1st derivative. We can get four different
direction of mask.
Figure 2.3
Line detection masks.
Let talk about how to use the four masks to decide which direction of mask is better
than others. Let R 1 , R 2 , R 3 and R 4 denote the response of the masks in Figure 2.3.
3
If at a point in the image , R k > R j ,for all j  k. R 1 > R j for j=2,3,4,that point is
said to be more likely associated with a line in the direction of mask k.
2.4. Edge detection
Figure 2.4 (a) Two region of constant intensity separated by an ideal vertical
ramp edge.(b)Detail near the edge, showing a horizontal intensity profile.
We conclude from the observation that the magnitude of 1st derivative can be used
to detect the presence of an edge at a point in an image. The 2nd derivative have two
properties : (1) it produces two values for every edge in an image.(2)its zero
crossings can be used for locating the center of thick edges.
2.4.1. Basic Edge Detection (gradient)
The image gradient is to find edge strength and direction at location (x,y) of image,
and defines as the vector.
 f 
 g x   x 
f  grad( f )      
g
 y   f 
 y 
(2.2)
The magnitude (length) of vector f , denoted as M(x,y)
mag(f )  g x  g y
(2.3)
The direction of the gradient vector is given by the angle
gy

 g x 
 ( x, y)  tan 1 
(2.4)
The direction of an edge at an arbitrary point (x,y) is orthogonal to the direction.
We are dealing with digital quantities,so a digital approximation of the partial
4
derivatives over a neighborhood about a point is required.
1. Roberts cross-gradient operators. Roberts [1965].
Figure 2.5 Roberts mask
gx 
f
 (z 9 z 5)
x
(2.5)
gy 
f
 (z 8 z 6 )
y
(2.6)
2. Prewitt operator
Figure 2.6 Prewitt’s mask
gx 
f
 (z 7  z 8  z 9 )  (z 1 z 2  z 3)
x
(2.7)
gy 
f
 (z 3  z 6  z 9 )  (z 1 z 4  z 7 )
y
(2.8)
3. Sobel operator
5
Figure 2.7 (a)~(g) are region of an image and various masks used to compute the
gradient at the point labeled z 5
gx 
f
 ( z 7 2 z 8  z 9 )  ( z 1 2 z 2  z 3 )
x
(2.9)
gy 
f
 ( z 3 2 z 6  z 9 )  ( z 1 2 z 4  z 7 )
y
(2.10)
The Sobel mask uses 2 in the center location for image smoothing. The Prewitt masks
are simpler to implement than Sobel masks, but the Sobel masks have better
noise-suppression(smoothing) characteristics makes them preferable.
In the previous discussion, we just discuss to obtain the
gx
and
gy
. However, this
implementation is not always desirable ,so an approach used frequently is to
approximately the magnitude of the gradient by absolute values:
M ( x, y )  g x  g y
(2.11)
2.4.2. The Marr-Hildreth edge detector(LoG)
This method in use at the time were based on using small operators ,and we
discuss previously the 2nd derivative is better than 1st derivative in small operator.
Then we use Laplacian to make it.
6
Figure 2.8 5x5mask of LOG
The Marr-Hildreth algorithm consists of convolving the LoG filter with an input
image, f ( x, y ) .
g ( x, y)  [2G( x, y)]  f ( x, y)
(2.12)
Because these are linear process, Eq(2.4-9) can be written also as
g ( x, y)  2 [G( x, y)  f ( x, y)]
(2.13)
It’s edge-detection algorithm may be summarized as follow:
1. Filter the input image with an n  n Gaussian lowpass filter(It can smooth the
large numbers of small spatial details).
2. Compute the Laplacian of the image resulting from Step1 using.
3. Finding the zero crossings of the image from Step2.
To specify the size of Gaussian filter, recall that about 99.7% of the volume under a
2-D Gaussian surface lies between 3  about the mean. So n  6 .
Figure 2.9
(a) Image of input. (b) After using the LoG with threshold 200.
7
2.5. Edge Linking and Boundary Detection
Edge detection should yield sets of pixel lying on edge, but noise would breaks in
the edges due to nonuniform illumination. Therefore, edge detection typically is
followed by linking algorithms designed to assemble edge pixels into meaningful
edges and/or region boundaries.
2.5.1. Local processing
This edge linking is to analyze the characteristics of pixels in a small
neighborhood about every point (x,y) that has been declared an edge point by
previous techniques.
The two principle properties used for establishing similarity of edge pixels are
1. the strength(magnitude)
M (s, t )  M ( x, y)  E
(2.14)
S xy denote the set of coordinates of a neighborhood centered at point ( x, y ) .
E is a positive threshold.
2. The direction of the gradient vector
 (s, t )   ( x, y)  A
(2.15)
A is a positive angle threshold.
2.5.2. Regional processing
Often, the location of regions of interest in an image are known or can be
determined. In such situations ,we can use techniques for linking pixels on a regional
basis, with the desired result being an approximation to the boundary of the region.
We discuss the mechanics of the procedure using the following Fig2.6
Figure 2.10 illustration of the iterative polygonal fit algorithm
An algorithm for finding a polygonal fit may be stated as follows:
8
1.
Specify two starting point A and B.
2.
Connected A and B ,and compare which points are defined vertices of the
polygon max and larger than T(threshold).
Then connect all the point we have, thus compare it like step 2 until distance
between every point and connected lines vertices is smaller than T.
3.
2.5.3. Global processing using the Hough transform
In regional processing, it makes sense to link a given set of pixels only if we know
that they are part of the boundary of a meaningful region. Often, we have to work
with unstructured environments in which all we have.
We can use Hough transform to use coordinate transition to find out the similar point
in other place.
Figure 2.11 (a)xy-plane. (b)Parameter space
Consider a point ( x i , y i ) in the xy-plane and the general equation of a straight line
in slope-intercept from, y i  ax i b ,a second point ( x j , y j ) also has a line in
parameter space associated with it .but a practical difficulty with the approach, is that
a (slope of a line)approaches infinity as the line approaches the vertical direction. One
way to use the normal representation of a line:
x cos   y sin   
(2.5-3)
9
Figure 2.12 (a) A line in the xy-plane. (b)Sinusoidal curves in the ρθ-plane; the
point of the intersection (ρ,θ)corresponds to the line passing through point (xi,yi)
and (xj,yj)in the xy-plane
2.6. Segmentation Using Morphological Watersheds
2.6.1. Background
The concept of watershed is based on visualizing an image in three dimensions:
two spatial coordinates and intensity. We consider three types of points:
1. The points belonging to the local minimum.
2. The points where a drop of water, if placed at the locations of these points, would
fall to a single local minimum. It is called catchment basin or watershed.
3. The points where water would be equally likely to fall to more than one local
minimum. They are similar to the crest lines on the topographic surface and are
termed divide lines or watershed lines.
The two main properties of watershed segmentation result are continuous boundaries
and over-segmentations. As we know, the boundaries that made by the watershed
algorithm are exact the watershed lines in the image. Therefore, the numbers of region
basically will be equal to the numbers of minima in the image. There are two steps to
achieve the solution using marker:
1. Preprocessing
2. Defining the criteria that the markers have to be satisfy.
The following figures are the mechanism to construct dam.
10
Figur2.10(a)~(d) Watershed algorithm.
Supposed that figure2.10 are the image of input , and the height of the “mountain” is
proportional to intensity values input image. We start to flood water from below by
letting water rise through the holes at a uniform rate. Figure (b) we see that water now
has risen into the first and second catchment basins. So we will construct a dam to
stop it to overflowing ,and do the same motion step by step.
2.6.2. The Use of Markers
Direct application of the watershed segmentation algorithm in the form discussed
in the previous section generally leads to oversegmentation due to noise and other
local irregularities of gradient.
An approach used to control oversegmentation is based on the concept of markers.
Then we have markers. We have internal markers, associated with objects of interest,
and external markers. A procedure for markers selection typically will consist of two
principal steps: (1)preprocessing (usually smoothing) (2)definition of a set of criteria
that markers must satisfy.(to do edge detection for every small region)
(a)(b)
(c)(d)
11
Figure 2.13 (a) Electrophoresis image(b)Result of applying the watershed
segmentation algorithm to the gradient image, we can say that oversegmentation
is evident.(c) is a image after (b) by smoothing, it shows internal markers(light
gray region) and external markers(watershed lines)(d)Result of segmentation.
Note the improvement over (b).(Courtesy of Dr. S.Beucher,CMM/Ecole des Mines
de Paris.)
2.7. Edge detection using Hilbert transform(HLT)
Compare with the derivative or so called the differential method, the impulse
response of HLT is much longer. The longer impulse response will reduce the
sensitivity of the edge detector and at the same time reduce the influence of noise.
We can learn that the longer response has less sensitive but has good detection in
ramp edge and more noisy robustness.
We list the mathematics in form of discrete time version of the HLT below.
The discrete version of the HLT is
g H  n   IDFT  H [ p ]DFT  g[n] 
(2.16)
Where
N 1
DFT  g[n]   g  n  e  j 2 pm / M ,
n0
1 N 1
IDFT  F [m]   e j 2 pm / N F [m],
N m0
H [ p ]   j for 0  p  N / 2,
H [ p]  j
for N / 2  p  N ,
H [0]  H [ N / 2]  0.
(2.17)
(2.18)
2.7.1. Short response Hilbert transform(SRHLT)
We have realized the advantages and disadvantages of the derivative method and
the HTL method for detecting edges. S. C. Pei and J. J. Ding proposed another
method combining the two methods to detect edges in 2007. [D-1] [D-4] They
combine the HLT and differentiation to define the Short Response Hilbert Transform
(SRHLT). They define the short response Hilbert transform (SRHLT) as:
gH    hb  x   g  x  ,
where hb  x   b csch( bx)
GH ( f )  H b ( f )G ( f ) where GH ( f )  FT [ g H ( )],
G ( f )  FT [ g ( )],
H b ( f )   j tanh( f / b).
(2.19)
(2.20)
When b  0+ (0+ is a positive number very near to 0), the SRHLT becomes the HLT.
12
When b  , the SRHLT tends to the differentiation operation.
Choose a suitable
value
HLT
b0
differentiation
b
Figure 2.14 The characters of the SRHLT
(a)
Time domain
1
Hilbert transform
FT 0
0
-1
-2
(c)
1
-1
-1
0
1
2
(e)
1
SRHLT, b=0.25
-1
-2
1
2
-1
0
1
2
-1
0
1
2
-1
0
1
2
-1
0
1
2
-1
-1
0
1
2
(f)
-2
1
SRHLT, b=1
FT 0
-1
-1
0
1
2
(h)
-2
1
SRHLT, b=4
FT 0
-1
-1
0
1
2
(j) -2
10
differentiation
FT 0
0
-1
-2
0
FT 0
0
(i) 1
-1
1
0
-1
-2
(g)
1
-2
(d)
0
-1
-2
Frequency domain
(b)
1
-1
0
1
2
-10
-2
Figure 2.15 Impulse responses and their FTs of the SRHLT for different b.
Higher b(differentiation)
Lower b(HLT)
Impulse response
Shorter
longer
Noise robustness
bad
good
Type of edge
step
ramp
output
sharp
thick
3. Thresholding
3.1. Basic Global Thresholding
As the fact that we need only the histogram of the image to segment it,
13
segmenting images with Threshold Technique does not involve the spatial information
of the images. Therefore, some problem may be caused by noise, blurred edges, or
outlier in the image. That is why we say this method is the simplest concept to
segment images.
When the intensity distributions of objects and background pixels are sufficiently
distinct, it is possible to use a single(global) threshold applicable over the entire image.
The following iterative algorithm can be used for this purpose:
1. Select an initial estimate for the global threshold, T.
2. Segment the image using T as
g ( x, y )   01
if f(x,y) T
if f(x,y)  T
(3.1)
This will produce two groups of pixels: G1 consisting of all pixels with intensity
values > T, and G 2 consisting of pixels with values  T.
3.
Compute the average(mean)intensity values
m 1 and m 2 for the pixels
in G1 and G 2 .
4.
Compute a new threshold values:
T
5.
1
(m 1  m 2 )
2
Repeats Step2 through 4 until the difference between values of T in successive
iterations is smaller than a predefined parameter.
3.2. Optimum Global Thresholding Using Otsu’s Method
Thresholding may be viewed as a statistical-decision theory problem whose objective
is to minimize the average error incurred in assigning pixels to two or more groups.
Let{0,1,2,…,L-1}denote the L distinct intensity levels in a digital image of size
M  N pixels, and let n i denote the number of pixels with intensity i . The total
number, MN, of pixels in the image is MN  n 0 n 1 n 2 ...  n L1 . The normalized
histogram has components p i  n i / MN , from which it follows that
L 1
p
i 0
i
 1, p i  0
14
(3.2)
Now, we select a threshold T (k )  k , 0  k  L  1 , and use it to threshold the input
image into two classes, C 1 and C 2 , where C 1 consist with intensity in the range [0, k ]
and C 2 consist with [k  1, L  1] .
Using this threshold , P 1 (k ) ,that is assigned to C 1 and given by the cumulative sum.
k
P 1 (k )   p i
(3.3)
i 0
P 2 (k ) 
L 1
p
i  k 1
i
 1  P 1 (k )
(3.4)
The validity of the following two equations can be verified by direct substitution of
the preceding result:
P 1 m1  P 2 m 2  m G
(3.5)
P 1 + P 2 =1
(3.6)
In order to evaluate the “goodness” of the threshold at level k we use the normalized,
dimensionless metric
=
 B2 (k )
 G2
(3.7)
Where  G2 is the global variance
L1
 G2 =  i  m G  p i
2
(3.8)
i 0
And  B2 is the between-class variance, define as :
 B2  P 1 (m 1 m G ) 2  P 2 (m 2 m G ) 2
(3.9)
(m G P 1 (k )  m(k )) 2
 (k )  P 1 P 2 (m1  m 2 ) 
(3.10)
P 1 (k )(1  P 1 (k ))
Indicating that the between-class variance and  is a measure of separability
2
B
2
between class.
Then, the optimum threshold is the value, k  ,that maximizes  B2 ( k )
 B2 (k  )  max  B2 (k )
o k  L 1
15
(3.11)
In other word, to find k  we simply evaluate (2.7-11) for all integer values of k
Once k  has been obtain, the input image f ( x, y ) is segmented as before:
g ( x, y )  10
if f(x,y) k*
if f(x,y) k*
(3.12)
For x = 0,1,2,…,M-1 and y = 0,1,2…,N-1. This measure has values in the range
(3.13)
0   (k *)  1
3.2.1. Using image Smoothing/Edge to improve Global Threshold
Compare the difference between preprocess of smoothing and Edge detection.
Smoothing
What situation is more Large
suitable for the method
Edge detection
object
we
are Small
interested.
object
we
are
interested
3.3. Multiple Threshold
For three classes consisting of three intensity intervals, the between-class variance is
given by:
 B2  P 1 (m 1 m G )2  P 2 (m 2 m G ) 2  P 3 (m 3 m G ) 2
(3.14)
The following relationships hold:
P 1 m1  P 2 m 2  P 3 m 3  m G
(3.15)
P 1 + P 2 +P 3 =1
(3.16)
The optimum threshold values,
 B2 (k1* , k2* ) 
max
o  k 1  k 2  L 1
 B2 (k 1 , k 2 )
(3.17)
Finally, we note that the separability measure defined in section 2.7.2 for one
threshold extends directly to multiple thresholds:
 (k1* , k2* ) 
 B2 (k1* , k2* )
 G2
(3.18)
3.4. Variable Thresholding
Image partitioning
One of the simplest approaches to variable threshold is to subdivide an image into
nonoverlapping rectangles. This approach is used to compensate for non-uniformities
in illumination and/or reflection.
16
Figure 3.1 (a)Noisy, shaded image (b) Image subdivide into six subimages.
(c)Result of applying Otsu’s method to each subimage individually.
Image subdivision generally works well when the objects of interest and the
background occupy regions of reasonably comparable size. When this is not the case,
the method typically fail.
Variable thresholding based on local image properties
We illustrate the basic approach to local thresholding using the standard deviation and
mean of the pixels in a neighborhood of every point in an image. Let 
xy
and
m xy denote the standard deviation and mean value of the set of pixels contained in a
neighborhood, S xy .
Q(local parameters) is true
g ( x, y )  10 ifif Q(local
parameters) is true
(3.19)
Where Q is a predicate based on parameter computes using the pixels in
neighborhood.
Q(
xy , m xy ) 

true if f ( x , y )  a
false otherwise
xy
ANDf ( x , y ) bm xy
(3.20)
Using moving average
Computing a moving average along scan lines of an image. This implementation is
quite useful in document processing, where speed is a fundamental requirement. The
scanning typically is carried out line by line in a zigzag pattern to reduce illumination
bias.
m(k  1) 
1 k 1
1
z i  m(k )  ( z k 1  z k  n )

n i  k  2n
n
(3.21)
Let z k 1 denote the intensity of the point encountered in the scanning sequence at step
k+1. Where n denote the number of point used in computing the average.
17
m(1)  z 1 / n is the initial value. Segmentation is implemented using Eq(2.7-1) with
T
xy
 bm xy ,where b is constant and m xy is the moving average at point (x,y) in the
input Image.
Multivariable Thresholding
We have been concerned with thresholding based on a single variable: gray-scale
intensity. A notable example is color imaging, where red(R),greed(G), and blue(B)
components are used to form a composite color image. It can be represented as a 3-D
vector, z= ( z 1 , z 2 , z 3 )T ,whose component are the RGB colors at a point.
Let a denote the average reddish color in which we are interested, D(z,a) is a distance
measure between an arbitrary color point, z, then we segment the input image as
follows:
if D(z,a)<T
g ( x, y )  10 otherwise
(3.22)
Note that the inequalities in this equation are the opposite of the equation we used
before. The reason is that the equation D(z,a)=T defines a volume.
D( z , a )  z  a  [( z  a ) ( z  a )]
T
1
1
2
D( z , a )  z  a  [( z  a ) C ( z  a )]
T
(3.23)
1
2
(3.24)
A more powerful distance measure is the so-called Mahalanobis distance.
Where C is the covariance matrix of the zs, when C=I, the identity matrix.
4. Region-Based Segmentation
4.1. Region Growing
Region growing segmentation is an approach to examine the neighboring pixels of
the initial “seed points” and determine if the pixels are added to the seed point or
not.
Step1. Selecting a set of one or more starting point (seed) often can be based on the
nature of the problem.
18
Step2. The region are grown from these seed points to adjacent point depending on
a threshold or criteria(8-connected) we make.
Step3. Region growth should stop when no more pixels satisfy the criteria for
inclusion in that region
Figure 4.1 (a)Original image (b)Use step 1 to find seed based on the nature
problem.(c) Use Step 2(4-connected here) to growing the region and finding the
similar point. (d)(e) repeat Step 2. Until no more pixels satisfy the criteria. (f) The
final image.
Then we can conclude several important issues about region growing:
1.
2.
3.
4.
5.
The suitable selection of seed points is important. The selection of seed points is
depending on the users.
More information of the image is better. Obviously, the connectivity or pixel
adjacent information is helpful for us to determine the threshold and seed points.
The value, “minimum area threshold”. No region in region growing method
result will be smaller than this threshold in the segmented image.
The value, “Similarity threshold value“. If the difference of pixel-value or the
difference value of average gray level of a set of pixels less than “Similarity
threshold value”, the regions will be considered as a same region.
The result of an image after region growing still have point’s gray-level higher
than the threshold but not connected with the object in image.
We briefly conclude the advantages and disadvantages of region growing.
19
Advantages:
1.
Region growing methods can correctly separate the regions that have the same
properties we define.
2. Region growing methods can provide the original images which have clear edges
the good segmentation results.
3. The concept is simple. We only need a small numbers of seed point to represent t
he property we want, then grow the region.
4. We can choose the multiple criteria at the same time.
It performs well with respect to noise, which means it has a good shape matching of
its result.
Disadvantage:
1. The computation is consuming, no matter the time or power.
2. This method may not distinguish the shading of the real images.
In conclusion, the region growing method has a good performance with the good
shape matching and connectivity. The most serious problem of region growing
method is the time consuming.
4.2. Simulation of Region Growing Using C++.
20
Figure 4.2 Lena image after using region growing, there are 90% pixels have
been classified. Threshold/second: 20/4.7 seconds.
The method have connected region, but it need more time to process.
4.3. Region Splitting and Merging
An alternative method is to subdivide an image initially into a set of arbitrary, disjoint
regions and then merge and/or split the region.
The quadtrees means that we subdivide that quadrant into subquadrants, and it is the
21
following as:
1.
Split
into
four
disjoint
quadrants
any
region
Ri
for
which
Q( R i )  FALSE (means the region don’t satisfy same logic in R i ) .
2.
When no further splitting is possible, merge any adjacent region R j and R k for
which Q( R j  R k )  TRUE (means that R j and R k have similarity we define in
3.
some where) .
Stop when no further merging is possible.
Advantage of region splitting and merging:
We can split the image by choosing the criteria we want, such as segment variance or
mean of the pixel-value. And the splitting criteria can be different from the merging
criteria.
Disadvantage of it:
1. Computation is intensive.
2. Probably producing the blocky segments.
The blocky segment problem effect can be reduce by splitting for higher resolution,
but at the same time, the computational problem will be more serious.
4.4. Data Clustering
The main idea of data clustering is that we will use the centroids or prototypes to
represent the huge numbers of clusters to reach the two goals which are, “reducing the
computational time consuming on the image processing”, and “providing a better
condition(say, more conveniently for us to compress it) on the segmented image for us
to compress it”.
Different from hierarchical and partitional clustering.
Hierarchical clustering, we can change the number of cluster anytime during
process if we want.
Partitional clustering , we have to decide the number of clustering we need first
before we begin the process.
4.4.1. Hierarchical clustering
There are two kink of hierarchical clustering. Agglomerative algorithms(builds) begin
with each element as a separate cluster and merge them in successively larger clusters.
The divisive algorithms(break up) begin with the whole set and proceed to divide it
22
into successively smaller clusters.
Agglomerative algorithms begin at the top of the tree, whereas divisive algorithms
begin at the bottom. (In the figure, the arrows indicate an agglomerative clustering.)
We introduce for the former one first.
Algorithm of hierarchical agglomeration:
1.
See every single data (as for the image, pixel) point in the database (as for the
image, the whole image) as a cluster Ci .
2.
Find out two data points Ci , C j for the distance between them is the shortest in
the whole database (as for the image, the whole image), and agglomerate them
together to form a new cluster.
3. Repeat the step 1 and step 2 until the numbers of cluster satisfies our demand.
Notice that we have to define the “distance” in hierarchical algorithm. The mostly two
adopted definitions are single-linkage agglomerative and complete-linkage
agglomerative methods.
Single-Linkage agglomerative algorithm:
D(Ci , C j )  min(d (a, b)),
for a  Ci , b  C j
(4.1)
Complete-Linkage agglomerative algorithm:
D(Ci , C j )  max(d (a, b)),
for a  Ci , b  C j
(4.2)
We assume D(Ci , C j ) as the distance between cluster Ci and C j . And assume
d (a, b) as the distance between data a and b (as for image, pixel a and pixel b).
For example, we have six elements {a} {b} {c} {d} {e} and {f}. The first step is to
determine which elements to merge in a cluster. Usually, we want to take the two
closest elements,
Suppose we have merged the two closest elements b and c, we now have the
following clusters {a}, {b, c}, {d}, {e} and {f}, and want to merge them further. But
to do that, we need to take the distance between {a} and {b c}, and therefore define
the distance between two clusters
Algorithm of hierarchical division:
1.
2.
See the whole database (as for the image, the whole image) as one cluster.
Find out the cluster having the biggest diameter in the clusters group we have
already had.
23
3.
Find the data point (as for image,
d ( x, C )  max (d ( y, c)), for y  C
the
pixel)
4.
Split x out as a new cluster C1 , and see the rest data points of C as Ci
5.
Count d ( y, C1 ) and d ( y, Ci ) , for y  Ci .
x
in
C
that
If d ( y, Ci ) > d ( y, C1 ) , then split y out of Ci and classify it to C1
6.
Back to step2 and continue the algorithm until every iteration of C1 and Ci is
not change anymore.
We assume that the diameter of a cluster Ci as D(Ci ) . The diameter is defined as
D(Ci )  max(d (a,b)) , for a  Ci , b  Ci .
The distance between point x and cluster C is defined as below,
d ( x, C )  the mean of the distance between x and every single point in cluster C .
Figure 4.3 The simple case of hierarchical division
4.5. Partitional clustering
Comparing with Hierarchical algorithm, partitional clustering cannot show the
characteristic of database. However, it does save more computational time than the
24
Hierarchical algorithms. The most famous partitional clustering algorithm is
“K-mean” method. We now check out its algorithm.
Algorithms of K-means:
1. Decide the numbers of the cluster in the final classified result we want. We
assume the number is N .
2. Randomly choose N data points (as for image, the N pixels) in the whole
database (as for image, the whole image) as the N centroids in N clusters.
3. As for every single data point (as for image, the pixel), find out the nearest
centroid and classify the data point into that cluster the centroid located. After step
3, we now have all data points classified in a specified cluster. And the total
numbers of cluster is always N , which is decided in the step 1.
4. As for every cluster, we calculate the centroid of it with the data points in it. The
same, the calculation of the centroid can also be defined by users. It can be the
median of the database within the cluster, or it can be the really centre. Then again,
we get N centroids of N clusters just like we do after step 2.
5. Repeat the iteration of step 3 to step 4 until there is no change in two successive
iterations.( Step 4 and 5 are checking step)
We have to mention the in step 3, we do not always decide the cluster classified by the
“distance” between the data points and the centroids. The using of distance now is
because it is the simplest criteria choice. We can also use some other criteria as we
want depending on the database characteristics or the final classified result we want.
Figure 4.4 (a) original image. (b) Chose 3 clusters and 3 initial points. (c)
Classify other points by using the minimum distance between the point to the
center of the cluster.
There are some disadvantages of the K-mean algorithm.
1. The results are sensitive from the initial random centroids we choose. That is,
difference choices of initial centroids may result in different results.
2. We cannot show the segmentation details as hierarchical clustering does.
25
3. The most serious problem is that the results of the clustering are often the circular
shapes due to the distance-oriented algorithms.
4. It is important to clarify that 10 average values does not mean there are merely 10
regions in the segmentation result. See fig 4.4
Figure 4.5 (a) The result of the K-means. (b) The result that we want.
To solve initial problem
There is a solution to overcome the initial problem. We can choose just one initial
point in the step 2. Using two point beside the initial point as centroids to classify
become two cluster, and then using the other four point beside the two point, do it
until satisfy the number of clustering we want. That means the initial point is fixed
also. Then we split the numbers of cluster until N clusters are classified. By using
this kind of improvement, we solve the initial problem caused by randomly chose
initial data point. No matter what names they called, the concepts of them are all
similar.
To determine the number of clusters
There are many researches to determine the number of clusters in K-means clustering.
Siddheswar Ray and Rose H. Turi define a “validity” which is the ratio of
“intra-cluster distance” to “inter-cluster distance” to be a criteria. The validity
measure will tell us what the ideal value of K in the K-means algorithm is.
Vadility=
intra
inter
(4.3)
They define the two distances as:
intra-cluster distance=
1
N
k

i 1 xCi
x  zi
2
where N is the number of pixels in the image, K is the number of cluster,
cluster centre of cluster
Ci
.
26
(4.4)
zi
is the
Since the goal of the k-means clustering algorithm is to minimize the sum of squared
distances from all points to their cluster centers. First we have to define a
“intra-cluster” distance to describe the distances of the points from their cluster centre
(or centroid), then minimize it.
inter-cluster distance  min( zi  z j
2
)
, i  1, 2,3,......, K  1
(4.5)
; j  i  1,......K
From the other hand, the purpose of clustering a database (or an image) is to separate
the differences between clusters. Therefore we define the “Inter-cluster distance”
which can describe the difference of different clusters. And we want the value of it the
bigger the better. It’s obvious that the distance between each clusters’ centroids are
large, then we just need few clustering to segment it, but if each centroids are close,
then we need more cluster to classify it clearly.
4.5.1. More method to solve initial problem but do not change the scheme of K-mean
Particle Swarm optimization
PSO is a population-based randomly searching process. We assume that there are
N “particles” randomly appear in a “solution space”. Mention that we are solving the
optimization problem and for data clustering, there is always a criteria (for example,
the squared error function) for every single particle at their position in the solution
space. The N particles will keep moving and calculating the criteria in every position
the stay (we call “fitness” in PSO) until the criteria reaches some threshold we need.
Each particle keeps track of its coordinates in the solution space which are associated
with the best solution (fitness) that has achieved so far by that particle. This value is
called personal best, pbest.
Another best value that is tracked by the PSO is the best value obtained so far by any
particle in the neighborhood of that particle. This value is called global best, gbest.
We introduce the exact statement in mathematics below:
vi , j (t )  w  vi , j (t  1)  c1  r1 ( pi , j (t  1)
 xi , j (t  1))  c2  r2 ( pg , j (t  1)
(4.6)
 xi , j (t  1))
xi , j (t )  xi , j (t  1)  vi , j (t )
(4.7)
where xi is the current position of the particle, vi is the current velocity of the
particle, pi is the personal best position of the particle, w, c, are all constant factors,
27
and r are the random numbers uniform distributed within the interval [0,1].
We use last velocity and last position of personal and global best to predicate the
velocity now. The position we stay is predicated by last position plus velocity now.
By using PSO, we can solve the initial problem of “K-means” and still maintain the
whole partitional clustering scheme. The most important thing is to think about it as
an optimization problem.
4.5.2. Advantage and disadvantage of data clustering
Hierarchical algorithm
advantage
1.
2.
3.
disadvantage
Partitional algorithms(K-means)
Concept is simple.
1.
Result is reliable. Result shows 2.
strong correlation with original
Computing speed is fast.
Numbers of cluster is fixed, so
the concept is also simple.
image.
Instead of calculating the
centroids of clusters, we have
to calculate the distances
between every single data
point.
1.
Computing
time
is
consuming. So that this
algorithm is not suitable for
performing on a large
database.
1.
2.
3.
4.
5.
28
Numbers of cluster is
fixed. So, what numbers
of clustering we choose is
the best?
Initial problem.
Partitional
clustering
cannot
show
the
characteristic of database
compared
with
Hierarchical clustering.
The
results
of
the
clustering are probably the
circular shapes.
We
can’t
improve
K-means by setting less
centroids.
We can solve the choice of numbers of clustering by observing the value “validity”
proposed by Siddheswar Ray and Rose H. Turi. For the initial problem, we can solve
it by choosing only one initial point or using the PSO algorithm directly.
However, we cannot solve the circular shapes problem because that is due to the core
computing scheme of partitional algorithms.
4.6. Simulation of K-means using C++.
Figure 4.6
Lena image after using k-means. Clustering/time: 9 clustering/ 0.1
29
seconds.
The image of top left is the original image. The others are the images after K-means.
We discover it is not connected to one region. However, it is a fast way in
region-based segmentation.
4.7. Cheng-Jin Kuo`s method
We will introduce the method we propose and explain why we would like to use this
kind of algorithm by the “data compression”.
4.7.1. Ideal thought of segment we would like to obtain.
The ideal result we would like to obtain is something like fig.5-6. It is very important
for us to classify the similar region together.
Figure 4.7 The ideal segment result we want.
We would like to see that the whole hair section is classified as one cluster. Because
after we obtain the result, we can send the result directly to the compression stage to
do the compression for every region. For almost all the methods of segmentation, it is
unavoidable to over-segment a region like the hair region in Lena image.
4.7.2. Algorithm of Cheng-Jin Kuo`s method
Make the first pixel (Mostly, it will be the top-left one) we scan as the first clustering.
1. See the pixel (x,1) in the image as one cluster Ci . See the pixel which we are
scanning as C j .
2. From the first column, scan the next pixel (x,1+1) and make a decision with
the threshold if it will be merged into the first clustering or to be a new
clustering.
30
If C j  centroid (Ci )  threshold , we merge C j into Ci and recount the
centroid of Ci .
If C j  centroid (Ci )  threshold , we make C j as a new cluster Ci 1 .
3. Repeat step 2 until all the pixels in the same column have been scanned.
4. Scan the next column with pixel (x+1,1) and compare it to the region Cu
which is in the upper side of it. And make the merge decision see if we have
to merge pixel (x+1,1) to the region Cu .
If C j  centroid (Cu )  threshold , , we merge C j into Cu and recount the
centroid of Cu .
If
C j  centroid (Cu )  threshold ,
we
make
Cj
as
a
new
cluster
Cn , where n is the cluster number so far .
5. Scan the next pixel (x+1,1+1) and compare it to the region Cu , Cl which is
upper to it and in the left side of it, respectively. And make the merge
decision see if we have to merge pixel (x+1,1) to anyone of them.
If C j  centroid (Cu )  threshold and C j  centroid (Cl )  threshold ,
(1) We merge C j into Cu , merge C j into Cl .
(2) Combine the region Cu and Cl to be region Cn , where n is the cluster
number so far.
(3) Recount the centroid of Cu .
else if C j  centroid (Cu )  threshold and C j  centroid (Cl )  threshold ,
(1) We merge C j into Cu and recount the centroid of Cu .
else if C j  centroid (Cu )  threshold and C j  centroid (Cl )  threshold ,
(2) We merge C j into Cl and recount the centroid of Cl .else We make
C j as a new cluster Cn , where n is the cluster number so far .
6. Repeat step 4 ~ step 5 until all the pixel in the image has been scanned.
7. Process the small regions which are classified from step1~ step4
It is important for us to deal with the isolated small regions carefully. For that we do
31
not want there are too many fragmentary results after we segment the image using our
method. Therefore, our goal is to classify the isolated small regions into the big region
which is already classified and is adjacent to these isolated small regions.
The following is the method to merge small region to big region.
(a) We aim and are prepare to process the regions Ri which have the small
size.(For the 256x256 input images, we aim the regions which have the size
below 32 pixels)
(b)
If the region Ri is fully surrounded by a single bigger region
Ci ; Ci  Ci  Ri
(C) If the region Ri is surrounded by several(for example, k) bigger regions
Ci , where i  1 ~ k ,We see the adjacent pixel of Ci as a region and count We
count the mean of Ri and classified Ri to the most similar
Ci :if
mean (Ri ) - mean (C h )  mean (R i - C i ),
where h could be one of 1~k, and i=1~k,
then Ch  Ch  Ri
4.8. The Improvement of the Fast Algorithm : Adaptive local Threshold Decision
In our algorithm, the threshold in section 4.7 does not change in the whole
procedure. We would like to make a new procedure that could adaptively decide the
threshold with the local frequency and variance in the original figure.
4.8.1. Adaptive threshold decision with local variance
We would like to select the threshold based on the local variance of a figure. Here is
the step of the algorithm:
1. Separate the original figure to 4*4, 16 section.
2. Compute the variance of the 16 sections, respectively.
3. Depending in the local variance, we select the suitable threshold. The bigger
variance, we assign the bigger threshold.
32
Figure 4.8
Lena image separated into 16 sections.
The variance of 16 sections as a matrix:
716 447
899 1579
1497 1822
2314 1129
1293
1960
1974
1545
2470
2238
1273
1646
(4.8)
We can image that after using the adaptive threshold method, the segmented result in
section (2,3) and (2,4) whose variance are 1960 and 1974 will be similar to the
original method whose global variance is 1943
We can image that the new segmented result in section (1,1) and (1,2) will be more
detailed, or we can say, in these two sections we will have more clusters in the result.
In the most of the time, the adaptive threshold selection method help us to do more
precisely segmentation. However, as we can see, we do not really feel the
improvement of select local threshold with local variance by observing the simulation
results.
The small value of variance will cause our local threshold be a small one. Therefore, it
will cause a more detailed segmenting result in the end.
4.8.2. Adaptive threshold decision with local variance and frequency
For example, the baboon.bmp will be segmented more detailed because the adaptive
local variance will select a smaller threshold in every parted area, because it’s low
variance. However, we do not satisfy for this. We would like to segment the beard
parts of this image more roughly.
33
Figure 4.9
Baboon.bmp
The local variance of baboon.bmp:
1625 1645 1694 1562
1405 865 757 1058
2346 1222 1256 505
606 990 1054 635
(4.9)
The local average of baboon.bmp:
14.0943 12.4850 12.1756 13.6914
12.7597 9.8058 9.4788 12.6781
11.4280 10.4072 10.6333 12.8095
11.8825 12.5687 13.1654 11.8211
(4.10)
As we can see, the bottom left area has a small variance and big frequency component.
In this area we will choose a small threshold, then the segmenting result will be more
detail. If we want to classify it as one region, we need set it depend on local
frequency.
To sum up, we conclude four situations for our improvement.
1. High frequency, high variance. Set highest threshold.
Figure 4.10
A high frequency and high variance Image
34
2. High frequency, low variance. Set second high threshold.
Figure 4.11
A high frequency and low variance image
3. Low frequency, high variance. Set third high threshold.
Figure 4.12 A frequency and high variance image
4. Low frequency, low variance. Set the lowest threshold.
Figure 4.13 A low frequency and low variance image
For the first case, the reason for a higher threshold we select is that there are often
many edges and different objects in this kind of area. The larger value of threshold
may cause a rough segmenting result, but we believe the clear edge and the high
variety between different objects will make the segmenting work. The larger threshold
will remove some over-segmentation cause by the high variance and high frequency.
It might be thought that the smallest threshold in case four will cause an over-segment
result. However, the stable and monotonous characteristic in case four will not make
the over-segmentation work.
4.8.3. Decide the local threshold.
we use a formula to decide the threshold:
threshold  16  F  V
35
(4.11)
The formula of F:
F  A (local average frequency)  B
(4.12)
V  C (local variance)  D
(4.13)
The formula of V:
In this thesis we always try to control the threshold value between 16 and 32 for
the best testing threshold value with the original method (without using adaptive
threshold) will be 24. For that, the range of F will be 0 to 8, and so does the range of V.
The maximum of F and V are all 8 which will make the maximum of threshold be 32.
If local average frequency>9,
F=6;
Else if local average frequency<3,
F=0;
End
If local variance>3000,
V=6;
Else if local variance<1000,
V=0;
End
With the range of F to be defined from 0 to 8 and range of V to be defined from 0 to 8,
the value of A, B, C, D will be 4/3, -4, 0.004, -4, respectively. The values of
parameters are simply the linear relationship we count.
Equation (6.4) processes the linear relationship between the local average frequency
and F. We consider the case only when 3<local average frequency<9.
Equation (6.5) processes the linear relationship between the local variance and V. We
consider the case only when 1000<local variance<3000.
We can also change the range of the final threshold. Only we have to do is to recount
the parameters A, B, C, D with the equation below:
 A, B  solve  '  A  Fmin  B  / 3', ' B  Fmax  9* A '  ;
(4.14)
C, D  solve  '  C  Vmin  D  /1000 ', ' D  Vmax  3000*C '  ;
(4.15)
36
4.9. Comparison of all algorithm by data compression
Region growing
K-means
Watershed
Cheng-Jin
Kuo`s method
Speed
bad
Good(worse
than C.J.K’s
method)
bad
Good
Shape
connectivity
intact
fragmentary
oversegmentation
Intact
Shape match
Good(better
than C.J.K’s
Good(equal
C.J.K’s
bad
Good
method)
method)
5. Boundary
Compression
using
Asymmetric
Fourier
Descriptor for Non-closed Boundary Segmentation
This chapter briefly introduces the Fourier descriptor and provides an improvement of
using with Fourier descriptor in describing a boundary. We define a variable R to
represent the ratio of the number of original terms K to the number of compressed
terms P in Discrete Fourier transform. Note that R = P/K.
5.1. Fourier Descriptor
The Fourier description is a method to descript boundary by using DFT to the
image as x-axis becomes real part and y-axis becomes imaginary part. We assume that
there are several boundary of point, ( x 0 , y 0 ),( x 1 , y 1 ),...,( x k 1 , y k 1 ) . These
coordinates can be expressed in the form s(k )  [ x(k ), y (k )] , k=0,1,2,…,K-1.
Moreover, each coordinate pair can be treated as a complex number so that
s (k )  x(k )  jy (k ),
for k=0, 1,2 ...K-1.
The Discrete Fourier transform (DFT) of s(k) is
37
(5.1)
1 K 1
 s(k ) e j 2 uk / K ,
K k 0
for u=0, 1, 2, ..., K-1.
a (u ) 
(5.2)
The complex coefficients a(u) are called the Fourier descriptors of the boundary.
The inverse Fourier transform of these coefficients are denoted by s(k). That is,
K 1
s (k )   a (u ) e j 2 uk / K ,
u 0
(5.3)
for k=0, 1, 2, ..., K-1
We get rid of the high frequency terms which k is higher than P-1. In mathematics,
this is equivalent to setting a(u) = 0 for u > P1 in (4.3).
The result is the approximation to s(k):
P 1
sˆ(k )   a (u ) e j 2 uk / K ,
u 0
(5.4)
for k=0, 1,2 ...K-1.
In Fourier transform theorem, high-frequency components account for fine detail, and
low frequency components determine global shape. Thus the small P becomes the
more lost detail on the boundary.
Problems of Fourier descriptor
Fourier descriptor has a serious problem when the compression rate is below 20%.
Below this compression rate, the corner of the boundary shape will be smoothed.
Mention that the corners of an image or boundary usually present the high-frequency
components in frequency domain. However, if we reconstruct the boundary from (5-4)
and let R is less than 20%, the results are not very good in the corner of the
boundaries.
5.2. Asymmetric Fourier descriptor of non-closed boundary segment
There is a method proposed by Ding and Huang can solve the problems we mentioned
above. This method which is so called “Asymmetric Fourier descriptor of non-closed
boundary segment” can improve the efficiency of the Fourier description even when
the value of R is below 20%. There are Three approaches(Steps) in this method. We
introduce it below.[A-1]
5.2.1. Approach 1: Predicting and marking the corner
The first step of the method is to find out the corner point of the boundary. In this step,
we will predict and mark the corner of the boundary.
As we can see in Figure 5.1, the corner points are at the regional maximum of the
error value. In our experiment, we define the corner points at the place that the error
38
value is greater than 0.5 and choose the maximal in the 10-point nearby region.
35
40
45
50
1.5
55
1
60
65
Figure 5.1
0.5
35
40
45
50
55
60
0
65
0
20
40
60
80
100
120
(a) A star-shape boundary (b) Error between the two boundary of
(a).
In this method, predicting and marking the corners is just the first step. After this step,
we have to segment the original boundary into several parts and to convert these
boundary segments by Fourier description.
5.2.2. Approach 2: Fourier descriptor of non-closed boundary segment
Fourier description to describe a boundary is to get rid of the high frequency
components.
a(u)
truncate
a(u)
DFT
Boundary
Boundary
segment
segment
DFT
Fig. .
0
truncate
P
0
K
P
Fourier
Fourier
descriptor
descriptor
K
u
u
Inverse
Inverse
DFT
DFT
Recovery
Recovery
boundary
boundary
Use Fourier descriptor to a non-close boundary segment.
.
Use Fourier
descriptor to
to a deal
non-close
boundary
segment.
Figure 5.2 Fig.
Using
Fourier
descriptor
with
a non-closed
boundary
segment.
However, if we truncate the high frequency of the frequency domain of a non-closed
boundary, the reconstructed boundary will be a closed boundary.
Now we describe it as followed:
(x0, y0)
s1(k)
(x0, y0)
s1(k)
s2(k)
s2(k)
s3(k)
s3(k)
(xK1, yK1)
(xK1, yK1)
Boundary
Boundary
segment
segment
Fig.
. .
Fig.
Linearly
Linearly
shift
shift
Add a new
Add a new
segment
segment
The
steps
toto
solve
the
non-closed
The
steps
solve
the
non-closedproblem.
problem.
Figure 5.3 (a) Step1 、 (b) step2 and (c) step3 .
Step 1: We set the coordinate of the start point as ( x 0 , y 0 ) , and the end point
39
as ( x K 1 , y K 1 ) . See figure 5.3 (a)
Step 2: We shift the boundary points linearly according to the distance on the curve
between the two end points. If (xk, yk) is a point of the boundary segment s1(k), for k =
0, 1, ..., K1, it will be shifted to (xk', yk'), see (b), s 2 (k )
where
xk '  xk  x0  ( xK 1  x0 )  k / ( K  1)
(5.5)
(5.6)
yk '  yk  y0  ( yK 1  y0 )  k / ( K  1)
Step 3: We add a boundary segment which is odd-symmetric to the original one. Then
the new boundary segment is closed and perfectly continuous along the curve
between the two end points. See (C) The new boundary segment is
s3 (k )  s2 (k )  s2 (k ),
(5.7)
for k   K  1,  K  2,..., 0,1,..., K  1
Step 4: Compute the Fourier descriptor to the new boundary segment s3(k). That is,
1 K 1
s (k ) e  j 2 uk / K ,

K k  K 1
for u  0,1,..., 2 K  2
a (u ) 
(5.8)
if the signal s(k) is odd-symmetric, its DFT a(u) is also odd-symmetric.
DFT
s(k )  s(k ) 
 a(u )  a(u)
(5.9)
Because the central point of s3(k) is the origin, the DC-term (the first coefficient of
DFT) is zero. We only need to record the second to the K-th coefficient of the Fourier
descriptors, as illustrated in Fig. 4.6.
odd symmetry
|a(u)|
s3
odd symmetry
|a(u)|
s3
useless
DFT
0
DFT
0
useless
K
K
DC-term is zero
u
u
2K2
2K2
DC-term is zero
Figure 5.4 Fourier descriptor of
s3
After doing all the steps, we can start to take the Fourier description of the processed
boundary and the problem we mention will not exist anymore.
5.2.3. Approach 3: Boundary compression
This is talk about the process in approach 2 which can use on boundary compression.
We can only reserve the P1 coefficients and truncate the other coefficients. We
40
recover the whole coefficient by stuffing zeroes and then copy to the odd symmetry
part
only reserve P1
coefficients
truncate
|a(u)|
|a(u)|
odd symmetry
u
u
0 P1
0 P1
K1
2K2
Figure 5.5 The reserve P1 coefficients
|a(u)|
only reserve P1
coefficients
truncate
|a(u)|
odd symmetry
u
u
0 P1
0 P1
K1
2K2
Figure 5.6 Recover the whole coefficients from fig. 5.5
5.2.4. Approach 4: boundary encoding
In the boundary segments encoding, we have four data to record:
Segment number
Difference
of each boundary
Huffman encoding
Coordinates
Difference
of each corner
Point number
Corners &
boundary
segments
Figure5.5
of each segment
&
&
Huffman encoding
Corner distance

+

Difference
Bit stream
&
Huffman encoding
Coefficients
Truncate and
Zero-run length &
of each segment
quantization
Huffman encoding
Figure 5.7 Boundary Segment Encoding
In the third data, the point number of each segment is the distance of two end points
of each boundary segment. The distance we used here is the sum of the two distances
of the x-axis and y-axis.
41
Figure 5.8 Point numbers of boundary and distance of two end point.
As we can see in Fig. 5.8, we have vector n of the point number of each boundary
segment. Similarly, dx and dy are vectors that record the distances of the x-axis and
y-axis respectively. Therefore, we can get the difference vector d where
d  n  (dx  dy )
(5.10)
The value difference vector d is close to zeroes and is appropriate to encode with
Huff-man encoding. In the decoder as shown in Fig. 4.12, we can recover n that
n  d  (dx  dy )
(5.11)
In the fourth data, we combine the coefficients of each boundary segment in a whole
boundary and encode them with zero-run length and Huffman coding. When the
boundary segment is a straight line, its coefficients of Fourier descriptor will be all
zeroes. Therefore, it is appropriate to use the zero-run length coding when many
boundary segments are straight lines.
Because we have recoded the point number of each boundary segment, we can
calculate the reserved coefficient number and the split the combined coefficient array
correctly. And then we can recover the original coefficients by stuffing zeroes to the
truncated position.
(a ) Origina l Bounda ry
(b) Recovered bounda ry
with R = 10% a nd
coefficient number is
grea ter tha n 3
Figure 5.9 Result of improved Boundary Compression
42
However, if we use the modified Fourier descriptor method that has split the boundary
at the corner point to several boundary segments, the sharp corner can be reserve
when R is less. We can also see that when R = 10%, the result of the original Fourier
descriptor method is obviously distortion. However, in the modified Fourier
descriptor method, the characteristic of corners can be reserved and some longer
boundary segments do not be distortion obviously.
We have to notice that the shorter boundary segments are stretched from a curve due
to the reserve coefficients are less than one. Therefore, we make the reserve
coefficient number is greater than three, where the three coefficients can represent the
most characteristic in our experiment. The improved result is shown in Fig 4.13.
43
6. Reference
1. R. C. Gonzalez, R. E. Woods, Digital Image Processing third edition, Prentice
Hall, 2010.
2. L.G. Roberts, “Machine Perception of three-Dimensional Solid,” in Optical
and Electro-Optical Information Processing, Tippet, J.T (ed.), MIT Press,
Cambridge, Mass, 1965.
44
Download