Segmentation via Genetic Programming

advertisement
Segmentation
via
Genetic Programming
a final project by Yonatan Shichel
Aug 2005
Introduction
2
Segmentation
The problem
The Dataset
2
2
2
Genetic Programming
Individuals
Fitness
Course of evolution
4
4
6
6
Segmentation via Genetic Programming
Function set
Terminal set
Fitness measure
Threshold
Image test set
Miscellaneous Evolutionary parameters
7
7
8
8
9
10
10
Results
A typical Evolutionary run
Best Individual
12
12
13
Summary & Discussion
Conclusions
Future Work
17
17
17
References
18
1
Introduction
In this project, Genetic Programming (GP) will be used to create segment maps of
given images. The result will be tested and compared to existing segmentation
methods.
Segmentation
Segmentation is a key problem in computational vision. Splitting an image into
segments can be an extremely hard problem, as some ‘real-world’ properties are
inevitably lost when transferred into 2D canvas.
To achieve good segmentation, each available image property should be used: image
colors, textures, patterns, edge and contour maps etc. But still, this problem is
conceptually ill-defined; a given image might be segmented differently by different
people, and there is no way to determine which of the segmentations is more ‘correct’
than the other.
The problem
The ‘traditional’ segmentation problem can be defined as the function:
seg x, y  : R  R  segment1 , segment 2 ,, segment n 
which gives each image pixel a ‘tag’, telling to which segment this pixel belongs. In
this project, however, I will use a slightly different function:
seg x, y  : R  R  true, false
which decides whether a given image pixel resides on a borderline between two
segments or not.
To simplify the operation and reduce the required system resources, I have used only
grayscale images, in which each pixel is a real number in the range [0,1], describing
the intensity of the pixel (0 represents a completely black pixel, and 1 represents a
completely white pixel).
The Dataset
In this project I have used the Berkley Segmentation Dataset [1], a public library of
300 images that have been segmented by 30 human subjects. The purpose of this
library is to provide a common basis to segmentation related researches, as well as
being a benchmark for segmentation algorithms.
The dataset contains a train set of 200 images and a test set of 100 images. Each of
the images in this dataset had been segmented manually by several human subjects, in
both color and grayscale versions.
2
Figure 1: Image #37073 and five different segmentation interpretations
3
Genetic Programming
Genetic Programming (GP) [2] is a bio-inspired Artificial Intelligence (AI) method,
which is a relatively new approach to the field of Evolutionary Algorithms [3],
sometimes called also Genetic Algorithms (GA).
GA (and also GP) is inspired by Darwin’s four evolutionary principles [4]: In brief, a
species lives in a given environment (the competition principle). Each of the
individuals has unique, congenital attributes that are that are being passed to its
progenies (variation principle). Due to limited environmental resources, more
individuals are born than can live to reproduce (overproduction principle), leading to
a struggle for existence (survival of the fittest principle). According to these
principles, the individuals whose attributes fit the environment will be able to
reproduce; their offspring will reflect some of these attributes, hence will be able to
fit to the environment. On the long run, the species population will consist of
individuals that are fitted to the environment.
The computer model of GA/GP is similar: Each individual is a candidate solution to
the problem we wish to solve. The population is a list of individuals, and limited
resources are simulated by assigning a fixed size to the population. We can assign a
fitness measure to each individual, according to the quality of its solution to the
problem. Individuals with high fitness measures will be chosen to reproduce.
In GA, each individual is a solution; in GP, each individual is a computer program (or
function), that can operate on any input. That way, GP is far more flexible than
traditional GA, but the space of possible individuals is enormous.
Individuals
GP individuals are computer functions. The most common representation is a LISPlike program, which can be easily transformed into a tree structure. The tree contains
functions (nodes) and terminals (leaves) which are the building blocks of the
program.
For example, using the function set F = {+,-,*} and the terminal set T = {x,1} we can
describe the functions:
x 2  * x x 
2x  1   *  11 x 1
The tree representations of these functions are shown in Figure 2.
The tree structure enables the reproduction of two individuals by simply exchanging
subtrees between the two parents. This operation, called crossover, results in two new
functions that partially resemble the original ones; an example of crossover is shown
in Figure 3.
Another common evolutionary operator is the mutation operator, which is done by
choosing a random node in the individual tree, discard the subtree rooted at that node
and create a new subtree. Again, this action results in an individual which resembles
4
the original, but have some new features. An example of the mutation operator is
shown in Figure 4.
*
x
x
*
+
1
1
x
1
Figure 2: Tree representations of the functions listed above
*
-
+
1
x
*
1
x
1
x
Figure 3: The original functions after crossover. The subtrees (+ 1 1) and x were exchanged,
producing the functions 2x and x2-1
*
+
1
1
*
x
*
1
1
1
x
x
1
Figure 4: 2x-1 after mutation. The (+ 1 1) subtree was replaced by the new subtree (* (- 1 1) x);
the new function is (- (* (* (- 1 1) x) x) 1), or simply -1.
5
Fitness
To determine the fitness measure of an individual, we should operate its function on
some inputs and check the quality of the results. This action is called fitness
assignment. As in real evolution, telling which individual is the fittest is not always
an easy task. I’ll discuss the fitness measures used in the segmentation problem later.
Course of evolution
To apply GP on a given problem, we should first define the function and terminal sets
- which will be used to build the evolved functions. Then we should define the fitness
measure - a function that calculates the fitness, or quality, of each individual in the
population.
The evolutionary process:
I. Generate the initial generation G0.
This is usually done by building NPOP random trees using the function and
terminal sets. There are several methods to grow the trees, as described by
Koza [2].
II. Evaluate the fitness measure of each individual in the current generation,
using the fitness function described above.
III. Create the next generation GN+1:
i. Choose two individuals by random, but in accordance to their
fitness measure. This can be achieved, for example, by
performing a mini-tournament between few random-chosen
individuals, taking the two with the highest fitness measure.
ii. With the probability PC, apply the crossover operator on the
two individuals, resulting in two child individuals. Pass these
individuals to the next generation GN+1. Otherwise, pass the
two original individuals to the next generation GN+1.
iii. With the probability PM, apply the mutation operator on the
selected individuals.
iv. Repeat until GN+1 is full.
IV. Go back to step II.
The evolution is stopped when a fairly good individual is found, after a predefined
number of generations or when the individuals fail to show further evolvement over
few generations.
6
Segmentation via Genetic Programming
I have applied GP on some images from the Berkley Segmentation Dataset. Looking
at several algorithms, I have noticed that most of them take the original image,
convolve it with one or more kernels (possibly after some noise reduction) and use a
threshold function to determine which of the pixels reside on a segmentation
boundary.
I have decided to design the individuals as functions that take the entire image (i.e.
2D matrix of real values) and return a new matrix, representing the segmentation
boundaries. Each pixel in the resulting matrix should be 0 or 1, for non-boundary or
boundary pixel respectively.
For this purpose, I have supplied the function and terminal sets with matrix-oriented
operators, as listed in the next subsection. These functions and terminals can operate
on numbers (scalars) and matrices, just like MATLAB functions. To ensure that each
individual represents a legal program, I have divided to types of tree nodes into three
categories: number, matrix and kernel. This approach is called Strongly-Typed
Genetic Programming (STGP) [5], and is commonly used.
Even though a kernel is actually a small matrix, they differ in size. For example, the
operator + cannot be applied on a matrix and a kernel. The operator conv, on the other
hand, must get a matrix and a kernel to perform correctly. Some operators can operate
on more than of input; for example: + can be applied on two number, one number and
one matrix, or two matrices.
Function set
format of the function declaration column: <return type> <function symbol> (<argument types>*)
matrix
matrix
kernel
kernel
number
+
+
+
+
+
(matrix,matrix)
(matrix,number)
(kernel,kernel)
(kernel,number)
(number,number)
Adds the two arguments.
-
Subtracts the two arguments (same as +)
*
Multiplies the two arguments (same as +)
number % (number,number)
A ‘safe divide’ operator. This one operates as a
simply divide operator, but avoids dividing by zero.
matrix neg (matrix)
kernel neg (kernel)
Returns the negative value of the given argument
matrix conv (matrix,kernel)
Returns the convolution result of the given matrix with
the given kernel
matrix opp (matrix)
kernel opp (kernel)
Returns 1 divided by the given argument. Special
care was taken to avoid division by zero; function
operates on each matrix cell individually.
matrix sqrt (matrix)
kernel sqrt (kernel)
Returns the square root of the matrix; function
operates on each matrix cell individually.
7
Terminal set
Terminals provide basic information for the evolved individual. I have included some
constant and random numbers, several kernels (like Prewitt’s gradx and grady) and
random kernels, and - of course - the original image itself.
matrix image
The original image
number 0
number 1
number const
0 & 1 are predefined constants. ‘const’ is an
Ephemeral Random Constant (ERC), as described by
Koza [2]. During the creation of the individual, its value
is randomly assigned and cannot be modified through
the course of evolution (unless mutated!)
kernel gradx
kernel grady
kernel const
Prewitt gradient kernels: [-1 0 1] and [-1 0 1]’. ‘const’ is
a kernel version of ERC - it is assigned with random
values when the individual is created, and cannot by
modified during the course of evolution, unless
mutated.
For example, we could write the gradient magnitude function using this program:
(sqrt (+ (* (conv image gradx) (conv image gradx)) (* (conv image grady)
(conv image grady))))
Fitness measure
Finding the right fitness measure is extremely important for GP applications: it sets
the course of evolution to find the fittest individuals!
In the segmentation problem, the fitness function should receive an individual,
operate its program on an input image and compare the result to a given human-made
segmentation map. As discussed earlier, segmentation is an ill-defined problem, so
different persons might draw different segmentation maps. To overcome this obstacle,
I have used the union of the human-made segmentation maps - so a pixel is treated as
‘true’ boundary pixel if one or more persons included it in their maps.
I have tried several fitness measures, which will be listed below. This task was not an
easy one, and was done with much trial-and-error. There are, however, some
measures that were found to be significantly better than the others.
Berkley’s F-measure
Given a segmentation map and a computer-generated map, this measure
calculates the precision and recall measures of the algorithm.
Precision is the probability that a machine-generated boundary pixel is a true
boundary pixel, i.e. the fraction of ‘true’ pixels that were reported by the
algorithm. Low precision measure indicates that the algorithm is ‘noisy’, and
tends to include many false pixels.
Recall is the probability that a true boundary pixel is detected, i.e. the fraction
of reported pixels that were included in the human-made segmentation map.
Low recall measure indicates that the algorithm does not fully detect the
human-drawn segmentation maps.
8
To describe the performance in one parameter, F-measure is introduced; this is
simply the harmonic mean between precision and recall.
I have tried this measure for a while, but the figures weren’t nice. Individuals of the
early generations don’t perform very well, but get rather high F-measure: an
individual that reports all pixels as boundaries (or non-boundaries) will get F-measure
value larger than 0.5. This might contradict the variety principle: many individuals
get this (relatively high) measure on the first generations, reproduce and eliminate
other individuals which might have contributed some important subprograms to the
population on later generations.
Accuracy measure
This measure is the simplest, yet very efficient. It is simply the number of
pixels common to the generated result and the human-made segmentation
map, divided by the number of pixels in the union of the two matrices:
acc 
result  segmap
result  segmap
This measure reflects the accuracy of the algorithm by rewarding the
discovery of ‘true’ segmentation boundaries and giving penalties on both
‘false’ pixels reported by the algorithm and ‘true’ pixels that weren’t reported
(i.e. both ‘false positives’ and ‘true negatives’).
Since many individuals from the first generations fail to find even one ‘true’
pixel, they receive a fitness measure of zero; hence the first generation’s
fitness values are less variant. To overcome this phenomenon, I have slightly
modified the accuracy measure:
acc 
1  result  segmap
1  result  segmap
Now, two individuals that haven’t found any boundary pixels may have
different fitness values - the one that reported less false pixels will be better.
Threshold
The evolved programs produce a ‘soft’ segmentation map, in which each matrix cell
contains a real number instead of a Boolean value. Of course, a threshold generating
function was needed. My assumption is that the best threshold value will be chosen
by the user (after the evolved function have been extracted and deployed), but
threshold values were needed on the fitness evaluation phase. Since the evolved
functions should be able to operate on any given image, I have avoided encoding the
threshold value in the genome. Instead, I have tried two automatic threshold methods:
The first method was used by Berkley: Berkley’s benchmark simply divides threshold
range linearly using 30 threshold values, calculating the fitness value (at Berkley it’s
the F-measure) for each of them. I found this method a bit crude, since at many
occasions the threshold range was not linear, causing the entire image to ‘slip’
between two threshold values.
9
The second (and preferred one) have divided the threshold range into uneven parts, so
each threshold step will ‘reveal’ an equal additional number of segmentation points,
calculated proportionally to the number of boundary points in the human-made
segmentation map (for example: assume that the human-made map included N
boundary points. The function will produce threshold values that will report 0.5N,
0.6N, ..., 1.9N, 2.0N points as boundary points. Lower & upper bounds can be
modified, as well as the step. Usually I have used 0.5N - 3.0N with 10 equal steps).
Like Berkley’s method, this method calculates the fitness for each of the threshold
values and picks the best. This is a somewhat slower method, but much more
accurate.
Image test set
The evolutionary process is extremely time and resource consuming. Each of the
individual programs should be applied on the original image and the results should be
evaluated; this might take up to several minutes per generation, depends on the
evolutionary parameters (see next section: Runtime).
To speed up the process, I have decided to drastically reduce the number of images
used for training. Only 3-5 images were chosen from the 200-image train set.
The resulting algorithms (best individual that have evolved during the evolutionary
process) were tested on ‘unseen’ images from Berkley’s test set.
Miscellaneous Evolutionary parameters
Population size
I have used populations of 150 individuals.
Generation count
I did not set a limit for the generation count; I have stopped the process when the
average fitness have reached peak value, and did not seem to be able to improve any
further.
Reproduction and Mutation
As described in the GP section. Crossover probability of 90% and mutation
probability of 10%.
Selection
I have used a tournament selection of k=3. To choose a single individual, the system
chooses k individuals randomly, and takes the one with the highest fitness value.
Tree Depth
Tree operations might result in extremely deep trees. To avoid this ‘bloating’ [6]
phenomenon, I have limited the depth of the trees to a value in the range 6-9.
10
Runtime
On a standard PC, Evaluation of a single individual with a single image dataset might
take up to 1 second. Multiply this by the population size and dataset size to get the
time needed for one generation. A successful GP run normally takes at least several
dozens of generations. Given the evolutionary parameters listed above, any process
took at least two hours, with the exception of several 10-hour runs.
Software
To run the GP evolutionary sessions, I have used Sean Luke’s ECJ 13 - a Java based
evolutionary computation and genetic programming research system [7]. This system
contains most of the infrastructure needed to run GP sessions, and had enabled me to
focus on research rather then on programming (though much programming and
debugging were still needed!). It also provides documentation, backup and logging
services, which are extremely helpful when dealing with such large populations and
long running times.
11
Results
After running several evolutionary processes, I have selected the best individuals.
This was done simply by picking the individuals with best fitness values from each
run. This section will discuss and examine these individuals.
A typical Evolutionary run
Figure 5 shows the progress of a short evolutionary run. These results are typical to
many evolutionary runs; we will discuss them now.
Figure 5: Accuracy fitness vs. Generation count; fitness values are shown for the best individual
as well as the average population fitness.
The first generation (G0) was created by random, so the individual accuracies are low:
0.176 for the best individual, 0.08 population average.
The following generations show improvement in both best and average values. As the
average values steadily increase, the best value sometimes ‘leaps’, as in G3 and G6.
This is due to the emergence of a new individual, created either by a crossover of two
individuals or by a mutation of an existing individual, which is much better than its
ancestors.
It is noticeable that fitness values might drop during the course of evolution; for
example, best individual of G2 is slightly worse than the best individual of G1. This is
due to the nature of the genetic operators - it is not always granted that the offspring
will be better than its ancestors.
By the end of this run, accuracy values seem to converge. On this phase, the
population variance is usually low, so only mutation may produce new individuals.
12
Best Individual
The best individual, with accuracy value of 0.262, was taken from G93. We’ll
examine its performance in the following section.
Segmentation Function
The segmentation function was extracted from the best individual:
(- (- (conv (* (conv image gradx) (conv image
gradx)) (- (- (kernel 5.381114 -8.362269 8.888325 1.1866289
-6.4069843 -8.251046 -9.389916 6.183886 -7.817788)
grady) (- (kernel -2.334486 -4.6182337 -9.115009 8.010966 3.0507333 3.22619 2.068446 -2.932576 -6.243905)
0.0))) (conv (* (conv image gradx) (conv
image gradx)) (- (- (kernel 2.4412537 -8.362269 8.888325
1.1866289 -6.4069843 -8.251046 -9.389916 6.183886 -7.817788)
grady) (- (kernel 9.936699 -4.6182337 -9.115009 8.010966 3.0507333 3.22619 2.068446 -2.932576 -6.243905)
0.0)))) (- (- (- (conv image grady) (* 1.0
9.336973)) (% (* 1.0 9.336973) 9.336973))
(% (% (* -3.9138038 0.0) (* 0.0 0.0)) (%
(* 1.0 9.336973) (* 1.0 9.336973)))))
As in most cases of GP, one may find it difficult to analyze the code of the function.
But looking deeper into the function, we can see a few interesting features:

This code uses both ‘gradx’ and ‘grady’ predefined kernels. In fact, a gradient
magnitude function for the x-axis has emerged during the evolution:
(* (conv image gradx) (conv image gradx)).

Like in human DNA, there is a lot of ‘junk genome’ in this function. For
example, (% (* 1.0 9.336973) (* 1.0 9.336973)) is simply 1.0; this is a
common phenomenon is GP, but it contributes operation to the genetic
operators - if one of the redundant subtrees will mutate, the entire individual
may function differently.
13
Here are some examples of the output of the evolved function. Accuracy values are
listed below, along with the accuracies of the ‘standard’ gradient magnitude
algorithm.
Accuracy = 0.307 (GM accuracy = 0.280)
14
Accuracy = 0.262 (GM accuracy = 0.245)
Accuracy = 0.126 (GM accuracy = 0.119)
15
Accuracy = 0.172 (GM accuracy = 0.193)
16
Summary & Discussion
Conclusions
In this work I have tried to use Genetic Programming to evolve segmentation
functions. Results show that it is possible to evolve segmentation functions using GP,
and that some good results can be achieved: in most cases, the best evolved function
‘beats’ the standard gradient-magnitude algorithm. Yet, gradient magnitude is not the
best segmentation algorithm available, so we cannot rely on it as a benchmark.
Usually, when applying GP to a problem, it is possible to drastically improve the
results using greater power. This problem is no different, and I believe that using
more resources could have produced better results.
Fitness evaluation uses much CPU resources, thus faster computers (or multi-CPU
architecture) will allow enlarging the population size.
Individuals’ tree structure tends to use exponential memory as a function of tree
depth; more RAM will allow using deeper (hence more complex) function trees.
Koza [2], for example, uses populations of ~10K individuals on his 1000-CPU
machine for GP evolutionary sessions.
Future Work

Use more CPU and RAM resources, hopefully in order to include all the train
set within the fitness calculation function.

Add some useful building blocks to the terminal set: more image-specific
kernels (like Gaussian smoothing kernel for noise reduction etc.)

Use ADFs (Auto Defined Functions) [8], which are subroutines that separately
evolved and may be used and reused by the evolved individuals.

Evolve the threshold function instead of using the same one for all images.
The threshold function may be integrated into the genome or evolved by coevolution [3], in which species are being evolved separately and evaluated
together.

Create an evolving ‘kernel library’, which will be evolved separately and may
be used by all evolved individuals.

Include more inputs, like texture maps or other edge detector outputs.

Use color images, which contain much more information.
17
References
[1]
The Berkley Segmentation Dataset and Benchmark
http://www.cs.berkeley.edu/projects/vision/grouping/segbench/
[2]
Koza, J. R.: Genetic Programming: On the programming of computers by
natural selection. MIT press, Cambridge, Mass. (1992)
[3]
Tomassini M.: Evolutionary Algorithms. Swiss Scientific Computing Center,
Manno.
[4]
Darwin, Charles: On the origin of species by means of natural selection.
London, John Murray, 1859
[5]
Montana, D.J.: Strongly typed genetic programming. Evolutionary
Computation 3 (1995) 199–230
[6]
Langdon, W.B.: Size fair and homologous tree genetic programming
crossovers.
Genetic Programming and Evolvable Machines 1 (2000) 95–119.
[7]
Luke S.: ECJ 13 - a Java based Evolutionary Computation and Genetic
Programming research system
http://cs.gmu.edu/~eclab/projects/ecj/
[8]
Koza, J. R.: Genetic Programming II: Automatic Discovery of Reusable
Programs. MIT press, Cambridge, Mass. (1994)
18
Download