2. IR-Model

advertisement
Information Retrieval
Image Retrieval: Project paper
The University of Joensuu
Department of Computer Science
20.11.2003
Ioan Cleju 157743 icleju@cs.joensuu.fi
Tuomas Hakala 149162 thakala@cs.joensuu.fi
Suvi Korpela 153927 skorpela@cs.joensuu.fi
Andrei Mihaila 157752 amihaila@cs.joensuu.fi
Risto Sarapik 142821 rsara@cs.joensuu.fi
1. Introduction
2. IR-Model
3. Technical description
3.1. K-means Clustering Algorithm
3.2. Self Organizing Maps
3.3. GUI
4. Evaluation
4.1. Test plan
4.2. The results of the tests
4.3. Conclusions
5. Description of other systems in the field
5.1. Comparison to other systems
6. Future development
7. References
1. Introduction
Among the increasing stream of published data
is a growing amount of visual information.
Digital image and video libraries are becoming
more common and widely used. There exists a
huge number of visual data sources, among
them various digital image collections and
databases for all kind of purposes. The fast
development of computing hardware has
enabled to use visual information as a
irremovable part of normal everyday computing.
In order to utilize the visual information stored
in databases we need an effective image
retrieval method. It can be based on textual
keywords which are connected to stored images
or it can be based on the measurable visual
properties of the images such like colors,
shapes, textures and spatial relations.
Traditional text-based image retrieval systems
have some obvious limitations. The connecting
of keywords to the every image in the whole
database has to be done manually and it is quite
laborious. The other problem is the contents of
the images; how to find the correct, allembracing and unambiguous keywords.
Content-based image retrieval is another way to
handle this problem. It is based on automatically
extracted features from the contents of the
image. The problems by this method are due to
different way to read the images of human being
and a computer.
The method we have studied is the latter one,
content-based image retrieval.
By analyzing the key features of the pattern
image we get a group of values which are
typical of this image and which can be
compared to corresponding features of the
images in database. The query result is a set of
images similar to the query image. How similar
the images look like the original pattern depends
on the methods which are used by analyzing the
pattern image and the features which are
compared.
2. IR-Model
Information retrieval model is a model which
presents the idea of how the system retrieves
information to its user [1]. In another words we
can say that the retrieval algorithm that the
system uses is build upon the model used.
Generally, different information retrieval
models can be used depending on the system’s
purpose and the user needs.
Our system should be able to group and rank
images. Actually the system extracts key vectors
from images and uses them to do the abovementioned tasks. Corresponding to these tasks
we used two different models.
In the phase of ordering the vectors and
measuring the distances between them we used
generalized vector space model [3] . In 1985,
Wong, Ziarko and Wong proposed an
interpretation in which the index term vectors
are assumed linearly independent but are not
pair wise orthogonal. Such interpretation leads
to the generalized vector space model. In our
system the generalized vector space model
appears in the structure of the vector. The vector
is composed of set of elements which includes
the values for the amount of pixels in a image
containing the specified color and the color is
represented in RGB so that the vector can be
considered as multi dimensional structure for
each element.
In phase of grouping similar vectors we used
neural network model. Since neural networks
are known to be good pattern matchers, it is
natural to consider their usage as an alternative
model for information retrieval. It is now well
established that our brain is composed of
billions of neurons, working in parallel for the
same goal. Each neuron can be viewed as a
basic processing unit, which when it is
simulated by input signals might emit output
signals as a reactive action. It has been
established that similar inputs cause neuron that
are in the same area to react. This is the basics
observations that lead to development of selforganizing-maps as an instrument of neural
networks[6].
3. Technical description
The system has to implement more tasks in
order to become a standalone application from
which we can retrieve images.
The first one is the feature extraction. Given a
image, the system has to extract some feature
that will use for comparison with other images
or for indexing. This version of the product will
be focused only on color information image. We
used color quantization as a technique to reduce
the number of colors. The final number of
colors depends on the type of the images from
the database, desired precision of the process
and run time. For a retrieving process, both the
query image and all the other images have to be
reduced to the same number of colors. Of
course, feature extraction is performed online
just for the query image. After the number of
colors is reduced, the extracted vector contains
information about the number of colors and the
actual colors.
Another task is to construct the database. All
images are passed throw feature extractor and in
the database we store the path to the image and
feature vectors.
After database is constructed, it has to be re
indexed. For indexing we use a self-organizingmap. This structure should organize our data in
a two-dimensional space, with the observation
that similar images will be in the same region.
To achieve this, the SOM needs to be trained
first. It takes all vectors obtained for the images
and adjusts its parameters.
To train, a SOM needs to know how to compute
distance between vectors. Usually euclidean
distance is proposed. However, because we
know the actual meaning of each element in the
vector, we can develop our own distance. That
is what we did. Here is the way we compute the
distances between two vectors [10]. By a greedy
algorithm, each color from one image is
associated with one color from other image, so
that the sum of euclidean distances between
colors is minimum. After the association, the
actual distance is computed as the sum of
distances between each corresponding color,
and the distance between colors is computed as
a sum of three terms. The first one is the
absolute difference of percents of the colors in
images. The second one is the euclidean
distance between colors, which is normalized
(taken in range between 0 and 1). The last term
is the product of the other two terms.
With the SOM constructed, we can begin the
actual indexing. All vectors from images are
showed to the map. It returns the two
coordinates, which represent the location of the
vector in SOMs space. This coordinates are
registered in the database.
All the tasks described before are done offline.
This is offline approach [2].
The online task – actual retrieving is like this.
The user selects a query image. The image is
processed and features extracted. Features are
shown to SOM, which will return their location.
A collection of images that are in the
neighborhood (larger or smaller, depending to
user preferences) are retrieved from the
database. Then all these images will be
compared to query image, will be sorted and
shown in the GUI.
3.1.
K-means Clustering Algorithm
There are many approaches when trying to solve
the problem of clustering: algorithms based on
splitting or merging, neural networks,
randomized approaches. From the approaches
that try to minimize an objective function, the
most used is Generalized Lloyd’s Algorithm,
also known as K-means Clustering Algorithm.
Suppose there is a d-dimension space, a set of n
data points in this space, and an integer k, the
algorithm is supposed to find a set of k centers
so that to minimize the total squared distance
from each data point to its nearest center.
Our approach is a refined version of K-means
Clustering Algorithm [4] which has the
advantage of being computationally very
feasible. It uses a iterative method to compute
the set of centers.
We will make the convention: by the
neighborhood of a center z we denote all the
points in the data set for which z is the nearest
center. As an observation, the optimal
placement of the center - given a set of points,
which will minimize total squared distance from
data points to the center, is in the centroid of the
set.
Each step of the algorithm re computes the
neighborhood for the centers and then moves
the center to the centroid of its neighborhood.
The algorithm will converge finally to a local
minimum. How close this local minimum is to
the global minimum is influenced by the choice
of the original centers.
The implementation of the algorithm uses the
tree structure as a key element. The width of the
tree depends on the number of dimensions of the
space. The root of the tree corresponds to the
entire space, which is a hypercube. The space is
divided – each dimension is divided by two, and
all the new hypercubes obtained will correspond
to children nodes of the root. Each region is
again split, and correspondingly the tree will
become deeper. A hypercube will stop to split
when the number of data points in it becomes
less than a certain limit (granularity).
The algorithm is implemented for three
dimensional case, so we need to construct an
octal tree. As shown before, for the general
case, the root of the tree corresponds to the
entire space of data points, which is a cube.
Then the cube is divided into 8 cubes (each of
cube’s dimensions is divided by 2) and these
cubes correspond to all the children of the root.
The process is recursively repeated until the
number of points in cubes that corresponds to
leaves do not exceed the granularity. The
splitting of the histogram is done just once, at
the beginning of the algorithm.
After the tree construction, the actual algorithm
can start. First, k locations for the centers are
chosen
(randomly
or
depending
on
particularities of data set). Then the first
iteration may begin. Each iteration of the
algorithm receives a set of centers, computes the
neighborhood for each of them and then move
center locations to centroids of their
neighborhoods. Iteration is repeated until the
locations of centers stabilize.
Usually in K-means implementations, the most
computations are done when neighborhoods are
constructed, for each center. Here is the biggest
improvement in this algorithm. With the help of
the already constructed tree, relations between
each data point and a center is build easier. The
idea is to propagate the centers through the tree,
and each node to retain only the centers that are
candidates for the points inside its
corresponding cube. The algorithm is known by
the name filter algorithm. In the end, the leaves
will have only a small number of centers, or
even just one center. If a leaf receives more
centers, it will eliminate all but one. All the data
points corresponding to the leaf belong to the
neighborhood of the only center remained
unfiltered from the leaf. Because we can find all
data point (actually all space) in the reunion of
all leaves, each data point has a corresponding
center.
At this moment, for each neighborhood of each
center, the centroid is computed, and each
center moves to corresponding centroid. As new
locations for centers are generated, next
iteration can start.
The key point in this approach is the
propagation of the centers – filtering algorithm.
There are 2 different cases, whether the node is
a leaf or not.
If the node is internal, the algorithm should
select from the centers, only those ones which
might be nearest to some points contained inside
node’s corresponding cube. However, a cube
(especially if its corresponding node is closer to
root) could contain large number of data points.
The filter algorithm tries to eliminate those
centers which are further from any part of the
cube than other centers. To achieve this,
filtering works like this: first get the center that
is nearest to the geometric center of the cube, let
it be center z. Then take each of all other centers
– z’, and check whether any part of the cube is
closer to z’ than to z. This can be done
considering the bisecting plane for the 2 points,
z-z’. If this plane intersects the cube, it means
that some points are closer to z’ than to z, and
others closer to z than to z’. If no part of the
cube is closer to z’ than to z, z’ is eliminated
from the candidates list. The new list of centers
is passed to children, and filter is applied for
children and so on.
If the node is a leaf, the weighted distances from
each center to all points are computed and only
the center with minimum distance will be
considered. So after filtering centers received by
a leaf, only one center will remain.
The iterative algorithm continues until the
centers will remain the same location for two
consecutive iterations.
As shown before, the way the initial centers are
generated is very important for the final result of
the algorithm and also for the algorithm’s run
time.
Because of the tree structure which is
constructed for the algorithm, we propose a way
to generate the initial centers. For this we need
an additional information: each node in the tree
should keep record of the number of data points
it contains.
The tree receives the number of centers it has to
generate and passes this number to root. All
root’s children will receive the number of
centers to generate, proportional to the number
of points that belong to their space. The number
of points to be generated by an internal node
should be equal to the total number of centers to
be generated by its children. This process is
recursive. Finally, leaves will generate the
centers in their corresponding spaces. Centers
can be generated random or in a certain way (for
some applications the process must not contain
any stochastic part). This way we assure that the
centers are generated in regions with large
density of data points.
3.2.
Self Organizing Maps
Self Organizing Map is a technique introduced
by Professor Teuvo Kohonen which reduces the
dimensions of data vectors to a little number of
dimensions [5]. Not only that the SOM reduces
the dimensions of vectors but also it groups
together similar data. It converts the complex
relations between vectors into simple
geometrical (usual 2 dimensions) relations.
Most usual SOMs have 2 dimensions, and of
this type is also the one we are using. That’s
why I will speak about them mostly.
The SOM consists as a 2 dimensional grid of
nodes. It has an input (vector type), and an
output (a vector with two dimensions). The
input is connected to all the nodes of network.
Each node also has an associated vector model.
The size of the model and of input must be the
same.
When an input vector is showed to network, the
output should be the coordinates of the node that
has most similar model with input vector.
Training is also an iterative algorithm. The same
data set of input vectors is shown repeatedly to
the network. More iterations, better solution will
be find, but with a longer time cost.
As the iteration number increase, the
neighborhood and the ratio by which a model
change. We can consider a neighborhood
function that will show us the ratio by which a
node will be modified. Usually the
neighborhood function follows a gaussian
distribution with parameters depending on the
coordinates of the winner and iteration number.
So the ratio will decrease as the distance to the
winner grows and as the iteration number
increase.
When talking about gaussian function, 
(standard deviation) is the key parameter. It
should decrease over time[6]. A generally used
formula for sigma is
(n) = (0)*exp(-n/T1)
When considering SOMs, there are two phases.
The other parameter, the amplitude (step), also
decreases over time, and an usual formula
would be
 (n) =  (0)*exp(-n/T2)
The first phase is the training of the system.
During this phase, all the input vectors are
shown to the network. The inner models of the
nodes change so that finally to closer nodes will
correspond closer models.
As good initial parameters:
 (0) = radius of lattice
 (0) = 0.1
The training is based on competition. For each
data vector, one node will win – the node that
contains the most similar model. As a reward
for this, the winner will modify itself and its
neighbors with a certain ratio of the input
vector, so it will be more similar to this.
The data within the nodes are first initialized
either random, or as a regular array of vectorial
values (preferred especially if the SOM is
developed for a specific task and more details
about the distribution of vectors is known).
During the training, there are 2 important
phases:
- self organizing and ordering
- convergence
Self organizing phase should take about 1000
epochs, and convergence at least 500 epochs.
Considering these values, then the parameters
for training will be
T1 = 1000/log( (0))
T2 = 1000
The second phase is the interrogation of the
network. The SOM is already trained. If we will
pass it a vector, it should return the coordinates
of the winning node. Responses from similar
vectors should be in the same region.
The dimensions of the network, depends on the
application.
3.3.
GUI
The application offers full functionality
graphical user interface. The same interface
provides:
- an interface for administrating system. There is
an option to create a new image database by
selecting the folder which contains the images,
extracts features from them and the database is
populated. The neural network is trained with all
the data from the new created database. The
images are re-indexed corresponding to their
response from the SOM.
- an interface for users to query images.
Working database can be chosen. User can
select image to query by accessing File - Query
image option from the main menu, when an
open file dialog appears. The user can query for
jpeg of gif file format images which may not
exist in the database. The result of query is
displayed by showing the images ordered by
relevance, in a thumbnail view mode. For each
image, the distance from to the query image is
also shown. On the status bar it is displayed the
number of similar images found, the cluster of
the query image and cluster of query result
image that is focused. User can select the image
database which he wants to query from Settings
– File Menu.
Example screenshot of system’s graphical user
interface
4. Evaluation
Basically the purpose of evaluating the retrieval
system is to make sure that the system, in our
case the image retrieval system, works and to
analyze its functionality and resources needed to
perform the task the system is supposed to do,
and evaluating the system how well it performs
the retrieval task in order to make conclusions
from the given answer set, and how userfriendly the system is.
The type of evaluation to be considered depends
on the objectives of retrieval system [3]. Usually
the first type of analysis that should be
considered is a functional analysis, in which the
retrieval system’s functionality is tested one by
one. An analysis like functional analysis should
include an error-analysis phase in which the
misbehaviors/errors of the system are
recognized by trying to make the system fail. In
our case we did the error-analysis using the
black-box testing method meaning that we
didn’t spend much time searching for errors in
the source code, instead we were more focused
on the results the system generated, in another
words mainly focused on evaluating the
performance of the image retrieval system.
The most common measures of system
performance are time and space. The shorter the
response time, the smaller the space used, the
better the system is considered to be. There is an
inherent tradeoff between space complexity and
time complexity which frequently allows
trading one for the other. About the resources
the system uses: Space requirements for the
system include the collection of images and the
collection of vectors in which each vector
presents the properties of an image related to it.
The more colours are used in a process of
creating vectors to present image properties, the
more space is naturally used, because the vector
length grows meaning that the more information
of images colour are added to the vector. The
time resources of the system depends on the
hardware efficiency and number of vectors
(number of images), because the system has to
perform more comparisons between vectors.
The other way (which we mainly focus on this
chapter) is to evaluate the system’s performance
by analyzing the answer set of images |A| the
system generates for each user given query. For
each query we had a related set of relevant
images |R| (the ones the system returns if it’s
working in ideal way), which we assumed to be
similar for the query image given. There are two
main terms in this type of approach, which we
can use to prove how reliable the answers of the
system are. The first term recall is the fraction
of the relevant images from the set of relevant
images |Ra|, which has been retrieved. In
another words, recall means how many percents
of relevant images are being retrieved. →
Recall = |Ra|/|R|.
The second term precision is the fraction of the
retrieved images from the answer set which is
relevant. In another words precision shows how
well the system has managed to find relevant
images to a query image. → Precision =
|Ra|/|A|.
Example of precision and recall for a given
query
The average precision Pavg(r) at the recall level
r is counted from formula:
→ Pavg(r) = i=1toNq  (Pi(r)/Nq) , where Nq is
the number of queries, and Pi(r) is the precision
at recall level r for the i-th query.
4.1.
Test plan
Here is a short description of the test plan for
evaluating the performance of the system. We
have 2 image databases, one with national flags
images consisting of 251 images 48x32
resolution and the other one 120x118 with
general images: people, animals, flowers,
satellite images, cars, chemicals. For evaluation,
we used the first one and we made six image
queries per colours used to create vectors. We
created image vectors so that we reduced the
number of colours in pictures to 2, 3, 4, 5 and 6.
So, that we had a total amount of 30 queries, six
for every colour amount. We use the same test
images for each colour amount test, because in
that way we can see the differences more easily
and if the system performance improves by
increasing the colour amount.
4.2.
The results of the tests
Results of performance evaluation are shown
below in five tables:
Test
No
1
2
3
4
5
6
Test
No
1
2
3
4
5
6
R A Ra
7
9
14
6
10
20
10
25
59
43
61
61
Recall
Precision
1
5
13
5
8
10
14,29 % 10,00 %
55,56 % 20,00 %
92,86 % 22,03 %
83,33 % 11,63 %
80,00 % 13,11 %
50,00 % 16,39 %
62,67 % 15,53 %
Result table using 2-colour vectors
R A Ra
7
9
14
6
10
20
16
38
7
15
46
32
Recall
Precision
3
5
2
1
8
13
42,86 % 18,75 %
55,56 % 13,16 %
14,29 % 28,57 %
16,67 %
6,67 %
80,00 % 17,39 %
65,00 % 40,63 %
45,73 % 20,86 %
Result table using 3-colour vectors
Test
No
1
2
3
4
5
6
R A Ra
7
9
14
6
10
20
Recall
Precision
15
29
22
40
25
27
1 14,29 %
6,67 %
5 55,56 % 17,24 %
10 71,43 % 45,45 %
6 100,00 % 15,00 %
7 70,00 % 28,00 %
14 70,00 % 51,85 %
63,54 % 27,37 %
Result table using 6-colour vectors
From the tests above we have average results for
precision and recall
1 = 2 colors
2= 3 colors
3= 4 colors
4= 5 colors
5= 6 colors
80,00 %
70,00 %
60,00 %
50,00 %
Test
No
1
2
3
4
5
6
Test
No
1
2
3
4
5
6
R A Ra
Recall
Precision
40,00 %
30,00 %
7
9
14
6
10
20
18
38
42
26
45
45
4 57,14 % 22,22 %
6 66,67 % 15,79 %
5 35,71 % 11,90 %
6 100,00 % 23,08 %
9 90,00 % 20,00 %
19 95,00 % 42,22 %
74,09 % 22,54 %
Result table using 4-colour vectors
R A Ra
Recall
Precision
20,00 %
10,00 %
0,00 %
1
2
3
4
5
2
3
4
5
30,00 %
25,00 %
20,00 %
15,00 %
10,00 %
7
9
14
6
10
20
22
19
15
23
57
57
5 71,43 % 22,73 %
4 44,44 % 21,05 %
3 21,43 % 20,00 %
6 100,00 % 26,09 %
9 90,00 % 15,79 %
20 100,00 % 35,09 %
71,22 % 23,46 %
Result table using 5-colour vectors
5,00 %
0,00 %
1
The Graphs of average recall and precision,
where the tests are represented on the x-axis
The time system uses to perform the following
tasks is shown below:
(colors)
Network
Learning Time
(2000 epochs)
Performing the
query
4.3.
2
3m
0.4s
3
4
7 m 13 m
0.5s
0.5s
5
18 m
0.6s
6
25 m
0.6s
Conclusions
The system is performing best when the vectors
contain information of four colours, which can
be seen on the test results where the average
recall is counted for each group of tests. In these
test cases we used images of flags which were
quite simple and did not contain many colours.
One reason for using images as simple as flags
was that it was easier to specify the relevant
images related to each query image, so that we
could count the recall and the precision more
reliably.
If we had used images from the real world such
as photos, we would have noticed that the
system would work better when the colour
information in vectors is increased, because the
comparisons would be more precise between the
vectors.
As the time results show the network learning
time increases when the colour information in
vectors is increased. Here we have found the
dependency between the time and the space
against the precision of the system given answer
set.
Our tests show that the system is working quite
well with images that contain a relatively small
amount of colours. Outside of testing that is
documented on this chapter, the queries made
showed that the system works well also with
more complicated images too. In conclusion, we
are pleased with the system’s behavior.
5. Description of other
systems in the field
Many collections of digital images have been
created by digitizing existing collections of
photographs, diagrams, drawings, paintings,
prints, in different domains. Research in image
retrieval techniques has become more active
since 1970s as study field of database
management and computer vision communities.
Corresponding to each community, two ways to
retrieve images developed: text based and visual
based.
First approaches used some text to annotate the
images, and then retrieving became text based.
However there are two major limitations:
annotation images must be made by humans,
and the number of images can be huge, and
second comes from the rich content of an image
and the subjectivity of human perception.
In 1990s, content based image retrieval was
proposed as a way to overcome the difficulties
showed earlier. Instead of being manually
annotated, images would be indexed by visual
content information. Recently, MPEG (Moving
Picture Experts Group) developed MPEG7
standard, formally named ”Multimedia Content
Description Interface”, for describing the
multimedia content data [7].
There are two different tasks that every systems
should implement: one is feature extraction and
the other one is indexing of features vectors.
Due to perception subjectivity, there is not a
best presentation for a given feature. The mostly
used features are: color, texture, shape, color
layout. For each of them there are more models,
each with good points as well as bad points.
After feature extraction, also reduced a lot, the
dimensions of the vectors can also be high
(typical order of hundreds components). And
considering also that dimension of database
could be high, and the response should be as fast
as possible, most of the system implement also
an indexing algorithm. The high dimension of
vectors and non-Euclidean similarity measure
are the biggest problems in indexing.
Three research communities that contribute to
this area are computational geometry, database
management and pattern recognition[9]. The
most used indexing techniques are bucketing
algorithm, k-d tree, priority k-d tree, quad-tree,
K-D-B tree, hB- tree, R-tree. As new
techniques, clustering and neural networks have
promising results.
Most of the systems support one or more of the
following options for search:
- random browsing;
- search by example;
- search by sketch;
- search by text (key word or
speech);
- navigation with customized
image categories;
Here are some of the most representative
systems that have been developed.
QBIC is the first commercial system, developed
by IBM. Queries can be example images, user
constructed sketches, selected color or texture,
etc. As features, it includes color, texture, shape.
It uses a R*-tree as multidimensional indexing
structure.
Its
demo
is
at
http://wwwqbic.almaden.ibm.com .
Virage – a system developed at Virage Inc. It
supports queries based on color, color layout,
texture and structure. It goes deeper than QBIC,
it supports combinations of those queries, and
weights for each. Its demo is at
http://www.virage.com/cgi-bin/query-e .
RetrievalWare – an engine developed by
Excalibur Technologies Corp. It uses neural
networks in retrieving process. As features, it
uses color, shape, texture, brightness, color
layout, aspect ratio. Its demo page
http://www.virage.com/cgi-bin/query-e .
is
Photobook – a set of interactive tools for
browsing and searching images developed at the
MIT Media Lab. It uses shape, texture and face
features. In its more recent versions, it is
proposed to include human in image annotation
and retrieval loop. The motivation is that there
is no feature that can best model images from
each domain.
VisualSEEk and WebSEEk: the first one is a
visual feature search engine and the latter is a
www oriented search engine, developed at
Columbia University. Visual information is
based on color set and wavelet transform based
feature.
The
demo
page
is
http://www.ee.columbia.edu/˜sfchang/demos.ht
ml .
PicSOM: developed at Laboratory of Computer
and information Science at Helsinki University
of Technology [8]. It is the first system that uses
self organizing maps as a mean of indexing
images. The system tries to adapt to user’s
preferences regarding the similarity of images in
the database. Features describe color content,
texture, shape and structure of the images.
5.1.
Comparison to other systems
In general, our system follows the general
architecture of other systems in the field. It
integrates both important parts of image
retrieval systems - the feature extraction and the
indexing system. So the architecture is the same
as the majority of systems.
Due to the lack of time and knowledge, the
features include only color information.
However, the algorithms used for color
extraction and distance of histograms computed
are among the highest in their field. For
indexing, we followed a technique that is very
promising – the self organizing map, and which
began to be used only in recent years. From the
other systems we have studied only PicSOM
uses the same indexing technique. Their SOM is
a special one (tree structured SOM), but
considering that for now we work only with
little databases, our implementation fits well.
With the help of our initialization technique, the
training times are in normal range. In a future
version, which will work with larger amount of
data, we will have to consider this problem.
The indexing part can also be developed. We
can keep the SOM general approach, but
implement a tree structure SOM, that improves
organization and time training time.
Another indexing technique, more deterministic,
based for example on tree organization
structure, can be developed.
Even the space of features can be clustered
similar to the way colors are clustered.
6. Future development
The system can be developed further. Actually
the architecture was specially designed to permit
easy development.
Feature extraction. Currently features are only
associated with colors.
Next step will be to add some spatial
information, which could be not so hard. Easiest
way is to divide the image into sub blocks and
to perform the same color extraction algorithm
on each block. Other way could be to extract
colors the same as now, and then to associate to
each color a region in space (a bounding
rectangle for example). Of course some
statistics parameters can also be extracted, like
dispersion of colors.
Some information about shape can be added.
First run some edge detection algorithm and
then processed the information.
Texture is harder to model. Some filters can be
applied to original images and their response
analyzed, to get information about texture.
Other way, after edge detection, the general
orientation and some statistical information
about it can be obtained.
Also more complex transforms like Fourier or
better some wavelet transform can be applied
and the result stored as a part of feature.
7. References
[1] E. Sutinen – ”Information Retrieval, Lecture
Notes”, The University of Joensuu.
[2] P. Fränti – ”Image Analysis, Lecture Notes”,
The University of Joensuu.
[3] R. Baeza-Yates, B. Ribeiro-Neto, Modern
Information Retrieval, ACM, London, 1999.
[4] T. Kanungo, D.M. Mount, N.S. Netanyahu,
C.D. Piatko, R. Silverman, and A.Y. Wu, “An
Efficient k-Means Clustering Algorithm:
Analysis
and
Implementation”,
IEEE
TRANSACTIONS ON PATTERN ANALYSIS
AND MACHINE
INTELLIGENCE, VOL. 24, NO. 7, JULY 2002.
[5] T. Kohonen, The Self-Organizing Map
(SOM), WWW,
http://www.cis.hut.fi/projects/somtoolbox/theor
y/somalgorithm.shtml, 19.10.2003.
[6] S. Haykin, Neural Networks A
Comprehensive Foundation.
[7]
International
Organization
for
Standardization ISO/IEC JTC1/SC29/WG11
Coding of Moving Pictures and Audio, March
2003.
[8] M. Koskela, Content-Based Image Retrieval
with Self-Organizing Maps, WWW,
http://www.cis.hut.fi/picsom/thesis-koskela.pdf,
15.10.2003.
[9] Y. Rui and T.S.Huang,
“Image
retrieval:
Current
Techniques,
Promising Directions and Open Issues”,
Journal of Visual Communication and Image
Representation, 10:39-62, January 1999.
[10] A. Mojsilovic, J. Hu and E. Soljanin, ”
Extraction of Perceptually Important Colors and
Similarity Measurement for Image Matching,
Retrieval, and Analysis”, IEEE Transaction on
Image Processing, Vol. 11, No. 11, November
2002
Download