Content-Based Retrieval Using Invariant Features, Self

advertisement
Content-Based Retrieval Using Invariant Features, SelfOrganizing Maps, Concepts and Fuzzy Interval Numbers.
C. Skourlas N. Vassilas
Department of Informatics, Technological Educational Institute of Athens
{cskourlas, nvas }@teiath.gr
Abstract
In the complex Cross Language Document Retrieval applications dealt in this paper, the goal is to
assist the document matching stage when the documents contain images. Hence, we calculate the
similarity between a submitted bilingual query and each document in the collection or retrieve images
that are similar to a query image, based on features extracted from images.
Fuzzy Interval Numbers (FINs) have been employed in various real-world application including
numeric and non-numeric data. In this paper, the use of FINs classifier is proposed to handle problems
of Cross Language Information Retrieval and documents’ classification. The FIN representation of
documents is based on the use of the collection term frequency as the term identifier. Such a
representation of documents seems to be suitable for Cross Language Information Retrieval without
dictionary.
In the recent years, Content-Based Image Retrieval (CBIR) evolved to an important research domain
within the context of multimodal information retrieval. To assure improved image retrieval
performance, even when the images are at different scales and orientations or corrupted by noise, we
propose a set of global and local invariant features. Color quantization of the images using selforganizing maps are also used to lead to memory savings and improve, in some cases, image retrieval
accuracy.
Keywords: Content Based Image Retrieval, Concept Based Retrieval, Cross Language Information
Retrieval, Multimodal Information Retrieval, Self-Organizing Maps, Fuzzy Interval Numbers
1. Introduction
In the complex Information Retrieval applications, such as cross language document retrieval,
the documents are typically represented using the vector space model. A set of N keywords
(index terms) is used to represent each document as an N-dimensional vector with each
element representing either the appearance of the corresponding keyword in the document or
its relative frequency of occurrence. To correctly translate a keyword from one language to
another, besides the use of a dictionary and a thesaurus (for the synonyms), word sense
disambiguation techniques that assess the appropriate contextual meaning of the word have to
be employed [Alevisos et al., 2007]. The similarity between a submitted query and each
document in the collection is usually an inner product based vector matching operation.
However, due to a poor selection of keywords, keyword sense ambiguities, inaccurate
lexicons or incomplete thesauri, the retrieval accuracy is often rather low.
In the image databases, the difficulty to annotate images so that to allow later retrieval that is
acceptable by the subjective perception of the various future users makes text-based image
retrieval an inappropriate approach [Rui et al., 1999]. It is, therefore, necessary to develop
tools for retrieving information based on image content and also improve the cross language
text retrieval without dictionaries and thesauri.
In the recent years, Content-Based Image Retrieval (CBIR) evolved to an important research
domain within the context of multimodal information retrieval [Faloutsos, 1995] and a
number of CBIR systems and tools have already been developed [Rui], [Veltkamp, 2002],
[Eakins, 2000]. CBIR could then be used to assist the document matching stage in complex
multimodal information retrieval applications when the documents contain images. In the
case that the documents also contain images, redesigning the similarity measures to include
the contribution from CBIR systems could lead to significant improvements in document
retrieval accuracy.
Fuzzy Interval Numbers (FINs) and lattice algorithms have been employed in various realworld application including numeric and non-numeric data [Kaburlasos, 2006]. A FIN may be
interpreted as a conventional Fuzzy Set; additional interpretations for a FIN are possible
including a statistical interpretation. Use of FINs classifiers is proposed [Alevisos, 2007] to
handle problems of Cross Language Information Retrieval. The FIN representation of
documents is based on the use of the collection term frequency as the term identifier and
seems to be suitable for Cross Language Information Retrieval without dictionary.
Section 2 of this paper illustrates the proposed experimental methodology using a Euro coin
database. The overall structure of the coin retrieval system is presented for the original coin
collection as well as for the coin collection after vector quantization with Kohonen’s selforganizing maps. In section 3 the features extracted for content-based coin indexing are
presented and the experimental results for three test datasets are discussed. In Section 4 a
brief introduction to definition and construction of FINs is given. The FINs similarity is
defined and applied to the similarity of queries and documents. The evaluation of FIN –
techniques is given in section 5. Finally, conclusions and future work are discussed in the last
section.
2. The Content Based Image Retrieval
The experimental methodology consists of two distinct phases: a) the design phase, whereby
we create the database and decide upon its indexing scheme, and b) the retrieval phase, in
which, following the presentation of query-images, similar images are retrieved from the
database.
2.1 Database Design
In the first phase, a coin database is created using the following stages:
 Specification of a digital coin image collection
 Preprocessing of each image to detect the position of the coin
 Feature extraction from coin pixels
 Feature-based indexing of the database
At the first stage, we downloaded coin images from the Internet and compiled a collection of
115 Euro coin faces [Vassilas, 2006]. Among these images, several coin designs, not yet in
circulation, were included. All images were in the RGB color space and had a size of
approximately 240x240 pixels. Next, each image is preprocessed in order to detect the
position of the coin in the image and isolate it from the background. To this end, each image
is first converted to gray-scale, then the edges are found using the Sobel operators and finally
the Hough transform [Ballard, 1982] is applied on the black-and-white image of edges in
order to detect the outer circle of the coin. The circular disk mask, thus created, is then used to
isolate the coin from the background (see Fig. 1).
a
b
c
d
Fig. 1. Preprocessing steps: a) original color image, b) Sobel edge detection on gray-scale image, c) mask from
Hough transform, and d) isolated coin.
Once the coin images are free from background interference, features are extracted from those
pixels that belong exclusively to the coin area. The extracted feature vector from each image
is then stored in order to form the index to the database.
2.2 Database Retrieval
Retrieval from the coin database is performed by:
 Query-image presentation to the database system
 Preprocessing to detect the position of the coin in the query-image
 Feature extraction and query encoding with its feature vector
 Specification of similarity measure
 Computation of query features similarity with those of the database index
 Retrieval of the most similar database coins
The query coin image presented to the system undergoes the same preprocessing, as that for
the database coins, to isolate the coin from the background and to extract its corresponding
feature vector. The similarity measures employed in this work between two feature vectors
are the L1 and L2 distances defined by
p 1/p
Lp = ( Error!| fi( I ) – fi( J ) | )
for p = 1, 2
where f() represents the feature vector of an image and I, J are the database and query
images respectively. The database coins are ranked according to their similarities to the query
image and the N most similar coins are retrieved from the database and shown to the user.
2.3 Quantization of Coin Collection
Kohonen’s self-organizing maps was the vector quantization method used in this work. The
coin collection was quantized in two ways. First, Kohonen’s algorithm was applied to each
coin image separately resulting to indexed images with distinct colormaps. Second, the same
algorithm was applied to the whole coin collection simulataneously, resulting to indexed
images with a common colormap. Fig. 2 shows some original coins (upper row) along with
their indexed representations (lower row) for the second quantization case. Also shown to the
right of this figure is the coins’ common colormap.
Figure 2. Original (upper row) and quantized (bottom row) coins using a common colormap
(right).
The feature extraction and indexing phases for the database coins are the same as for the
previously described methodology. During retrieval, the queries are also quantized with
Kohonen’s algorithm resulting in an indexed representation with their own associated
colormap.
3. Feature Extraction and experimental results
Three different kinds of features were used, namely, color, shape and wavelet features. Color
is known to be invariant to scale and orientation but is sensitive to illumination changes. Four
moments (mean, standard deviation, skewness and kyrtosis) from each of the hue, saturation
and value components of the HSV color space were extracted, for a total of twelve color
features.
The shape features were extracted from the gradient image which was obtained by first
transforming the color image to grayscale and then using the Sobel operators. To achieve
some invariance to illumination conditions, the [min, max] range of the gradient image was
normalized to [0, 1]. Assuming that scale and orientation changes preserve most of the edges,
the normalized polar historgram, i.e. the probability distribution of edge orientations (at 90 o
with respect to the edge gradients), not only should it be, adequately, scale and rotation
invariant but also can give an estimate of the rotation angle. The latter results from the
normalized polar histogram circular correlation between the query image and each one of the
database coins. Fig. 3 shows the normalized polar histograms of a database image and a query
that is of a different size and orientation with respect to the original image.
Figure 3. A database (left) and a query (right) coin along with their polar histograms.
Actually, the procedure followed uses the L1 or L2 distance measures instead of the typical
inner product correlation in order to determine the angle of rotation and the corresponding
polar similarity score. In addition, in order to increase the robustness of the system, we
extracted both, edge magnitudes and orientations from five equal-area concentric circular
rings. The shape features extracted were: a) the mean, standard deviation, skewness and
kyrtosis of the ring edge magnitudes, for a total of 5x4 = 20 features, and b) the normalized
polar histograms of each ring for 5 degrees angular bins (36 features/ring), for a total of 180
polar features.
Finally, for comparison purposes, we also extracted features from a three level wavelet
analysis with the ‘db3’ mother wavelet of the Daubechies family. In particular, we extracted
the mean and standard deviation of the wavelet coefficients for each detail image of the
wavelet decomposition as well as for the level-3 approximation image, for a total of 10x2 =
20 wavelet features.
3.1 Discussion of the experiments
The experiments on coin retrieval have been performed using three sets of queries. The first
data set (DS1) contained queries at a lower resolution (approximately of 130x130 pixels) in
order to test the effect of scale on the retrieval accuracy. The queries of the second data set
(DS2) were at a different scale (70% the size of the original images), had a random rotation in
the [-180o, 180o] range and were corrupted with additive zero mean gaussian noise with 0.1
standard deviation. Finally, the queries of the third data set (DS3) were at a random scale
(between 50% and 100% of the original scale), at random orientations (in the [-180o, 180o]
range) and with a random change of color in order to test scale, rotation and illumination
invariances. Tables 1 and 2 show the retrieval accuracy based on color and edge (ring)
magnitude features respectively, (for the means, standard deviations or their combination) for
the three data sets while Tables 3 and 4 show the retrieval accuracy for the edge (ring)
orientations and wavelet features respectively. Since the L1 distance measure gave slightly
better results than L2 in most of the experiments, all results shown in the tables assume the L1
measure.
Table 1. Retrieval based on color.
Feature vector composed of:
μ
σ
μ+σ
4.6
(7.0)
5.7
(5.1)
3.3
(4.9)
DS1
6.5 (6.6)
25.6 (29.7)
7.2 (7.8)
DS2
DS3 19.1 (19.4) 30.7 (27.3) 18.3 (22.2)
Table 2. Retrieval using edge strengths.
Feature vector composed of:
μ
σ
μ+σ
10.1
(14.6)
23.9
(28.9)
13.8
(20.7)
DS1
DS2 17.6 (19.6) 35.2 (30.6) 22.4 (23.0)
DS3 32.3 (28.8) 35.6 (30.8) 30.9 (29.4)
Table 3. Retrieval using edge angles.
Feature vector:
Normalized Polar Histogram
12.1 (17.1)
DS1
22.7 (27.6)
DS2
29.4 (31.9)
DS3
Table 4. Retrieval using wavelet features.
Feature vector composed of:
μ
σ
μ+σ
36.3
(28.2)
32.7
(26.7)
36.7
(28.9)
DS1
DS2 34.1 (27.9) 38.8 (30.3) 32.8 (28.2)
DS3 36.3 (29.3) 31.7 (27.0) 31.3 (27.9)
4. Definition and construction of FINs
FIN could be regarded as an abstract “mathematical object” and can have various
interpretations and uses.
Given a vector (a “population”) x = [x1,x2,,xN] of term frequencies (“measurements”) that
are real numbers sorted in ascending order. The dimension of a vector x is denoted by dim(x)
e.g. dim([2,-1]) = 2, dim([-3,4,0,-1,7]) = 5. The median(x) of the vector x = [x1,x2,,xN] is
defined to be a number such that half of the N numbers x1,x2,,xN are smaller than median(x)
and the other half are larger than median(x); for instance, median([x1,x2,x3]) = x2, with x1 < x2
< x3, whereas median([x1,x2,x3,x4]) = (x2 + x3)/2, with x1 < x2 < x3 < x4. A FIN can be
computed for the vector x (population) by applying the following CALFIN algorithm (see
Kaburlasos, 2006].
Algorithm CALFIN // Calculate FIN
Let x be a vector of term frequencies (real numbers).
Sort vector x incrementally.
Initially vector pts is empty.
function calfin(x) {
while (dim(x)  1)
medi:= median(x)
insert medi in vector pts
x_left:= elements in vector x less-than number median(x)
x_right:= elements in vector x larger-than number median(x)
calfin(x_left)
calfin(x_right)
endwhile
} //function calfin(x)
Sort vector pts incrementally.
Store in vector val, dim(pts)/2 numbers from 0 up to 1 in steps of 2/dim(pts) followed by
another dim(pts)/2 numbers from 1 down to 0 in steps of 2/dim(pts).
Eventually Algorithm CALFIN computes two vectors, i.e. pts and val, where vector val
includes the degrees of (fuzzy) membership of the corresponding real numbers in vector pts.
In the figure 4 two queries or documents (vectors) are illustrated (plotted) as two
Fuzzy Interval Numbers. Each value on the term axis represents a term (stem). Any
“cut” at a given height h(0,1] defines two generalized intervals, denoted by [a’, c’]h,
[b’, d’] h. In our case the generalized intervals are positive and intersecting. As we can
see below, a bell-shaped mass function is used for the calculations of the distance
between two FINs (or the similarity between two queries or documents which are
represented by their FINs).
Figure 4. Two queries / documents (vectors) illustrated as two Fuzzy Interval Numbers. Each
value on the term axis represents a term (stem). A bell-shaped mass function is used for the
calculation of the distance (similarity) between the documents.
4.1 Fuzzy Interval Numbers and FINs similarity
The concept of the Generalized Intervals was used for introducing a metric into the
lattice of the Fuzzy Interval Numbers (FINs) [Kaburlasos 2006]. The interpretation of
a generalized interval depends on an application; for instance if a feature (a term) is
present in a document it could be indicated by a positive generalized interval
[Skourlas 2007]. The area “under” a generalized interval is a real number which could
be calculated. A metric distance and an inclusion measure function in the set (lattice)
of the generalized intervals Mh can be defined [Kaburlasos, 2006].
4.2 Calculation of the distance between documents
The FIN distance is used instead of the similarity measure between documents: the
smaller the distance the more similar the documents. In the case of documents and for
the distance calculations a bell-shaped mass function could be selected:
  (1   )h
mh(t)=
[  (1   )t ]
2

1 z 
t
1
1
z  maxctf / 
Figure 4 illustrates how we calculate the distance of two FINs F1 and F2
(representing two documents) using the mass function. The points a, b, c, d are used
to define the distance, at height h, dh(F1(h),F2(h)) = dh([a,b]h,[c,d]h);
dh(F1(h),F2(h)) equals the sum of areas of the shaded regions.
The distance of the two FINs at the height h is given by the sum of areas of the shaded
regions and the distance between the two FINs is calculated using the definite integral
of the distance at height h from h=0 to 1:
1
b'
d'
0
a'
c'
Distance =  (|  f (t )dt |  |  f (t )dt |)dh
5 Examples of successful document retrieval and classification
From our previous experimentation [Skourlas, 2007] we knew that if the length of the queries
and the documents is greater then the results of the method are better.
Sample 1: Coin Images’ annotations
Unfortunately, the available annotations for the Euro coin images are rather short e.g.
“2-cent denomination: Depicting a Corvette, i.e. a type of ship used during the Greek War of
Independence (1821)”. «Kέρμα των 2 λεπτών: Κορβέτα, τύπος πλοίου που χρησιμοποιήθηκε
στον Εθνικό Απελευθερωτικό Αγώνα (1821-1827)».
Therefore, it was necessary to expand such annotations using relevant texts in order to have
successful experimentation. For example, within the framework of the Olympic Games Coin
Program (Athens 2004), the Greek Mint was striking series of commemorative coins. In the
case of representation of Goddess Athena on such coins we added texts as the following:
“Goddess Athena
Athena was Zeus' favourite daughter. She was beautiful and brave, yet very wise as well. And it was absolutely
natural that she was highly esteemed by the Greeks for it was in her figure that the most important ideals of the
ancient Greek spirit, prudence and bravery, were personified. She was the goddess of wisdom, protector of cities
who ensured the peaceful life of citizens. At the same time she was the goddess of war and power, the one who
would greatly contribute to the final victory in the battlefield. It is no surprise therefore that she is often
represented triumphant (Nikephoros), with a small figure of Nike (Victory) in hand.
Athena is always depicted in full armament - helmet, spear and shield. Her symbols are the owl and the olive. The
myth of Athena's birth is quite impressive …..”.
Sample 2: The MED collection.
The MED test set is composed of a collection of 1033 documents; a set of 30 queries, and a
list of all the collection documents (“qrels”) that are relevant to each of the queries
e.g.“qrels14”={23,24,25,26,28,29,454,455,456,457,459,461,463,466,467,468}
We selected all the long queries (e.g. 14, 17, 20, 25, …). For example, query 14 is described
by the following terms:
“renal amyloidosis as a complication of tuberculosis and the effects of steroids on this
condition. only the terms kidney diseases and nephrotic syndrome were selected by the
requester. prednisone and prednisolone are the only steroids of interest”.
The terms of the queries, and the frequencies of the terms (in the MED collection, in the
“qrels”) were used in order to extract representative features (terms) for the “qrels” classes
(see some of the features in the Table 5).
Query
Representative features (terms)
“qrel14” Amyloidosis, Tuberculosis, Nephrotic, Prdnisone, Prednisolone
“qrel25” Diabetes, Insipidus, Chlorothiazide, Sodium
Table 5. Representative features for some classes (“qrels”) of the MED collection
After that we used successfully various techniques based (or not) on the above statistics
(terms frequencies). Table 6 summarizes some interesting experimental results. In this case
we used the representative features (terms) to construct one centroid for each class. Then, we
calculated the distance of every query from these centroids (see table 6). If the FIN-distance
of a query from the centroid of a class is minimal then the query is classified to this class. For
the calculations we also used a stop-words file (see [IDOM-server]), and the following
parameters for the mass-function: ρ = 0, σ = 1, ν = 2, z = 0.05. It also seems that for the MED
collection it is better to use TfIdf instead of simple tf (term frequency).
Qrels
Q14
Q17
Q20
Q25
Q27
Q29
14
12.7087 24.8059 28.4343 9.13131 71.354
25
17.3849 24.7493 27.8314 6.67153 71.2515 35.5123
35.1403
Table 6 Examples of successful classification
Sample 3: Bilingual medical bibliographic records
We also focus on various samples using bibliographic descriptions extracted from the
bilingual bibliographic databases of the Greek National Documentation Centre (NDC). The
rationale for selecting such bibliographic records is the following: Each bibliographic record
(text, document) is publicly available in various formats (e.g. text, MARC XML). NDC is the
host of various bibliographic (or not) databases that cover different research topics (e.g.
medical bibliography, dissertations, financial reports and bibliography).
For example, the records of the NDC were searched using the search terms “pediatrics” and
“mastectomy”. All the records without abstract were deleted and then all the documents that
have a search term in their keywords formed two classes: Pediatrics-class, Mastectomy-class.
Then, we supposed that the rest of the documents were our queries. All these queries
(documents) were successfully classified using the FIN-based similarity.
6. Conclusions and future work
In this work, we developed a content-based coin retrieval system using color, shape and
wavelet features with consideration to translation, scale and rotation invariance. A coin
collection was used during the design phase to create the index to the database. In the retrieval
phase, a weighted similarity measure was used to match the query’s feature vector to those of
the index and retrieve the most similar database coins. Several experiments have been
performed showing improved retrieval accuracy when using feature combinations. Moreover,
contrary to intuition, vector quantization of the coin database using self-organizing maps,
showed a significant improvement of system’s performance when the scaled-down and
rotated queries were corrupted by additive noise or had color alterations, especially when a
common colormap was used.
We have also concluded that the FIN-based calculation of the similarity between queries and
documents (vectors) is a novel method for solving various problems. We verified in our
experiments that if we focus on the documents that are related to different search terms then
we can easily apply FIN-based techniques and calculate correctly the distance (similarity)
between them.
We are currently considering on expanding the present research in respect to distributed
processing architectures [13] and in applying the aforementioned techniques in specific use
case scenarios such as in medical environments [14].
Acknowledgments
This Project is co-funded by the European Social Fund and National Resources – (EPEAEKII)-ARXIMHDHS.
References
1. Rui, Y., T.S. Huang and S.-F. Chang, “Image Retrieval: Current Techniques, Promising
Directions and Open Issues”, J. of Visual Communication and Image Representation, Vol. 10,
pp. 1-23, 1999.
2. Faloutsos, C. and D. Oard, “A Survey of Information Retrieval and Filtering Methods”,
Technical Report, Dept. of Computer Science, Univ. of Maryland, College Park, No. CS-TR3514, Aug.1995.
3. Veltkamp, R.C., “Content-Based Image Retrieval Systems: A Survey”, Revision of Tech.
Rep. UU-CS-2000-34, Dept. of Computer Science, Utrecht University, 2002.
4. Eakins, J.P. and M.E. Graham, “Content-Based Image Retrieval”, Tech. Rep. JTAP-039,
JISC Technology Application Program, Newcastle upon Tyne, 2000.
5. Kaburlasos V. G., Towards a Unified Modeling and Knowledge –Representation based on
Lattice Theory - Computational Intelligence and Soft Computing Applications, SpringerVerlag, Summer 2006
6. T. Alevisos, V.G. Kaburlasos, S. Papadakis, C. Skourlas, P. Belsis, Fuzzy Interval Number
Techniques for Multilingual and cross Language Information Retrieval. International
Conference on Enterprise Information Systems ICEIS 2007, Madeira Portugal
7. Ballard, D.H. and C.M. Brown, Computer Vision, Prentice Hall, Englewood Cliffs, N.J.,
1982.
8. Kohonen, T., Self-Organizing Maps, Springer, Heidelberg, 1995.
9. Vassilas N., Skourlas C., Content-based coin retrieval using invariant features and self
organizing maps, ICANN06, Athens, 2006
10. Skourlas C, Alevizos T, Belsis P, Fragos K., V. G Kaburlasos, Papadakis S. Fuzzy
Interval Numbers (FINs) techniques and its applications in Natural Language Queries
Processing and documents classification, BCI 2007, accepted
11. Glasgow IDOM server–IRCollections, http://ir.dcs.gla.ac.uk/resources/test_collections//
12. Greek National Documentation Centre: www.ekt.gr
13. P. Belsis, Challenges and Potential Solutions for Secure and Efficient Knowledge
Leveraging in Coalitions, eJETA: The electronic Journal for E-Commerce Tools &
Applications , Vo. 2, No. 1, 2006
14. Gritzalis S., Belsis P., Katsikas S. “Interconnecting autonomous medical domains: a
security perspective” IEEE Engineering in Medicine and Biology Magazine (IEEE EMBS)
Theme Issue on: "Image, Signal and Distributed Data Processing for Networked eHealth
Applications" (accepted)
Download