References - Imagelab - Università degli studi di Modena e Reggio

advertisement
Describing Texture Directions with Von Mises Distributions
Costantino Grana, Daniele Borghesani, Rita Cucchiara
Dipartimento di Ingegneria dell’Informazione
Università degli Studi di Modena e Reggio Emilia
name.surname@unimore.it
Abstract
A new approach for document analysis has been
proposed. The goal is to find a coherent way to
describe texture within generic documents, in order to
classify text (mainly), background and images.
Autocorrelation matrixes has been computed, and an
elegant description through mixture of Von Mises
distributions has been implemented.
1. Introduction
Document image analysis has a quite long story in
pattern recognition. Several techniques has been
proposed for content and layout segmentation to
provide the basis for semantic annotation,
classification and retrieval: for the implementation
over a large collection of digital documents, the
accuracy of the analysis and the computational effort
required are both significant. A specific use case are
ancient books or illuminated manuscripts, that cannot
be flipped through by the public due to their value and
delicacy. Computer science has the power to fill the
gap between people and all these precious libraries of
masterpieces: in fact digital versions of the artistic
works can be publicly accessible, either locally or
remotely, giving the user the freedom to choose his
personal way to navigate and enjoy it.
The quality of images of illuminated manuscripts
heavily depends on the way they have been acquired
or the preservation status of the work. Small rotations
or scaling can occur, pages can be spoiled, grayscale
or low quality acquisition is also possible, generally
resulting in a set of noisy textures. Moreover different
manuscripts have different contents and layout. For all
these reasons, a simple approach based on color, shape
or layout would not be effective enough for a large
scale implementation. In this paper, a flexible
approach to the problem is proposed based on texture
analysis. Images are analyzed by blocks through
autocorrelation, and a direction histogram is computed
for each block. Then we described it with a mixture of
Von Mises distribution (MoVM), a statistical
formulation that is more suitable for angular data than
a mixture of Gaussians. We implemented an EM
algorithm for parameters extraction and finally we
exploited Support Vector Machine (SVM) for block
classification. The goal is to provide a very fast and
compact representation for each class, in order to
make the retrieval as fast and effective as possible.
The test set used for our experiments is composed
by illuminated manuscripts provided by the Franco
Cosimo Panini S.P., precisely the photo collection of
The Borso D’Este Holy Bible.
2. Related works
Document segmentation is normally based on
partitioning of image in blocks and then texture
analysis. Several works for text segmentation has been
proposed: a clustering approach is presented in [1],
while in [2] a classification using Gabor filters has
been used. A comprehensive survey is proposed by
Busch et.al. [3] exploring and comparing several
techniques and situations. More general approaches
dealing also with background and pictures
segmentation have been proposed. Some works exploit
geometric constraints over the layout: for a literature
survey, please refer to [4]. Many others compute
specific descriptors followed by classification: an
example is provided by [5] with hidden tree Markov
models.
The major of works have been developed for
printed documents, while on illuminated manuscripts
only a limited set of works has been carried out. A
reference paper is the description of the DEBORA
system [6], which consists of a complete system for
analysis of Renaissance. In [7] an effective technique
for texture characterization in old books has been
proposed, exploiting the autocorrelation matrix in
order to extract the relevant directions within the
texture (called directional rose). In out work, we
extended this approach formulating a more elegant
Figure 1. Example of illuminated manuscripts and relative ground truth. White identifies
background, red identifies text and blue identifies pictures or decorations.
description of the different directions within the block
with the use of mixtures of Von Mises distributions.
3. Texture analysis
Documents are mainly characterized by three kind
of textures: a noisy background, the text and colored
images or decorations (Fig. 1). The quality of these
textures heavily depends on the way the image has
been acquired. Moreover each document can have
various contents and layout. For all these reasons, a
simple approach based on color, shape or layout would
not be effective enough for a large scale
implementation. A flexible approach to the problem is
necessary, so we chose to look at the texture structure
using a well known texture feature, the autocorrelation
matrix.
Autocorrelation is a very powerful and
discriminative feature in our context because textual
textures have a pronounced orientation, that heavily
differs from background, pictures or decorations. The
autocorrelation function is a typical signal processing
technique. Formally, it is a cross correlation of a signal
with itself, and it represents a measure of similarity
between two signals. Once applied to a grayscale
image, it produces a central symmetry matrix, that
gives an idea of how regular the texture is.
The image is divided into square blocks whose size
bs must be set according to the scale at which the
texture should be analyzed. The definition of the
autocorrelation for a block is:
C k, l  
bs 1 min  0, l  bs 1 min  0, k 

y  max  0, l 

x  max  0, k 
I  x, y   I  x  k , y  l  (1)
where l and k are defined in   bs 2, bs 2 .
The result of the autocorrelation can be analyzed
extracting an estimate of the relevant directions within
the texture (a similar approach has been proposed in
[7]). Each angle determines a direction, and the sum of
all the pixels along each direction is computed to form
a polar representation of the autocorrelation matrix,
called direction histogram. In this way, each direction
will be characterized by a weight, indicating its
importance within the block.
w   

r 0, bs 2 
C  r cos  , r sin   .
(2)
Since the autocorrelation matrix has a central
symmetry by definition, we consider only the first half
of the direction histogram, from 0° to 179°.  and r
are quantized: the step of  is set to 1 degree (so we
obtain 180 values, representing 180 possible
directions), r is defined as 2 bs of the block size. A
text block will be characterized by peaks around 0°
and 180° because of the dominant direction is
horizontal, and this behavior is different from image
textures (described by a generic monomodal or
multimodal distribution) and background texture
(described by a nearly uniform flat distribution).
4. Texture characterization
The polar distribution obtained by autocorrelation
in the previous steps can be easily modeled using Von
Mises distributions. Gaussian distributions are
inappropriate to model periodic datasets: setting the
origin in 0°, elements crossing this angle will be
classified into two distinct directions, even if they
express almost the same one. The choice of the origin
becomes very critical, and for this reason could be a
weak point of the fit. Instead, a Von Mises distribution
is circularly defined so it can correctly represent
angular datasets. The probability density function is
defined as follows:


V  | ,m 
1
2 I 0  m 
e

m cos  

.
(3)
8
0.5
8
0.5
7
0.45
7
0.45
0.4
6
0.35
5
0.3
4
0.25
3
0.2
0.15
2
0.1
0.4
6
0.35
5
4
3
0.2
0.5
7
0.45
0.4
6
0.35
0.3
5
0.3
0.25
4
0.25
0.2
3
0.15
2
8
0.1
0.15
2
0.1
1
0.05
1
0.05
0
0
0
0
0
0
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
145
154
163
172
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
145
154
163
172
0.05
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
145
154
163
172
1
Figure 2. Example of directional histograms and the corresponding fitting with Von Mises
mixtures.
The parameter m denotes how concentrate the
distribution is around the mean angle  . In our
context, we used a slightly different formulation (we
simply multiply the angles by 2) with a periodicity of
 instead of 2 , considering only angles in 0,  
representative for valuable and meaningful directions.
I0 is the modified order 0 Bessel function, and is
defined as:
1
I0  m  
2
2
e
m cos 
d .
(4)
  sV  |  s , ms 


(5)
where  k represents a weight of the distribution
within the mixture. An optimal way to get the
maximum likelihood estimates of the mixture
parameters
is
the
Expectation-Maximization
algorithm [8]. In the E step the expected values for the
likelihood are computed, then a set of parameters to
maximize such values are obtained, repeating the
process until convergence or maximum number of
iterations is reached. To maximize the likelihood, a set
of responsibilities of the bins for each Von Mises is
necessary. Let  be the index of the bin. The
responsibilities are computed as follows:
.
(6)
s 1
A new set of weights for the Von Mises of the
mixture can now be computed:
k 

  0, 
w   k


 0,
M     kV  | k , mk ,
k 1
 kV  |  k , mk 
K
0
To catch the general multimodal behavior of input
datasets, we chose a mixture of Von Mises
distributions. We used mixtures with 2 components
only, because they proved to be sufficient in order to
recognize the two most meaningful directions
(horizontal and vertical) while keeping an affordable
computational cost. An example of fitting for the three
types of texture analyzed is shown in Fig. 2.
Generally, a mixture of K Von Mises distributions
is defined as follows:
K
k 

w
.
(7)
This formulation differs from the one in [8], and the
motivation lies on the dataset we used: we do not have
a general distribution of angular data to fit, but a
sampling of directions and relative weights. For this
reason, we consider the weight as a multiplier value
for each angle, so formally we have w times the
angle theta in our dataset.
In the M step, we compute the new  and m values
for each Von Mises within the mixture. In particular,
 is computed by maximization of the relative
likelihood as follows:
  w   k sin 2
  0, 
 k  arctan 
  w   k cos 2
  0, 


.


(8)
Note the multiplication by 2 in order to relate to a
 periodicity. The retrieval of m by maximization is a
bit more complicate, due to the presence of the Bessel
functions. Given the derivative of the modified Bessel
function I1 defined as:
I1  m  
1
2
2
e
m cos 
cos  d ,
(9)
0
the problem could be mathematically solved using this
formulation:


 m 2  m 2  2m m cos 2    
1
2
1 2
1
2 
1
I 0 
(10)

I 0  m1  I 0  m2  
2



been manually annotated, focusing on the three main
w   k cos    k

characteristics of these images: text, images and
I  m   0, 
background. Images have been divided using different
. (11)
A  mk   1 k 
I 0  mk 
w   k

fixed window sizes, then a suitable single window size
  0, 
has been chosen. For each window (block) over the
image, the direction histogram has been computed. A
The value of mk can be found by the numerical
preprocessing stage is necessary to highlight the real
inversion of A  mk  . In particular we use the
shape of the distribution, besides the numerical value
approximation proposed in [9].
assumed by bins. A simple normalization approach has
At this point, we have 6 parameters to play with:
the major drawback of exalting the shape, and thus the
 k ,  k , and mk of both Von Mises. This represents a
noise in low variability data.
very consistent and compact way to describe a whole
In order to represent the data distribution
distribution, making the retrieval faster and effective.
coherently with the real shape distribution, we chose
The similarity between two Von Mises distributions
to change the baseline reference of values, subtracting
can be defined using the Bhattacharyya distance.
to each bin the minimum of the entire block. In this
Given two Von Mises distributions V1 and V2, the
way, a noisy block like the background will show a
formulation is shown in Eq. 10. No explicit form is
nearly flat distribution with a very small variance.
available for mixtures, so we propose a new metric
Instead a text block, with a dominant direction over
that also takes into account the relative weights of the
the horizontal axis, will continue to show a coherent
components of the mixture. Given two mixture
distribution, with peaks near 0° and 180°.
2
i
i
i
i
distributions M     k 1 kV  |  k , mk , we test
Moreover, we would like to keep the same scale in
all
blocks, so that the shape can be compared. For this
the Bhattacharyya distance between the two
reason
the direction histogram bins have been
components of one distribution and the two of the
quantized to a fixed step. The normalized distribution
other, select the best matching two (call them b, and
has been used to fit a mixture of Von Mises
the other two o) and then measure the distance as:
distributions.
WBb  WBo
In our experiments, we tested mixtures with
1
2
d M ,M  1 2
(12)
different numbers of Von Mises distributions, and we
 b b   o1 o2 ,
observed that even a very limited set is sufficient to
where
produce good retrieval results in terms of precision
and recall, without affecting too much the
1
2
1 2
1
2
WBx  M , M    x x B Vx ,Vx 
(13)
computational time.
To perform a first evaluation of the characterization
This metric takes into account the fact that two
proposed, a confusion matrix has been computed using
components can be very similar, but their contribution
a 1-nearest neighbor approach. The training set was
to the mixtures is quite low.
analyzed, then a test set was classified: for each block
within the test set, the classification of the most similar
5. Experimental results
block within the training set has been chosen as
classification of the block. The result are shown in
Tests have been performed using 20 uncompressed
Table 1. Results were quite promising: this feature has
24bit high definition pictures of biblical illustrated
a good discriminative power with all three kinds of
manuscripts, provided in high resolution (8373x6039)
texture used.
and with 400dpi. For each image, a ground truth has
B V1 ,V2   1 






text background image
recall
recall
precision
text
431
5
54
0.879592
text
0.931183
0.911579
background
7
446
172
0.7136
background
0.854086
0.87976
image
27
63
631
0.875173
image
0.826138
0.898477
Table 1. SVM classification using radial basis
function as kernel.
Table 2. Confusion Matrix relative to a 1nearest neighbor classification.
more than 1800 blocks can be processed in about 100
seconds on a standard PC. The result of the feature
extraction provides a very compact description of
blocks. For this reason, these features can be easily
exploited by a content-based image retrieval system:
the computational time needed for comparison and
searching will be extremely low.
Finally we thank the Franco Cosimo Panini S.P.A
that give us the possibility to analyze an invaluable
pieces of art such as The Borso D’Este Holy Bible for
this work.
References
Figure 3. Visualization of the proposed
classification. A filtering technique has been
applied to clean out the results.
In order to produce a more generic classifier, we
implemented the SVM classification. The best results
(in terms of recall and precision) have been obtained
exploiting the radial basis kernel. The results of this
classification are shown in Table 2, and a visual
representation of the classification is shown in Fig. 3.
The text is the main focus of this feature, because of
the typical horizontal orientation, and these
experiments reveal a good discriminative power for
this kind of texture. The background either has a very
peculiar characteristics that makes this feature suitable
for retrieval: a flat distribution is far more different
than the monomodal distribution of the text. Instead
this feature is not designed to be effective for image
classification: in practice, pictures present a generic
multimodal distribution, occasionally they show a
particular symmetry but it is not representative of the
entire class. For this reason, the good classification
results in this case have to be considered as a side
effect of the limited number of Von Mises within the
distribution: the algorithm tends to classify every
image with a two-dimensional distribution (horizontal
and vertical orientations), that have proved to have a
sufficiently pronounced characteristic respect of the
other two.
6. Conclusions
In this paper a new technique for document analysis
is presented, characterizing each texture with a 6vector feature. The autocorrelation computation and
the fitting of the direction histogram with the mixture
of Von Mises distributions has a reasonable processing
time for this application: a high resolution page with
[1] Bres, S.; Eglin, W.; Gagneux, A., "Unsupervised
clustering of text entities in heterogeneous grey level
documents," Pattern Recognition, 2002. Proceedings.
16th International Conference on , vol.3, no., pp. 224227 vol.3, 2002
[2] A. K. Jain and S. Bhattacharjee, "Text segmentation
using Gabor filters for automatic document processing,"
Machine Vision and Applications, vol. 5, no. 3, pp.
169--184, 1992.
[3] Busch, A.; Boles, W.W.; Sridharan, S., "Texture for
script identification," Pattern Analysis and Machine
Intelligence, IEEE Transactions on , vol.27, no.11, pp.
1720-1732, Nov. 2005
[4] Mao, S., Rosenfeld, A., Kanungo, T.: Document
structure analysis algorithms: a literature survey. Proc.
SPIE Electronic Imaging 5010 (2003) 197–207
[5] Diligenti, M.; Frasconi, P.; Gori, M., "Hidden tree
Markov models for document image classification,"
Pattern Analysis and Machine Intelligence, IEEE
Transactions on , vol.25, no.4, pp. 519-523, April 2003
[6] F. Le Bourgeois, H. Emptoz, “DEBORA: Digital
AccEss to BOoks of the RenAissance”, in International
Journal on Document Analysis and Recognition, vol. 9,
n. 2-4, pp. 193-221, 2007
[7] Journet, N.; Eglin, V.; Ramel, J.Y.; Mullot, R.,
"Dedicated texture based tools for characterisation of
old books," Document Image Analysis for Libraries,
2006. DIAL '06. Second International Conference on ,
vol., no., pp. 10 pp.-, 27-28 April 2006
[8] A. Prati, S. Calderara, R. Cucchiara, “Using Circular
Statistics for Trajectory Analysis” in Proceedings of
International Conference on Computer Vision and
Pattern Recognition (CVPR 2008), Anchorage, Alaska
(USA), June 24-26, 2008
[9] G.W. Hill, “Evaluation and Inversion of the Ratios of
Modified Bessel Functions, I 0 ( x) I1 ( x) and
I1.5 ( x) I0.5 ( x) , ACM Transactions on Mathematical
”
Software, vol. 7, n. 2, June 1981, pp. 199-208
Download