Describing Texture Directions with Von Mises Distributions Costantino Grana, Daniele Borghesani, Rita Cucchiara Dipartimento di Ingegneria dell’Informazione Università degli Studi di Modena e Reggio Emilia name.surname@unimore.it Abstract A new approach for document analysis has been proposed. The goal is to find a coherent way to describe texture within generic documents, in order to classify text (mainly), background and images. Autocorrelation matrixes has been computed, and an elegant description through mixture of Von Mises distributions has been implemented. 1. Introduction Document image analysis has a quite long story in pattern recognition. Several techniques has been proposed for content and layout segmentation to provide the basis for semantic annotation, classification and retrieval: for the implementation over a large collection of digital documents, the accuracy of the analysis and the computational effort required are both significant. A specific use case are ancient books or illuminated manuscripts, that cannot be flipped through by the public due to their value and delicacy. Computer science has the power to fill the gap between people and all these precious libraries of masterpieces: in fact digital versions of the artistic works can be publicly accessible, either locally or remotely, giving the user the freedom to choose his personal way to navigate and enjoy it. The quality of images of illuminated manuscripts heavily depends on the way they have been acquired or the preservation status of the work. Small rotations or scaling can occur, pages can be spoiled, grayscale or low quality acquisition is also possible, generally resulting in a set of noisy textures. Moreover different manuscripts have different contents and layout. For all these reasons, a simple approach based on color, shape or layout would not be effective enough for a large scale implementation. In this paper, a flexible approach to the problem is proposed based on texture analysis. Images are analyzed by blocks through autocorrelation, and a direction histogram is computed for each block. Then we described it with a mixture of Von Mises distribution (MoVM), a statistical formulation that is more suitable for angular data than a mixture of Gaussians. We implemented an EM algorithm for parameters extraction and finally we exploited Support Vector Machine (SVM) for block classification. The goal is to provide a very fast and compact representation for each class, in order to make the retrieval as fast and effective as possible. The test set used for our experiments is composed by illuminated manuscripts provided by the Franco Cosimo Panini S.P., precisely the photo collection of The Borso D’Este Holy Bible. 2. Related works Document segmentation is normally based on partitioning of image in blocks and then texture analysis. Several works for text segmentation has been proposed: a clustering approach is presented in [1], while in [2] a classification using Gabor filters has been used. A comprehensive survey is proposed by Busch et.al. [3] exploring and comparing several techniques and situations. More general approaches dealing also with background and pictures segmentation have been proposed. Some works exploit geometric constraints over the layout: for a literature survey, please refer to [4]. Many others compute specific descriptors followed by classification: an example is provided by [5] with hidden tree Markov models. The major of works have been developed for printed documents, while on illuminated manuscripts only a limited set of works has been carried out. A reference paper is the description of the DEBORA system [6], which consists of a complete system for analysis of Renaissance. In [7] an effective technique for texture characterization in old books has been proposed, exploiting the autocorrelation matrix in order to extract the relevant directions within the texture (called directional rose). In out work, we extended this approach formulating a more elegant Figure 1. Example of illuminated manuscripts and relative ground truth. White identifies background, red identifies text and blue identifies pictures or decorations. description of the different directions within the block with the use of mixtures of Von Mises distributions. 3. Texture analysis Documents are mainly characterized by three kind of textures: a noisy background, the text and colored images or decorations (Fig. 1). The quality of these textures heavily depends on the way the image has been acquired. Moreover each document can have various contents and layout. For all these reasons, a simple approach based on color, shape or layout would not be effective enough for a large scale implementation. A flexible approach to the problem is necessary, so we chose to look at the texture structure using a well known texture feature, the autocorrelation matrix. Autocorrelation is a very powerful and discriminative feature in our context because textual textures have a pronounced orientation, that heavily differs from background, pictures or decorations. The autocorrelation function is a typical signal processing technique. Formally, it is a cross correlation of a signal with itself, and it represents a measure of similarity between two signals. Once applied to a grayscale image, it produces a central symmetry matrix, that gives an idea of how regular the texture is. The image is divided into square blocks whose size bs must be set according to the scale at which the texture should be analyzed. The definition of the autocorrelation for a block is: C k, l bs 1 min 0, l bs 1 min 0, k y max 0, l x max 0, k I x, y I x k , y l (1) where l and k are defined in bs 2, bs 2 . The result of the autocorrelation can be analyzed extracting an estimate of the relevant directions within the texture (a similar approach has been proposed in [7]). Each angle determines a direction, and the sum of all the pixels along each direction is computed to form a polar representation of the autocorrelation matrix, called direction histogram. In this way, each direction will be characterized by a weight, indicating its importance within the block. w r 0, bs 2 C r cos , r sin . (2) Since the autocorrelation matrix has a central symmetry by definition, we consider only the first half of the direction histogram, from 0° to 179°. and r are quantized: the step of is set to 1 degree (so we obtain 180 values, representing 180 possible directions), r is defined as 2 bs of the block size. A text block will be characterized by peaks around 0° and 180° because of the dominant direction is horizontal, and this behavior is different from image textures (described by a generic monomodal or multimodal distribution) and background texture (described by a nearly uniform flat distribution). 4. Texture characterization The polar distribution obtained by autocorrelation in the previous steps can be easily modeled using Von Mises distributions. Gaussian distributions are inappropriate to model periodic datasets: setting the origin in 0°, elements crossing this angle will be classified into two distinct directions, even if they express almost the same one. The choice of the origin becomes very critical, and for this reason could be a weak point of the fit. Instead, a Von Mises distribution is circularly defined so it can correctly represent angular datasets. The probability density function is defined as follows: V | ,m 1 2 I 0 m e m cos . (3) 8 0.5 8 0.5 7 0.45 7 0.45 0.4 6 0.35 5 0.3 4 0.25 3 0.2 0.15 2 0.1 0.4 6 0.35 5 4 3 0.2 0.5 7 0.45 0.4 6 0.35 0.3 5 0.3 0.25 4 0.25 0.2 3 0.15 2 8 0.1 0.15 2 0.1 1 0.05 1 0.05 0 0 0 0 0 0 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 0.05 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 1 Figure 2. Example of directional histograms and the corresponding fitting with Von Mises mixtures. The parameter m denotes how concentrate the distribution is around the mean angle . In our context, we used a slightly different formulation (we simply multiply the angles by 2) with a periodicity of instead of 2 , considering only angles in 0, representative for valuable and meaningful directions. I0 is the modified order 0 Bessel function, and is defined as: 1 I0 m 2 2 e m cos d . (4) sV | s , ms (5) where k represents a weight of the distribution within the mixture. An optimal way to get the maximum likelihood estimates of the mixture parameters is the Expectation-Maximization algorithm [8]. In the E step the expected values for the likelihood are computed, then a set of parameters to maximize such values are obtained, repeating the process until convergence or maximum number of iterations is reached. To maximize the likelihood, a set of responsibilities of the bins for each Von Mises is necessary. Let be the index of the bin. The responsibilities are computed as follows: . (6) s 1 A new set of weights for the Von Mises of the mixture can now be computed: k 0, w k 0, M kV | k , mk , k 1 kV | k , mk K 0 To catch the general multimodal behavior of input datasets, we chose a mixture of Von Mises distributions. We used mixtures with 2 components only, because they proved to be sufficient in order to recognize the two most meaningful directions (horizontal and vertical) while keeping an affordable computational cost. An example of fitting for the three types of texture analyzed is shown in Fig. 2. Generally, a mixture of K Von Mises distributions is defined as follows: K k w . (7) This formulation differs from the one in [8], and the motivation lies on the dataset we used: we do not have a general distribution of angular data to fit, but a sampling of directions and relative weights. For this reason, we consider the weight as a multiplier value for each angle, so formally we have w times the angle theta in our dataset. In the M step, we compute the new and m values for each Von Mises within the mixture. In particular, is computed by maximization of the relative likelihood as follows: w k sin 2 0, k arctan w k cos 2 0, . (8) Note the multiplication by 2 in order to relate to a periodicity. The retrieval of m by maximization is a bit more complicate, due to the presence of the Bessel functions. Given the derivative of the modified Bessel function I1 defined as: I1 m 1 2 2 e m cos cos d , (9) 0 the problem could be mathematically solved using this formulation: m 2 m 2 2m m cos 2 1 2 1 2 1 2 1 I 0 (10) I 0 m1 I 0 m2 2 been manually annotated, focusing on the three main w k cos k characteristics of these images: text, images and I m 0, background. Images have been divided using different . (11) A mk 1 k I 0 mk w k fixed window sizes, then a suitable single window size 0, has been chosen. For each window (block) over the image, the direction histogram has been computed. A The value of mk can be found by the numerical preprocessing stage is necessary to highlight the real inversion of A mk . In particular we use the shape of the distribution, besides the numerical value approximation proposed in [9]. assumed by bins. A simple normalization approach has At this point, we have 6 parameters to play with: the major drawback of exalting the shape, and thus the k , k , and mk of both Von Mises. This represents a noise in low variability data. very consistent and compact way to describe a whole In order to represent the data distribution distribution, making the retrieval faster and effective. coherently with the real shape distribution, we chose The similarity between two Von Mises distributions to change the baseline reference of values, subtracting can be defined using the Bhattacharyya distance. to each bin the minimum of the entire block. In this Given two Von Mises distributions V1 and V2, the way, a noisy block like the background will show a formulation is shown in Eq. 10. No explicit form is nearly flat distribution with a very small variance. available for mixtures, so we propose a new metric Instead a text block, with a dominant direction over that also takes into account the relative weights of the the horizontal axis, will continue to show a coherent components of the mixture. Given two mixture distribution, with peaks near 0° and 180°. 2 i i i i distributions M k 1 kV | k , mk , we test Moreover, we would like to keep the same scale in all blocks, so that the shape can be compared. For this the Bhattacharyya distance between the two reason the direction histogram bins have been components of one distribution and the two of the quantized to a fixed step. The normalized distribution other, select the best matching two (call them b, and has been used to fit a mixture of Von Mises the other two o) and then measure the distance as: distributions. WBb WBo In our experiments, we tested mixtures with 1 2 d M ,M 1 2 (12) different numbers of Von Mises distributions, and we b b o1 o2 , observed that even a very limited set is sufficient to where produce good retrieval results in terms of precision and recall, without affecting too much the 1 2 1 2 1 2 WBx M , M x x B Vx ,Vx (13) computational time. To perform a first evaluation of the characterization This metric takes into account the fact that two proposed, a confusion matrix has been computed using components can be very similar, but their contribution a 1-nearest neighbor approach. The training set was to the mixtures is quite low. analyzed, then a test set was classified: for each block within the test set, the classification of the most similar 5. Experimental results block within the training set has been chosen as classification of the block. The result are shown in Tests have been performed using 20 uncompressed Table 1. Results were quite promising: this feature has 24bit high definition pictures of biblical illustrated a good discriminative power with all three kinds of manuscripts, provided in high resolution (8373x6039) texture used. and with 400dpi. For each image, a ground truth has B V1 ,V2 1 text background image recall recall precision text 431 5 54 0.879592 text 0.931183 0.911579 background 7 446 172 0.7136 background 0.854086 0.87976 image 27 63 631 0.875173 image 0.826138 0.898477 Table 1. SVM classification using radial basis function as kernel. Table 2. Confusion Matrix relative to a 1nearest neighbor classification. more than 1800 blocks can be processed in about 100 seconds on a standard PC. The result of the feature extraction provides a very compact description of blocks. For this reason, these features can be easily exploited by a content-based image retrieval system: the computational time needed for comparison and searching will be extremely low. Finally we thank the Franco Cosimo Panini S.P.A that give us the possibility to analyze an invaluable pieces of art such as The Borso D’Este Holy Bible for this work. References Figure 3. Visualization of the proposed classification. A filtering technique has been applied to clean out the results. In order to produce a more generic classifier, we implemented the SVM classification. The best results (in terms of recall and precision) have been obtained exploiting the radial basis kernel. The results of this classification are shown in Table 2, and a visual representation of the classification is shown in Fig. 3. The text is the main focus of this feature, because of the typical horizontal orientation, and these experiments reveal a good discriminative power for this kind of texture. The background either has a very peculiar characteristics that makes this feature suitable for retrieval: a flat distribution is far more different than the monomodal distribution of the text. Instead this feature is not designed to be effective for image classification: in practice, pictures present a generic multimodal distribution, occasionally they show a particular symmetry but it is not representative of the entire class. For this reason, the good classification results in this case have to be considered as a side effect of the limited number of Von Mises within the distribution: the algorithm tends to classify every image with a two-dimensional distribution (horizontal and vertical orientations), that have proved to have a sufficiently pronounced characteristic respect of the other two. 6. Conclusions In this paper a new technique for document analysis is presented, characterizing each texture with a 6vector feature. The autocorrelation computation and the fitting of the direction histogram with the mixture of Von Mises distributions has a reasonable processing time for this application: a high resolution page with [1] Bres, S.; Eglin, W.; Gagneux, A., "Unsupervised clustering of text entities in heterogeneous grey level documents," Pattern Recognition, 2002. Proceedings. 16th International Conference on , vol.3, no., pp. 224227 vol.3, 2002 [2] A. K. Jain and S. Bhattacharjee, "Text segmentation using Gabor filters for automatic document processing," Machine Vision and Applications, vol. 5, no. 3, pp. 169--184, 1992. [3] Busch, A.; Boles, W.W.; Sridharan, S., "Texture for script identification," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.27, no.11, pp. 1720-1732, Nov. 2005 [4] Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. Proc. SPIE Electronic Imaging 5010 (2003) 197–207 [5] Diligenti, M.; Frasconi, P.; Gori, M., "Hidden tree Markov models for document image classification," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.25, no.4, pp. 519-523, April 2003 [6] F. Le Bourgeois, H. Emptoz, “DEBORA: Digital AccEss to BOoks of the RenAissance”, in International Journal on Document Analysis and Recognition, vol. 9, n. 2-4, pp. 193-221, 2007 [7] Journet, N.; Eglin, V.; Ramel, J.Y.; Mullot, R., "Dedicated texture based tools for characterisation of old books," Document Image Analysis for Libraries, 2006. DIAL '06. Second International Conference on , vol., no., pp. 10 pp.-, 27-28 April 2006 [8] A. Prati, S. Calderara, R. Cucchiara, “Using Circular Statistics for Trajectory Analysis” in Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR 2008), Anchorage, Alaska (USA), June 24-26, 2008 [9] G.W. Hill, “Evaluation and Inversion of the Ratios of Modified Bessel Functions, I 0 ( x) I1 ( x) and I1.5 ( x) I0.5 ( x) , ACM Transactions on Mathematical ” Software, vol. 7, n. 2, June 1981, pp. 199-208