Automatic Recognition of Biological Particles in Microscopic Images M. Ranzato a,∗ P.E. Taylor b J.M. House b R.C. Flagan b Y. LeCun a P. Perona b a The Courant Institute - New York University, 719 Broadway 12th fl., New York, NY 10003 USA b California Institute of Technology 136-93, Pasadena, CA 91125 USA Abstract A simple and general-purpose system to recognize biological particles is presented. It is composed of four stages. First (if necessary) promising locations in the image are detected and small regions containing interesting samples are extracted using a feature finder. Second, differential invariants of the brightness are computed at multiple scales of resolution. Third, after point-wise nonlinear mappings to a higher dimensional feature space, this information is averaged over the whole region thus producing a vector of features for each sample that is invariant with respect to rotation and translation. Fourth, each sample is classified using a classifier obtained from a mixture-of-Gaussians generative model. This system was developed to classify 12 categories of particles found in human urine; it achieves a 93.2% correct classification rate in this application. It was subsequently trained and tested on a challenging set of images of airborne pollen grains where it achieved an 83% correct classification rate for the 3 categories found during one month of observation. Pollen classification is challenging even for human experts and this performance is considered good. Key words: biological particles, feature, non-linearity, recognition, cells, pollen, mixture of Gaussians (MoG) 1 Introduction Microscopic image analysis is used in many fields of technology and physics. In a typical application the input image may have a resolution of 1024x1024 pixels, while objects of interest have a resolution of just 50x50 pixels. In all these applications, detection and classification are difficult because of poor resolution and strong variability of objects of interest, and because the background can be very noisy and highly variable. In this work, we aim to develop a general recognition system that is able to deal with the variability between the classes and also with the variability within each class. Moreover, our approach is segmentation free and allows easy addition of new classes. Sec. 3 describes the two datasets used to train and test the system. The first is a collection of image patches centered around corpuscles found in microscopic urinalysis. These particles can be classified into 12 categories. The second dataset is a collection of airborne pollen images. In this dataset a recognition system has first to detect pollen grains, then, to classify them into their correct genus. We have 27 different pollen categories to detect, only the 8 most numerous classes were then used to test the classifier. We considered this second dataset not only because automatic pollen recognition is still an unsolved problem and interesting application, but also because we wanted to assess the ability of the classifier to adapt to other classes of particles. In sec. 4, we present a detector that is able to extract points of interest by looking at the differences of Gaussian images, and then to generate patches which might contain a particle of interest. In sec. 5, we describe the key point of the system: a new set of very informative and straightforward features based on the local invariants defined by Mohr and Schmid (1997) [4]. A simple processing step then allows us to extract from the input image most of the information contained in the particle in the foreground, so that no segmentation is needed. Sec. 6 gives an overview at the mixture of Gaussians classifier we have used in our experiments. In both datasets, we have to classify images with poor resolution; in addition, we are considering many different classes of biological particles that sometimes look very similar to each other. In addition to improving and automating the microscopic analysis of urine particles and the recognition of airborne pollen grains, the system we have developed can be applied to many other kinds of biological particles found in microscopic analysis. Indeed, this system has shown very good performance on a large variety of corpuscles as reported by several experiments in sec. 7, and also by other recent experiments done on ∗ Corresponding author. Tel.: +1 212 998 3389; fax: +1 212 995 4122. Email addresses: ranzato@cs.nyu.edu (M. Ranzato), taylor@caltech.edu (P.E. Taylor), jhouse@caltech.edu (J.M. House), flagan@caltech.edu (R.C. Flagan), yann@cs.nyu.edu (Y. LeCun), perona@caltech.edu (P. Perona). 2 another set of images: a collection of small underwater animals 1 . 2 Previous Work The approach formulated by Dahmen et al. (2000) [5] for the classification of red blood cells is general and segmentation-free, as is the system we are describing in this paper. They aim to classify three different kinds of red blood cells (namely, stomatocyte, echinocyte, discocyte) in a dataset of grayscale images with resolution 128x128 pixels. Their idea of extracting features that are invariant with respect to shift, rotation and scale is powerful because it allows us to cluster cells that are found in different positions, orientations and scales. Given an image, they extract Fourier Mellin based features [7] and then model the distribution of the observed training data using Gaussian mixture densities. Using these models they are able to achieve a test error rate of about 15%. Several research works about pollen recognition were done using scanning electron microscope (SEM) and some good results have been achieved (Vezey and Skvarla, 1990 [16]; Langford et al., (1990) [9]), but SEM analysis is expensive, slow and absolutely not suitable for a real-time application. France et al. (1997) [6] describe two methods of pollen identification in images taken by optical microscopy. This work is one of the first attempts to automate the process of pollen identification using the optical microscope. Using a collection of about 900 patches containing pollen grains and 900 patches containing particles that are not pollen, they aim to identify the pollen grains. Their best approach uses the so-called “Paradise Network”. Several small templates able to identify features in the object are generated, then linked together to form a certain number of patterns. This composition is repeated in testing, and the system looks for the best match between the assembled pattern and the pollen and non-pollen patterns. The network is able to recognize nearly 79.1% of pollen as pollen, 4% of the debris is also classified as pollen. Ronneberger et al. (2000) [14] acquire 3D volume data for each pollen grain with a fluorescence microscope. They have a dataset of 350 samples of pollen taken directly from trees (i.e. samples with very little variation in appearance), and belonging to 26 categories. They extract features that are 3D invariant and use a Support Vector Machine (SVM) to classify them. Testing one image at a time they achieve an error rate of about 8%. Note that this dataset is too small to get a reliable result, that they are using “pure” samples, and that 3D 1 Edgington, D.R., Kerkez, I., Cline, D.E., Ranzato, M., Perona, P. 2006. “Detecting, tracking and classifying animals in underwater video”. IEEE International Conference on Computer Vision and Pattern Recognition, demonstration, New York 3 analysis is not suitable for a real-time application. Luo et al. (2004) [12] have developed a plankton recognition system using a dataset of images which has many similarities to ours. Their particles do not have clear contours and might be partially occluded and there are also many unidentifiable objects that must be discarded. First, they extract features by considering invariant moments, granulometric measurements and domain specific features. Secondly, they apply a feature selection procedure that reduces the number of features and happens to keep almost all the domain specific features. Finally, they classify with an SVM to obtain an overall average accuracy of 76%, but with a high error rate on the unidentifiable particles (about 42%). Uebele et al. (1995) [15] use a neural-network-based fuzzy classifier on a dataset of blood cells. After extracting 13 features from each sample, they train a three-layered neural network. Then, they used the trained weights to cluster the feature space into non-overlapping regions, which reduces the error rate by 1%. More recently, a multi-layer convolutional neural network (LeCun et al., 1998 [10]) has been trained successfully to detect faces and to recognize objects in natural and biological images (Ning et al., 2005 [13]). This technique is very attractive because it employs end-to-end learning which makes unnecessary any hand-made tuning and any feature selection activity, but it requires a very large dataset to estimate its internal parameters. 3 Datasets Our first dataset is a collection of particles found in microscopic urinalysis. Because of the technique used to acquire images in urinalysis, and the low concentration of particles in the specimen, the only challenging task in this application is classification. Thus the images in this dataset are already small image patches centered around a corpuscle. Some of these samples are shown in fig. 1. They are (columns from left to right): bacteria, white blood cell clumps, yeasts, crystals, hyaline casts, pathological casts, non squamous epithelial cells, red blood cells, sperm, squamous epithelial cells, white blood cells and mucus. All categories have 1000 gray-level images with the exception of mucus (546 samples) and pathological casts (860 samples). The size is variable, and goes from 36x36 to 250x250 pixels at 0.68µm/pix. As shown in fig. 1, there can be a marked variability in shape and texture among items belonging to the same category, but also a similarity among cells belonging to different classes. During the tuning stage we took 470 images per class to estimate the parameters of the system and validated on 30 images. The final test was performed taking 90% of the images in each class as training images and the remaining 4 ones as test images with random extraction. In each of these training sets, we included the images used in the tuning stage. We developed our classifier using the urine dataset and then tested our system on a completely different dataset of particles: a collection of airborne pollen images. An example of 4 images belonging to this dataset is shown in fig. 2. This dataset has 1429 images containing 3686 pollen grains belonging to 27 species. Images have a resolution of 1024x1280 pixels with 0.5µm/pix. As you can see in fig. 2, there can be images with a high density of particles (upper left corner), with a low density of particles (upper right corner), with big objects inside (lower left corner) and with bubbles of glue (lower right corner). These images are acquired using an optical microscope to analyze a tape collecting samples from the most standard and widely used air sampler 2 . In order to build a labeled dataset, we developed a GUI to enable human operators to manually collect a large training set. Using this software, we built a reference list which stores for each image like those in fig. 2 the position and the genus of each pollen particle identified by some expert. Given this information, we derived a dataset of patches centered on pollen particles. Fig. 3 shows some samples belonging to the most numerous pollen classes. They are (columns from left to right): alder (416 samples), ash (329 samples), elm (522 samples), mulberry (111 samples), oak (413 samples), olive (326 samples), jacaranda (282 samples) and pine (705 samples). Only these 8 most numerous categories will be considered in the classification experiments reported in sec. 7.2. On the pollen dataset we have tested the whole recognition system, detector and classifier. Detection on this dataset (see fig. 2) is very challenging because pollen grains are rare in images and the background is highly variable. A segmentation-based approach has never given good results. In addition, classification is difficult because of the natural variability of pollen grains belonging to the same category and the smooth transition in appearance across items belonging to different classes (see fig. 3). Because of the shortage of available samples in this dataset of patches, we did not consider a validation set. In the classification experiments described in sec. 7.2, tuning is done by randomly taking 10% of the images for testing. The final test is an average of 100 experiments in each of which 10% of the images are randomly extracted to build the test dataset. 4 Detection Detection is the first stage of a recognition system. Given an input image with a lot of particles, it aims to find key points which have at least a little probability of being a particle of interest. Once one of these points is detected, 2 This air sampler is called a Burkard 7-day volumetric spore trap. 5 BACT. WBC CL. YSTS CL. CRYST. HYAL NHYAL NSE RBC SPERM. SQEP. WBC. MUC. Fig. 1. Urine dataset: samples belonging to the twelve categories. The resolution is variable (from 36x36 up to 250x250) but in this figure, all the images have been rescaled to fit the grid. Fig. 2. Examples of images in the pollen dataset; this dataset has 1429 images with 3686 pollen grains belonging to 27 species. In a typical image, only two or three pollen grains can be found while all the other particles are dust. The resolution is 1024x1280 pixels. a patch is centered on it, and then it is passed to the classifier in the next stage. It is extremely important to detect all the objects of interest especially when they are rarely found in images and their number is low, as typically happens in pollen detection. Though we only need to detect corpuscles in images of the pollen dataset, these are so variable in terms of particle density and shape that a general-purpose system has to be developed. Our detector is based on a filtering approach (Lowe 1999 [11]). The input image is converted in gray level values. It is blurred and sampled at two 6 Fig. 3. Examples of pollen patches extracted by some expert from images like those of fig. 2. Each column shows examples belonging to the same category, (columns from left to right): alder, ash, elm, mulberry, oak, olive, jacaranda and pine. These samples are taken from the 8 most numerous categories and have an average resolution of 50x50 pixels; these images are rescaled to fit the grid. different frequencies, F1 and F2 . In order to avoid artifacts near the image border, we normalize the background brightness to zero 3 . We apply to both sampled images a filtering stage, and for both images we compute a difference of Gaussians (Lowe 1999 [11]). A set of interest points is extracted looking at the extrema of the two previously found images. In this way, we focus on particles that are approximately round and within a certain size. Next, we remove points that are too close to the image border, and for the remaining ones we create a proper bounding box (see fig. 4). In order to do this, we estimate 8 points on the edge of the particle near the interest point; then, we take that circle passing through 3 of these points and minimizing a fitting error. This is the sum of the distances of the 3 closest points (among the remaining 5) to the circumference. This circle yields a square bounding box whose side and centroid are the diameter and center of the circumference, respectively. The fitting error in this circumference estimation is then used to choose which patch to discard in the case of two partially overlapping bounding boxes. After removing those patches that are too small or that overlap too much with other patches, and those patches whose average brightness is too high (because of bubbles of glue, see bottom right picture in fig. 2), we have 3 The estimate of the background brightness is given by the peak of the histogram of all the image pixel values. 7 a set of patches that might contain a particle of interest. Fig. 4. During detection (1) an interest point is found inside each particle (magenta dot), (2) 8 points are estimated on the edge of the particle (red and green dots), (3) a circle passing through 3 of these points (red dots) and minimizing a fitting error is computed, and (4) a bounding box is determined by taking diameter and center of the circumference as side and centroid of the box. 5 Feature extraction In order to recognize an object in an image, it must be represented by some kind of features that should express the characteristics and the information contained in the image. These features should be invariant with respect to shift and rotation of the particle, and should be robust against small changes in scale as well. In this way, we are able to cluster images belonging to a certain class because we acquire the same features independent of the position and orientation of the cell. We derived our features from the “local jets” originally studied by Koenderink and Van Doorn (1987) [8], and the invariants as defined by Mohr and Schmid (1997) [4]. Let I(x, y) be an image with spatial coordinates x and y. Given I(x, y) we can compute a set of local invariant descriptors with respect to shift and rotation by applying kernels Kj as described by Koenderink and Van Doorn (1987) [8]. For example, the first invariant is the local average luminance, the second one is the square of the gradient magnitude, and the fourth one is the Laplacian. In this work, the set of invariant features we consider is limited to the first 9, which are combinations up to third order derivatives of the image. Moreover, these 9 invariants are computed at different scales. At each stage a blurred image with a Gaussian kernel Gσi is considered; the width of the kernel is σi2 ∈ {0, 1/2, 1, 2}. Each pixel in the image has an associated vector with 9 × 4 = 36 components that are given by Fi,j (x, y) = I ∗ Gσi ∗ Kj (x, y), with i ∈ {0, 1, 2, 3}, j ∈ {0 · · · 8} 8 (1) Let mbg be the mean of Fi,j (x, y) over all the (x, y) that fall in the 10 pixel-wide frame around the image-patch border (i.e. we are estimating the background brightness of component i, j). We define the non-linear functions fk as follows: f1 (z) = f2 (z) = 0 z − mbg z − mbg 0 if z < mbg positive part if z ≥ mbg if z < mbg negative part (2) if z ≥ mbg f3 (z) = |z − mbg | + mbg absolute value Thus, we define a new pixel feature as: Hi,j,k (x, y) = fk (I ∗ Gσi ∗ Kj (x, y)) (3) with i ∈ {0, 1, 2, 3}, j ∈ {0 · · · 8} and k ∈ {1, 2, 3}. Each pixel has a vector with 9 × 3 × 4 = 108 components. Finally, for each (i, j, k) in Hi,j,k we thus average all the corresponding nonzero pixel values in the image, and we construct a single image feature vector with 108 components. This feature vector is invariant to shift and rotation of the cell, is scale-invariant because of the average operation, and is robust with respect to changes in the size of the patch centered around the particle because we exclude from the mean those values that are at zero. Most importantly, it is computed without any segmentation of the cell. To summarize, the detection algorithm described in sec. 4 extracts a square bounding box around a particle, and then an invariant feature vector is computed without segmenting the particle from the background of the patch. We apply the non-linearity in eq. (2) (represented on the right panel of fig. 5) because we have found experimentally that this produces a dramatic decrease in the error rate when compared with a system using just an average of the features defined in eq. (1); the error rate is almost halved. Other non-linearities have been tested (see fig. 5), but the non-linearity we have introduced in eq. (2) achieves the best performance while keeping the system very simple. 6 Classification Given a patch with one centered object, a feature vector is extracted from the image. In training, the system learns how to distinguish among features of images belonging to different categories. In the test phase, a decision is made 9 Fig. 5. Non-linearities tested: non-linearity I performs a linear and quadratic mapping resetting to zero those values around the background mean; non-linearity II aims to separate the low, medium and high ranges of values; non-linearity III is a simplification of non-linearity II. The second and the third non-linearities give the best results; we have chosen the last one. according to the test image feature. If the object in the image is considered a particle of interest, then its class is also identified; otherwise, it is discarded because the feature does not match any training model. We use a Bayesian classifier that models the training data as a mixture of Gaussians (MoG) (Bishop 2000 [3]). If the number of categories is K, then we have to estimate K MoGs. At the end of training, we get for each category the class-conditional distribution f (x | Cj ), for j = 1...K. The estimation is done by applying the expectation maximization (EM) algorithm (Bishop 2000 [3]). An important property of mixture models is that they can approximate any continuous probability density function to arbitrary accuracy with a proper choice of number of components and parameters of the model. Given a test image, we compute its feature vector x. Then, we apply Bayes’ rule: we want to pick the class that maximizes the probability P [Ck | x] of the class Ck given the test data x. Applying the extension of Bayes’ rule to continuous conditions and assuming equal distribution for the class probabilities, we need only to find the class that maximizes f (x | Ck ), where f is the probability density function learned during the training stage. Note that we need to decrease the feature space dimensionality before estimating the MoGs, since the number of parameters in the model grows very quickly with the dimension of the feature space and we have available only a limited number of training samples. Particularly, given the whole set of training-feature vectors we compute Fisher’s Linear Discriminants (Bishop 2000 [3]). We project both the training and the test data along these directions in order to decrease the dimensionality of the feature space from 108 to a certain D. This parameter is found by a cross-validation procedure during the training phase. Note that we are presenting the simplest classifier that is able to give a good 10 performance. Other classifiers have been tried. Slightly better performance can be attained using this classifier as basic weak learner in the AdaBoost algorithm and by classifying with SVM. However, the improvement amounts to only a fraction of percent in the error rate, while the simple Bayesian classifier allows easy addition of new categories by estimating a MoG for the new class independently from the others. Thus, we will refer only to the mixture of Gaussians classifier in the next section. 7 Experiments 7.1 Detection Detection is performed only on images of the pollen dataset (see fig. 2). We evaluate the performance by measuring how well our automatic detector agrees with the set of reference labels. A detection occurs if the algorithm indicates the presence of an object at a location where a pollen exists according to the reference list. Similarly, a false alarm occurs if the algorithm indicates the presence of an object at a location where no pollen exists according to the reference list. We say that there is a match between the detector box and the expert’s one, if q (xc1 − xc2 )2 + (yc1 − yc2)2 ≤ α s A1 + A2 2 (4) where xci , yci are the coordinates of the box centroid, Ai is the area of the box and α√is a parameter to be chosen. In our experiments, we have chosen α equal to 1/ 2. The overall performance can be evaluated computing the recall and the precision. The former is the percentage of detected pollen grains when compared to the pollen particles labeled by the experts. The latter is the ratio of detected pollen grains over the total number of detected particles. After tuning the system in order to find good values for the sub-sampling √ √ frequencies (8 and 8 2) and for the variance of the Gaussian kernel ( 2), we ran the detector on all the images of the dataset. We achieved a recall equal to 93.9% and a precision equal to 8.6%, that is, we are able to find a pollen with a probability of 0.93 and for each pollen grain another 10 particles that are not pollen are detected. This is a good performance because we are dealing with a highly variable background and with many other particles in each image. Even an expert has to focus his attention on about 10 points before finding a pollen grain in an image. Such a high number of false alarms is not a problem as long as we are able to make a good classification of these particles (see next section). 11 Since we aim to assess the performance of the whole pollen recognition system, that is, detection followed by classification, we have built a dataset with these automatically detected patches in order to train and test the classifier. For this reason, we have gone through this dataset to remove those patches having an incorrect bounding box. It is possible to have a good overlap with the expert’s patch even if the particle is not properly centered in the region of interest because of dust or other particles around it. Fig. 6 shows examples of “good” and “bad” patches belonging to the alder category. If we consider only the “good” patches, the overall percentage of detection decreases to 83%, but then we are able to perform a meaningful classification on those extracted images. Note that this technique is quite fast: the analysis of one image of size 1024x1280 pixels takes 6 seconds on a 2.5GHz Pentium processor computer (code in Matlab). GOOD PATCHES BAD PATCHES Fig. 6. Examples of 90 patches of alder pollen extracted by our detector. The first 7 rows show examples of good bounding boxes. The last 2 rows show samples that were removed from the set because they were incorrectly centered in their bounding box. Problems usually arise when there is dust attached to the pollen or when pollen grains are grouped in a single cluster. 93% of pollen grains are detected, and 83% of pollen particles are detected and correctly assigned to a region of interest. Note that images have been rescaled to a common resolution in order to fit the grid. 7.2 Classification The classifier needs to be tuned with respect to the number of feature-space dimensions, the number of components in the mixture and the structure of the covariance matrix of the Gaussians in the mixture. Thus, we ran many experiments varying one parameter at a time, and kept that combination of values that gave the lowest error rate. In order to estimate parameters of the 12 MoGs by EM algorithm, we used some functions of the NETLAB package for MATLAB [1]. Since the EM algorithm needs a random initialization, we also ran many experiments varying only the state of the random number generator in order to find the best values. We first considered the urine dataset and took 500 images per class for the tuning stage. The best results were achieved using 4 full covariance Gaussians in a 12 dimensional feature space in each mixture. The test was performed taking 90% of the images in each class as training images and the remaining ones as test images with random extraction. The average results of 10 experiments is shown in fig. 7 in the form of a confusion matrix; the error rate on these 12 categories is equal to 6.8%. In order to assess whether the system is able to classify particles with a different statistics, we applied the classifier to the dataset of pollen grain patches. Let us consider first the dataset of pollen patches extracted by experts (see fig. 3). Tuning is done by randomly taking 10% of the images for testing. The result of this stage is shown in tab. 1. To assess the classification performance, in fig. 8 we show the average confusion matrix of 100 experiments; the error rate is 21.8%. Each time we have randomly extracted from each class 10% of the images for test purposes. When the same experiment was done using the patches containing pollen grains that were found by the detection system we achieved an error rate of 23.2%. In order to assess the overall performance of the pollen recognition system, we will need to combine the detector and the classifier. Since we train the classifier separately from the detector, however, we feed the classifier not only with the 8 most numerous pollen categories, but also with the same 10:1 ratio of false alarms to genuine particles that the detector will provide. We have found experimentally that it is better to model the false alarm class and, given its huge number of available samples, we can use a more complex model for it. Moreover, we put a threshold on the estimated probability of the class given the input data. If there is low confidence, we do not make a decision and instead assign the sample to an “unknown” category, UNK in the graphical representation of the confusion matrices shown in fig. 9 and fig. 10. Of course, this unknown class can be merged with the false alarm class; for this reason, we assume the system is correct when it assigns a false alarm patch either to the false alarm category or to the unknown class (in the following experiments the error rate is computed differently in order to deal with these considerations). The tuning stage proceeds as in the previous experiments and it is summarized in tab. 1. The average performance on 100 experiments is shown in fig. 9. Note that in any practical application of airborne pollen recognition, it is useful to know what types occur together at various times of the year in the 13 geographic area of interest. For example, since we know that only ash, alder and pine pollen grains are commonly found in January in this area [2], we can restrict the pollen categories to these three. Taking 10 false alarm patches for each of pollen grain, and proceeding with an analogous tuning (see tab. 1), we have achieved an error rate of about 17% as shown in fig. 10. Note that 20ms are required to analyze a particle on a 2.5GHz Pentium processor computer (code in Matlab). We conclude this section reporting classification results on the urine dataset using some methods referenced in sec. 2. The same mixture of Gaussians classifier has been trained on a different set of features: the Fourier-Mellin based features [7] used by Dahmen et al. (2000) [5]. These features are the amplitude spectrum of the image Fourier transform computed in a log-polar coordinate system. With these features we have achieved an error rate of about 21% for the classification of the urine particles into the 12 categories. Finally, we report a result on the same dataset using a different classifier. We trained different architectures of a convolutional neural network on raw images [10]. We reduced the images to a common resolution of 52x52 pixels by bilinear interpolation or sampling since the original resolution is variable. The network giving the best performance has in the first convolutional layer 6 feature maps. This is followed by a sub-sampling layer with a vertical and horizontal stride equal to 2. The next convolutional layer outputs 16 feature maps which are sub-sampled by a factor of 4 in the next sampling layer. The last convolutional layer has 120 feature maps which will be reduced to 12 (as many as the number of classes) by a 1 layer neural network. With this system we have achieved an error rate of 16%. The network seems to overfit the data and needs more training samples overall for the classes characterized by a very low contrast; few thousands of samples seem to be insufficient to properly estimate the more than 50,000 internal parameters. 14 Fig. 7. Classification of urine particles: averaged confusion matrix for experiments in which 90% of images in the full data set are used for training and 10% are randomly extracted for testing. The proportional error rate (ERp in parenthesis) is the average of the error rates of each single class; the main error rate shown is the ratio between the total number of misclassified images and the total number of tested images. We refer also to the proportional error rate because we run experiments using a different number of test samples for each class, namely, classes NHL and MUC have fewer test images. In the intersection of row i with column j, we have the percentage of items belonging to class i that have been assigned to class j. On the diagonal, we show the percentage of correct classification of each class. 8 Summary and Conclusions Microscopic analysis of biological particles aims to find and to classify particles. These objects often have variable shape and texture, and very low contrast and resolution. Moreover, the background can be highly variable as well. Recognition on these images is very challenging. In this work, we have defined a new kind of feature based on “local jets” (Mohr and Schmid 1997 [4]). These are able to extract information from a patch centered on the object of interest without any segmentation. Classification performed with a simple mixture of Gaussians classifier has achieved a very low error rate in considering the 12 classes of cells that are found in urinalysis. Even though these images are characterized by very low contrast and poor resolution, we reached an error rate equal to 6.8%. The same classifier is able to achieve an error rate of 22% on 8 species of airborne pollen grains. This classifier outperforms the previous systems dealing with analogous pollen images and makes feasible the development of a real-time instrument 15 Fig. 8. Classification of 8 pollen categories (expert’s patches): average test confusion matrix of 100 experiments. Every time 10% of the images are randomly extracted for testing. The proportional error rate (average of the error rates of each single class) is first shown; the error rate in parentheses is the ratio between the total number of misclassified images and the total number of tested images. Since we are testing a different number of samples for each class, we mainly refer to the proportional error rate. TRUE CLASS av. test ER = 35.1% σ = 3.1 (abs. ER = 15.5%) f.a. 6.7 1.4 1.4 1.1 0.5 1.8 1.7 1.3 4.4 79.7 pine 8.7 0.7 1.0 2.8 0.7 0.2 2.2 2.0 74.8 6.9 olive 3.6 0.6 87.3 0.9 6.5 2.5 2.2 6.0 0.2 oak 19.4 mulber. 13.7 jacaran. 20.2 2.8 8.4 elm 13.1 2.4 ash 18.4 8.9 alder UNK 1.1 6.6 0.9 1.3 13.4 3.1 44.4 5.7 66.6 2.9 5.6 41.3 4.5 10.2 6.9 60.7 5.7 1.6 3.1 2.5 46.0 8.1 6.4 2.6 9.8 76.5 1.4 3.3 3.0 1.3 1.3 7.1 alder ash elm jacaran. mulber. oak 4.0 0.5 2.6 3.9 4.8 1.7 0.5 3.4 2.3 0.4 0.6 3.3 olive pine f.a. ESTIMATED CLASS Fig. 9. Classification of 8 pollen species and false alarm class: average test confusion matrix of 100 experiments in which the detector patches were used. In these experiments, we have taken for each pollen particle 10 false alarms and we have randomly extracted 10% of the images for testing. In the plot f.a. collects the patches assigned to the false alarm class, UNK the patches having classification confidence below threshold. A false alarm patch is correctly classified when it is assigned either to f.a. or to UNK. 16 av. test ER = 17.4% σ = 2.6 (abs. ER = 8.2%) TRUE CLASS f.a. pine ash alder 14.8 1.4 1.4 4.2 78.2 14.5 0.7 0.7 78.0 6.1 14.7 4.1 78.0 1.2 2.0 9.9 81.4 3.7 0.7 4.3 UNK alder ash pine f.a. ESTIMATED CLASS Fig. 10. Classification of those pollen grains that can be found in January: average test confusion matrix of 100 experiments in which the detector patches were used. Even if for each pollen grain we also detect 10 false alarms, most of these particles are subsequently rejected by the classifier: 14.8+78.2=93% of these objects are correctly classified. for counting pollen. In this work, we did not aim to address a specific application and we have proved by many experiments on different datasets that this system is actually able to achieve very good performance on many different kinds of “small” object categories found in microscopic images. We believe that our features could be improved by considering color information and by applying more complex non-linearities. We have also presented a detection system based on a filtering approach. This gives very good results on a pollen dataset because it is able to detect a pollen grain with a probability of 9/10. For each pollen grain another 10 particles that are not pollen are detected, though most of these extraneous corpuscles are subsequently rejected by the classifier. Moreover, even an expert has to focus his attention on nearly 10 points in each image to establish the presence of one grain of pollen. This result is reliable because we have used 1429 images with 3686 pollen grains. Improvements could be obtained by introducing other kinds of tests to decrease the number of false alarms, by considering a more sophisticated estimation of regions of interest, and by taking into account color information as well. Acknowledgements The authors are very grateful to C. Poultney and H.M. Kim for their precious comments. Research described in this article was supported by Philip Morris USA Inc., Philip Morris International and the Southern California Environmental Health 17 parameters value 8 pollen classes: expert’s patches fig. 8 dimension D 14 number components in MoG 2 covariance in MoG diagonal 8 pollen classes and false alarms detector patches fig. 9 dimension D 14 number components in MoG (pollen) 2 covariance in MoG (pollen) spherical number components in MoG (f.a.) 4 covariance in MoG (f.a.) full confidence threshold 0.5 3 pollen classes and false alarms (January) detector patches fig.10 dimension D 8 number components in MoG (pollen) 1 covariance in MoG (pollen) full number components in MoG (f.a.) 10 covariance in MoG (f.a.) full confidence threshold 0.7 Table 1 Pollen classification: parameters found during the tuning stage. Sciences Center (NIEHS grant number 5P30 ES07048). P. Taylor gratefully acknowledges support from a Boswell Fellowship, which is a joint position intended to foster cooperation between the California Institute of Technology and the Huntington Medical Research Institute. References [1] http://www.ncrg.aston.ac.uk/netlab/down.php [2] http://www.che.caltech.edu/groups/rcf/DataFrameset.html 18 [3] Bishop, C.M., 2000. Neural Networks for Pattern Recognition, Oxford University Press. [4] Mohr, R., Schmid, C., 1997. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (5), 530-535. [5] Dahmen, J., Hektor, J., Perrey, R., Ney, H., 2000. Automatic classification of red blood cells using Gaussian mixture densities. Proc. Bildverarbeitung für die Medizin 331-335. [6] France, I., Duller, A.W.G., Lamb, H.F., Duller, G.A.T., 1997. A comparative study of approaches to automatic pollen identification. Proceedings of British Machine Vision Conference. [7] Grace, A.E., Spann, M., A comparison between Fourier-Mellin descriptors and moment based features for invariant object recognition using neural networks. 1991 Pattern Recognition Letters, 12:635-643 [8] Koenderink, J.J., van Doorn, J., 1987. Representation of local geometry in the visual system. Biological Cybernetics 55, 367-375. [9] Langford, M., Taylor, G.E., Flenley, J.R., 1990. Computerised identification of pollen grains by texture. Review of Paleibotany and Palynology 1990. [10] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), 2278-2324. [11] Lowe, D.G., 1999. Object recognition from local scale-invariant features. Proceedings of the International Conference on Computer Vision. [12] Luo, T., Kramer, K., Goldgof, D.B., Hall, L.O., Samson, S., Remsen, A., Hopkins, T., 2004. Recognizing plankton images from the shadow image particle profiling evaluation recorder. IEEE Transactions on System, Man and Cybernetics 34 (4) 1753-1762. [13] Ning, F., Delhomme, D., LeCun, Y., Piano, F., Bottou, L., Barbano, P., 2005. Toward automatic phenotyping of developing embryos from videos. IEEE Transactions on Image Processing 14 (9), 1360-1371. [14] Ronneberger, O., Heimann, U., Schultz, E., Dietze, V., Burkhardt, H., Gehrig, R., 2000. Automated pollen recognition using gray scale invariants on 3D volume image data. Second European Symposium on Aerobiology. [15] Uebele, V., Abe, S., Lan, M.S., 1995. A neural-network-based fuzzy classifier. IEEE Transactions on System, Man and Cybernetics 25 (2), 353-361. [16] Vezey, E.L., Skvarla, J.J., 1990. Computerised feature analysis of exine sculpture patterns. Review of Paleibotany and Palynology. 19