Evolutionary Image Analysis and Signal Processing Stefano Cagnoni HEURISTIC PARAMETER OPTIMIZATION AND REAL-WORLD PROBLEMS The function to be optimized: • describes the solution to a practical problem parametrically • is typically optimized based on performance achieved on a set of examples which are representative of the problem at hand Problem Parametric solution S(p1…pn) Cost function f(p1…pn) Examples Optimization method Optimized Solution A SHIFTED VIEW UPON DESIGN From: • defining exact solutions, justified by an underlying theory To: • searching solutions which work well, by: • defining a quality criterion to measure the effectiveness (cost) of possible solutions • choosing a method that maximizes (minimizes) it. A SHIFTED VIEW UPON DESIGN From: • tuning the parameters of a specific solution to a problem using knowledge about the problem To: • tuning the parameters of a general optimization method using knowledge about the method WHEN ? • No direct solution is available • Problem specifications can be provided only qualitatively or through examples • Behaviors or phenomena can be described or measured with little precision (e.g., noisy signals) • Little a priori knowledge (none ?) on the problem • Integration of modules to which any of the previous conditions applies COMPUTER VISION The “art” of making computers see (and understand what they see) Sub-topics (the ‘vision chain’) Image Acquisition Image Enhancement Segmentation 3D-Information Recovery Image Understanding COMPUTER AND HUMAN VISION HUMAN COMPUTER Perception Image Acquisition Selective information extraction Feature Enhancement Grouping by ‘similarity’ Segmentation Extraction of spatial relationships 3D-Information Recovery Object recognition and semantic interpretation Image Understanding (signal/image processing) COMPUTER AND HUMAN VISION HUMAN COMPUTER Perception Image Acquisition Selective information extraction Feature Enhancement Grouping by ‘similarity’ Segmentation Extraction of spatial relationships 3D-Information Recovery Object recognition and semantic interpretation Image Understanding (signal/image processing) LOW-LEVEL VISION COMPUTER AND HUMAN VISION HUMAN HIGH-LEVEL Perception COMPUTER VISION Image Acquisition Selective information extraction Feature Enhancement Grouping by ‘similarity’ Segmentation Extraction of spatial relationships 3D-Information Recovery Object recognition and semantic interpretation Image Understanding (signal/image processing) APPLICATION FRAMEWORKS • Optimization of parameters of specific objective functions (e.g., using GA, ES,…): – related with a well-defined task. – for a whole system. • Generation of solutions from scratch (e.g., using GP) Optimization of performances based on: – specific objective functions – interactive qualitative comparisons between solutions • Generation of emergent collective solutions (e.g., using PSO or other Swarm Intelligence techniques) Achievement of higher-level and complex tasks from collective use of trivial, local, hard-wired behaviors APPLICATIONS TO SIGNAL/IMAGE PROCESSING AND PATTERN RECOGNITION • Optimization of filter/detector AND algorithm parameters for event detection and image segmentation • Qualitative optimization of image processing algorithms • Design of implicitly parallel binary image operators and classifiers • Object detection and segmentation APPLICATIONS • GA-based design of a QRS detector for ECG signals. • Optimization of a 3D segmentation algorithm for tomographic images based on an elastic contour model. • GP-based design of lookup tables for color processing of MR images. • SmcGP-based low-level image processing and lowresolution character recognition. • Object detection and segmentation using PSO SIGNAL PROCESSING • Signal Enhancement (filtering) and Event Detection (thresholding) ECG signal Enhanced ECG QRS complex EVOLUTIONARY DESIGN OF QRS DETECTORS Given: – Filter/detector layout – Training set – Fitness function Optimize (using a GA): – Filter coefficients – Detector threshold – Other parameters regulating the adaptive behavior of the detector FILTER LAYOUT • Linear: • Linear with selected samples: • Quadratic with selected samples: OTHER PARAMETERS EXPERIMENTAL SETUP TRAINING SET 10 10-second tracts of the ECG from each of the 48 30-minute records of the MIT-BIH Arrhythmia Database (5981 beats out of about 110,000). FITNESS FUNCTION f = fmax - (FP2 + FN2) , fmax such that f>0 FP = False Positives, FN = False Negatives RESULTS • 99.5% average sensitivity (100% on most “normal” recordings) • Much faster detection with respect to published algorithms yielding comparable results APPLICATIONS • GA-based design of a QRS detector for ECG signals. • Optimization of a 3D segmentation algorithm for tomographic images based on an elastic contour model. • GP-based design of lookup tables for color processing of MR images. • SmcGP-based low-level image processing and lowresolution character recognition. • Object detection and segmentation using PSO RATIONALE Pre-defined parameter sets hardly work in the presence of high variability, as in biology/anatomy • GA optimization of a specific structure segmentation algorithm • interactive specification of a few training contours, followed by • extraction of the contours of the structure of interest from the whole data set CONTOUR EXTRACTION The problem can be reformulated as ‘multiple 1D-edge detection and tracking’. An extension of the 1D detector SEGMENTATION • Definition of a starting contour • Iterate: – Application of the GA-designed filter to the next contour (extraction of matching edge points) – Elastic contour model-based interpolation (also optimized by the GA) of the edge points extracted by the filter TRAINING SET One slice following the one which is used to seed the iterative segmentation process FITNESS FUNCTION dyk = distance, along scan line Lh, between the actual edge point and the one detected, K = constant RESULTS APPLICATIONS • GA-based design of a QRS detector for ECG signals. • Optimization of a 3D segmentation algorithm for tomographic images based on an elastic contour model. • GP-based design of lookup tables for color processing of MR images. • SmcGP-based low-level image processing and lowresolution character recognition. • Object detection and segmentation using PSO INTERACTIVE EVOLUTION OF LOOKUP TABLES Aim: given two grey-scale images I1(x,y) and I2(x,y), produce a color image evolved using GP IC(x,y) = F (I1(x,y), I2(x,y) ) with ‘interesting’ features from the point of view of a specific application Fitness is implicitly defined by the user who acts as the referee of a tournament (of size 2) used in the selection phase Population size: 15 N. of generations : 10-15 Crossover rate: 0.7 No Mutation Results of 4 different experiments APPLICATIONS • GA-based design of a QRS detector for ECG signals. • Optimization of a 3D segmentation algorithm for tomographic images based on an elastic contour model. • GP-based design of lookup tables for color processing of MR images. • SmcGP-based low-level image processing and lowresolution character recognition. • Object detection and segmentation using PSO SUB-MACHINE CODE GENETIC PROGRAMMING (Poli) Unsigned long variables are used (32/64 bit words depending on compiler or computing architecture) to encode binary arrays of inputs The bit string may encode consecutive samples of a temporal sequence, a row or a window within an image, etc. Function set: bitwise logical operators + circular shifts A whole block of data is affected by a single bit-wise Boolean operation (SIMD paradigm) SUB-MACHINE CODE GENETIC PROGRAMMING Advantages • operating in parallel on multiple data makes fitness evaluation more efficient • fitness can be evaluated on multiple fitness cases at the same time with a single operation sequence Limitations • Impossible to apply different operations to data from the same block: the long int variable is an array of independent data which undergo the same operation, hence the need for circular shift operators EXPERIMENTAL TEST-BED APACHE PLATE-RECOGNITION SYSTEM (Adorni-Cagnoni, 2000) • Main task: car license-plate recognition • Data: 130 images of running cars • Sub-tasks: plate extraction and character recognition PRE-PROCESSING Input image Grey-level image Binarized H.gradient image The horizontal-gradient image is thresholded to obtain a binary image showing only the strongest edges. APACHE: PLATE EXTRACTION The plate region is the one where the density of vertical edges (peaks of the horizontal gradient) is highest CHARACTER RECOGNITION Recognition of digits represented by binary twodimensional patterns: 10 specialized binary classifiers have the pattern as input and produce as output: 1 if the patterns belongs to the class corresponding to the classifier 0 if the pattern belongs to another class CLASSIFICATION BY INDEPENDENT CLASSIFIERS Advantages • Each classifier is specialized and yields high performances • Classifiers are very ‘compact’: they don’t need to consider features belonging to several classes Disadvantages Possible ambiguous classifications: - The output of all classifiers is 0 - The output of more than one classifier is 1 A disambiguation mechanism is needed INPUT ENCODING (TERMINAL SET) Input Pattern: binary digits of size 13x8 104 bits may be represented using 4 32-bit words (e.g., 4 unsigned long variables in C). 72 bits of the pattern are packed into the 24 least significant bits of the first 3 unsigned long variables The remaining 32 are packed into the fourth one FUNCTION SET + ERCs Binary bitwise operators: AND OR NOT XOR Circular shift operators: SHR, SHR2, SHR4, SHL, SHL2, SHL4 Ephemeral Random Constants (ERC): 32-bit unsigned long FITNESS FUNCTION F = number of correct classifications (TP+TN) (TP = True Positive, TN = True Negatives) NB If training data are uniformly distributed, then the negative case shown to each classifier are 9 times as many as the positive ones => F strongly favors specificity EVOLUTION PARAMETERS Population : 1000 Survival rate : 17 % Crossover rate: 80% Mutation rate : 3% Tournament selection with tournament size = 7 300 to 2000 iterations (up to 3 hours on a 3 GHz PC) TEST SET Database of plate digits collected at toll booths of Italian highways about 11000 digits of size 13x8 from real plates binarized with threshold=0.5 (0=black; 1 = white) 6024 in the training set 5010 in the test set (exactly 501 per class) CLASSIFIERS Set of 10 binary classifiers (one for each class) For each classifier: Input : unsigned long pattern[4] Output : unsigned long out Each classifier actually produces 32 binary outputs (32 distinct fitness cases): the output bit which yields the highest fitness on the training set is taken as the actual output of the classifier CLASSIFICATION RULE The same pattern is input into all classifiers. 10 outputs are produced of which one (hopefully) is 1 and the others are 0. If the output is ambiguous an external tie-breaker is applied (the LVQ neural net classifier used in the APACHE system). For all-0 outputs this is the only possible remedy. When more than one output are active a second classifier set could be trained to act as tie-breakers. FITNESS FUNCTION F = TP+TN • High specificity => high positive predictivity • All-zero classifications are less than 5% • Ambiguous cases less than 1% 95.3% of the digits is directly classified by the basic classifier set (accuracy: training 99.98%, testing 98.7%). The remaining 5% is classified by the LVQ classifier. RESULTS Training set 99.65% correct classifications (99.98% over the 99% which could be directly classified) Test Set 97.43% correct classifications (98.7% over the 95.3% which could be directly classified) Performance close to the reference LVQ classifier Average execution time: 0.1 ms per classifier (1 ms per pattern) on a 3-GHz Pentium IV computer after compiling the programs. 10-fold reduction of computation time EC-BASED IMPLEMENTATIONS • GA-based design of a QRS detector for ECG signals. • Optimization of a 3D segmentation algorithm for tomographic images based on an elastic contour model. • GP-based design of lookup tables for color processing of MR images. • SmcGP-based low-level image processing and lowresolution character recognition. • Object detection and segmentation using PSO SmcGP-BASED PLATE DETECTION • Input data: 4 unsigned long words encoding a window, of size 32x4 pixels, from the binarized gradient image • Desired output: 1 if the window belongs to the plate 0 otherwise TRAINING SET • 80/130 images • Input data: binarized horizontal gradient image • 366 negative + 4824 positive = 5190 training samples • Same settings as those used for digit recognition Empty samples (that can be found both inside and outside the plate) have been purged. TYPICAL RESULTS The same algorithm that APACHE directly applies to the gradient image can be applied to this image to improve plate localization COMPARISON WITH APACHE • Increased plate-detection rate RESULTS • Improved accuracy with a limited increase of computation time • The computation efficiency of SmcGP classifiers limits the effects of the overhead added by the GP-evolved stage EC-BASED IMPLEMENTATIONS • GA-based design of a QRS detector for ECG signals. • Optimization of a 3D segmentation algorithm for tomographic images based on an elastic contour model. • GP-based design of lookup tables for color processing of MR images. • SmcGP-based low-level image processing and lowresolution character recognition. • Object detection and segmentation using PSO Particle Swarm Optimization Optimization technique inspired by the collective behavior of flocks (swarms) of birds in search of food • Each particle moves ‘randomly’ in the parameter space under the action of an attractive force field generated by: • the best point in the parameter space ever discovered by the particle itself • the best point in the parameter space ever discovered by the whole swarm Particle Swarm Optimization • Basic PSO equations XP(t) = XP(t-1) + vP(t) vP(t) = w * vP(t-1) + c1 * rand() * [BestXP – XP(t-1)] + c2 * rand() * [gBestX – XP(t-1)] c1, c2 w XP BestXP gBestX rand() positive constants inertia weight position of particle P best location reached by particle P up to time t-1 best location ever found by the whole swarm random number between 0 and 1 The swarm converges towards the fittest point. PSO and Image Analysis • Multi-objective optimization: the goal is to identify (segment out) one or MORE regions of interest (ROIs) in the search space (the image), not a single optimum. • k-mean clustering PSO (Passaro and Starita 2006): the swarm reorganizes itself in multiple sub-swarms: each sub-swarm may then ‘address’ a different target. PSO-based License Plate Detection • Plates feature high density of pixels with high horizontal gradient values due to the contrast between letters (black) and plate background (white) • The horizontal gradient can be easily and efficiently approximated using differences PSO-based License Plate Detection Two Search stages : • Global search: the most ‘promising’ areas within the whole image are grossly detected • Local search: attention is focused on the ROIs defined in the previous stage to detect good plate candidates PSO-based License Plate Detection • A repulsion term between particles is added to let the swarm spread all over the target: vP*(t) = vP(t) + repulsionp • Repulsion between two particles is computed, separately along each axis, as repulsion(i, j) = REPULSION_RANGE - |Xi-Xj| REPULSION_RANGE being the maximum distance within which interaction between particles occurs. • The global repulsion term for P is the average of all repulsion terms with particles interacting with it: repulsionp = j repulsion(p,j) / n PSO-based License Plate Detection • The goal is to find regions featuring high density of high-gradient pixels • To prevent swarms from converging towards ISOLATED pixels, fitness has two terms: • punctual fitness, depending on visual features • local fitness, proportional to the number of neighbors PSO-based License Plate Detection • The swarm explores a gray-scale image fitness(x,y) = punctual fitness(x,y) + local fitness(x,y) if ( |r(x,y)-g(x,y)|< K0 && |r(x,y)-b(x,y)|<K0 && |g(x,y)-b(x,y)| <K0 ) punctual_fitness = max(right_gradient,left_gradient); else punctual_fitness = 0; where right_gradient = |grayscale(x,y)-grayscale(x+1,y)|; left_gradient = |grayscale(x,y)-grayscale(x-1,y)|; local fitness = K1 * neighbor_number PSO-based License Plate Detection • The standard equation has been further modified, to increase stability of sub-swarms. If a particle with high punctual fitness lies within a region with high density of particles, then it has a probability, which is linearly dependent on such a density, of staying there: Prob { Xt-1=Xt } = number of particles in the neighborhood / total number of particles in the swarm PSO-based License Plate Detection GLOBAL SEARCH RESULTS PSO-based License Plate Detection Two ‘promising’ areas are detected (sub-swarms with Nparticles > T): the larger one will be explored first PSO-based License Plate Detection The whole swarm is re-initialized near the selected area and a local search is performed, using the previous algorithm. PSO-based License Plate Detection A bounding box is computed enclosing all particles with high punctual fitness. If this box has a w:h ratio = 5:1, we can assert we found the plate. PSO-based License Plate Detection Otherwise the box is expanded, letting some agent move up and down (or left and right), in order to reach the given ratio. On failure, the next region is explored. PSO-based License Plate Detection Final result PSO-based License Plate Detection TEST RESULTS Set of 98 plates, 10 runs per plate = 980 runs Correct detections Wrong detections Plate not found N 955 15 10 Avg. time 0.083 s 0.151 s 1.424 s In many cases performing the whole plate detection process has a lower computation cost than computing the gradient image PSO-based License Plate Detection Detection speed (955 correct detections / 980) N. plates detected 1200 (Single-frame) detection percentage vs. time 1000 800 600 400 200 0 < 0,05 < 0,1 < 0,15 < 0,2 < 0,3 < 0,4 < 0,5 <1 <2 Time (s) Real-time tracking can be achieved re-initializing the swarm in frame F(t+1) within the region where the plate has been detected in frame F(t): average detection time 0.03s (more than 30 fps can be analyzed). • Swarm Intelligence approaches can be very efficient in roughly detecting object positions, as they effectively sample the search space • Very interesting approaches to the analysis of stereo images for obstacle detection and robot navigation have been described by Louchet, Lutton and Olague CONCLUSIONS In many cases in which a signal/image processing/understanding problem can be reformulated as an optimization problem (and not only in those cases… ), evolutionary computation provides powerful and effective tools to search for ‘good’ solutions, hard to find otherwise. BIBLIOGRAPHY • Poli R., Cagnoni S., Valli G., ``Genetic design of optimum linear and non-linear QRS detectors'', IEEE Trans. on Biomedical Engineering, 42(11):1137-41,1995. • Poli R., Cagnoni S. ``Genetic programming with user-driven selection: experiments on the evolution of algorithms for image enhancement'', Genetic Programming 1997, pp. 269-277, 1997. • Cagnoni S., Dobrzeniecki A.B., Poli R., Yanch J.C. ``Genetic-algorithmbased interactive segmentation of 3D medical images'', Image and Vision Computing Journal, 17(12):881-896, 1999. • Cagnoni S., Bergenti F., Mordonini M., Adorni G., "Evolving binary classifiers through parallel computation of multiple fitness cases". IEEE Transactions on Systems, Man and Cybernetics Part B-Cybernetics, 35(3):548-555, 2005. • Cagnoni S., Mordonini M., Sartori J., “Particle Swarm Optimization for Image Segmentation and Object Detection”, Proc. EvoWorkshops 2007, pp. ….. UPCOMING BOOK Genetic and Evolutionary Image Processing and Computer Vision EURASIP Book Series in Signal Processing Editors: S. Cagnoni, E. Lutton, G. Olague