Vision System Based on Shifted Fovea Multiresolution Retinotopologies Fabiiin Arrebola, Cristina Urdiales, Pelegrin Camacho and Francisco Sandoval Departamento de Tecnologia ElectrBnica E. T. S. I. Telecomunicacibn, Universidad de Milaga, Campus de Teatinos, 29071 MBlaga - Spain fabian@dte.uma.es, cris@dte.uma.es, pelegrin@dte.uma.es, sandoval@dte.uma.es - Abstract In this paper we present a foveal active vision system. It is capable of moving and fixating the fovea to any region of a scene, detecting its most relevant areas to extract certain features of these regions of interest. The system conducts a segmentation of the image, detects the possible existing objects in the scene, obtains hierarchically a set of features for each detected object -centroid, area, bounding box and grey level- and extracts the corners of the object contained in the fovea. This system is going to be integrated in an autonomous mobile agent, so it is important to process each object in the optimal resolution level to minimise computational load and time requirements. The most important novelty of the system is the use of Reconfigurable Shifted Fovea Retinotopologies, also including a new algorithm capable of obtaining a curvature function by means of local histograms of the contour chain code to reliably calculate the stable corners of the contour of the objects. I. INTRODUCTION Most vision systems consider perception as a reconstructive process and use the camera as a static entity. However, living beings often use vision as a guidance in their activities. Thus, information adquisition and data extraction processes strongly depend on the task to acomplish. Also, such processes are conducted in an intentional and active way: we selectively focus our gaze conditioned by the features and events of the environment according to our needs and goals. Several visually guided applications, specially those related to robotic vision -autonomous navigation, surveillance, tracking and object recognition- require a wide vision field, high resolution and minimal answering times. Obviously, such requirements can hardly be matched by a vision system relying on uniform resolution images. The main problem of such images is the enormous load of data that they yield: real time vision systems can not deal with such data volume [8]. Recently, Active Vision has been strongly developed in order to study and implement real time perception systems capable of granting interaction of a robot with a dynamic complex environment [ 141. To ease the process of comprehension of a scene, Active Vision proposes the use of foveal sensors presenting a changing resolution over the field of view. A second strategy to improve the performance of the system consists of extracting and processing only data relevant for a given task [ 1I]. Thus, a selective perception is conducted, processing the scene just in certain areas with the required resolution level to conduct a minimum number of operations. 0-7803-4503-7/98/$10.00 1998 IEEE 11. FOVEAL, VISION The main goal of foveal vision is the simulation of the retina of biological vision systems. The main feature of these systems is that images to be processed present a central region yielding the maximum density of photo-receptors (foveal region) and a peripheral one where that density decreases according to the distance to the central region. These images are known as foveal images and they simultaneously present a wide field of view and a high resolution in their central regions but nevertheless they work with a reduced data volume. If this non-uniform sensor layout is combined with the posibility of moving the foveal region in a controlled way, it is possible to adquire the information required to perform any visual task with no need of all data contained in a uniform resolution image [ 121. The changing resolution profile presented by foveal sensors can be obtained by means of a logpolar [9] sampling strategy -Fig. 1-, or a Cartesian exponential one [4,13] -Fig. 2-. In Figs. 1 and 2 it can be appreciated how the sensor size in both geometries is variable but the sensor shape is uniform. However, the log-polar geometry does present blind spots, since it has a singularity in the origin. The main advantages of Cartesian topologies when compared to log-polar ones are the following [4]: -It allows an easier VLSI implementation of the foveal sensor. -Most algorithms, cameras, processing hardware, image storage techniques and development tools have been developed for Cartesian topologies. Fig 1. Log-Polar Topology 1357 Authorized licensed use limited to: Universidad de Malaga. Downloaded on September 25, 2009 at 08:43 from IEEE Xplore. Restrictions apply. Waist Waist cl Ri Ri D rexel (b) Fig 3. a) Exponential foveal Lattice (m=2, d=2). b) Foveal Polygon Fig .2 Cartesian Exponential Topology -There is a wide supply of multirresolution algorithms available to work in a fast and simple way. All these algorithms can be easily adapted to work with Cartesian foveal images. 111. EXPONENTIAL FOVEAL SENSOR GEOMETRIES Foveal geometries proposed by Bandera [ 131 are defined by two parameters: m: number of rings or resolution levels of the grid. d: subdivision factor or number of subrings in each resolution level. Figure 2 presents a grid whose parameters are d=8 and m=3. Obviously, to manipulate and process these images a multirresolution structure would be desirable. Foveal Poligon [13 ] has proven to be a suitable one. Fig 3 shows how a Foveal Poligon can be generated from the grid -Fig 3.b-. The most relevant features of the structure are: - The first m layers present 4dx4d cells. There are two different kinds of cells: the rexels, which are obtained from the gray level geometry, and the computed cells, which are obtained by averaging 4 cells of the lower layer. - Level m covers the whole field of view and is usually refered to as waist. - There exists a pyramidal structure of computed cells over the waist level. - In the different resolution levels existing between the fovea and the waist, the computed cells region has a size of 2dx2d and is located in the center of each resolution level. IV. SHIFTED FOVEA MULTIRESOLWON GEOMETRIES (SFMG) Shifted Fovea Multiresolution images [ 1,3,6,7] present a Log-Cartesian geometry where the fovea and the resolution rings associated to a certain fixation can be relocated to any point of the field of view (FOV) according to the requirements of the system. Also, unlike static fovea systems, if a relevant shape is detected in the periphery of the FOV, it is not necessary to move the camera to examine it at the maximum resolution level available. These geometries and the hierarchical structures required 0-7803-4503-7/98/$10.00 1998 IEEE computed cell for their manipulation and processing are defined by two features: a) size or number of cells associated to each resolution level and b) relative shiftings between the different resolution levels. Thus, three types of SFMG have been developed: Basic SFMG: Where resolution levels yield constant size -4dx4d- and constant relative shiftings -Fig 4.a-. These geometries are defined by 4 parameters: m, d, s,, y s,. Parameters slr y s, define the relative shifting of each ring inside the ring where it is contained [6]. The main difference of this hierarchical data structure when compared with the centred fovea one is that the shifted computed cells region keeps the same relative position in every resolution level [3]. - Extended Mobile SFMG: Where the size of each layer of the structure is constant, but relative shiftings change -Fig 4.b- . This geometry allows a higher number of possible fixations, and therefore, the positioning error of the foveal region in the scene is reduced [l]. In this case, the final shifting of the fovea into the field of vision is determined by two mays SH y SV, where each pair of elements SH,y SV, shows the relative shifting of ring k regarding ring k + l . It can be observed that the computed cells region in every resolution level keeps a size of 2dxZd, but it is not located in the same position.- -SFMG of Adaptive Fovea Size: Where the shifting betwen the levels is constant, but the size of the fovea and resolution rings changes -Fig 4.c-. The most important advantage of this geometry is that the volume of data decreases when dealing with small objects, and thus, the efficiency of the system is improved. These geometries are defined by 5 parameters [7]: m, U, Rd, Td and Bd. Ld/Rd is the number of columns of sensor elements on the legright side of each ring and Td /Bd is the number of rows of sensor elements in the tophottom side of each ring. It can be observed in the hierachical data structure that despite the changeable size of the computed cells region, its relative shifting with respect to the limits of the resolution level remains constant. 1358 Authorized licensed use limited to: Universidad de Malaga. Downloaded on September 25, 2009 at 08:43 from IEEE Xplore. Restrictions apply. Waist d+s, d+s, d+s, 4d SV,+d SV,+d SV,+d Level 1 Fdvea 4d Fig. 4. Shifted Fovea Multiresolution geometries (SFMG) and the hierarchical structures required for their manipulation and processing. a) Basic SFMG, b) Extended Mobile SFMG, c) SFMG of Adaptive Fovea Size. 0-7803-4503-7/98/$10.00 1998 IEEE 1359 Authorized licensed use limited to: Universidad de Malaga. Downloaded on September 25, 2009 at 08:43 from IEEE Xplore. Restrictions apply. Extended Mobile SFMG: ( 2(i-(d+SVk))-1+m, 20-(d+SHk))-1+n ) V. DESCRIPTION OF THE SFMG IMAGE PROCESSING SYSTEM The system processes each scene in a selective way, conducting attentive steps and fixations in a cyclic way. An attentive step consists of a preprocessing stage divided in two parts: a) Multiscale segmentation and b) Determination of the regions of interest. Fixations consist of extracting the contours of the object inside the fovea and obtaining its corners. Q I m,n <4 d+SVk+l I i < 3d+SVk-1 d+SHk+l s j < 3d+SHk-1 SFMG of Adaptive Fovea Size ( 2(i-Td)-l+m , 2 G - m - 1 +n ) 0 I m,n <4 Td+l I i < DimYk,,-(Td+Bd)-l Ld+l < j < DimX,,,-(Ld+Rd)-l A. Segmentation The segmentation algorithm proposed by Burt and Rosenfeld [5,10]for pyramidal structures has been adapted to our geometries.This algorithm uses the Adaptive Link Principle between cells in successive layers, linking them according to their similitude and recalculating their grey level. After a certain number of iterations, the link values do not change and the structure is stabilised. If a level L of the structure is chosen and the grey level of the cells of L is propagated down to the fovea, the image is divided into a controllable number of classes that depends on the height of level L in the structure. being L the The number of classes obtained is equal to 4(N-L), number of levels in the hierarchical structure. The adaptation of the pyramidal algorithm to our multiresolution structures is conducted by considering: - The architecture of the previously described foveal poligons, which is anyway a very reduced set of data related to a pyramid structure. - The existance of two different types of cells: the computed ones and the rexels. + (d+sh)+n) (1.4 Extended Mobile SFMG: (L(i-1)/21+(d+SVk-,)+m , lG-l)/21+(d+SHk-,)+n) (1.b) O1m,n<2 1 I i, j < 4d-1 SFMG of Adaptive Fovea Size ( l(i-1)/2J + Td+m , @1)/2J + Ld+n ) O<m,n<2 1 I i < DimXk-,-l ; 1 < j < DimYk_,-l U]. (1.c> being D i d and DimY the horizontal and vertical dimension of the different resolution levels in case of working with a SFMG of Adaptive Fovea Size. In the gray level recomputation stage a k+l level cell(i,j) may have from 0 to 16 son cells linked in level k. The coordinates of these son cells are: Basic SFMG: ( 2(i-(d+sy))-l+m, 2Q-(d+sh))-1+n) 0 < m,n <4 d+sY+l 2 i < 3d+sy-1 d+sh+l c j < 3d+sh-1 0-7803-4503-7/98/$10.00 (2.4 1998 IEEE (2.c) B. Object Detection Once the structure is stabilised, objects available in the scene are detected. The first step of the second stage consists of hierarchically calculating by a “bottom-up’’ process the bounding-boxes and centroids of every cell in the structure up to level L. The regions of interest are obtained by extracting the cells from level L to level waist+l that can be roots of an object. Possible roots must fulfil the following requirements: - Compactness ConditionThe relationship between the area of the cell and the area of its associated bounding box must <U be under a threshold U, as follows: Ahound,nghox/Anx,t - Contrast Condition: The gray level of the root cell must be different enough from the background. This test procedure is conducted in a top-down way. Thus, only cells not belonging to detected roots in upper levels are tested when working in a given level, and so a good computational load reduction is achieved. Obviously,detections depend on the level L where they are performed and of the chosen threshold U, as well as of the fixation point of the fovea. The Gaze Control Subsystem will be in charge of changing these parameters to explore the scene by deciding where to look in the next fixation. In order to simphfy the task of this subsystem, it would be derirable that the informationprovided on the set of detected roots remain as constant as possible from one fixation to the following one. Thus, a second corrective process is applied to these roots to merge or reject the possible targets given by the previous step In the linking stage a k-1 level cell (i,j ) must be linked to the most suitable one of its four possible father cells in level k. The coordinates for these father cells are: Basic SFMG: ( l(i-I)/2J + (d+sY)+m, lG-I)/2J Q<m,n<2 1 < i, j < 4d-1 (2.b) In order to merge two different roots, three conditions must be fulfilled: - Their bounding boxes must be overlapped. - The son cells of both roots must conform a connected region. - Its gray level must be similar. The rejection condition for a cell to be considered a root is that its descendant cells in the m-level of the hierarchical structure (waist) do not conform a connected region. C. Feature Extraction In order to extract features to characterise the objects in the foveal region, the system includes a new algorithm that generates a curvature function by means of local histograms of the contour chain code [2].This function provides a fast and reliable detection of the corners of an object. Our comer detector is very robust and stable to rotations, scaling and contour noise. Also, our computational time is very small when compared to that of other detectors [2]. The parameters for subsequent fixations are extracted from the regions of interest obtained in the preprocessing stage, 1360 Authorized licensed use limited to: Universidad de Malaga. Downloaded on September 25, 2009 at 08:43 from IEEE Xplore. Restrictions apply. b a C Fig 5. a) Uniform resolution image with the resolution rings associated to a fixation overprinted; b) segmented foveal image, including centroids, bounding boxes and detected comers of the object inside the fovea c) segmentation of the levels of the hierarchical data structure, presenting the lower levels (fovea, plus three rings) and the upper ones (16x16 and 8x8), where detection is conducted. repeating the described two stages for the rest of the objects of the image. VI. RESULTS The performance of the system can be appreciated in Fig. 5. First, a 512x5 12 uniform resolution scene (a) is presented. Then a SFMG of Adaptive Fovea Size is used. The results of the processing, segmentation, and corner detection can be seen in (b) and (c). Figure 5.b. shows the segmented foveal image including centroids and bounding boxes of every object and the corners of the object contained in the fovea. Fig. 5.c. presents the foveal polygon segmentationfrom waist to fovea, as well as levels L=5 and L-1=4, where roots cells are marked with a black spot. ACKNOWLEDGMENTS The present work has been partially supported by the Spanish Comisidn Interministerial de Ciencia y Tecnologia, (CICYT), Proyect No. TIC095-0589. REFERENCES [I] F. Arrebola, P. Camacho, F. Sandoval, “Generalization of Shifted Fovea Multiresolution Geometries Applied to Object Deteccion”, in Proc. of 9th Intemtionul Conference on Image Analisis and Processing 1997, vol. 2, pp. 477-484. [2] F. Arrebola, A. Bandera, P. Camacho, F. Sandoval, “Comer Detection by Local his tog"^ of Contour Chain Code”, Electronics Letters, vol. 33, no. 21, Sept. 1997, pp. 1769-1771. [3] F. Arrebola, P. Camacho, F. Sandoval, “Shifted Fovea Multiresolution 0-7803-4503-7/98/$10.00 1998 IEEE ImageSegmentation”,in Proc. o f ’ l l t h URSl 1996, vol. 2, pp. 205-208, (in Spanish). C. Bandera, S. Ghosal, A. J. Izatt, “Retinotopic Processing for Active Foveal Visih”, i, Proc. 2nd Asian Conference on Computer V i s i h 1995, vol. 2. P. J. Burt, T. H. Hong, A. Rosenfeld, “Image Smoothing Based on Neighbor Linking”, IEEE Transaction SMC 1981, vol. 11, no. 12,769780. P. Camacho, F. Arrebola, F. Sandoval, ”Shifted Fovea Multiresolution Geometries” , in Proc. IEEE International Conference on Image Processing 1996, vol. 1 ,pp. 307-310. P. Camacho, F. Arrehola, F. Sandoval, “Adaptive Fovea Structures for Space-variant Sensors”, in Proc of 12th International Conference on Image Anulisis and Processing 1997, vol.1, pp. 422-429. J. 0. Eklundh, “Trends in Active Vsion”, in Computer science roday: recent trends and developments (J. van Leeuwen, Ed.) Springer 1995, pp. 505-517. F. F e d , J. Nielsen, P. Questa and G . Sandini, “Space variant imaging”, Sensor Review 1995, vol. 15, no. 2, pp. 17-20. [IO] Hong, Narayanan, Peleg, Rosenfeld, Silherberg, “Image Smoothing and Segmentationby Multiresolution Pixel Linking: Further Experiments and Extension”, JEEE Trans. System, Man and Cybernetics 1982, vol. 12, no. 5, pp. 61 1-622. [I I ] R. D. Rymey and C. M. Brown, “Control of Selective Perception Using Bayes Nets and Decision Theory”, International Journal of Computer Vision 1994, vol. 12, no. 2 0 . [I21 J. Santos-Victor, G. Sandini, F. Curotto, S. Garibaldi, “Divergent Stereo in Autonomous Navigation: From Bees to Robots”, International Vision 1995, vol. 14, pp. 159-177. ~ O U ~ ofcomputer ~Ul 1131 P. Scott,C. Bandera, “Hierarchical Multiresolution Data Structures and Algorithms for Foveal Vision Systems”, in Proc. JEEE Internutional Conference on System, Man and Cybernetics 1990. [14] M. J. Swain, M. Stricker, Eds, “Promising Directions in Active Vision”, International Journal of’ Computer Vision 1993, vol. 11, no. 2, pp. 109-126. 1361 Authorized licensed use limited to: Universidad de Malaga. Downloaded on September 25, 2009 at 08:43 from IEEE Xplore. Restrictions apply.