Vision System Based On Shifted Fovea Multiresolution

advertisement
Vision System Based on Shifted Fovea Multiresolution Retinotopologies
Fabiiin Arrebola, Cristina Urdiales, Pelegrin Camacho and Francisco Sandoval
Departamento de Tecnologia ElectrBnica
E. T. S. I. Telecomunicacibn, Universidad de Milaga,
Campus de Teatinos, 29071 MBlaga - Spain
fabian@dte.uma.es, cris@dte.uma.es, pelegrin@dte.uma.es, sandoval@dte.uma.es
-
Abstract In this paper we present a foveal active vision
system. It is capable of moving and fixating the fovea to
any region of a scene, detecting its most relevant areas to
extract certain features of these regions of interest. The
system conducts a segmentation of the image, detects the
possible existing objects in the scene, obtains hierarchically
a set of features for each detected object -centroid, area,
bounding box and grey level- and extracts the corners of
the object contained in the fovea. This system is going to be
integrated in an autonomous mobile agent, so it is
important to process each object in the optimal resolution
level to minimise computational load and time
requirements.
The most important novelty of the system is the use of
Reconfigurable Shifted Fovea Retinotopologies, also
including a new algorithm capable of obtaining a
curvature function by means of local histograms of the
contour chain code to reliably calculate the stable corners
of the contour of the objects.
I. INTRODUCTION
Most vision systems consider perception as a reconstructive
process and use the camera as a static entity. However, living
beings often use vision as a guidance in their activities. Thus,
information adquisition and data extraction processes strongly
depend on the task to acomplish. Also, such processes are
conducted in an intentional and active way: we selectively
focus our gaze conditioned by the features and events of the
environment according to our needs and goals.
Several visually guided applications, specially those related
to robotic vision -autonomous navigation, surveillance, tracking
and object recognition- require a wide vision field, high
resolution and minimal answering times. Obviously, such
requirements can hardly be matched by a vision system relying
on uniform resolution images. The main problem of such
images is the enormous load of data that they yield: real time
vision systems can not deal with such data volume [8].
Recently, Active Vision has been strongly developed in order
to study and implement real time perception systems capable of
granting interaction of a robot with a dynamic complex
environment [ 141.
To ease the process of comprehension of a scene, Active
Vision proposes the use of foveal sensors presenting a
changing resolution over the field of view. A second strategy to
improve the performance of the system consists of extracting
and processing only data relevant for a given task [ 1I]. Thus,
a selective perception is conducted, processing the scene just
in certain areas with the required resolution level to conduct a
minimum number of operations.
0-7803-4503-7/98/$10.00
1998 IEEE
11. FOVEAL, VISION
The main goal of foveal vision is the simulation of the
retina of biological vision systems. The main feature of these
systems is that images to be processed present a central region
yielding the maximum density of photo-receptors (foveal
region) and a peripheral one where that density decreases
according to the distance to the central region. These images
are known as foveal images and they simultaneously present a
wide field of view and a high resolution in their central regions
but nevertheless they work with a reduced data volume.
If this non-uniform sensor layout is combined with the
posibility of moving the foveal region in a controlled way, it is
possible to adquire the information required to perform any
visual task with no need of all data contained in a uniform
resolution image [ 121. The changing resolution profile
presented by foveal sensors can be obtained by means of a logpolar [9] sampling strategy -Fig. 1-, or a Cartesian exponential
one [4,13] -Fig. 2-.
In Figs. 1 and 2 it can be appreciated how the sensor size
in both geometries is variable but the sensor shape is uniform.
However, the log-polar geometry does present blind spots,
since it has a singularity in the origin.
The main advantages of Cartesian topologies when
compared to log-polar ones are the following [4]:
-It allows an easier VLSI implementation of the foveal
sensor.
-Most algorithms, cameras, processing hardware, image
storage techniques and development tools have been
developed for Cartesian topologies.
Fig 1. Log-Polar Topology
1357
Authorized licensed use limited to: Universidad de Malaga. Downloaded on September 25, 2009 at 08:43 from IEEE Xplore. Restrictions apply.
Waist
Waist cl
Ri
Ri
D
rexel
(b)
Fig 3. a) Exponential foveal Lattice (m=2, d=2). b) Foveal Polygon
Fig .2 Cartesian Exponential Topology
-There is a wide supply of multirresolution algorithms
available to work in a fast and simple way. All these
algorithms can be easily adapted to work with Cartesian
foveal images.
111. EXPONENTIAL FOVEAL SENSOR
GEOMETRIES
Foveal geometries proposed by Bandera [ 131 are defined by
two parameters:
m: number of rings or resolution levels of the grid.
d: subdivision factor or number of subrings in each
resolution level.
Figure 2 presents a grid whose parameters are d=8 and m=3.
Obviously, to manipulate and process these images a
multirresolution structure would be desirable. Foveal Poligon
[13 ] has proven to be a suitable one. Fig 3 shows how a Foveal
Poligon can be generated from the grid -Fig 3.b-. The most
relevant features of the structure are:
- The first m layers present 4dx4d cells. There are two
different kinds of cells: the rexels, which are obtained from
the gray level geometry, and the computed cells, which are
obtained by averaging 4 cells of the lower layer.
- Level m covers the whole field of view and is usually
refered to as waist.
- There exists a pyramidal structure of computed cells over
the waist level.
- In the different resolution levels existing between the
fovea and the waist, the computed cells region has a size of
2dx2d and is located in the center of each resolution level.
IV. SHIFTED FOVEA MULTIRESOLWON
GEOMETRIES (SFMG)
Shifted Fovea Multiresolution images [ 1,3,6,7] present a
Log-Cartesian geometry where the fovea and the resolution
rings associated to a certain fixation can be relocated to any
point of the field of view (FOV) according to the requirements
of the system. Also, unlike static fovea systems, if a relevant
shape is detected in the periphery of the FOV, it is not
necessary to move the camera to examine it at the maximum
resolution level available.
These geometries and the hierarchical structures required
0-7803-4503-7/98/$10.00
1998 IEEE
computed cell
for their manipulation and processing are defined by two
features: a) size or number of cells associated to each resolution
level and b) relative shiftings between the different resolution
levels. Thus, three types of SFMG have been developed:
Basic SFMG:
Where resolution levels yield constant size -4dx4d- and
constant relative shiftings -Fig 4.a-. These geometries are
defined by 4 parameters: m, d, s,, y s,.
Parameters slr y s, define the relative shifting of each ring
inside the ring where it is contained [6]. The main
difference of this hierarchical data structure when compared
with the centred fovea one is that the shifted computed cells
region keeps the same relative position in every resolution
level [3].
- Extended Mobile SFMG:
Where the size of each layer of the structure is constant, but
relative shiftings change -Fig 4.b- . This geometry allows
a higher number of possible fixations, and therefore, the
positioning error of the foveal region in the scene is reduced
[l]. In this case, the final shifting of the fovea into the field
of vision is determined by two mays SH y SV, where each
pair of elements SH,y SV, shows the relative shifting of
ring k regarding ring k + l . It can be observed that the
computed cells region in every resolution level keeps a size
of 2dxZd, but it is not located in the same position.-
-SFMG of Adaptive Fovea Size:
Where the shifting betwen the levels is constant, but the
size of the fovea and resolution rings changes -Fig 4.c-.
The most important advantage of this geometry is that the
volume of data decreases when dealing with small objects,
and thus, the efficiency of the system is improved. These
geometries are defined by 5 parameters [7]: m, U,
Rd, Td
and Bd. Ld/Rd is the number of columns of sensor elements
on the legright side of each ring and Td /Bd is the number
of rows of sensor elements in the tophottom side of each
ring.
It can be observed in the hierachical data structure that
despite the changeable size of the computed cells region, its
relative shifting with respect to the limits of the resolution
level remains constant.
1358
Authorized licensed use limited to: Universidad de Malaga. Downloaded on September 25, 2009 at 08:43 from IEEE Xplore. Restrictions apply.
Waist
d+s,
d+s,
d+s,
4d
SV,+d
SV,+d
SV,+d
Level 1
Fdvea
4d
Fig. 4. Shifted Fovea Multiresolution geometries (SFMG) and the hierarchical structures required for their manipulation and processing. a) Basic SFMG, b)
Extended Mobile SFMG, c) SFMG of Adaptive Fovea Size.
0-7803-4503-7/98/$10.00
1998 IEEE
1359
Authorized licensed use limited to: Universidad de Malaga. Downloaded on September 25, 2009 at 08:43 from IEEE Xplore. Restrictions apply.
Extended Mobile SFMG:
( 2(i-(d+SVk))-1+m, 20-(d+SHk))-1+n )
V. DESCRIPTION OF THE SFMG IMAGE
PROCESSING SYSTEM
The system processes each scene in a selective way,
conducting attentive steps and fixations in a cyclic way. An
attentive step consists of a preprocessing stage divided in two
parts: a) Multiscale segmentation and b) Determination of the
regions of interest. Fixations consist of extracting the contours
of the object inside the fovea and obtaining its corners.
Q I m,n <4
d+SVk+l I i < 3d+SVk-1
d+SHk+l s j < 3d+SHk-1
SFMG of Adaptive Fovea Size
( 2(i-Td)-l+m , 2 G - m - 1 +n )
0 I m,n <4
Td+l I i < DimYk,,-(Td+Bd)-l
Ld+l < j < DimX,,,-(Ld+Rd)-l
A. Segmentation
The segmentation algorithm proposed by Burt and
Rosenfeld [5,10]for pyramidal structures has been adapted to
our geometries.This algorithm uses the Adaptive Link Principle
between cells in successive layers, linking them according to
their similitude and recalculating their grey level. After a
certain number of iterations, the link values do not change and
the structure is stabilised. If a level L of the structure is chosen
and the grey level of the cells of L is propagated down to the
fovea, the image is divided into a controllable number of
classes that depends on the height of level L in the structure.
being L the
The number of classes obtained is equal to 4(N-L),
number of levels in the hierarchical structure.
The adaptation of the pyramidal algorithm to our
multiresolution structures is conducted by considering:
- The architecture of the previously described foveal
poligons, which is anyway a very reduced set of data related
to a pyramid structure.
- The existance of two different types of cells: the computed
ones and the rexels.
+
(d+sh)+n)
(1.4
Extended Mobile SFMG:
(L(i-1)/21+(d+SVk-,)+m
, lG-l)/21+(d+SHk-,)+n)
(1.b)
O1m,n<2
1 I i, j < 4d-1
SFMG of Adaptive Fovea Size
( l(i-1)/2J + Td+m , @1)/2J + Ld+n )
O<m,n<2
1 I i < DimXk-,-l ; 1 < j < DimYk_,-l
U].
(1.c>
being D i d and DimY the horizontal and vertical dimension
of the different resolution levels in case of working with a
SFMG of Adaptive Fovea Size.
In the gray level recomputation stage a k+l level cell(i,j)
may have from 0 to 16 son cells linked in level k. The
coordinates of these son cells are:
Basic SFMG:
( 2(i-(d+sy))-l+m, 2Q-(d+sh))-1+n)
0 < m,n <4
d+sY+l 2 i < 3d+sy-1
d+sh+l c j < 3d+sh-1
0-7803-4503-7/98/$10.00
(2.4
1998 IEEE
(2.c)
B. Object Detection
Once the structure is stabilised, objects available in the
scene are detected. The first step of the second stage consists of
hierarchically calculating by a “bottom-up’’ process the
bounding-boxes and centroids of every cell in the structure up
to level L. The regions of interest are obtained by extracting the
cells from level L to level waist+l that can be roots of an
object. Possible roots must fulfil the following requirements:
- Compactness ConditionThe relationship between the area
of the cell and the area of its associated bounding box must
<U
be under a threshold U, as follows: Ahound,nghox/Anx,t
- Contrast Condition: The gray level of the root cell must be
different enough from the background.
This test procedure is conducted in a top-down way. Thus,
only cells not belonging to detected roots in upper levels are
tested when working in a given level, and so a good
computational load reduction is achieved.
Obviously,detections depend on the level L where they are
performed and of the chosen threshold U, as well as of the
fixation point of the fovea. The Gaze Control Subsystem will
be in charge of changing these parameters to explore the scene
by deciding where to look in the next fixation. In order to
simphfy the task of this subsystem, it would be derirable that
the informationprovided on the set of detected roots remain as
constant as possible from one fixation to the following one.
Thus, a second corrective process is applied to these roots to
merge or reject the possible targets given by the previous step
In the linking stage a k-1 level cell (i,j ) must be linked to
the most suitable one of its four possible father cells in level k.
The coordinates for these father cells are:
Basic SFMG:
( l(i-I)/2J + (d+sY)+m, lG-I)/2J
Q<m,n<2
1 < i, j < 4d-1
(2.b)
In order to merge two different roots, three conditions must
be fulfilled:
- Their bounding boxes must be overlapped.
- The son cells of both roots must conform a connected
region.
- Its gray level must be similar.
The rejection condition for a cell to be considered a root is
that its descendant cells in the m-level of the hierarchical
structure (waist) do not conform a connected region.
C. Feature Extraction
In order to extract features to characterise the objects in the
foveal region, the system includes a new algorithm that
generates a curvature function by means of local histograms of
the contour chain code [2].This function provides a fast and
reliable detection of the corners of an object.
Our comer detector is very robust and stable to rotations,
scaling and contour noise. Also, our computational time is very
small when compared to that of other detectors [2]. The
parameters for subsequent fixations are extracted from the
regions of interest obtained in the preprocessing stage,
1360
Authorized licensed use limited to: Universidad de Malaga. Downloaded on September 25, 2009 at 08:43 from IEEE Xplore. Restrictions apply.
b
a
C
Fig 5. a) Uniform resolution image with the resolution rings associated to a fixation overprinted; b) segmented foveal image, including centroids, bounding boxes
and detected comers of the object inside the fovea c) segmentation of the levels of the hierarchical data structure, presenting the lower levels (fovea, plus three rings)
and the upper ones (16x16 and 8x8), where detection is conducted.
repeating the described two stages for the rest of the objects of
the image.
VI. RESULTS
The performance of the system can be appreciated in Fig.
5. First, a 512x5 12 uniform resolution scene (a) is presented.
Then a SFMG of Adaptive Fovea Size is used. The results of
the processing, segmentation, and corner detection can be seen
in (b) and (c). Figure 5.b. shows the segmented foveal image
including centroids and bounding boxes of every object and the
corners of the object contained in the fovea. Fig. 5.c. presents
the foveal polygon segmentationfrom waist to fovea, as well as
levels L=5 and L-1=4, where roots cells are marked with a
black spot.
ACKNOWLEDGMENTS
The present work has been partially supported by the
Spanish Comisidn Interministerial de Ciencia y Tecnologia,
(CICYT), Proyect No. TIC095-0589.
REFERENCES
[I] F. Arrebola, P. Camacho, F. Sandoval, “Generalization of Shifted Fovea
Multiresolution Geometries Applied to Object Deteccion”, in Proc. of
9th Intemtionul Conference on Image Analisis and Processing 1997,
vol. 2, pp. 477-484.
[2] F. Arrebola, A. Bandera, P. Camacho, F. Sandoval, “Comer Detection by
Local his tog"^ of Contour Chain Code”, Electronics Letters, vol. 33,
no. 21, Sept. 1997, pp. 1769-1771.
[3] F. Arrebola, P. Camacho, F. Sandoval, “Shifted Fovea Multiresolution
0-7803-4503-7/98/$10.00
1998 IEEE
ImageSegmentation”,in Proc. o f ’ l l t h URSl 1996, vol. 2, pp. 205-208,
(in Spanish).
C. Bandera, S. Ghosal, A. J. Izatt, “Retinotopic Processing for Active
Foveal Visih”, i, Proc. 2nd Asian Conference on Computer V i s i h
1995, vol. 2.
P. J. Burt, T. H. Hong, A. Rosenfeld, “Image Smoothing Based on
Neighbor Linking”, IEEE Transaction SMC 1981, vol. 11, no. 12,769780.
P. Camacho, F. Arrebola, F. Sandoval, ”Shifted Fovea Multiresolution
Geometries” , in Proc. IEEE International Conference on Image
Processing 1996, vol. 1 ,pp. 307-310.
P. Camacho, F. Arrehola, F. Sandoval, “Adaptive Fovea Structures for
Space-variant Sensors”, in Proc of 12th International Conference on
Image Anulisis and Processing 1997, vol.1, pp. 422-429.
J. 0. Eklundh, “Trends in Active Vsion”, in Computer science roday:
recent trends and developments (J. van Leeuwen, Ed.) Springer 1995,
pp. 505-517.
F. F e d , J. Nielsen, P. Questa and G . Sandini, “Space variant imaging”,
Sensor Review 1995, vol. 15, no. 2, pp. 17-20.
[IO] Hong, Narayanan, Peleg, Rosenfeld, Silherberg, “Image Smoothing and
Segmentationby Multiresolution Pixel Linking: Further Experiments and
Extension”, JEEE Trans. System, Man and Cybernetics 1982, vol. 12,
no. 5, pp. 61 1-622.
[I I ] R. D. Rymey and C. M. Brown, “Control of Selective Perception Using
Bayes Nets and Decision Theory”, International Journal of Computer
Vision 1994, vol. 12, no. 2 0 .
[I21 J. Santos-Victor, G. Sandini, F. Curotto, S. Garibaldi, “Divergent Stereo
in Autonomous Navigation: From Bees to Robots”, International
Vision 1995, vol. 14, pp. 159-177.
~ O U ~ ofcomputer
~Ul
1131 P. Scott,C. Bandera, “Hierarchical Multiresolution Data Structures and
Algorithms for Foveal Vision Systems”, in Proc. JEEE Internutional
Conference on System, Man and Cybernetics 1990.
[14] M. J. Swain, M. Stricker, Eds, “Promising Directions in Active Vision”,
International Journal of’ Computer Vision 1993, vol. 11, no. 2, pp.
109-126.
1361
Authorized licensed use limited to: Universidad de Malaga. Downloaded on September 25, 2009 at 08:43 from IEEE Xplore. Restrictions apply.
Download