Uploaded by Rached Nabil Zantout

GPU implementation of a road sign detector based on particle swarm optimization

advertisement
Evol. Intel. (2010) 3:155–169
DOI 10.1007/s12065-010-0043-y
RESEARCH PAPER
GPU implementation of a road sign detector based on particle
swarm optimization
Luca Mussi • Stefano Cagnoni • Elena Cardarelli
Fabio Daolio • Paolo Medici • Pier Paolo Porta
•
Received: 8 February 2010 / Revised: 9 July 2010 / Accepted: 23 September 2010 / Published online: 15 October 2010
Springer-Verlag 2010
Abstract Road Sign Detection is a major goal of the
Advanced Driving Assistance Systems. Most published
work on this problem share the same approach by which
signs are first detected and then classified in video sequences, even if different techniques are used. While detection is
usually performed using classical computer vision techniques based on color and/or shape matching, most often
classification is performed by neural networks. In this work
we present a novel modular and scalable approach to road
sign detection based on Particle Swarm Optimization, which
takes into account both shape and color to detect signs.
In our approach, in particular, the optimization of a single
fitness function allows both to detect a sign belonging to a
certain category and, at the same time, to estimate its position with respect to the camera reference frame. To speed up
processing, the algorithm implementation exploits the parallel computing capabilities offered by modern graphics
L. Mussi S. Cagnoni (&) E. Cardarelli P. Medici P. P. Porta
Dipartimento di Ingegneria dell’Informazione, University
of Parma, Viale G. Usberti 181a, 43124 Parma, Italy
e-mail: cagnoni@ce.unipr.it
L. Mussi
e-mail: mussi@ce.unipr.it
E. Cardarelli
e-mail: cardar@ce.unipr.it
P. Medici
e-mail: medici@ce.unipr.it
P. P. Porta
e-mail: porta@ce.unipr.it
F. Daolio
Information Systems Institute (ISI) - HEC, University
of Lausanne, Internef 135, CH-1015 Lausanne, Switzerland
e-mail: fabio.daolio@unil.ch
cards and, in particular, by the Compute Unified Device
Architecture by nVIDIA. The effectiveness of the approach
has been assessed on both synthetic and real video sequences, which have been successfully processed at, or close to,
full frame rate.
Keywords Particle swarm optimization Road sign
detection GPU computing Parallel computing
1 Introduction
Automatic traffic sign detection and classification is a very
important issue for the Advanced Driver Assistance Systems (ADAS) and road safety. It can both improve safety
and help navigation, by providing critical information the
driver could otherwise miss, limiting or compensating for
drivers’ distractions. Because of this, several road sign
detectors have been developed in the last 10 years [24].
In most industrial systems only speed limit signs are
detected, since these are considered to be the most relevant
for safety. Nevertheless, information provided by warning
signs, mandatory signs and all the remaining prohibitory
signs can also be extremely significant: ignoring the presence of such signs can lead to dangerous situations or even
accidents. Automatic road sign detection systems can be
used to both warn drivers in these situations and supply
additional environmental information to other on-board
systems such as the Automatic Cruise Control (ACC), the
Lane Departure Warning (LDW), etc.
Both gray-scale and color cameras can be used to this
purpose: in the first case, search is mainly based on shape
and can be quite demanding in terms of computation time
[10, 19]. Using a color camera, the search can be based
mainly on chromatic information: color segmentation is, in
123
156
general, faster than shape detection, even if it requires
additional filtering; however, images acquired by inexpensive color cameras can suffer from artifacts deriving
from Bayer conversion or from other problems related, for
instance, to color balance [2].
The approaches to traffic sign detection which rely on
color images are usually based on color bases different
from RGB; the HSV/HSI color space is the most frequently
used [7, 35] but other color spaces, such as CIECAM97 [9],
can be used as well. On the one hand, these spaces separate
chromatic information from lighting information, making
detection of a specified color mostly independent of light
conditions. On the other hand, the RGB [32] and YUV [31]
color spaces require no transformations, or just very simple
ones; however, they require more sophisticated segmentation algorithms, since the boundary between colors is
fuzzier. In order to make detection more robust, both color
segmentation and shape recognition can be used in cooperation [9].
As regards sign recognition, most methods are based on
computational intelligence techniques [6], the most frequent being neural networks [8, 11] and fuzzy logic [13].
In this paper we present a novel approach to road sign
detection based on geometric transformations of basic sign
templates. Our approach projects sets of three-dimensional
points, which sample significant regions of the road sign
template to be detected and describe its shape, onto the
image plane according to a transformation which maps 3D
points in the camera reference frame onto the image; the
transformed set of points is then matched to the corresponding image pixels. The likelihood of detection is
estimated using a similarity measure between color histograms [34]. This procedure can actually estimate the pose
of any object based on a 3D model within any projection
system and with any general object model. One of the
advantages over other model-based approaches is that this
approach does not need any preliminary pre-processing of
the image (like, for example, color segmentation) or any
reprojection of the full three-dimensional model [34].
Another peculiar feature with respect to similar work [7],
besides the aforementioned similarity measure, is that our
method relies upon Particle Swarm Optimization (PSO) to
estimate, by a single transformation, the pose of the sign in
the 3D space at the same time as the position of the sign in
the image.
Despite being more efficient than many other metaheuristics, PSO is still rather demanding in terms of
computational resources; therefore, a sequential implementation of the algorithm would be too slow for real-time
applications. As for all other metaheuristics, this is especially true when the function to be optimized is itself
computationally complex. The PSO algorithm we have
used in this work has been implemented within the nVIDIA
123
Evol. Intel. (2010) 3:155–169
CUDA environment [23, 25], to take advantage of the
computing power offered by the massively parallel architectures available nowadays even on cheap consumer video
cards. As will be shown, thanks also to the parallel nature
of PSO, this choice allowed the final system to manage
several swarms at the same time, each specialized in
detecting a specific class of road signs.
This paper is organized as follows: Sect. 2 briefly
introduces PSO and its parallel implementation within
CUDA; Sect. 3 addresses the problem of road sign detection, motivating our approach and offering further details
on how shape and color information is processed to compute fitness. Finally, in Sect. 4, we report results obtained
on both a synthetic video sequence containing two signs
and on two real video sequences, acquired on-board a car,
for a total running time of about 30 min.
2 GPU implementation of particle swarm optimization
Particle Swarm Optimization is a simple but powerful
optimization algorithm, introduced by Kennedy and Eberhart [15]. In the last decade many variants of the basic
PSO algorithm have been developed [18, 26, 29] and
successfully applied to many problems in several fields
[28], image analysis being one of the most frequent
ones. In fact, image analysis tasks can be often reformulated as the optimization of an objective function,
directly derived from the physical features of the problem being solved. Beyond this, PSO can often be more
than a way to ‘tune’ the parameters of another algorithm,
but can be directly the main building block for an original solution. For example, [3, 5, 21, 27, 38] use PSO
to directly infer the position of an object that is sought
in the image.
2.1 PSO basics
PSO searches for the optimum of a fitness function, following rules inspired by the behavior of flocks of birds in
search of food. A population of particles move within the
fitness function domain (usually termed their search
space), sampling the function in the points corresponding
to their position. This means that, after each particle’s
move, the fitness computed at its new position is evaluated.
In their motion, particles preserve part of their velocity
(inertia), while undergoing two attraction forces: the first
one, called cognitive attraction, attracts a particle towards
the best position it visited so far, while the second one,
called social attraction, pulls the particle towards the best
position ever found by the whole swarm. Based on this
model, in basic PSO, the following velocity and position
update equations are computed for each particle:
Evol. Intel. (2010) 3:155–169
157
Vi ðtÞ ¼ w Vi ðt 1Þ
þ C1 R1 ½Xib ðt 1Þ Xi ðt 1Þ
þ C2 R2 ½Xigb ðt 1Þ Xi ðt 1Þ
Xi ðtÞ ¼ Xi ðt 1Þ þ Vi ðtÞ
ð1Þ
ð2Þ
where the subscript i refers to the i-th dimension of the
search space, V is the velocity of the particle, C1, C2 are
two positive constants, w is the inertia weight, X(t) is the
particle position at time t, Xb(t - 1) is the best fitness
position visited by the particle up to time t - 1, Xgb(t - 1)
is the best fitness point ever visited by the whole swarm; R1
and R2 are two random numbers from a uniform distribution in [0, 1].
Many variants of the basic algorithm have been developed [29], some of which have focused on the algorithm
behavior when different topologies are defined for the
neighborhoods of the particles [16]. A usual variant of PSO
substitutes Xgb ðt 1Þ with X lb ðt 1Þ, which represents the
‘local’ best position ever found by all particles within a
pre-set neighborhood of the particle under consideration.
This formulation admits, in turn, several variants,
depending on the topology of such neighborhoods. Among
others, Kennedy and coworkers evaluated different kinds
of topologies, finding that good performance could be
achieved using random and Von Neumann neighborhoods
[16]. Nevertheless, the authors also indicated that selecting
the most efficient neighborhood structure is, in general, a
problem dependent task. Since the random topology is
usually designed such that each particle communicates
with two random neighbors, most often using a simple ring
topology is adequate for the problem at hand, while
allowing for an easier implementation.
Whatever the choices of the algorithm structure, parameters, etc., and despite good convergence properties, PSO
is still an iterative process which, depending on problem
difficulty, may require several thousands (when not millions) of particle updates and fitness evaluations. Therefore,
designing efficient PSO implementations is a problem of
great practical relevance. This becomes even more critical,
if one considers real-time applications to dynamic environments in which, for example, the fast convergence
properties of PSO may be used to track moving points of
interest (maxima or minima of a specific dynamicallychanging fitness function) in real time. This is the case, for
example, of computer vision applications in which PSO has
been used to track moving objects [22] or to determine
location and orientation of objects or people [12, 21].
2.2 Implementing PSO within CUDA
We implemented a standard PSO with particles organized
with the classical ring topology [23]. The rationale behind
this choice is, on the one hand, the inadequacy of PSO
with synchronous best update and global-best topology,
which would have been the most natural and easiest
parallel implementation, for optimizing multi-modal
problems [4]. On the other hand, as reported above, PSO
with ring topology provides a very good compromise
between quality of results, efficiency, and easiness of
implementation.
The parallel programming model of CUDA allows
programmers to partition the main problem in many subproblems that can be solved independently in parallel.
Each sub-problem may then be further decomposed into
many modules that can be executed cooperatively in
parallel. In CUDA, each sub-problem becomes a thread
block, which is composed by a certain number of threads
which cooperate to solve the sub-problem in parallel.
The software modules that describe the operation of each
thread are called kernels: when a program running on the
CPU invokes a kernel, a unique set of indices is assigned
to each thread, to denote to which block it belongs and
its position inside it. These indices allow each thread
to ‘personalize’ its access to the data structures and,
in the end, to achieve problem parallelization and
decomposition.
To exploit the impressive computation capabilities of
graphic cards effectively within CUDA and implement a
parallel version of PSO, the best approach is probably to
consider the main phases of the algorithm as separate
tasks, parallelizing each of them separately: this way,
each phase can be implemented by a different kernel and
the whole optimization process can be performed by
iterating the basic kernels needed to perform one generational update of the swarm. Since the only way CUDA
offers to share data among different kernels is to keep
them in global memory (i.e., the RAM region, featuring
the slowest access time by far, which is shared by the
processes run by the GPU and the ones run by the CPU)
[25], the current status of our PSO must be saved there.
Data organization is therefore the first problem to tackle
to exploit the GPU read/write coalescing capability and
maximize the degree of parallelism of the implementation. With our data design, it is enough to appropriately
arrange the thread indices to run several swarms at the
same time very efficiently.
In order to have all tasks performed on the GPU, and
avoid, as much as possible, the bottleneck of data exchange
with the CPU using global memory, we generate pseudorandom numbers running the Mersenne Twister [20]
algorithm directly on the GPU using the kernel available
within the CUDA SDK: this way the CPU load is virtually
zero.
In the following we briefly describe the three kernels
into which our PSO implementation has been subdivided.
123
158
Evol. Intel. (2010) 3:155–169
2.2.1 Position update
3.1 The image projection model and camera calibration
A computation grid, divided into a number of blocks of
threads, updates the position of all particles being simulated. Each block updates the data of one particle, while
each thread in a thread block updates one element of the
position and velocity arrays. In the beginning the particle’s
current position, personal best position, velocity and local
best information are loaded, after which the classical PSO
equations are applied.
The main goal of computer vision is to make a computer
analyze and ‘understand’ the content of an image, i.e., the
projection of a region of the real world which lies within
the field of view of a camera onto the camera’s sensor
plane (the image plane), in order for it to be able to take
some decision based on such an analysis.
The simplest mathematical model which describes the
spatial relationships between the 3D real-world scene and
its projection on the image pixels is a general affine
transform of the following form:
2.2.2 Fitness evaluation
p i ¼ A Pi
This kernel is scheduled as a computation grid composed
by one block for each particle being simulated (irrespective
of the swarm to which it belongs). Each block comprises a
number of threads equal to the total number of points that
describe a sign (three sets of 16 points each) so that the
projection of all points on the current image is performed in
parallel. Successively, each thread contributes to building
the histograms described in Sect. 3.3: the thread index
determines to which set/histogram the projected point
under consideration belongs, while the sampled color value
determines which bin of the histogram is to be incremented. Finally the fitness value is computed according to
Eq. 9 where, once again, histogram similarity is assessed in
parallel.
2.2.3 Bests update
For each swarm, a thread block is scheduled with a
number of threads equal to the number of particles in the
swarm. As already mentioned, in our system we have used
a ring topology with radius equal to 1. Firstly, each thread
loads in shared memory both the current and the best
fitness values of its corresponding particle, to update the
personal best, if needed. Successively, the current local
best fitness value is found by computing the best fitness of
each particle’s neighborhood (including the particle and
the neighboring one on both sides of the ring), comparing
it to the best value found so far and updating it, when
necessary.
3 Road sign detection
In this section, we first introduce the basics of projective
geometry which underlie the theory of image acquisition
by means of a camera. Then, we describe the road sign
detection algorithm, based on computer vision and PSO,
focusing, in particular, onto the fitness function, whose
optimization drives the whole detection process.
123
ð3Þ
where Pi represents the 3D coordinates of a point in the
world, while pi represents its 2D projection on the image
expressed with homogeneous coordinates. The matrix
A [ M3x3 models a central linear projection and is
usually expressed as
2
3
f x 0 u0
A ¼ 4 0 fy v0 5:
ð4Þ
0 0 1
Here, briefly, fx and fy are, respectively, estimates of the
focal length along the x and y direction of the image and
(u0,v0) is an estimate of the principal point (a.k.a. the
‘center of projection’) of the image. The process whose
aim is to determine the above parameters as well as the
position of the camera in the world (not needed in our case)
is usually referred to as camera calibration. The so-called
extrinsic parameters describe the camera position and
orientation while the intrinsic ones are those appearing in
Eq. 4, i.e. focal lengths and center of projection.
In the literature many algorithms for camera calibration
have been proposed for monocular [30, 36, 37] and stereo
systems [17, 39]. Many of them are based on particular
hypotheses that ease the calibration step; usually these
hypotheses are not verified in automotive environments,
such as short-distance perception, still scenarios, or still
camera. In our case, we can consider the last two constraints to be satisfied, since we are not taking into account
the correlation between subsequent frames of a video
sequence to infer a model of the car motion and forecast
trajectories, but analyze each frame as an independent
image. We also set the origin of our ‘world’ reference
frame which, in our case, is coincident with the camera
frame, to be a fixed point in the car (see Fig. 1).
Other issues to be tackled in general outdoor and,
especially, in automotive applications are related to specific conditions of outdoor environments. In fact, temperature and illumination conditions can vary and can be
barely controlled. Regarding illumination, in particular,
extreme situations like direct sunlight or strong reflections
Evol. Intel. (2010) 3:155–169
Fig. 1 The projection model used in our system: OwX, OwY, OwZ are
the three axes of the world reference system (which in our case is
coincident with the camera reference system), f is the focal distance,
(u0,v0) is the projection center, and pi : (ui, vi) is the projection of a
generic point Pi : (Xi, Yi, Zi) onto the image plane
must be taken into account. Other light sources, such as car
headlights or reflectors, interfering with the external environmental light, might also be present in a typical automotive scene.
3.2 PSO-based road sign detection algorithm
Suppose that an object of known shape and color, having
any possible orientation, may appear within the field of
view of a calibrated camera. In order to detect its presence
and, at the same time, to precisely estimate its position, one
can use the following algorithm (see also [34]):
1.
2.
3.
Consider a set of key contour points, of known
coordinates with respect to a reference position, and
representative of the shape and colors of the object.
Translate (and rotate) them to a hypothesized position
visible by the camera and project them onto the image.
Verify that color histograms of the sets of key points
match those of their projection on the image to assess
the presence of the object being sought.
Road signs are relatively simple objects belonging to
few possible classes characterized by just a few regions of
homogeneous colors. Each sign class can be described by a
model consisting of a few sets of key points which lie just
near the color discontinuities, with points belonging to the
same set being characterized by the same color. Once all
points in a set are projected onto the image plane, one must
verify that the colors of the corresponding pixels in the
image match the ones in the model. A further set of points,
lying just outside the object silhouette, can help verify
whether the object border has been detected: this is, in
general, confirmed when colors of corresponding pixels in
159
such a region are significantly different from those of the
object.
In Fig. 2 we show three classes of traffic signs (priority,
warning, and prohibitory signs), along with the sets of
points of the model we use to represent them. For each
model, we consider three sets of 16 points: one lies just
outside the external border (therefore, on the image background), one on the red band just inside the external border,
and one on the central white area, as close to the red border
as possible. Please notice that, for the prohibitory signs, we
use points uniformly distributed along their circular border
while, for the triangular priority and warning signs, points
are more densely distributed in proximity of the corners.
This choice reduces the chance of mismatching circular
signs to triangular ones since, at a similar scale, the corners
of triangular signs lie well outside the borders of the circular ones.
If a calibrated camera is available on a moving car,
given an estimate of the position and rotation of a road sign
inside the camera’s 3D field of view, the sets of points in
the world reference frame can be roto-translated to this
position and then projected onto the image plane, to verify
the likelihood of the estimate by matching color histograms. All is needed for detection is a method to generate
estimates of signs positions and refine them until the actual
position of a sign is found.
When a pose estimate is available for a sign, all points
belonging to its model can then be projected onto the
image plane using the following equation:
pi ¼ A ðRe Pi þ te Þ
ð5Þ
where te represents the offset/position of the sign in the x,
y and z directions with respect to the camera mounted on
the car (in our case, to the world reference system, as well).
Re is a 3 9 3 rotation matrix derived from the estimate of
the sign rotation: since a free rotation in the 3D space can
always be expressed with three degrees of freedom, it is
sufficient to estimate three values (e.g., the rotation angles
around the three axes) in order to represent all possible
rotations of a sign.
To this aim we apply PSO, as introduced in Sect. 2. In
our method, each swarm generates location estimates for a
specific class of signs; each particle in the swarm encodes
an estimate of the sign position by four values, which
represent its offsets along the x, y and z axes, as well as its
rotation around the vertical axis (yaw) in the camera reference frame. Although our system is already structured for
estimating all six degrees of freedom of a pose estimate, we
deliberately chose to ignore the rotation around the camera
optic axis (roll) and the horizontal axis (pitch) after some
preliminary tests. Although it makes sense to have the
system able to estimate every possible rotation of a sign,
we had no experimental evidence about this need, at least
123
160
500
400
Evol. Intel. (2010) 3:155–169
500
set 1
set 2
set 3
400
500
set 1
set 2
set 3
400
300
300
300
200
200
200
100
100
100
0
0
0
-100
-100
-100
-200
-200
-200
-300
-300
-300
-400
-400
-400
-500
-500
-500 -400 -300 -200 -100 0
-500 -400 -300 -200 -100 0
100 200 300 400 500
100 200 300 400 500
set 1
set 2
set 3
-500
-500 -400 -300 -200 -100 0
100 200 300 400 500
Fig. 2 The three different sets of points used to represent a priority sign (left), a warning sign (center), and a prohibitory sign (right). The
dimensions of these models conform to the Italian standards (largest versions). All coordinates are expressed in millimeters
for the general road configurations we dealt with in our
tests. In fact, introducing all three angles would not affect
the complexity of the fitness function, since the full
transformation is already computed anyway but, of course,
it would significantly increase the size of the PSO search
space.
A particle swarm can then be used to detect the presence
of signs of a specific class within the image, by assigning a
fitness value to each position estimate encoded by its particles. Such a value is proportional to the similarity
between the projections onto the image plane of the points
belonging to the sign model, obtained according to Eq. 5,
and the corresponding image pixels. If the fitness value in
the point (particle position) under evaluation is above a
given threshold, we consider that point to be the location of
a sign of the class associated to the swarm.
This is the main feature that characterizes this algorithm.
In fact, having an accurate estimation of the position and
orientation of a sign offers the possibility to rectify its
image by means of a simple Inverse Perspective Mapping
(IPM) transform [1], in order to obtain a pre-defined view.
This means it is always possible to obtain a standardized
view (in terms of size and relative orientation) of the
detected sign. This is the optimal input for a classifier
whose purpose is to recognize the content of a detected
sign irrespective of its actual orientation and distance.
Having the signs pre-classified into different classes in the
detection phase is a further significant advantage which
makes recognition easier and more accurate, since a separate classifier can then be used for each class. At the
moment, we only focus on detection: no classification of
the signs which have been detected is performed, even if
we plan to add a sign recognition module to our system in
the immediate future.
PSO is run at each new frame acquisition for a predefined number of generations. Actually, the algorithm
structure and its GPU implementation permit to schedule
123
more than one PSO runs per frame. On the one side, this
offers a second opportunity to detect a sign which was
missed in the previous run on that frame. On the other side,
it also allows each swarm to detect more signs belonging to
the same class in the same frame. In fact, when a sign is
detected in the first run, it is ‘hidden’ in order to prevent the
corresponding swarms from focusing on it again during the
subsequent runs.
In the next subsection we describe the fitness function
we use in our PSO based approach in details.
3.3 Fitness function
Let us denote the three sets of points used to describe each
sign class (see, for example, the models in Fig. 2) as
S1 ¼ fs1i g, S2 ¼ fs2i g and S3 ¼ fs3i g, with sxi 2 R2 (they all
lie on the xy plane), with i [ [1,16]. Based on the position
encoded by one particle and on the projection matrix
derived from the camera calibration, each set of points is
roto-translated and projected onto the current frame,
obtaining the corresponding three sets of points which lie
on the image plane P1 ¼ fp1i g, P2 ¼ fp2i g and P3 ¼ fp3i g.
To verify whether the estimated position is actually
correct, three color histograms [33] in the HSV colorspace,
one for each channel, are computed for each set Px with
x [ {1, 2, 3}. Let us denote each of them as Hcx , formally
defined as:
n
1X
dðIc ðpxi Þ bÞ
ð6Þ
Hxc ðbÞ ¼
n i¼1
where c [ {H, S, V} specifies the color channel, x [ {1, 2, 3}
identifies the set of points, b [ [1, Nbin], (Nbin being the
number of bins in the histogram), n represents the number
of points in the set (sixteen in our case), the function
d(n) returns 1 when n = 0 and zero otherwise and, finally,
Ic ðpÞ : R2 ! R maps the intensity of channel c at pixel
Evol. Intel. (2010) 3:155–169
161
location p to a certain bin index. The term 1n is used
PNbin c
to normalize the histogram such that
b¼1 Hx ðbÞ ¼ 1.
Moreover, three additional histograms, denoted as Hcref , are
used as reference histograms for the red band surrounding all
three sign models taken into consideration. The
Bhattacharyya coefficient q [14], which offers an estimate
of the amount of overlap between two statistical samples, is
then used to compare the histograms.
qðH1 ; H2 Þ ¼
Nbin pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
H1 ðbÞH2 ðbÞ:
ð7Þ
b¼1
The Bhattacharyya coefficient returns a real value
between 0, when there is no overlap at all between the
two histograms, and 1, when the two histograms are
identical. Finally, if we use
Sx;y ¼
H
S
S
V
V
qðHH
x ; H y Þ þ qðH x ; H y Þ þ qðH x ; Hy Þ
3
ð8Þ
to express the similarity of the two triplets of histograms
computed for the sets of points x and y, we can express the
fitness function as
f ¼
k0 ð1 S1;2 Þ þ k1 ð1 S2;3 Þ þ k2 S1;ref
k0 þ k1 þ k2
ð9Þ
where k0 ; k1 ; k2 2 Rþ are used to weigh the contributions
of the three distances appearing in the above equation.
Such a fitness function requires that:
•
•
•
histograms computed on the first two sets of points be
as different as possible, hypothesizing that, in case the
sign had been detected, the background color nearby
the sign would differ significantly from the red band.
the histogram of the points in the red band be as
different as possible from the one computed on the
inner area of the sign.
histograms H c1 resemble as much as possible the
reference histograms H cref computed for the red band
surrounding the sign.
Histograms of regions having colors that differ only
slightly from the model, possibly because of noise, produce
high values of S1,ref. The fitness function f will therefore be
close to 1 only when the position of one particle is a good
estimate of the sign pose in the scene captured by the
camera.
Actually, despite this being the most natural and general
way to express this sort of fitness function, we noticed that
system performances improved if we ignored the V (Value,
or Intensity) channel at all. The reason for this stands
probably in the fact that the lighting conditions of the
region surrounding a sign are usually rather uniform, which
makes intensity information useless for the discriminant
properties of the fitness function, even if it affects its value.
At the same time, we verified that in evaluating the reference (red) color it is preferable to neglect also the S (Saturation) channel. This means that, for the red band, we only
use the H (Hue) channel, which is the only channel which
encodes pure color information.
4 Experimental results
The PSO parameters were set to w = 0.723,
C1 = C2 = 1.193. Three swarms of 64 particles were run
for up to 200 generations per frame to detect regulatory
signs (of circular shape), warning signs (of triangular
shape), and priority signs (of reversed triangular shape),
respectively. The coefficients appearing in Eq. 9 were
empirically set as follows: k0 = 1.4, k1 = 1.0, k2 = 0.8.
The fitness threshold above which we considered a detection to have occurred was set to 0.9. The search space, in
world coordinates, ranged from -4 m to 6.5 m along the
horizontal direction (x axis), from 6 m to -1.6 m vertically
(y axis), and from 9.5 m to 27 m in the direction of the car
motion and of the camera optic axis (z axis). We finally
allowed a range p4 ; p4 for sign rotation with respect to
the vertical axis. All previous settings were set empirically
after some preliminary tests in order for them to represent
general values, independent of a particular sequence of
images, defining a reasonable invariant region for the PSO
to explore.
The overall test of the system was divided in two separate phases. During the first one, synthetic video sequences were used to assess the ability of the system to correctly
find and estimate the sign poses. During the second, more
significant phase, real-world images were processed to
assess the system’s detection performances in typical urban
and suburban environments.
4.1 Tests on synthetic images
In the first test phase, we simulated a 3D rural environment
with a road and a pair of traffic signs using the public
domain raytracer POV-Ray1 We relied on the Roadsigns
macros by Chris Bartlett2 to simulate the signs and on some
ready-to use objects by A. Lohmüller and F. A. Lohmüller3
to simulate the road. Bumps and dirtiness were added to the
traffic signs in order to simulate more realistic conditions.
Fig. 3 shows a sample frame from one of the synthetic
sequences. As time passes, the simulated car moves forward zigzagging from left to right. At the same time, as
they get closer to the car, the two signs rotate around their
1
2
3
http://www.povray.org.
http://lib.povray.org/collection/roadsigns/chrisb2.0/roadsigns.html.
http://f-lohmueller.de/pov_tut/objects/obj_500i.htm.
123
162
Evol. Intel. (2010) 3:155–169
Fig. 3 Sample frame taken from our synthetic video sequence
simulating a country road scenario with two differently shaped
roadsigns
vertical axis. We introduced rotations to test the ability of
our system to estimate the actual roto-translation between
the camera and the sign which is detected. In fact, in our
case, each particle moves in R4 and its position represents
the x, y and z offsets of the sign as well as its rotation with
respect to the vertical axis (yaw).
Figure 4 shows three frames from the very beginning,
from the middle, and from the end of the sequence,
respectively. Image contrast has been reduced in this figure
to better highlight the swarm positions. White points
superimposed to the images represent the best-fitness
estimate of the sign position, while black points depict the
hypotheses represented by all other individuals. In Fig. 4.a
it is possible to see the two swarms during the initial search
phase: in this case both are on the wrong target despite
being already in the proximity of a sign. Figure 4b, c show
how the two swarms correctly converged onto their targets.
For a more detailed performance analysis, Fig. 5 shows
results obtained in estimating the actual position of the
signs throughout the sequence.
Figure 5 (top left) shows the actual x position and the
estimated one (mean and standard deviation over one
hundred runs), versus the frame number, for both the
warning (light line) and the regulatory (dark line) signs. As
can be seen, the horizontal position for the two signs is
correctly detected until the end of the sequence with a
precision of the order of centimeters. The sinusoidal trend
of the two position reflects the zigzagging behaviour of the
car we simulated. The top right part of the figure shows the
results for the y coordinates. This time the actual position is
constant since the simulated car is supposed to have a
constant pitch. Again, the estimated position is correct with
errors of just few centimeters. Similar considerations can
be made for the bottom left graph of Fig. 5, which reports
123
Fig. 4 Output of a run of our road sign detection system at the very
beginning (a), at middle length (b) and near to the end (c) of a
synthetic video sequence
results of depth (z coordinate) estimation. Even if signs are
rather far from the car (about fifteen meters in the beginning) estimates are very good with errors of less than half a
meter. The error is mostly due to the distance between the
two most external sets of points, which introduces a tolerance in the estimation of the actual position of the target
3000
2000
163
(b) 1050
Regulatory: estimates pos
Warning: estimated pos
Regulatory: actual pos
Warning: actual pos
roadsigns’ y position (mm)
(a)
roadsigns’ x position (mm)
Evol. Intel. (2010) 3:155–169
1000
0
-1000
-2000
-3000
-4000
Regulatory: estimated pos
Warning: estimated pos
Regulatory: actual pos
Warning: actual pos
1000
950
900
850
800
750
700
650
600
0
20
40
60
80
100
120
140
160
180
200
0
20
40
60
roadsigns’ z position (mm)
Regulatory: estimated pos
Warning: estimated pos
Regulatory: actual pos
Warning: actual pos
18000
100
120
140
160
180
200
frame number
16000
14000
12000
10000
(d)
0.8
roadsigns’ Yaw angle (rad)
frame number
(c) 20000
80
0.6
8000
Regulatory: estimates pos
Warning: estimated pos
Regulatory: actual pos
Warning: actual pos
0.4
0.2
0
-0.2
-0.4
-0.6
0
20
40
60
80
100
120
140
160
180
200
0
20
40
60
frame number
80
100
120
140
160
180
200
frame number
Fig. 5 Position estimation errors: a shows the estimation of the horizontal position (x coordinate) for both signs, b shows the vertical position
estimates, and c the depth estimates. The estimations of sign rotations around the vertical axis are shown in d
border. Tightening this distance could improve the precision of the results but, at the same time, would make it
more difficult to obtain high fitness values for signs which
are far from the car because, in that situation, the depth
value is large and the projections of these two sets of points
on the image almost overlap, producing color histograms
which are very similar.
Finally, in the bottom right part of the figure we show
results of yaw estimation. In this case results are not as
precise: it is possible to see how the rotation of the
warning sign is estimated rather precisely, even if, in the
first third of the simulation, the standard deviation is very
high, while the rotation of the regulatory sign seems to be
randomly estimated. Since we are dealing with small
angles this must be due, again, to the tolerance in locating
the signs introduced by the distance between the two
external sets of points, which is more likely to affect
fitness when matching circular signs than with triangular
ones.
4.2 Real-world tests
The system was then validated on two real-world image
sequences acquired with cameras placed on-board a car.
We compared results obtained by our system with those
obtained by a road sign recognition system previously
developed by some of the authors [2], affiliated to VisLab4,
the computer vision laboratory of our Department at the
University of Parma. For this reason we will refer to such a
system as the VisLab system from now on.
The benchmark used for this comparison comprised two
of the sequences that were acquired during its development. The first sequence, which includes 10000 frames
having a resolution of 750 9 480 pixels, acquired at 7.5 fps
by a PointGrey Firefly color camera, and is therefore about
22 min long, was recorded while driving on the ‘Parma
orbital’ on a sunny day. Since the road circumnavigates the
town and the sequence includes a full tour of the orbital,
the sequence contains images featuring all possible light
orientations. The second sequence, having the same resolution, is little less than 5000 frames long and was acquired
by an AVT Guppy color camera at 7.5 fps on the ‘Turin
orbital’ on a cloudy day. Images in this sequence feature
more constant lighting but lower contrast. A significantly
long segment of this sequence has been acquired while
driving in an urban environment, while most of the first
4
http://www.vislab.it.
123
164
sequence has been acquired on a separate-lane road. In the
two sequences, the car on which the cameras were mounted
runs at speeds ranging from 0 (a few crossings and
roundabouts are present) to more than 70 km/h.
Using the settings reported above, signs were detected at
an average distance from the car of about 12 m Fig. 6
displays screenshots of the system, taken while running it:
the main window area shows the actual camera view being
currently processed, while the right vertical panel keeps
track of the last road sign detected for each one of the three
categories (regulatory, warning, and priority, see caption).
In Fig. 7 it is possible to see some examples of the signs
which could be detected, as they appear after rectification.
Signs (a)–(c) were detected with normal light conditions.
Signs (d) and (e) had direct sunlight, while sign (f) had a
strong backlight. Images (g) and (h) were the only two false
positives we ever observed during all our tests: image (g)
shows a red car partially occluded by another one, which
Fig. 6 Sample screen snapshots
taken while processing realworld images. On the right the
images of the detected signs,
correctly rectified to a standard
frontal view, are shown: the top
frame shows prohibitory signs,
the middle frame warning signs,
and the bottom frame priority
signs. Images of the last
detected signs remain visible on
the window until a new sign of
the same category is detected.
The rectangle highlights the
sign which was last detected
123
Evol. Intel. (2010) 3:155–169
creates an inverted triangular shape with a partially red
border, and was detected as a priority sign, while (h), a
detail of a commercial poster showing a red line which
forms a triangle with a red border, was (correctly?)
detected as a warning sign. Sign (i) was partially covered
by a tree. Sign (j) was depicted inside a bigger yellow
panel. The internal region of signs (b) and (i)–(l) is not
white, which highlights the characteristic of the system of
looking for a red band coupled with two color discontinuities on either sides. Finally, signs (m)–(o) were correctly detected in extremely poor lighting conditions: in the
original frames, which are shown in Fig. 8a, they were
difficult to see even by humans. Looking again at the
rectified sign (a), and comparing it to the original image
showed in Fig. 8b it is possible to notice how the rectification process almost reduced to zero the sign rotation with
respect to the vertical axis, causing it to appear as if it had
been observed from a perpendicular frontal viewpoint.
Evol. Intel. (2010) 3:155–169
165
Fig. 7 Some of the signs that
were detected, after rectification.
Our system is able to detect
signs in very different lighting
conditions
A validation software based on the unique code assigned
to each sign permitted to further assess the effectiveness of
the system. Based on the annotations made during the
development of the VisLab system, the validation software
produced detection statistics in terms of false positives,
false negatives, and correct detections (true positives).
The above statistics were compared with those obtained,
on the same sequence, by the VisLab system. Such a
123
166
Fig. 8 Original frames where signs (m)–(o) (above), and a (below) of
Fig. 7 were detected
system implements a three-steps process (see Fig. 9): color
segmentation, shape detection, and classification based on
several neural networks. Since illumination conditions
deeply affect the first step, a particular gamma correction
method has been developed to compensate for color drifts
typical of sunset or dawn [2].
Results obtained on the sequence shot in Parma are
compared in Table 1, while data in Table 2 compares
results obtained on the sequence shot in Turin.
Since PSO is a stochastic algorithm, we also assessed
our system’s performance repeatability by executing several runs for each sequence. The tables report, for our
system, the best and worst results obtained in the test runs.
Fig. 9 Block diagram of the
VisLab system used as
reference (from [2])
123
Evol. Intel. (2010) 3:155–169
In general, the performances of our system were comparable, and often better, than those of the VisLab system,
in terms of correct detections. The system was also very
selective and only generated two false positives (reported
in Fig. 7) which, curiously enough, were only noticed while
annotating the results by hand and were not counted as
false positives by the validation software because, by
chance, in the same frames there was actually a sign of
exactly the same kind.
Considering the conditions in which the two test
sequences have been acquired and the results yielded by
the two systems on such sequences, our system seems to be
more robust than VisLab’s with respect to light changes
and critical conditions such as backlight or blinding. In
fact, while results on the sequence shot in Turin are comparable, our system outperforms the VisLab’s if the
sequence shot in Parma is taken into consideration. The
only exception, noticeable in both sequences, regards priority signs. This might seem to contrast with the good
performances obtained on the warning signs, with which
they share shape and reference point sets, differing only for
orientation. However, one should notice that the pole on
which the signs are usually mounted intersects the warning
signs contour in a position (the middle of one side) where
few reference points are present, while it does so close to a
vertex of the priority sign, where reference points for such
a sign are denser. This suggests that the fitness function
may be affected negatively by such an ‘interference’ and
that a different distribution of reference points for such a
sign class may limit this problem. Another possible justification might be offered by the fact that priority signs are
often mounted above other signs, such as roundabout signs,
which could also affect the fitness function.
4.3 Computation efficiency
The CUDA implementation of the whole system described
above permitted to achieve very good execution times.
Experiments were run on two different machines. The first
one is equipped with a 64-bit Intel(R) Core(TM)2 Duo
Evol. Intel. (2010) 3:155–169
167
Table 1 Results for the CUDA-PSO and the VisLab (reference) systems on the ‘Parma orbital’ sequence
System
VisLab (reference)
# Category
(tot)
False Pos
False Neg
# Priority
(29)
4
9 (31%)
# Prohibitory
(51)
1
33 (64.7%)
# Warning
(44)
0
# Total
(124)
5
CUDA-PSO-RSD
Correct Det
False Pos min–max
False Neg min–max
Correct Det min–max
20 (69%)
1–1
9–10 (31–34.5%)
19–20 (65.5–69%)
18 (35.3%)
0–0
20–26 (39.2–51%)
25–31 (49–60.8%)
23 (52.3%)
21 (47.7%)
1–1
10–13 (22.7–29.5%)
31–34 (70.5–77.3%)
65 (52.4%)
59 (47.6%)
2–2
39–49 (31.5–39.5%)
75–85 (60.5–68.5%)
Table 2 Results for the CUDA-PSO and the VisLab (reference) systems on the ‘Turin orbital and downtown’ sequence
System
VisLab (reference)
CUDA-PSO-RSD
# Category
(tot)
False Pos
False Neg
Correct Det
False Pos
False Neg min–max
Correct Det min–max
# Priority
# Prohibitory
(14)
(53)
0
4
1 (7.1%)
22 (41.5%)
13 (92.9%)
31 (58.5%)
0–0
0–0
4–8 (28.6–57.1%)
20–22 (37.7–41.5%)
6–10 (42.9–71.4%)
31–33 (58.5–62.3%)
# Warning
(45)
0
10 (22.2%)
35 (77.8%)
0–0
7–8 (15.6–17.8%)
37–38 (82.2–84.4%)
# Total
(112)
4
33 (29.5%)
79 (70.5%)
0–0
31–38 (27.7–33.9%)
74–81 (66.1–72.3%)
processor running at 1.86 GHz with a moderately priced
GeForce 8800GT video card by nVIDIA, equipped with
1 GB of video RAM and fourteen multi-streaming processors (MPs), which adds up to a total of 112 processing
cores. The second one is powered by a 64-bit Intel(R)
Core(TM) i7 CPU running at 2.67 GHz, combined with a
top-level Quadro FX5800 graphics card, also by nVIDIA,
having 4 Gb of video RAM and thirty MPs or, in other
words, 240 processing cores.
In spite of the great disparity between the two setups, no
major differences were observed between the execution
times. This suggests that our system, in spite of the
impressive amount of operations required, still does not
saturate the computational power of this kind of GPUs.
All our tests were performed off-line. The input video
sequences were encoded as a set of single shots to be
singularly loaded from disk with no streaming. Therefore,
in all timing operations we took care to exclude disk input/
output latency times. In fact, live recordings from a camera
connected to the PC would permit to directly transfer
images data into the computer RAM in almost negligible
time, avoiding slow disk accesses.
With this setup, we obtained good performances with
many different system settings. For example, employing
simultaneously 3 swarms (one per each sign class under
consideration) composed by 64 particles, and activated
twice per frame for 200 generations, yielded a processing
speed of about 20 frames per second (about 50 ms of actual
processing time per frame). The frame rate improved to
about 24 fps if each swarm was run just once per frame.
Using swarms of 32 particles resulted in about 30 fps and
48 fps when two runs or one single run per frame were
executed, respectively. In another set of tests, where only
100 generations per run were simulated, processing speed
further increased to about 50 fps executing two PSO runs
per frame and 65 fps with a single run.
Running each swarm more than once and ‘hiding’, in the
subsequent runs, any sign that has been previously detected
allows a swarm to deal effectively with the presence of
pairs of signs of the same class in the same frame, an event
which occurs rather frequently.
A sequential version of the algorithm would require up
to about half a second for the most demanding (and performing) of the settings reported above. This means that
the same system would hardly process video at 4–5 fps, a
processing speed which is not acceptable for this kind of
application.
Therefore, our system is able to detect many types of
road signs processing images at a speed close to full frame
rate. Considering that many existing systems actually work
at speeds that are significantly lower than full frame rate,
we can also say that some time remains available to perform sign classification. From this point of view, being able
to reconstruct a frontal view of given size for each sign
which is detected by our system allows for using rather
simple classifiers in the recognition stage. Therefore, we
expect that embedding such a stage in our system, which is
actually our final goal, will not affect processing speed
significantly.
Even if it is not possible to directly compare the processing performance of our system to the ones of the
VisLab system (which is stated to be able to run at about
123
168
13 fps including sign classification on a dual-processor
Pentium PC with clock frequency of 2 GHz), the performances of our system appear to be competitive with our
reference.
Evol. Intel. (2010) 3:155–169
5.
5 Conclusions and future directions
6.
We have shown that PSO, provided with a suitable fitness
function, can effectively detect traffic signs in real time.
Experimental results on both synthetic and real video
sequences showed that our system is able to correctly
estimate the position of the signs it detects with a precision
of about ten centimeters in all directions, with depth being
(rather obviously) the less accurate.
We have considered signs of three possible classes,
which, in normal road settings, account for far more than
half of the occurrence of signs. In any case, the three classes
taken into consideration are also those which are more
likely to be confused among one another. In fact, the sign
classes omitted from our analysis have features that are
mostly ‘orthogonal’ to the one that characterize the signs
presently sought and to one another’s. These considerations,
along with the modularity of our system, lead us to expect
that extending our system to those other classes could be
feasible by introducing only small changes in the system,
obtaining comparable results in terms of quality. As concerns processing speed, the considerations about scalability
made in Sect. 4.3 also induce optimistic expectations.
Finally, the way our system detects road signs permits
us to re-project all the images of the detected signs back to
a standard frontal view which represents the optimal input
for a classification step. Because of this, the introduction of
a classification module into the system has the highest
priority in our near-future agenda.
Acknowledgments We would like to express our thanks and
appreciation to Gabriele Novelli, Denis Simonazzi and Marco
Tovagliari for their help in tuning, testing and assessing the performances of our system.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
References
1. Mallot HA, Bülthoff HH, Little JJ, Bohrer S (1991) Inverse
perspective mapping simplifies optical flow computation and
obstacle. Biol Cybern 64(3):177–185
2. Broggi A, Cerri P, Medici P, Porta PP, Ghisio G (2007) Real time
road signs recognition. In: Proceedings of IEEE intelligent
vehicles symposium 2007, Istanbul, Turkey, pp 981–986
3. Cagnoni S, Mordonini M, Sartori J (2007) Particle swarm optimization for object detection and segmentation. In: Applications
of evolutionary computing. Proceeding of EvoWorkshops 2007,
Springer, pp 241–250
4. Cagnoni S, Mussi L, Daolio F (2009) Empirical assessment of
the effects of update synchronization in Particle Swarm
123
19.
20.
21.
22.
Optimization. In: Poster and workshop proceedings of the XI
conference of the Italian association for artificial intelligence.
Reggio Emilia, Italy (2009). Electronic version, ISBN 978-88903581-1-1
Anton Canalis L, Hernandez Tejera M, Sanchez Nielsen E (2006)
Particle swarms as video sequence inhabitants for object tracking
in computer vision. In: Proceedings of IEEE international conference on intelligent systems design and applications (ISDA’06),
pp 604–609
Engelbrecht AP (2007) Computational Intelligence: an Introduction, 2nd edn. Wiley, England
de la Escalera A, Armignol JM, Mata M (2003) Traffic sign
recognition and analysis for intelligent vehicles. Image Vis
Comput 21(3):247–258
de la Escalera A, Moreno LE, Puente EA, Salichs MA (1994)
Neural traffic sign recognition for autonomous vehicles. In:
Proceedings of IEEE 20th international conference on industrial
electronics, control and instrumentation 2:841–846
Gao X, Shevtsova N, Hong K, Batty S, Podladchikova L, Golovan A, Shaposhnikov D, Gusakova V (2002) Vision models based
identification of traffic signs. In: Proceedings of European conference on color in graphics image and vision. Poitiers, France,
pp 47–51
Gavrila D (1999) Traffic sign recognition revisited. In: Mustererkennung 1999, 21. DAGM-symposium. Springer, pp 86–93
Hoessler H, Wöhler C, Lindner F, Kreßel U (2007) Classifier
training based on synthetically generated samples. In: Proceedings of 5th international conference on computer vision systems.
Bielefeld, Germany
Ivekovic S, John V, Trucco E (2010) Markerless multi-view
articulated pose estimation using adaptive hierarchical particle
swarm optimisation. In: Di Chio C et al (eds) Applications of
evolutionary computing: proceedings of EvoApplications 2010,
Istanbul, Turkey, Part I, LNCS 6024, Springer, pp 241–250
Jiang G-Y, Choi TY (1998) Robust detection of landmarks in
color image based on fuzzy set theory. In: Proceedings of IEEE
4th international conference on signal processing 2:968–971
Kailath T (1967) The divergence and Bhattacharyya distance
measures in signal selection. IEEE Trans Commun Technol
15(1):52–60
Kennedy J, Eberhart R (1995) Particle swarm optimization. In:
Proceedings IEEE international conference on neural networks,
IV, IEEE, New York, pp 1942–1948
Kennedy J, Mendes R (2002) Population structure and particle
swarm performance. In: Proceedings of congress on evolutionary
computation—CEC, IEEE, pp 1671–1676
Hyukseong K, Park J, Kak A (2007) A new approach for active
stereo camera calibration. In: Proceedings of IEEE international
conference on robotics and automation, pp 3180–3185
Liang J, Qin A, Suganthan P, Baskar S (2006) Comprehensive
learning particle swarm optimizer for global optimization of
multimodal functions. IEEE Trans Evol Comput 10(3):281–295
Loy G, Barnes N (2004) Fast shape-based road signs detection for
a driver assistance system. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems, Sendai,
Japan, pp 70–75
Makoto M, Takuji N (1998) Mersenne Twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans Model Comput Simul 8(1):3–30
Mussi L, Cagnoni S (2008) Artificial creatures for object tracking
and segmentation. In: Applications of evolutionary computing:
proceedings of EvoWorkshops 2008, Springer, pp 255–264
Mussi L, Cagnoni S (2009) Particle swarm for pattern matching
in image analysis. In: Serra R et al (eds) Proceedings of WIVACE
2008, Italian Workshop on artificial life and evolutionary computing, World Scientific, pp 89–98
Evol. Intel. (2010) 3:155–169
23. Mussi L, Daolio F, Cagnoni S (2010) Evaluation of particle
swarm optimization algorithms within the CUDA architecture.
Inf Sci. doi:10.1016/j.ins.2010.08.045
24. Nguwi Y, Kouzani, A (2006) A study on automatic recognition of
road signs. In: Proceedings of IEEE conference on cybernetics
and intelligent systems. Bangkok, Thailand, pp 1–6
25. nVIDIA Corporation (2009) nVIDIA CUDA Programming Guide
v. 2.3. http://www.nvidia.com/object/cuda_develop.html
26. Montes de Oca M, Stützle T, Birattari M, Dorigo M (2009)
Frankenstein’s PSO: a composite particle swarm optimization
algorithm. IEEE Trans Evol Comput 13(5):1120–1132
27. Owechko Y, Medasani S (2005) A swarm-based volition/attention framework for object recognition. In: Proceedings of IEEE
conference on computer vision and pattern recognition—workshops (CVPR’05). IEEE, pp 91–91
28. Poli R (2008) Analysis of the publications on the applications of
particle swarm optimisation. J Artif Evol Appl 2008(1):1–10
29. Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimization: an overview. Swarm Intel 1(1):33–57
30. Tsai RY (1987) A versatile camera calibration technique for
high-accuracy 3D machine vision metrology using off-theshelf TV cameras and lenses. IEEE J Robot Autom 3:323–
344
31. Shadeed WG, Abu-Al Nadi DI, Mismar MJ (2003) Road traffic
sign detection in color images. In: Proceedings of IEEE 10th
international conference on electronics, circuits and systems
2:890–893
169
32. Soetedjo A, Yamada K (2005) Fast and robust traffic sign
detection. In: Proceedings of IEEE international conference on
systems, man and cybernetics 2:1341–1346
33. Sonka M, Hlavac V, Boyle R (2007) Image processing, analysis,
and machine vision, 3rd edn. CL-Engineering
34. Taiana M, Nascimento J, Gaspar J, Bernardino A (2008) Samplebased 3D tracking of colored objects: a flexible architecture. In:
Proceedings of British machine vision conference (BMVC’08).
BMVA, pp 1–10
35. Vitabile S, Pollaccia G, Pilato G (2001) Road signs recognition
using a dynamic pixel aggregation technique in the HSV color
space. In: Proceedings of international conference on image
analysis and processing, Palermo, Italy, pp 572–577
36. Wei GQ, Ma SD (1994) Implicit and explicit camera calibration:
theory and experiments. IEEE Trans Pattern Anal Machine Intell
16(5):469–480
37. Zhang Z (2000) A flexible new technique for camera calibration.
IEEE Trans Pattern Anal Machine Intell 22(11):1330–1334.
http://research.microsoft.com/*zhang/Calib/
38. Zhang X, Hu W, Maybank S, Li X, Zhu M (2008) Sequential
particle swarm optimization for visual tracking. In: Proceedings
of IEEE conference on computer vision and pattern recognition
(CVPR’08). IEEE, pp 1–8
39. Ziraknejad N, Tafazoli S, Lawrence P (2007) Autonomous stereo
camera parameter estimation for outdoor visual servoing. In:
Proceedings of IEEE Workshop on machine learning for signal
processing, pp 157–162
123
Download