Sensor-Based Pedestrian Protection

advertisement
Intelligent Transportation Systems
Editor: Alberto Broggi
University of Pavia, Italy
broggi@ce.unipr.it
Sensor-Based Pedestrian
Protection
Dariu M. Gavrila, DaimlerChrysler Research
T
raffic accidents worldwide kill more than 430,000
pedestrians and injure more than 39,000 yearly (see
Table 1, left). For the European Union (EU), the corresponding numbers are over 155,000 and 6,000 (see Table 1,
right). Pedestrian accidents represent the second-largest
source of traffic-related injuries and fatalities, after accidents involving car passengers. Children are especially at
risk (see Figure 1).
This problem’s magnitude has caught legislators’ attention. The EU, for example, is studying proposals for legislating maximum-tolerated impact coefficients for a vehicle
hitting a child or adult pedestrian frontally at 40 kph. Two
classes of impact coefficients are under consideration: one
involving the primary impact areas—the lower and upper
legs—and the other involving the more dangerous secondary impact area—the head. Many aspects of such a specification are still subjects of considerable debate. One issue
is whether a component-based crash test, which hurls separate impactors toward the vehicle, can adequately model a
human body’s kinematics during a crash. Another issue
involves the large variation in pedestrian kinematics between a child and an adult, who have quite different centers
of mass at impact. Optimizing for one group can make
things worse for the other.
Final test procedures and numbers have not materialized
yet. However, the very dissimilar object properties (mass
and velocity) between pedestrians and vehicles make
energy absorption during a crash difficult. What’s more,
besides being “pedestrian friendly,” vehicles should perform well in crashes with hard objects, such as other vehicles and trees, and have an attractive design. Vehicle manufacturers are addressing these challenges by looking into
extendable vehicle body structures (such as the bumper and
hood) that activate upon first impact with a pedestrian.
A complementary approach is to focus on sensor-based
solutions, which let vehicles “look ahead” and detect pedestrians in their surroundings. This article investigates
the state of the art in this domain, reviewing passive, videobased approaches and approaches involving active sensors
(radar and laser range finders).
Video-based approaches
Video sensors are a natural choice for detecting people.
Texture information at a fine angular resolution enables
quite discriminative pattern recognition techniques. The
human visual-perception system is perhaps the best example of how well such sensors might perform, if we add the
appropriate processing. Besides, video cameras are cheap,
and because they do not emit any signals, they raise no
issues regarding interference with the environment.
Considerable computer vision research deals with
“looking at people.”1 What makes pedestrian recognition
applications on vehicles particularly challenging is the
moving camera, the wide range of possible pedestrian
Table 1. 1997 deaths and injuries due to traffic accidents (source: United Nations Economic Commission for Europe).
Worldwide
Passenger cars
Pedestrians
Bicycles
Mopeds
Motorcycles
Other
Total
European Union
Deaths
Injuries
Total
Deaths
Injuries
Total
75,615
39,670
6,872
3,151
10,972
28,397
3,751,024
436,422
236,027
163,854
227,946
1,303,571
3,826,639
476,092
242,899
167,005
238,918
1,331,968
22,502
6,049
2,421
2,385
3,821
4,559
995,026
155,151
141,870
139,442
124,023
121,816
1,017,528
161,200
144,291
141,827
127,844
126,375
161,677
6,118,844
6,283,521
41,737
1,677,328
1,719,065
NOVEMBER/DECEMBER 2001
1094-7167/01/$10.00 © 2001 IEEE
77
fic environment; vehicles generate heat too.
Even the pavement can appear hotter on a
summer day than a pedestrian’s body. So,
rather than offering the solution for pedestrian detection per se, infrared sensors provide a means to simplify the segmentation
problem. Pattern recognition techniques are
still necessary.
Active-sensor approaches
Video sensors do not directly provide
depth information; stereo vision derives
depth by establishing feature correspondence
and performing triangulation. On the other
hand, active sensors measure distances
directly.
Figure 1. A typical dangerous situation: a child suddenly steps into a street.
appearances, and the cluttered (uncontrolled) backgrounds. Most research on
vision-based pedestrian recognition has
taken a learning-based approach, bypassing
a pose recovery step altogether and describing human appearance in terms of
simple low-level features from a region of
interest (ROI). One line of research has
dealt specifically with scenes involving
people walking laterally to the viewing
direction, with recognition by either using
the periodicity cue2,3 or learning the characteristic lateral gait pattern.4
A crucial factor determining the success of learning methods is the availability of a good foreground region. Unlike
with applications such as surveillance,
where the camera is stationary, standard
background subtraction techniques are of
little avail here because of the moving
camera. Independent motion detection
techniques can help,3 but they are difficult to develop. Yet, given a correct initial
foreground, we can shift some of the burden to tracking.4–9
A complementary problem is to recognize pedestrians in single images; this is
particularly relevant for pedestrians standing still. One general approach involves
shifting windows of various sizes over the
image, extracting low-level texture features, and using standard pattern classification techniques to determine a pedestrian’s presence. For example, Constantine
Papageorgiou and Tomaso Poggio combine wavelet features with a support vector
machine classifier.10 More recently, Anuj
78
Mohan and his colleagues have extended
this research to involve a component-based
approach.11
However, this approach’s performance–
speed trade-off is currently unfavorable
for use in vehicles. The Chamfer System
addresses this through two-step object recognition.12 The first step applies hierarchical
template matching using contour features to
efficiently lock onto candidate solutions.
Matching is based on correlation with distance-transformed images. By capturing the
object’s shape variability through a template
hierarchy and by using a combined coarseto-fine approach in shape and parameter
space, this step achieves large speedups
compared to an equivalent brute-force
method. The second step reverts to texturebased pattern classification of the candidate
solutions that the first step provided.
Another powerful technique to establish
ROIs is stereo vision. Uwe Franke and his
colleagues combine stereo vision with texture-based pattern classification. I describe
two other stereo vision-based approaches
later.
Lately, interest has been increasing in
video sensors that operate outside the visible spectrum. Having long been used exclusively in the military domain, infrared
sensors are finding their way into civilian
applications owing to the advent of cheaper,
uncooled cameras. The principle of detecting pedestrians by the heat their bodies emit
is appealing (Takayuki Tsuji and his colleagues provide one example13). Yet pedestrians are not the only heat sources in a trafcomputer.org/intelligent
Radar
Some commercial vehicles already
employ radar for adaptive cruise control (for
example, the Distronic System on MercedesBenz S-Class cars). For near-distance applications, such as pedestrian detection, ongoing investigations focus on 24-GHz radar
technology.14 Radar-based systems can
enhance object localization by placing multiple sensors on the vehicle’s relevant parts
and applying triangulation-based techniques.
They can classify objects—that is, distinguish pedestrians from other objects such as
cars and trees—by examining the power
spectral-density plot of the reflected signals.
In this context, we consider an object’s spectral content and reflectivity. Objects with
smaller spatial extents, such as pedestrians,
have narrower peaks in the plot than, say,
cars. The material properties of the object’s
surface determine the strength of reflected
radar signals. Vehicles’ metallic parts reflect
much better than human tissue, by at least an
order of magnitude. Human tissue, in turn,
reflects much better than nonconductive
materials, such as the wood in trees.
Laser range finders
The main appeal of eye-safe laser range
finders lies in their fast, precise depth measurement and their large field of view. For
example, Martin Kunert, Ulrich Lages, and
I describe a laser range finder that has a
depth accuracy of +/− 5 cm and a range of
40 m for objects with at least 5 percent
reflectivity (this includes most, if not all,
relevant targets).14 Furthermore, its horizontal scans cover a 180-degree field of
view in increments of 0.5 degree at 20 Hz,
making the sensor especially suitable to
cover the area just in front of the vehicle.
IEEE INTELLIGENT SYSTEMS
Current systems
At least three pedestrian recognition
systems have been integrated on demonstration vehicles. Those I describe here are
video-based and employ a two-step detection–verification framework for efficient
pedestrian recognition; stereo vision provides the ROI.
At Carnegie Mellon University’s NavLab,
Liang Zhao and Charles Thorpe developed
a system that combines stereo vision with
neural-network pattern classification.15 It
obtains the texture features for classification by applying a high-pass filter to the
ROI and normalizing for size. The system,
running at 3 to 12 Hz, aims to assist bus
drivers in urban traffic. The researchers
plan to expand it to cover the sides of the bus
and, eventually, to provide full 360-degree
coverage.
The University of Pavia system, implemented in the ARGO experimental autonomous vehicle, combines stereo vision
with template matching for detecting pedestrian head and shoulder shapes.16 The
system searches for vertical symmetry to
verify candidate regions. The authors report good detection results in the range of
10 to 40 meters.
At DaimlerChrysler, we have been working on pedestrian recognition as part of our
multiyear effort to extend driver assistance
beyond the highway scenario into the complex urban environment.4,12,17,18 Of particular interest is the Intelligent Stop&Go
system on our Urban Traffic Assistant
demonstrator (see Figure 2). Intelligent
Stop&Go lets the UTA autonomously follow a lead vehicle, while being aware of
relevant elements of the traffic infrastructure (for example, road lanes, traffic
signs, and traffic lights) and other traffic
participants.
Our most recent pedestrian detection system consists of stereo vision-based obstacle
detection and fine localization within the
stereo ROI using the Chamfer System (see
Figure 3).12 The system tracks detected
objects over time and aggregates singleframe results. At the same time, a time delay
neural network with local receptive fields19
constantly evaluates successive ROIs, searching for the characteristic temporal patterns
of (lateral) human gait. Visit www.gavrila.
net/Computer_Vision/computer_vision.html
for a few video clips.
Other systems will soon join these three.
The EU has recently begun a major initiaNOVEMBER/DECEMBER 2001
Figure 2. DaimlerChrysler’s Urban Traffic Assistant demonstrator.
tive for pedestrian protection under the
Fifth Framework project Protector.14,20
The project brings together major vehicle
manufacturers, sensor suppliers, and research institutions to develop intelligent
systems on vehicles for reducing accidents
involving pedestrians, bicyclists, and other
unprotected traffic participants. Among the
completed tasks are the analysis of accident statistics and the definition of relevant
traffic scenarios. The project is investigating three sensor technologies: radar, laser
range finder, and video, which we will implement on two passenger cars (Fiat and
DaimlerChrysler) and one truck (MAN).
Sometime in 2002 we will evaluate the final
systems on a test track under standardized
and realistic conditions (that is, using dummies). User interface and user acceptance
studies will conclude this project.
The road ahead
A pedestrian safety system’s success or
failure, from a technical viewpoint, will
depend largely on the rate of correct detections versus false alarms that it produces, at a
certain processing rate and on a particular
processor platform. But what rate will we
need for actual deployment of a sensor-based
pedestrian system? This question
is difficult to answer because the desired rate
will depend on the final system concept. If,
for example, the system concept involves
only a warning function, performance criteria will likely be less stringent than for a concept that involves active vehicle control.
Perhaps we can more easily establish
where we currently stand regarding performance. Consider a (fictional) video-based
pedestrian detection system that involves a
computer.org/intelligent
succession of three components: stereobased obstacle detection, template-based
shape matching, and texture-based pattern
classification. Assume that each component’s performance is independent of that
of the others.
We conservatively estimate that, to
detect every pedestrian in urban traffic, the
stereo component produces one pedestrian
ROI each 10 seconds. (In lieu of hard
experimental data, we use a value derived
from our experience.) We assume that the
stereo component accomplishes this by
employing simple heuristics regarding the
sizes and locations of the rectangular
regions it detects as obstacles. Because we
cannot expect the pedestrian ROI to exactly
outline the pedestrian, we assume that we
need 10 probes to extract the pedestrian
correctly. For the shape-based and texturebased components, we estimate a detection
rate of 95 percent at a false positive rate in
the order of 10–3 and 10–1 per candidate
region, respectively.10,12,15 All in all, we
arrive, in this best-case scenario, at a falsepositive rate of 1 per 104 seconds or 1 per
2.8 hours, for a detection rate of 90 percent.
Integrating the results over time by tracking will improve this figure somewhat.
However, this improvement will be offset by
the lower filter ratios of the shape and texture components, which, in practice, are not
independent. On the basis of this, we can
fairly say that we’ll need to reduce the falsepositive rate by at least one order of magnitude to obtain a viable pedestrian system,
while maintaining the same detection rate.
Fortunately, several ways exist to significantly reduce the false-positive rate. Improved multicue video algorithms (combin79
the precrash range, prediction quickly becomes unreliable; pedestrians can easily
change direction. Furthermore, accurate risk
assessment will increasingly require good
scene understanding. For example, the danger associated with a pedestrian heading
toward the street will depend largely on the
placement of the road boundaries, whether a
traffic light exists, and, if so, whether it is
green. This suggests that, in the long run, a
reliable, anticipatory pedestrian system must
be aware of several types of infrastructural
elements, through either perception or telematics approaches. We might reduce at least
some complexity by limiting a pedestrian
protection system’s scope to cover only specific traffic scenarios; this will represent a
good intermediate solution.
D
ifficult technical challenges lie ahead,
but this domain’s progress over the past
few years warrants optimism. Considering the potential for saving lives and increasing safety, the goal certainly appears
worthwhile.
References
1. D.M. Gavrila, “The Visual Analysis of
Human Movement: A Survey,” Computer
Vision and Image Understanding, vol. 73, no.
1, Jan. 1999, pp. 82–98.
Figure 3. Pedestrian detection results (shown in white) from the Chamfer System.
Besides showing correct detections, the figure illustrates typical shortcomings, such as
false detections in heavily textured image areas (for example, the left image in the
bottom row) or missing detections in areas of low contrast, occlusion, or both (for
example, the right image in the bottom row).
ing distance, shape, texture, and motion
cues) could successively decimate the false
alarm rate, as the description of our fictional
system illustrates. Sensor fusion (for example, combining video and laser range finder
approaches) will probably also produce
large benefits. Finally, telematics concepts,
involving communication between pedestrians and vehicles combined with GPS-based
localization, could close any remaining performance gap. Although we can’t realistically expect people to buy special-purpose
80
pedestrian protection devices, pedestrian
safety systems could piggyback on the pervasiveness of the future communication
infrastructure (for example, the UMTS
[Universal Mobile Telecommunications
System] and Bluetooth).
Challenges remain even after we solve the
pedestrian detection problem. After all, we’ll
need to assess the danger of a particular traffic situation. This assessment will consider
the pedestrians’ and vehicles’ position and
speed. But with a larger look ahead, beyond
computer.org/intelligent
2. R. Cutler and L. Davis, “Real-Time Periodic
Motion Detection, Analysis and Applications,”
Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, IEEE CS Press, Los
Alamitos, Calif., 1999, pp. 326–331.
3. R. Polana and R. Nelson, “Low Level Recognition of Human Motion,” Proc. IEEE Workshop Motion of Non-rigid and Articulated
Objects, IEEE CS Press, Los Alamitos, Calif.,
1994, pp. 77–82.
4. B. Heisele and C. Wöhler, “Motion-Based
Recognition of Pedestrians,” Proc. 14th Int’l
Conf. Pattern Recognition, IEEE CS Press,
Los Alamitos, Calif., 1998, pp. 1325–1330.
5. A. Baumberg and D. Hogg, “Learning Flexible Models from Image Sequences,” Proc.
European Conf. Computer Vision, Lecture
Notes in Computer Science, vol. 800, SpringerVerlag, Heidelberg, 1994, pp. 299–308.
6. T. Cootes et al., “Active Shape Models: Their
Training and Applications,” Computer Vision
and Image Understanding, vol. 61, no. 1, Jan.
1995, pp. 38–59.
IEEE INTELLIGENT SYSTEMS
Dariu M. Gavrila is a research scientist with DaimlerChrysler Re-
7. C. Curio et al., “Walking Pedestrian Recognition,” IEEE Trans. Intelligent Transportation
Systems, vol. 1, no. 3, Nov. 2000, pp. 155–163.
8. V. Philomin, R. Duraiswami, and L. Davis,
“Quasi-random Sampling for Condensation,”
Proc. European Conf. Computer Vision, vol.
2, Lecture Notes in Computer Science, vol.
1843, Springer-Verlag, Heidelberg, Germany,
2000, pp. 134–149.
9. G. Rigoll, B. Winterstein, and S. Müller,
“Robust Person Tracking in Real Scenarios
with Non-stationary Background Using a Statistical Computer Vision Approach,” Proc. 2nd
IEEE Int’l Workshop Visual Surveillance,
IEEE CS Press, Los Alamitos, Calif., 1999,
pp. 41–47.
10. C. Papageorgiou and T. Poggio, “A Trainable
System for Object Detection,” Int’l J. Computer
Vision, vol. 38, no. 1, June 2000, pp. 15–33.
11. A. Mohan, C. Papageorgiou, and T. Poggio,
“Example-Based Object Detection in Images
by Components,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 4,
Apr. 2001, pp. 349–361.
12. D.M. Gavrila, “Pedestrian Detection from a
Moving Vehicle,” Proc. European Conf. Computer Vision, vol. 2, Lecture Notes in Computer Science, vol. 1843, Springer-Verlag,
Heidelberg, Germany, 2000, pp. 37–49.
13. T. Tsuji et al., “Development of Night Vision
System,” Proc. IEEE Int’l Conf. Intelligent
search’s Image Understanding Group in Ulm, Germany. His research
interests include vision systems for detecting human presence and
activity, with applications in surveillance, virtual reality, and intelligent human–machine interfaces. He works on real-time vision systems for driver assistance and intelligent cruise control. He is currently responsible for the European Union’s Protector project for
pedestrian protection. He received his MS in computer science cum
laude from the Free University in Amsterdam and his PhD in computer science from the University of Maryland at College Park. Contact him at Image Understanding Systems, DaimlerChrysler Research, Ulm 89081, Germany; dariu.gavrila@daimlerchrysler.
com; www.gavrila.net.
Vehicles, IEEE Press, Piscataway, N.J., 2001,
pp. 133–140.
14. D.M. Gavrila, M. Kunert, and U. Lages, “A
Multi-sensor Approach for the Protection of
Vulnerable Traffic Participants: The PROTECTOR Project,” Proc. IEEE Instrumentation and Measurement Technology Conf., vol.
3, IEEE Press, Piscataway, N.J., 2001, pp.
2044–2048.
15. L. Zhao and C. Thorpe, “Stereo- and Neural
Network-Based Pedestrian Detection,” IEEE
Trans. Intelligent Transportation Systems,
vol. 1, no. 3, Nov. 2000, pp. 148–154.
16. A. Broggi et al., “Shape-Based Pedestrian
Detection,” Proc. IEEE Intelligent Vehicles
Symp., IEEE Press, Piscataway, N.J., 2000,
pp. 215–220.
17. U. Franke et al., “From Door to Door: Principles and Applications of Computer Vision for
Driver Assistant Systems,” Intelligent Vehicle
Technologies, L. Vlacic, F. Harashima, and M.
Parent, eds., Butterworth Heinemann, Oxford,
UK, 2001, pp. 131–188.
18. U. Franke et al., “Autonomous Driving Goes
Downtown,” IEEE Intelligent Systems, vol.
13, no. 6, Nov./Dec. 1998, pp. 40–48.
19. C. Wöhler and J. Anlauf, “An Adaptable
Time-Delay Neural-Network Algorithm for
Image Sequence Analysis,” IEEE Trans.
Neural Networks, vol. 10, no. 6, Nov. 1999,
pp. 1531–1536.
20. P. Carrea and G. Sala, “Short Range Area
Monitoring for Pre-crash and Pedestrian Protection: The Chameleon and Protector Projects,” Proc. 9th Aachener Colloquium Automobile and Engine Technology, Institut für
Kraftfahrwesen Aachen (Aachen Inst. for
Automotive Eng.) and Verbrennungs Kraftmaschinen Aachen (Aachen Inst. for Internal
Combustion Engines), Aachen, Germany,
2000, pp. 629–639.
Advertiser/Product Index
November/December 2001
Page No.
Computing in Science & Engineering
Cover 3
IEEE Computer Society
60
IEEE Distributed Systems Online
33
IEEE Intelligent Systems
Advertising Sales Offices
Sandy Brown
10662 Los Vaqueros Circle, Los Alamitos, CA
90720-1314; phone +1 714 821 8380; fax +1 714 821
4010; sbrown@computer.org.
Cover 4
IEEE Pervasive Computing
40
Classified Advertising
60
Advertising Contact: Debbie Sims, 10662 Los
Vaqueros Circle, Los Alamitos, CA 90720-1314;
phone +1 714 821 8380; fax +1 714 821 4010;
dsims@computer.org.
Boldface denotes advertisers in this issue.
For production information, and conference and classified advertising, contact Debbie Sims, IEEE Intelligent Systems, 10662 Los Vaqueros Circle, Los
Alamitos, CA 90720-1314; phone (714) 821-8380; fax (714) 821-4010; dsims@computer.org; http://computer.org.
NOVEMBER/DECEMBER 2001
computer.org/intelligent
81
Download