Intelligent Transportation Systems Editor: Alberto Broggi University of Pavia, Italy broggi@ce.unipr.it Sensor-Based Pedestrian Protection Dariu M. Gavrila, DaimlerChrysler Research T raffic accidents worldwide kill more than 430,000 pedestrians and injure more than 39,000 yearly (see Table 1, left). For the European Union (EU), the corresponding numbers are over 155,000 and 6,000 (see Table 1, right). Pedestrian accidents represent the second-largest source of traffic-related injuries and fatalities, after accidents involving car passengers. Children are especially at risk (see Figure 1). This problem’s magnitude has caught legislators’ attention. The EU, for example, is studying proposals for legislating maximum-tolerated impact coefficients for a vehicle hitting a child or adult pedestrian frontally at 40 kph. Two classes of impact coefficients are under consideration: one involving the primary impact areas—the lower and upper legs—and the other involving the more dangerous secondary impact area—the head. Many aspects of such a specification are still subjects of considerable debate. One issue is whether a component-based crash test, which hurls separate impactors toward the vehicle, can adequately model a human body’s kinematics during a crash. Another issue involves the large variation in pedestrian kinematics between a child and an adult, who have quite different centers of mass at impact. Optimizing for one group can make things worse for the other. Final test procedures and numbers have not materialized yet. However, the very dissimilar object properties (mass and velocity) between pedestrians and vehicles make energy absorption during a crash difficult. What’s more, besides being “pedestrian friendly,” vehicles should perform well in crashes with hard objects, such as other vehicles and trees, and have an attractive design. Vehicle manufacturers are addressing these challenges by looking into extendable vehicle body structures (such as the bumper and hood) that activate upon first impact with a pedestrian. A complementary approach is to focus on sensor-based solutions, which let vehicles “look ahead” and detect pedestrians in their surroundings. This article investigates the state of the art in this domain, reviewing passive, videobased approaches and approaches involving active sensors (radar and laser range finders). Video-based approaches Video sensors are a natural choice for detecting people. Texture information at a fine angular resolution enables quite discriminative pattern recognition techniques. The human visual-perception system is perhaps the best example of how well such sensors might perform, if we add the appropriate processing. Besides, video cameras are cheap, and because they do not emit any signals, they raise no issues regarding interference with the environment. Considerable computer vision research deals with “looking at people.”1 What makes pedestrian recognition applications on vehicles particularly challenging is the moving camera, the wide range of possible pedestrian Table 1. 1997 deaths and injuries due to traffic accidents (source: United Nations Economic Commission for Europe). Worldwide Passenger cars Pedestrians Bicycles Mopeds Motorcycles Other Total European Union Deaths Injuries Total Deaths Injuries Total 75,615 39,670 6,872 3,151 10,972 28,397 3,751,024 436,422 236,027 163,854 227,946 1,303,571 3,826,639 476,092 242,899 167,005 238,918 1,331,968 22,502 6,049 2,421 2,385 3,821 4,559 995,026 155,151 141,870 139,442 124,023 121,816 1,017,528 161,200 144,291 141,827 127,844 126,375 161,677 6,118,844 6,283,521 41,737 1,677,328 1,719,065 NOVEMBER/DECEMBER 2001 1094-7167/01/$10.00 © 2001 IEEE 77 fic environment; vehicles generate heat too. Even the pavement can appear hotter on a summer day than a pedestrian’s body. So, rather than offering the solution for pedestrian detection per se, infrared sensors provide a means to simplify the segmentation problem. Pattern recognition techniques are still necessary. Active-sensor approaches Video sensors do not directly provide depth information; stereo vision derives depth by establishing feature correspondence and performing triangulation. On the other hand, active sensors measure distances directly. Figure 1. A typical dangerous situation: a child suddenly steps into a street. appearances, and the cluttered (uncontrolled) backgrounds. Most research on vision-based pedestrian recognition has taken a learning-based approach, bypassing a pose recovery step altogether and describing human appearance in terms of simple low-level features from a region of interest (ROI). One line of research has dealt specifically with scenes involving people walking laterally to the viewing direction, with recognition by either using the periodicity cue2,3 or learning the characteristic lateral gait pattern.4 A crucial factor determining the success of learning methods is the availability of a good foreground region. Unlike with applications such as surveillance, where the camera is stationary, standard background subtraction techniques are of little avail here because of the moving camera. Independent motion detection techniques can help,3 but they are difficult to develop. Yet, given a correct initial foreground, we can shift some of the burden to tracking.4–9 A complementary problem is to recognize pedestrians in single images; this is particularly relevant for pedestrians standing still. One general approach involves shifting windows of various sizes over the image, extracting low-level texture features, and using standard pattern classification techniques to determine a pedestrian’s presence. For example, Constantine Papageorgiou and Tomaso Poggio combine wavelet features with a support vector machine classifier.10 More recently, Anuj 78 Mohan and his colleagues have extended this research to involve a component-based approach.11 However, this approach’s performance– speed trade-off is currently unfavorable for use in vehicles. The Chamfer System addresses this through two-step object recognition.12 The first step applies hierarchical template matching using contour features to efficiently lock onto candidate solutions. Matching is based on correlation with distance-transformed images. By capturing the object’s shape variability through a template hierarchy and by using a combined coarseto-fine approach in shape and parameter space, this step achieves large speedups compared to an equivalent brute-force method. The second step reverts to texturebased pattern classification of the candidate solutions that the first step provided. Another powerful technique to establish ROIs is stereo vision. Uwe Franke and his colleagues combine stereo vision with texture-based pattern classification. I describe two other stereo vision-based approaches later. Lately, interest has been increasing in video sensors that operate outside the visible spectrum. Having long been used exclusively in the military domain, infrared sensors are finding their way into civilian applications owing to the advent of cheaper, uncooled cameras. The principle of detecting pedestrians by the heat their bodies emit is appealing (Takayuki Tsuji and his colleagues provide one example13). Yet pedestrians are not the only heat sources in a trafcomputer.org/intelligent Radar Some commercial vehicles already employ radar for adaptive cruise control (for example, the Distronic System on MercedesBenz S-Class cars). For near-distance applications, such as pedestrian detection, ongoing investigations focus on 24-GHz radar technology.14 Radar-based systems can enhance object localization by placing multiple sensors on the vehicle’s relevant parts and applying triangulation-based techniques. They can classify objects—that is, distinguish pedestrians from other objects such as cars and trees—by examining the power spectral-density plot of the reflected signals. In this context, we consider an object’s spectral content and reflectivity. Objects with smaller spatial extents, such as pedestrians, have narrower peaks in the plot than, say, cars. The material properties of the object’s surface determine the strength of reflected radar signals. Vehicles’ metallic parts reflect much better than human tissue, by at least an order of magnitude. Human tissue, in turn, reflects much better than nonconductive materials, such as the wood in trees. Laser range finders The main appeal of eye-safe laser range finders lies in their fast, precise depth measurement and their large field of view. For example, Martin Kunert, Ulrich Lages, and I describe a laser range finder that has a depth accuracy of +/− 5 cm and a range of 40 m for objects with at least 5 percent reflectivity (this includes most, if not all, relevant targets).14 Furthermore, its horizontal scans cover a 180-degree field of view in increments of 0.5 degree at 20 Hz, making the sensor especially suitable to cover the area just in front of the vehicle. IEEE INTELLIGENT SYSTEMS Current systems At least three pedestrian recognition systems have been integrated on demonstration vehicles. Those I describe here are video-based and employ a two-step detection–verification framework for efficient pedestrian recognition; stereo vision provides the ROI. At Carnegie Mellon University’s NavLab, Liang Zhao and Charles Thorpe developed a system that combines stereo vision with neural-network pattern classification.15 It obtains the texture features for classification by applying a high-pass filter to the ROI and normalizing for size. The system, running at 3 to 12 Hz, aims to assist bus drivers in urban traffic. The researchers plan to expand it to cover the sides of the bus and, eventually, to provide full 360-degree coverage. The University of Pavia system, implemented in the ARGO experimental autonomous vehicle, combines stereo vision with template matching for detecting pedestrian head and shoulder shapes.16 The system searches for vertical symmetry to verify candidate regions. The authors report good detection results in the range of 10 to 40 meters. At DaimlerChrysler, we have been working on pedestrian recognition as part of our multiyear effort to extend driver assistance beyond the highway scenario into the complex urban environment.4,12,17,18 Of particular interest is the Intelligent Stop&Go system on our Urban Traffic Assistant demonstrator (see Figure 2). Intelligent Stop&Go lets the UTA autonomously follow a lead vehicle, while being aware of relevant elements of the traffic infrastructure (for example, road lanes, traffic signs, and traffic lights) and other traffic participants. Our most recent pedestrian detection system consists of stereo vision-based obstacle detection and fine localization within the stereo ROI using the Chamfer System (see Figure 3).12 The system tracks detected objects over time and aggregates singleframe results. At the same time, a time delay neural network with local receptive fields19 constantly evaluates successive ROIs, searching for the characteristic temporal patterns of (lateral) human gait. Visit www.gavrila. net/Computer_Vision/computer_vision.html for a few video clips. Other systems will soon join these three. The EU has recently begun a major initiaNOVEMBER/DECEMBER 2001 Figure 2. DaimlerChrysler’s Urban Traffic Assistant demonstrator. tive for pedestrian protection under the Fifth Framework project Protector.14,20 The project brings together major vehicle manufacturers, sensor suppliers, and research institutions to develop intelligent systems on vehicles for reducing accidents involving pedestrians, bicyclists, and other unprotected traffic participants. Among the completed tasks are the analysis of accident statistics and the definition of relevant traffic scenarios. The project is investigating three sensor technologies: radar, laser range finder, and video, which we will implement on two passenger cars (Fiat and DaimlerChrysler) and one truck (MAN). Sometime in 2002 we will evaluate the final systems on a test track under standardized and realistic conditions (that is, using dummies). User interface and user acceptance studies will conclude this project. The road ahead A pedestrian safety system’s success or failure, from a technical viewpoint, will depend largely on the rate of correct detections versus false alarms that it produces, at a certain processing rate and on a particular processor platform. But what rate will we need for actual deployment of a sensor-based pedestrian system? This question is difficult to answer because the desired rate will depend on the final system concept. If, for example, the system concept involves only a warning function, performance criteria will likely be less stringent than for a concept that involves active vehicle control. Perhaps we can more easily establish where we currently stand regarding performance. Consider a (fictional) video-based pedestrian detection system that involves a computer.org/intelligent succession of three components: stereobased obstacle detection, template-based shape matching, and texture-based pattern classification. Assume that each component’s performance is independent of that of the others. We conservatively estimate that, to detect every pedestrian in urban traffic, the stereo component produces one pedestrian ROI each 10 seconds. (In lieu of hard experimental data, we use a value derived from our experience.) We assume that the stereo component accomplishes this by employing simple heuristics regarding the sizes and locations of the rectangular regions it detects as obstacles. Because we cannot expect the pedestrian ROI to exactly outline the pedestrian, we assume that we need 10 probes to extract the pedestrian correctly. For the shape-based and texturebased components, we estimate a detection rate of 95 percent at a false positive rate in the order of 10–3 and 10–1 per candidate region, respectively.10,12,15 All in all, we arrive, in this best-case scenario, at a falsepositive rate of 1 per 104 seconds or 1 per 2.8 hours, for a detection rate of 90 percent. Integrating the results over time by tracking will improve this figure somewhat. However, this improvement will be offset by the lower filter ratios of the shape and texture components, which, in practice, are not independent. On the basis of this, we can fairly say that we’ll need to reduce the falsepositive rate by at least one order of magnitude to obtain a viable pedestrian system, while maintaining the same detection rate. Fortunately, several ways exist to significantly reduce the false-positive rate. Improved multicue video algorithms (combin79 the precrash range, prediction quickly becomes unreliable; pedestrians can easily change direction. Furthermore, accurate risk assessment will increasingly require good scene understanding. For example, the danger associated with a pedestrian heading toward the street will depend largely on the placement of the road boundaries, whether a traffic light exists, and, if so, whether it is green. This suggests that, in the long run, a reliable, anticipatory pedestrian system must be aware of several types of infrastructural elements, through either perception or telematics approaches. We might reduce at least some complexity by limiting a pedestrian protection system’s scope to cover only specific traffic scenarios; this will represent a good intermediate solution. D ifficult technical challenges lie ahead, but this domain’s progress over the past few years warrants optimism. Considering the potential for saving lives and increasing safety, the goal certainly appears worthwhile. References 1. D.M. Gavrila, “The Visual Analysis of Human Movement: A Survey,” Computer Vision and Image Understanding, vol. 73, no. 1, Jan. 1999, pp. 82–98. Figure 3. Pedestrian detection results (shown in white) from the Chamfer System. Besides showing correct detections, the figure illustrates typical shortcomings, such as false detections in heavily textured image areas (for example, the left image in the bottom row) or missing detections in areas of low contrast, occlusion, or both (for example, the right image in the bottom row). ing distance, shape, texture, and motion cues) could successively decimate the false alarm rate, as the description of our fictional system illustrates. Sensor fusion (for example, combining video and laser range finder approaches) will probably also produce large benefits. Finally, telematics concepts, involving communication between pedestrians and vehicles combined with GPS-based localization, could close any remaining performance gap. Although we can’t realistically expect people to buy special-purpose 80 pedestrian protection devices, pedestrian safety systems could piggyback on the pervasiveness of the future communication infrastructure (for example, the UMTS [Universal Mobile Telecommunications System] and Bluetooth). Challenges remain even after we solve the pedestrian detection problem. After all, we’ll need to assess the danger of a particular traffic situation. This assessment will consider the pedestrians’ and vehicles’ position and speed. But with a larger look ahead, beyond computer.org/intelligent 2. R. Cutler and L. Davis, “Real-Time Periodic Motion Detection, Analysis and Applications,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, IEEE CS Press, Los Alamitos, Calif., 1999, pp. 326–331. 3. R. Polana and R. Nelson, “Low Level Recognition of Human Motion,” Proc. IEEE Workshop Motion of Non-rigid and Articulated Objects, IEEE CS Press, Los Alamitos, Calif., 1994, pp. 77–82. 4. B. Heisele and C. Wöhler, “Motion-Based Recognition of Pedestrians,” Proc. 14th Int’l Conf. Pattern Recognition, IEEE CS Press, Los Alamitos, Calif., 1998, pp. 1325–1330. 5. A. Baumberg and D. Hogg, “Learning Flexible Models from Image Sequences,” Proc. European Conf. Computer Vision, Lecture Notes in Computer Science, vol. 800, SpringerVerlag, Heidelberg, 1994, pp. 299–308. 6. T. Cootes et al., “Active Shape Models: Their Training and Applications,” Computer Vision and Image Understanding, vol. 61, no. 1, Jan. 1995, pp. 38–59. IEEE INTELLIGENT SYSTEMS Dariu M. Gavrila is a research scientist with DaimlerChrysler Re- 7. C. Curio et al., “Walking Pedestrian Recognition,” IEEE Trans. Intelligent Transportation Systems, vol. 1, no. 3, Nov. 2000, pp. 155–163. 8. V. Philomin, R. Duraiswami, and L. Davis, “Quasi-random Sampling for Condensation,” Proc. European Conf. Computer Vision, vol. 2, Lecture Notes in Computer Science, vol. 1843, Springer-Verlag, Heidelberg, Germany, 2000, pp. 134–149. 9. G. Rigoll, B. Winterstein, and S. Müller, “Robust Person Tracking in Real Scenarios with Non-stationary Background Using a Statistical Computer Vision Approach,” Proc. 2nd IEEE Int’l Workshop Visual Surveillance, IEEE CS Press, Los Alamitos, Calif., 1999, pp. 41–47. 10. C. Papageorgiou and T. Poggio, “A Trainable System for Object Detection,” Int’l J. Computer Vision, vol. 38, no. 1, June 2000, pp. 15–33. 11. A. Mohan, C. Papageorgiou, and T. Poggio, “Example-Based Object Detection in Images by Components,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 4, Apr. 2001, pp. 349–361. 12. D.M. Gavrila, “Pedestrian Detection from a Moving Vehicle,” Proc. European Conf. Computer Vision, vol. 2, Lecture Notes in Computer Science, vol. 1843, Springer-Verlag, Heidelberg, Germany, 2000, pp. 37–49. 13. T. Tsuji et al., “Development of Night Vision System,” Proc. IEEE Int’l Conf. Intelligent search’s Image Understanding Group in Ulm, Germany. His research interests include vision systems for detecting human presence and activity, with applications in surveillance, virtual reality, and intelligent human–machine interfaces. He works on real-time vision systems for driver assistance and intelligent cruise control. He is currently responsible for the European Union’s Protector project for pedestrian protection. He received his MS in computer science cum laude from the Free University in Amsterdam and his PhD in computer science from the University of Maryland at College Park. Contact him at Image Understanding Systems, DaimlerChrysler Research, Ulm 89081, Germany; dariu.gavrila@daimlerchrysler. com; www.gavrila.net. Vehicles, IEEE Press, Piscataway, N.J., 2001, pp. 133–140. 14. D.M. Gavrila, M. Kunert, and U. Lages, “A Multi-sensor Approach for the Protection of Vulnerable Traffic Participants: The PROTECTOR Project,” Proc. IEEE Instrumentation and Measurement Technology Conf., vol. 3, IEEE Press, Piscataway, N.J., 2001, pp. 2044–2048. 15. L. Zhao and C. Thorpe, “Stereo- and Neural Network-Based Pedestrian Detection,” IEEE Trans. Intelligent Transportation Systems, vol. 1, no. 3, Nov. 2000, pp. 148–154. 16. A. Broggi et al., “Shape-Based Pedestrian Detection,” Proc. IEEE Intelligent Vehicles Symp., IEEE Press, Piscataway, N.J., 2000, pp. 215–220. 17. U. Franke et al., “From Door to Door: Principles and Applications of Computer Vision for Driver Assistant Systems,” Intelligent Vehicle Technologies, L. Vlacic, F. Harashima, and M. Parent, eds., Butterworth Heinemann, Oxford, UK, 2001, pp. 131–188. 18. U. Franke et al., “Autonomous Driving Goes Downtown,” IEEE Intelligent Systems, vol. 13, no. 6, Nov./Dec. 1998, pp. 40–48. 19. C. Wöhler and J. Anlauf, “An Adaptable Time-Delay Neural-Network Algorithm for Image Sequence Analysis,” IEEE Trans. Neural Networks, vol. 10, no. 6, Nov. 1999, pp. 1531–1536. 20. P. Carrea and G. Sala, “Short Range Area Monitoring for Pre-crash and Pedestrian Protection: The Chameleon and Protector Projects,” Proc. 9th Aachener Colloquium Automobile and Engine Technology, Institut für Kraftfahrwesen Aachen (Aachen Inst. for Automotive Eng.) and Verbrennungs Kraftmaschinen Aachen (Aachen Inst. for Internal Combustion Engines), Aachen, Germany, 2000, pp. 629–639. Advertiser/Product Index November/December 2001 Page No. Computing in Science & Engineering Cover 3 IEEE Computer Society 60 IEEE Distributed Systems Online 33 IEEE Intelligent Systems Advertising Sales Offices Sandy Brown 10662 Los Vaqueros Circle, Los Alamitos, CA 90720-1314; phone +1 714 821 8380; fax +1 714 821 4010; sbrown@computer.org. Cover 4 IEEE Pervasive Computing 40 Classified Advertising 60 Advertising Contact: Debbie Sims, 10662 Los Vaqueros Circle, Los Alamitos, CA 90720-1314; phone +1 714 821 8380; fax +1 714 821 4010; dsims@computer.org. Boldface denotes advertisers in this issue. For production information, and conference and classified advertising, contact Debbie Sims, IEEE Intelligent Systems, 10662 Los Vaqueros Circle, Los Alamitos, CA 90720-1314; phone (714) 821-8380; fax (714) 821-4010; dsims@computer.org; http://computer.org. NOVEMBER/DECEMBER 2001 computer.org/intelligent 81