Technische Hochschule Nürnberg Georg Simon Ohm Fakultät Elektrotechnik Feinwerktechnik Informationstechnik Studiengang Softwareengineering Masterarbeit von Michael Mühlbauer-Prassek Profiling of Cognitive Algorithms for Future Automotive Driving Functions SS 2015 Technische Hochschule Nürnberg Georg Simon Ohm Fakultät Elektrotechnik Feinwerktechnik Informationstechnik Studiengang Softwareengineering Masterarbeit von Michael Mühlbauer-Prassek Profiling of Cognitive Algorithms for Future Automotive Driving Functions SS 2015 Ersteller : Michael Mühlbauer-Prassek Erstprüfer : Prof. Dr. Bruno Lurz Zweitprüfer : Prof. Dr. Hans-Georg Hopf Betreuer : Dr. Lukas Bulwahn (BMW Car IT GmbH) Abgabedatum : 04.09.2015 Declaration Ich, Michael Mühlbauer-Prassek, Matrikel-Nr. ________________, versichere, dass ich die Arbeit selbständig verfasst, nicht anderweitig für Prüfungszwecke vorgelegt, alle benutzten Quellen und Hilfsmittel angegeben sowie wörtliche und sinngemäße Zitate als solche gekennzeichnet habe. I herewith declare that I worked on this thesis independently. Furthermore, it was not submitted to any other examining committee. All sources and aids used in this thesis, including literal and analogous citations, have been identified. Michael Mühlbauer-Prassek Abstract Future automated driving functions will not only build upon sensors measuring physical quantities directly related to the vehicle such as angular and longitudinal accelerations. Additional information acquired by 3D laser scanners, radars and stereo cameras will be fed into an environmental model in order to automate driving functions and relieve the driver. As opposed to the first sort of data, the computational load imposed by environmental conditions is hardly to predict. Since current research is moving towards autonomous driving, this issue is of great importance when it comes to real-time image processing. In this area, traditional real-time-system design, which is based on the worst case execution time, turned out to lack suitable methods due to algorithms being subject to almost unpredictable worst case execution time. Therefore, adaptive computational approaches are worth thinking about considering both the execution time and the quality of results. Given a system capable to balance these characteristics against each other by means of particular parameters, computationally demanding peak loads could be managed using less precise but still sufficient results in favour of shortened execution time. With reference to this approach, this thesis deals with the timing behaviour of PCL’s people library, that features people recognition based upon pointcloud data. To this end, parameters have been identified that are expected to have impact on the execution time and a test suite has been developed to provide a comprehensive insight into the correlation of execution time and quality of results associated with those parameters. In order to establish practice-oriented conditions throughout the tests, an embedded platform is employed and a real-time capable environment is set up. V Table of Contents List of Figures .............................................................................................................................VI 1 Introduction ........................................................................................................................ 1 2 Problem Space .................................................................................................................... 3 2.1 Measurement Techniques ........................................................................................... 3 2.2 Static Analysis .............................................................................................................. 5 2.3 Any-Time Approach ..................................................................................................... 6 3 Object under Investigation ................................................................................................. 9 4 Test Environment ............................................................................................................. 11 5 6 7 4.1 Hardware Setup ......................................................................................................... 11 4.2 Software Setup........................................................................................................... 12 4.3 Test Suite Setup ......................................................................................................... 14 Test Approach ................................................................................................................... 16 5.1 Input Data .................................................................................................................. 16 5.2 Parameter Variation .................................................................................................. 19 5.3 Test Execution ............................................................................................................ 22 5.4 Metrics and Classification Figures ............................................................................. 25 Test Results ....................................................................................................................... 28 6.1 Proof of Performance ................................................................................................ 28 6.2 Load Analysis .............................................................................................................. 36 6.3 Impact of Parameterization ....................................................................................... 40 6.3.1 Sampling Factor .................................................................................................. 41 6.3.2 Voxel Size ............................................................................................................ 43 6.3.3 Height Limits ....................................................................................................... 45 6.3.4 Width Limits........................................................................................................ 47 Conclusion ........................................................................................................................ 52 Bibliography .............................................................................................................................. 54 VI List of Figures Figure 1 : distribution of execution time ................................................................................... 4 Figure 2 : phases of static WCET analysis .................................................................................. 5 Figure 3 : example of any time computation ............................................................................ 8 Figure 4 : people detection pipeline ........................................................................................ 10 Figure 5 : mounting position of stereo camera ........................................................................ 11 Figure 6 : raw image vs. pointcloud ......................................................................................... 16 Figure 7 : voxel-grip over a pointcloud..................................................................................... 20 Figure 8 : test directory structur .............................................................................................. 22 Figure 9 : reference ground truth arrangement ...................................................................... 29 Figure 10 : RR policy / no resource-consumer ......................................................................... 31 Figure 11 : RR policy / one resource-consumer ....................................................................... 31 Figure 12 : RR policy / two resource-consumers ..................................................................... 32 Figure 13 : FIFO policy / no resource-consumer ...................................................................... 33 Figure 14 : FIFO policy / one resource-consumer .................................................................... 33 Figure 15 : FIFO policy / two resource-consumers .................................................................. 34 Figure 16 : NO_RT policy / no resource-consumer .................................................................. 35 Figure 17 : NO_RT policy / one resource-consumer ................................................................ 35 Figure 18 : NO_RT policy / two resource-consumer ................................................................ 36 Figure 19 : minimum-load scene .............................................................................................. 37 Figure 20 : low-load scene........................................................................................................ 37 Figure 21 : medium-load scene ................................................................................................ 38 Figure 22 : high-load scene ...................................................................................................... 38 Figure 23 : maximum-load scene ............................................................................................. 38 Figure 24 : time per frame vs. complexity of scene ................................................................. 39 Figure 25 : sampling factor – execution time vs. detection rate ............................................. 42 Figure 26 : voxel size – execution time vs. detection rate ....................................................... 44 Figure 27 : height limits – execution time vs. detection rate .................................................. 46 Figure 28 : impact of lower point bound.................................................................................. 49 Figure 29 : impact of upper point bound ................................................................................. 50 1 1 Introduction Nowadays, embedded systems serve different purposes in modern cars. Considering driving assistance and safety features, e.g. navigation systems or ESP, that have become common standard in the past few years, none of them could be realized without an increasing spread and networking of electronic control units. In this respect, the development of smaller, cheaper and at the same time more powerful electronics enables engineers to design more sophisticated features, which will lead to autonomous driving in foreseeable future. This trend comes along with an increasing complexity of system architecture, which constitutes new challenges to software engineering. Moving towards automated driving functions requires highly dependable systems because any malfunction could cause severe hazards to humans and the environment. To this end, real-time requirements have to be met in a reliable manner. The knowledge of the worst case execution time (WCET) is a prerequisite of established real-time design methods in order to devise schedulable systems ensuring deterministic, predictable and timely computation under any conditions [1]. The challenge of the WCET determination in respect to demanding software components results from its dependency on the complexity of code, extent and diversity of input data, current state of software / hardware and the underlying hardware architecture. The goal of WCET analysis methods is to provide safe and tight upper bounds in reasonable time and cost. The difficulties in this area of computer science are reflected by an ongoing research on analysis methods addressing this issue since the early days of real-time applications [7]. A common objective of these methods is to achieve a more precise estimation of the WCET because their results yield more or less pessimistic upper bounds including safety margins that mitigate uncertainties [1]. Considering modern hardware, this goal becomes even more complex. This is due to the usage of caches, pipelining and other features to increase the average-case performance, which makes it increasingly difficult to predict its behavior [2]. In addition, input-data dependent algorithms, used to obtain surrounding information from sensors and cameras, raise the complexity to predict tight upper bounds for the WCET [4]. To encounter these difficulties, it is worthwhile considering a decision-making scheme based on information resulting from a trade-off between quality and computation time. In this regard, the term any-time algorithms has been established by Thomas Dean and Mark Boddy [3]. The idea of any time algorithms is to provide as precise results as possible in currently available time when data-dependent computation comes into play. This approach 2 intends to improve the overall capacity utilization not relying on predetermined WCET. At the same time, this approach prevents the system from violating deadlines in a few timecritical situations by providing less precise but still sufficient results, which should do to make safe decisions [5]. Under regular conditions, a safety margin, represented by higher quality of results, guarantees stable responsiveness and high accuracy. With reference to such an approach, this master thesis deals with the people library of the point cloud library project [6] - in the following called “people tracker” – that features human detection based on stereo camera images. The goal of this thesis is to investigate and verify people tracker’s capability to provide reliable information in scalable time by means of parameterization. If confirmed, this property could be used to limit the execution time of the people tracker under high load induced through demanding environmental conditions. 3 2 Problem Space As mentioned in the introduction, the determination of WCET is a decisive part in the development of realtime-critical applications. This chapter outlines methods employed to estimate the WCET and illustrates problems and limitations in case of modern hardware and algorithms in the area of artificial perception whose execution time is significantly affected by the complexity of input-data. Referring to these issues, the any-time approach will be presented in order to provide a measure to mitigate the effort associated with WCET estimation and scheduling of real-time applications. Generally speaking, the methods of WCET estimation can be distinguished into measurement techniques and static analysis. Both the methods have in common that the complexity of software and hardware creates a significant burden in practical use and lead to specialized solutions for particular conditions, which increases the time and money spent in the course of development [1]. Measurement techniques are typically suited when the average-case execution time is of most interest and sporadic exceedance of deadlines can be tolerated. When it comes to timing-critical applications with crucial safety requirements, static analysis in conjunction with measurement (hybrid methods) is the means of choice. Regardless of the method chosen, the rule of thumb in the development of real-time applications implies simplicity and transparency of code because its structure has direct impact on the analyzability [1]. Unfortunately, requirements, such as statically bounded loops, minimization of input-data dependency and simple code structures ([1],[2]) are almost not feasible considering applications in the field of image processing and machine learning. Apart from that, modern processor architectures and its memory systems, designed towards optimized throughput, contribute to the required effort for WCET estimation by adding even more uncertainty. This is reflected by execution times depending on the instruction history, which may entail a fluctuation of several orders of magnitude for a single instruction [7]. 2.1 Measurement Techniques Measurement techniques permit the execution time determination of a piece of code by means of logic analyzers, hardware traces, high-resolution timers, emulators and other 4 means [1]. However, all these techniques require a certain degree of debug and analysis features build in the hardware. Apart from that, the evaluation of the timing behavior by means of measurement has impact on the timing itself, when instrumentation is needed. This phenomenon is called probe effect. A general issue of measurement techniques is the fact that they are limited by the time available to perform a bounded number of test runs and the coverage of selected input-data in conjunction with a broad diversity of initial states that are possible on advanced processors. Apart from that, measurement techniques are known to underestimate the WCET and there is no evidence that the real WCET can be observed in the course of measuring [7]. This fact constitutes a significant drawback of measurement techniques because it has to be considered by safety margins, which – in order to guarantee reliability – lead to over allocation of resources [1]. Moreover, the complexity of hardware and software makes it more and more difficult to deduce and compile appropriate data-sets putting the system under test into most stressful states [7]. These issues are illustrated in Figure 1. As indicated by the light gray area, the observed values of the WCET represent a subset of possible but not noticed values (dark gray area) during the course of measurement. Of special interest are peak values arising from high-load conditions that are hard to replicate being subject to extraordinary conditions. Therefore, safety margins must be applied in order to deduce reliable upper bounds for the WCET. Figure 1 : distribution of execution time [1] 5 2.2 Static Analysis In comparison to measurement techniques, static analysis lends itself to more precise approximation of the real WCET. However, the same difficulties apply to the effort while hardware and software becomes more complex. A static analysis comprises three stages [7], as depicted in Figure 2. Figure 2 : phases of static WCET analysis [1] Flow Analysis Stage: At this stage of analysis possible paths of a piece of code are identified in order to examine the dynamic behavior of the code regarding interdependencies of conditions, function calls and the impact of bounds on loop iterations. Low-Level Analysis Stage: The goal of this stage of static analysis is to derive the execution time of machine instructions based on the compiled object code with respect to the underlying hardware. This is achieved by a timing model that reflects the hardware specification. 6 Calculation Stage: At this stage of analysis the results of the flow analysis and the low-level analysis are merged to derive the WCET based on identified execution paths and associated machine instruction timing. Although the scheme of the static analysis looks straightforward, it is subject to two major aspects making it fairly complex. The fist one is going by the fact that it is almost not possible to specify all paths in a given piece of code because bounds on loops must be applied to ensure finite execution [1]. The other aspect can be attributed to the complexity of performance-enhancing hardware features that must be taken into account by the timing model. Although the model does not need to reproduce all details of the hardware, the state space of advanced processors exhibits time-consuming analysis effort [7]. Since the development and examination of timing models imposes considerable costs and their dependency on a particular architecture hampers universal application to a large range of processors employed in embedded systems, the use of static analysis still represents a remarkable burden. Furthermore, a survey on several tools published in [7] issued limitations regarding code-structure (loop nesting, use of pointers, dynamically allocated data), flow analysis and detection of infeasible paths, programming language, accuracy regarding overestimation and delays caused by preemption / context-switches. 2.3 Any-Time Approach As illustrated above, both measurement techniques and static analysis as means of development for real-time applications suffer from a certain amount of limitations. Considering desirable features of real-time applications, such as predictability and robustness [2], in order to guarantee expected system behavior and the capability to manage overload conditions, these methods turned out to be insufficient when it comes to data dependent runtimes typically found in the area of image-processing applications [5]. Apart from that, such applications exhibit a situation-dependent demand of resulting quality. The development towards a predetermined worst case scenario would result in low system utilization [8]. However, this scenario could only be an estimate and the system itself would still be prone to rare peak loads. The any-time approach addresses this problem of WCET estimation and scheduling of input-data dependent algorithms. Its approach pursues the 7 strategy to provide suitable results within a short time and to improve the accuracy as long as deadlines can be met. Since this case is not characterized by a fixed WCET but rather a dynamically determined upper bound, the term expected case execution time (ECET) has been established [5]. The determination of the ECTE is accomplished by means of statistics and system monitoring [4], whereas the utility of results is derived from performance profiles that indicate the output quality of an algorithm depending on its computation time [8]. The flexibility provided by the any-time approach is given by the capability of adaptive computation time, which is a function of task-specific parameters. These parameters should be selected by a monitoring component that is capable to optimize the quality of service based on application demands and current load imposed by the environment [5]. According to Dean and Boddy [3], the employed algorithms must satisfy the following specification: The algorithm can be interrupted and resumed with little overhead. The algorithm provides increasingly good answers over a range of response times. The algorithm can be terminated at any time. To exemplify the idea of the any-time approach, Figure 3 refers to results from Ihme et al. [4] achieved with an extended SURF (Speeded Up Robust Features) algorithm for image processing that satisfy the requirements as mentioned above. As indicated by the dark green bar, first results come up after a very short time no matter which frame is considered. Subsequent processing stages provide improved results, which in this case show a progressive strategy regarding quality and computation time. Each stage is initiated according to the ECET. However, this can lead to the violation of deadlines as indicated by the red bars. In this case, the algorithm terminates and the result from the previous stage is regarded for further use. This property of the any-time approach facilitates a higher degree of utilization compared to guarantee-based scheduling or rather more reliability compared to best-effort scheduling. 8 Figure 3 : example of any time computation [5] 9 3 Object under Investigation Based on the motivation illustrated in the previous chapters, this thesis focuses on the people library, that is a component of the Point Cloud Library (PCL)[9]. The PCL provides a comprehensive collection of algorithms employed for filtering, feature estimation, surface recognition, registration, model fitting and segmentation of point cloud data [11]. This data can be obtained e.g. from RGB-D cameras, stereo cameras and laser scanners in order to gain spatial information. The people library makes use of PCL’s feature set providing information about the appearance of human shapes. Since obstacle evasion, among others, is a desirable application in the area of autonomous driving that could benefit from PCL, it is worth evaluating its characteristics. This thesis focuses on the capability of the people library to provide scalable runtime in conjunction with a trade-off against the accuracy of results. This area of concern emerges from the massive impact of varying complexity of input-data on the computation time of people tracker’s processing pipeline (see Figure 4) making it hardly predictable. The stages of the pipeline serve the following tasks [10]: Voxel Grid Filtering : In general, data supplied to the pipeline comprise outliers and noise that impede the processing. Hence, they need to be removed prior to the subsequent stages. Furthermore, the processing can be accelerated using a diminished number of points. The characteristics of this step are explained in more detail in chapter 5.2 (see VoxelSize). Ground Plane Detection and Removal : In order to detect single objects, a subset of the point cloud associated with the ground plane needs to be removed because it is the part of the scene that all object are connected to. The estimation of the ground plane is accomplished by the RANSAC approach that yields results in an iterative way. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability. This probability increases with every iteration1. 1 Quotation from : http://en.wikipedia.org/wiki/RANSAC 10 3D Clustering This stage identifies the cohesion among remaining points based on the euclidean clustering approach. The result is a batch of separated point clusters representing single objects that do not necessarily correspond with human shapes. People Detection At this stage each cluster is examined whether it corresponds to characteristics of a human being or a spurious object. This is accomplished by a support-vector-machinebased person classifier. Figure 4 : people detection pipeline 11 4 Test Environment This chapter gives an outline of the hardware employed for the tests, prerequisites and provisions made in order to achieve reliable statements and the specification of all items needed to build the test suite. 4.1 Hardware Setup Test Platform The hardware platform used within the course of this thesis is the PandaBoard ES. It is a single-board computer based on the Texas Instruments OMAP4460 system on a chip. It features a dual-core ARM® Cortex™-A9 MPCore™ with SMP at 1.2 GHz each and 1 GB low-power DDR2 RAM. The Pandaboard ES supports removable non-volatile-memory storage via onboard SD/MMC card cage. In the course of this investigation a SanDisk Extreme Pro SD card is used with a capacity of 16GB. For an in-depth board-specification please refer to the PandaBoard ES specification [12]. Figure 5 : mounting position of stereo camera 12 Image Acquisition The image acquisition system consists of two monochrome cameras manufactured by Basler. Each camera provides 30 frames per second with a resolution of 1296 x 966 px. The point-clouds are generated from both image streams by a post-processing tool that considers the temporal coherence of the images. For an in-depth specification please refer to the data sheet of the camera2. The cameras have been mounted in a BMW 5er series (F 10) next to each other beneath the interior mirror as illustrated in Figure 5. The projection of this position to the floor represents the origin of the reference coordinate system. 4.2 Software Setup The goal of the tests conducted in this thesis is to acquire findings based on the timing behaviour of the people library. To this end, the Linux operating system is used due to its support for the selected hardware platform and facilities to customize a Linux distribution that is best suited to our demands. However, some additional provisions have to be made because the mainline Linux kernel and its default setup of process management lack sufficient real-time behaviour in order to provide replicable test results. Making the kernel preemptable When running an application under the mainline Linux kernel, there is no guarantee to avoid extraordinary latencies caused by interrupt handlers and kernel functionality that can block a task of high priority. Applying the CONFIG_PREEMPT_RT patch to the standard kernel following the RT kernel wiki [13], these issues can be considerably tackled. This patch takes effect minimizing the amount of kernel code that is nonpreemptible. Among others, elaborations provided by K. Koolwal [20] and F. Cerqueira / B. B. Brandenburg [21], respectively, present benchmarking studies comparing standard kernel versions with patched ones. These include results that have been achieved with diverse metrics to quantify determinism and latency. To this end, both elaborations have made use of different approaches to expose the system under test to CPU- and I/O-bound workload. The results of these studies show a considerable improvement in 2 http://www.baslerweb.com/de/produkte/flaechenkameras/ace/aca1300-30gm 13 terms of real-time capable behaviour serving us as rationale for the usage of the CONFIG_PREEMPT_RT in this thesis. Scheduling Policy According to the Linux manual [14] by default Linux applies a round-robin time-sharing scheduling policy. That is, a process is inserted into a low prioritized queue and granted CPU based on a dynamically determined priority among other processes within this queue. In terms of real-time requirements, this fair play policy leads to unpredictable latencies. Apart from that, the scheduler considers processes scheduled under real time policies first. Hence, the test process must be ran under a real time capable policy, which can be accomplished with the sched_setscheduler system call ([15]) assigning the policy to first in first out (FIFO) and round robin (RR), respectively. In order to prioritize processes, one can specify the priority value ascending from 1 up to 99. According to [16] it is, however, not recommended to assign a process the value of 99 because there are management threads that need to run with the highest priority. Note that although Linux provides FIFO and RR rt-policies no difference has been observed between these two policies in the course of this thesis. Memory Management Another source for latencies can be attributed to page faults. Therefore, [16] advices to lock the virtual address space of real-time application into RAM. This can be achieved by the mlockall system call according to [15]. Precision of Time Measurement In order to achieve high precision when measuring elapsed time, one can employ the system call clock_gettime using time stamps before and after the code section of interest. According to [17], the parameter clk_id, that specifies the characteristics of the accuracy, is supposed to be set to CLOCK_MONOTONIC_RAW because this gives access to a raw hardware-based time avoiding any interference. 14 According to previous research ([18]) performed at BMW Car IT GmbH, this measures are suitable to achieve proven real time capabilities. In addition, this setup has been tested under increased CPU workload imposed by the generator stress3, which turned out to have negligible impact on the execution time of processes scheduled with real-time policy. 4.3 Test Suite Setup The effort to build a customized Linux image for a dedicated hardware platform, e.g. the PandaBoard ES, can be managed through the facilities provided by the yocto project. The yocto project is an open-source collaboration that provides templates and tools with the aim to support developers creating custom Linux systems for embedded devices [19]. The yocto project is based on the poky platform builder, which is the reference build system incorporating the open-embedded project and a build scheduler called BitBake. This infrastructure is based upon a set of meta-data that is composed of recipes and layers. Recipes serve the purpose to define sources, configurations, dependencies and compile instructions. Layers represent compilations of recipes put together in order to meet a certain demand. Basic setups are pooled by core layers such as openembedded-core or meta-oe. Layers developed towards particular hardware support are called board-support-packages (BSP). The meta-ti layer, applied in this thesis, is used to provide configurations specific to boards using processors manufactured by Texas Instruments in order to build a Linux image for the PandaBoard ES. Application-specific configurations can be established including custom layers. In this thesis the meta-ros layer has been added providing support through the ROS API to access point-cloud data stored as rosbag files. The particular specification of the test suite and the build system as realized in this thesis is shown in Table 1. The following gives a brief outline of the steps towards a custom Linux image. 1. download the cross-compilation environment from the Yocto Project website 2. add the mata-oe, mata-ros and mata-ti layers according to Table 1 3. create a recipe as an extension of the core-image-ros-roscore.bb build recipe to include the pcl-people-tracker-timing-test suite and additional components if desired 3 http://people.seas.harvard.edu/~apw/stress/ 15 4. amend the kernel build recipe of the meta-ti layer in order to incorporate the CONFIG_PREEMPT_RT patch with a configuration which enables preemption 5. invoke the build process using the bitbake command 6. install the resulting image to the storage device 7. check the kernel version using the uname -a command; a patched kernel version contains an -rt** identifier of the respective patch. For detailed information on the respective steps please refer to the RT kernel wiki [13], the official Yocto Project website [19] and the meta-ros repository on GitHub [22]. Item Specification Yocto Project 1.5.4 Poky Platform Builder meta-oe meta-ros meta-ti gcc-version 10.0.4 Dora git://git.openembedded.org/meta-openembedded SHA1 ID e75ae8f50af3effe560c43fc63cfd1f39395f011 git://github.com/bmwcarit/meta-ros.git SHA1 ID d0a954d11e822b0f8be83ecaadac784770d38445 git://git.yoctoproject.org/meta-ti SHA1 ID 4390f867bf883b93cf36cedbb7ef6b11e079c1e4 gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2 git://dev.omapzoom.org/pub/scm/integration/kernel-ubuntu.git branch=ti-ubuntu-3.4-1485 SHA1 ID b3d5eeb10553e4bc0c3f250a4d06d43c4ab397a9 kernel In conjunction with CONFIG_PREEMPT_RT patch from http://hbrobotics.org/wiki/images/8/88/Patch-3-4-9-rt17.patch.doc with following configuration http://hbrobotics.org/wiki/images/c/c7/Config-3-4-9-rt17.doc git://github.com/PointCloudLibrary/pcl.git SHA1 ID ae08f0780750aae8a8b8ea0e9c82209071ffb724 pcl In conjunction with following patch: 0001-allow-to-run-people-library-without-visualization.patch from git://github.com/bmwcarit/pcl-people-tracker-timing-test/tree/master/res pcl-people-tracker-timing- git://github.com/bmwcarit/pcl-people-tracker-timing-test test SHA1 ID 624e7aee57e74dd77f1db929910bb81d3d3f9090 Table 1 : test-suite version specification 16 5 Test Approach This chapter describes the workflow and usage of the test suite. All settings and parameters needed as well as outputs are explained in detail. 5.1 Input Data This chapter pinpoints all input data required to execute a test session. Pointcloud Data According to chapter 4.1, the image is acquired by two monochrome cameras. Since both cameras only provide 2D images, this data must be coupled in order to generate depth information needed to obtain point-cloud data. This conversion is accomplished with the aid of a pre-processing tool that is not part of the test-suite. Thanks to message-passing supplied by the ROS middleware4 throughout the test vehicle, this conversion is based on camera inputs in sensor_msgs/Image format and outputs in sensor_msgs/PointCloud2 format. The input as well as output data are stored in ROS’s bag format. Figure 6 illustrates a sample snapshot of a monochrome image and the associated point-cloud. Figure 6 : raw image vs. pointcloud 4 http://www.ros.org/ 17 Reference Files As motivated in the introduction, the aim of the underlying investigation is to identify a correlation between the execution time of the people tracker algorithm depending on parameters provided by PCL’s API and its associated quality of results. In this regard, the accuracy of recognition and position detection poses a proper metric that can be determined using information given by the people tracker API. To this end, actual positions of people must be known to be compared to the computed ones. This information is provided by a user defined text file serving as reference. Its structure indicates the total number of frames in the first line and the positions of people per frame in subsequent lines as shown below. Note that the ground truth plane is in parallel to PCL’s X-Z plane. Hence, the first coordinate represents the X value and the second one the Z value of a particular person. The following listing shows a sample reference file: Frame_Count 3 Frame_Number 1 X_1 Z_1 X_2 Z_2 Frame_Number 2 X_1 Z_1 X_2 Z_2 X_3 Z_3 Frame_Number 3 X_1 Z_1 Listing 1 : reference-file structure Test Configuration This file is needed to configure general test-suite settings that are supposed to remain unchanged throughout a batch of tests. However, it is desirable to amend them since e.g. camera settings can change. The following listing shows a sample file content 18 topic perception_pcl_objects/mesh rgb_intrinsics_matrix 1.2 0.0 6. 0.0 1.2 1.7 0.0 0.0 1.0 svm_filename trainedLinearSVMForPeopleDetectionWithHOG.yaml groundCoeffs 0.0 -1.0 0.0 1.0 maxDeviation 0.3 minConfidence -1.5 Listing 2 : test configuration file structure topic : This parameter indicates the topic name that identifies the point-cloud stored within the input bag file. rgb_intrinsics_matrix : This parameter represents the coefficients of the intrinsic camera matrix that indicate its optical, geometric and digital characteristics. svm_filename : This parameter represents the name of file needed for the support-vector-machine classifier (see below) groundCoeffs : This parameter represents the components of the plane equation needed to detect and remove the ground plane. maxDeviation : This parameter defines a threshold in meters that limits the accepted deviation from a position according to the reference file. In case of exceedance a detected object is not considered a match. minConfidence : This parameter serves the purpose to set the threshold for the HOG confidence. That is, clusters are only considered a match that exhibit a confidence value greater than minConfidence. SVM Configuration File In order to recognize and distinguish human shapes from spurious object, the PCL library utilizes a support-vector-machine (SVM) algorithm. The object recognition method is based on the histogram of oriented gradients approach that analyzes features according to the orientation and arrangement of gradients resulting from point-clouds associated 19 with clusters. The resulting features are characterized by descriptors. These descriptors must be classified in order to verify their coherence with human shapes. To this end, the SVM builds up a decision making model based on patterns. To allow the model being specific in terms of human shapes, an appropriate configuration is load that satisfies the demand of people detection. 5.2 Parameter Variation This chapter outlines the parameters provided by the PCL that have been considered in terms of their impact on the timing behaviour of the people tracker. The associated API interface as well as the functional impact of each parameter will be explained in detail. Sampling Factor If assigned a value greater than 1, this parameter down samples the number of points stored in the point cloud. Since this parameter takes effect right at the beginning of the processing pipeline (see Figure 4), all subsequent stages are affected. The value of this parameter can be set according to the following API function: void setSamplingFactor (int sampling_factor) Although points of a point-cloud object are stored in a member represented by a C++ vector, that is a one-dimensional sequence container, they are said to represent an organized point-cloud dataset. This structure consists of rows and columns, which pays off speeding up the computation of nearest neighbour operations due to known relations among points. In order to map this two-dimensional structure to a onedimensional vector, the pointcloud is characterized by its width and height. In this regard, width corresponds to the number of columns and height to the number of rows. A particular element of the vector can be accessed given the column/row coordinates as follows, whereas row ranges in [0,height[ and column in [0,width[ . Point [column, row] = Vector [row * width + column] 20 If applied, the sampling-factor downsizes both width and height by division in a linear manner. Hence, a sampling-factor of two divides the number of points by four, a sampling-factor of three by nine and so on. Voxel Size The voxel-size parameter also surves the purpouse to reduce the number of points in a given input-cloud. It is set according to the following API function: void set VoxelSize (float voxel_size) The voxel-size refers to the length of edges of 3D cubes that are put over the pointcloud as illustrated in Figure 7. Figure 7 : voxel-grip over a pointcloud It must be taken into account, that filtering by means of the voxel-size is based on the resulting point-cloud that has already been downsized by the sampling-factor. As opposed to the sampling-factor filtering, the voxel-size filtering does not proportionally apply to the entire point-cloud. That is, the points within each voxel are approximated 21 by their centroid instead of being approximated regarding the centre of the voxel. This way the approximation yields a better representation of the origin point-cloud. Height Limits The height limits determine the minimum and maximum height that is allowed for a person cluster up from the ground. These parameters are set using the following API function: template <typename PointT> void pcl::people::GroundBasedPeopleDetectionApp<PointT>:: setPersonClusterLimits (float min_height, float max_height, float min_width, float max_width) Listing 3 : function setPersonClusterLimits By default, PCL assigns min_height to 1.3 and max_height to 2.3 meter. Although it is not the primary purpose of the height limits, these parameters also have effect on the minimum respectively maximum number of points that belong to a person cluster. This is in accordance with following relations: min_points = (int) (min_height * min_width / voxel_size / voxel_size) max_points = (int) (max_height * max_width / voxel_size / voxel_size) Listing 4 : min_points / max_points equation Width Limits The min_width and max_width parameter, respectively, is set by the same API function as the height-limits. As opposed to the height-limits, these parameters do not refer to any characteristics of the people in the scene. The sole purpose of the width-limits is to determine to minimum respectively maximum number of points that a person cluster 22 contains (see above). By default, PCL assigns min_width to 0.1 and max_width to 8. Given all parameters as default (voxel_size = 0.06 meter), min_points is equal to 36 and max_points is equal to 5111 points. 5.3 Test Execution This chapter gives an instruction on how to perform a test and how the test environment has to look like. First of all, the test environment structure muss be set up as depicted in Figure 8. Test_Exe refers to the executable of the testsuite. TestConfig.txt and trainedLinearSVMForPeopleDetectionWithHOG.yaml are necessary to configure the testsuite according to the explanations of chapter 5.1. The .bag file contains the pointcloud stream and the corresponding reference file must reside in a folder named ReferFiles. Note that the names of the .bag file and its associated reference file can be arbitrary and need not follow any pattern. point_cloud_stream.bag ReferFiles Refer_point_cloud_stream.txt TestConfig.txt Test_Exe trainedLinear SVMForPeopleDetectionWithHOG.yalm Figure 8 : test directory structur Given that, a test run can be executed according to the following command: 23 ./Test_Exe file_1 file_2 parameters policy [tag] file_1 name of the .bag file including file extension file_2 name of reference file including file extension -minh parameter setting the minimum height limit as float -maxh parameter setting the maximum height limit as float -minw parameter setting the minimum width as float -maxw parameter setting the maximum width as float -vs parameter setting the voxel size as float -sf parameter setting the down sampling factor as float or int (note: a float value is converted to an int one by truncating the fractional part) policy selector for scheduling policy mode (available options: RR | FIFO | NO_RT) [tag] parameter used to tag output files Following error messages have been defined in case of particular error conditions: InputFile not found. No file found corresponding to file_1 parameter in the test directory. No reference file found. No file found corresponding to file_2 parameter in the ReferFiles folder of the test directory. Test parameter not set. Some of the parameter list items have not been set. Incorrect inputs. Test command does not match required pattern. Settings in TestConfig.txt missing. Some settings of TestConfig.txt have not been set. No TestConfig.txt found in <path/to/root>. There is no TestConfig.txt file in the test directory. Wrong input for scheduling policy. Available options : FIFO | RR | NO_RT Wrong parameter assigned to the policy parameter. 24 Failed to set policy. Setting of scheduling policy using sched_setscheduler system call failed. mlockall failed Locking of the calling process's virtual address space into RAM using mlockall system call failed. An exemplary command to invoke a test run could look like this: ./Test_Exe scene_1.bag ref_scene_1.txt -minc -1.5 -minh 1.3 maxh 2.3 -minw 0.9 -maxw 1.4 -vs 0.06 -sf 1 RR s1_3 Please note that this command can only be executed with root permissions. On success following output is generated during a test run: RR policy set. Frame Count : 36 Invoke performTest Frame : 0 Frame : 1 Frame : 2 ... Frame : 33 Frame : 34 Frame : 35 performTest finished Invoke performEvaluation performEvaluation finished Invoke writeResultFile writeResultFile finished Invoke writeReportFile writeReportFile finished Listing 5 : test-execution runtime information In this case, a new folder named Results will be created if it does not already exists. This is the location where a report and result file of each test run is stored (see chapter 5.4). 25 5.4 Metrics and Classification Figures As mentioned in the previous chapter, two files are generated after every test run. These files contain the characteristics of a particular test run. In order to evaluate and compare the results of multiple test runs, some metrics and classification figures have been introduced in the course of this thesis. In the following, both the structure of files and their contents are explained in detail. The first file is said to be the report file. Its name consists of the file_1 parameter, the file classifier Report, an optional tag and the file extension txt. E.g. given a road_scene.bag input file that was tagged with 1 the associated report file is road_scene_Report_1.txt. The content of this file exhibits the following structure: Fri Jun 5 13:32:50 2015 -- Frame : 0 ---- Time : 0.261413 ---- Cluster : 0 ------ B_X : 2.18725 ------ B_Y : 0.87723 ------ B_Z : 5.28796 ------ HOG_C : -1.10829 ---- Cluster : 1 ------ B_X : 0.404633 ------ B_Y : 0.878143 ------ B_Z : 5.5867 ------ HOG_C : -0.944061 ---- Cluster : 2 ------ B_X : 0.402876 ------ B_Y : 0.869012 ------ B_Z : 5.40856 ------ HOG_C : -0.810767 -- Frame : 1 ... Listing 6 : report-file content 26 The first line contains the date and time of test execution. The subsequent lines contain the detection results separated by frame count. Each frame includes information about its execution time and the centroid position of clusters that are expected to represent persons. Each person cluster is identified by its bottom position relative to the camera position (according to the right-hand-rule, whereas Z points into the scene and Y to the ground) and the confidence value HOG_C that is based on its histogram of oriented gradients (HOG). Both the position and HOG_C are calculated by PCL’s algorithms. Note that in this thesis the parameter minConfidence (see 5.1/Test Configuration) is equal to -1.5. Hence, clusters are only considered having a HOG_C value greater than -1.5. By means of the report-file one can check the positions and the precision of detected clusters. The second file associated with a particular test run is called the result-file. The name of this file is subject to the same pattern as the report-file. Hence, referring to the example from above, the name of the result file would be road_scene_Result_1.txt. The content of this file exhibits the following structure: # # # # # # # # # # # # # # # # # # # Fri Jun 5 13:32:50 2015 --- Parameterlist --minc : -1.5 minh : 1.3 maxh : 2.3 minw : 0.4 maxw : 1.4 vs : 0.06 sf : 1 -- Characteristics -meanTime : 1.07173 stdDevT : 0.0336307 minTime : 1.00204 maxTime : 1.15381 ---- Detection ---detRate : 0.944 maxFaSe : 1 meanHOG : -1 --------------------- 27 #FRAME 0 1 2 ... EXE_TIME 1.0645 1.1136 1.077 FP 1 3 2 FN 0 0 0 MA 2 2 2 MA_DEV 0.72 0.71 0.67 Listing 7 : result-file structure The first line contains the date and time of test execution. The following section entitled Parameterlist outlines all input parameters that characterise this test run. The second section entitled Characteristics lists time-specific classification figures such as the mean execution time of all frames (meanTme), the standard deviation (stdDevT) of execution time and its minimum (minTime) and maximum (maxTime) peak values. The subsequent section entitled Detection contains detection-specific figures. The detRate figure represents a ratio between frames where all reference positions (see chapter 5.1/ Rererence Files) have been found with a specified deviation (see chapter 5.1/ Test Configuration) and all frames of a given stream. The maxFaSeq figure represents the maximum false sequence. This is the longest sequence of frames that did not match their associated references. This figure is of interest in conjunction with detRate because a result can be considered more valuable than another having a smaller maxFaSeq, whereas detRate is equal. The meanHOG figure represents the mean HOG confidence of clusters associated with frames that contribute to the denominator of detRate. This figure is a measure for the detection accuracy. The last section contains six columns. The first one indicates the frame count and the second one its associated execution time. The FP column represents the false positive count that indicates the number of spurious object. That is, objects that have a HOG confidence greater than minConfidence but no ground truth reference. The FN column represents the false negative count that indicates the number of referenced objects being not recognized. The MA column indicates the number of objects that match a reference. The last column called MA_DEV represents a ratio between the mean deviation of matching objects in the current frame and maxDeviation parameter(see chapter 5.1/ TestConfiguration). The result-file serves the purpose to evaluate the impact of parameters according to chapter 5.2. Moreover, it can be utilized to visualize the data using gnuplot. 28 6 Test Results This chapter presents the results that have been achieved in the course of this thesis regarding the timing behaviour of PCL’s people library and its dependency on parameters according to chapter 5.2. The evaluation of the timing behaviour has been conducted based on an instance of pcl::people::GroundBasedPeopleDetectionApp<PointT>5 class that provides the method compute according to following specification: Bool compute (std::vector<pcl::people::PersonCluster<PointT> >& clusters); This method encapsulates all operations depicted in chapter 3 in order to perform people detection based on a 3D point cloud. The execution time of each frame has been determined making a time stamp immediately before and after the call of the compute method (see 4.2 / Precision of Time Measurement). The following subchapters show the results observed under various conditions. With provisions made towards a real-time capable system (see 4.2) a proof of deterministic timing behaviour is provided under steady state conditions. Since the limitation to a particular load cannot be kept up in real traffic situations, the impact of fluctuating load on the execution time will be presented. However, in terms of predictability and safety the execution time of a system is supposed to be limited in time. To this end, parameters having an impact on the execution time are evaluated. This investigation intends to quantify the extent to which the execution time can be adapted due to fluctuating load in order to meet desired time constraints and sufficient level of detection. 6.1 Proof of Performance In order to gain reliable findings regarding the correlation between the execution time and particular parameters, deterministic system behaviour poses a necessary prerequisite. To this end, measures have been applied to the test-suite according to chapter 4.2. With this 5 https://github.com/PointCloudLibrary/pcl/blob/master/people/include/pcl/people/ground_based_people_de tection_app.h 29 setup, the test-suite is expected to provide reproducible and non-fluctuating results. In order to give evidence for a deterministic behaviour, steady state conditions in terms of scene arrangement have been employed. Figure 9 illustrates the reference ground truth arrangement chosen for this purpose. The circle labelled with a C represents the position of the stereo camera systems, whereas the two circles labelled with 1 and 2 represent the positions of persons. Figure 9 : reference ground truth arrangement Given this arrangement, multiple test runs have been perform under a constant parameter set (minh = 1.3, maxh = 2.3, minw = 0.4, maxw = 1.4, vs = 0.06, sf = 3 (see 5.3)) and with a resulting detection rate (see 5.4) of 94.4% among 36 frames. The proof of determinism requires negligible deviation of execution time for each frame of the stream during multiple test runs. A deviation of execution time between two arbitrary frames within a test run results from a fluctuating point-cloud density. This is even the case with static scenes. To reduce this impact, a scene with low complexity has been chosen. 30 To emphasize the need for a real time scheduling policy and to illustrate the consequences that come along with disregard of this design issue, three batches of tests have been performed covering RR, FIFO and NO_RT policy (see 5.3). In order to preclude the impact on the execution time imposed by unpredictable processing on the system, the workload generator stress6 has been used to impose additional CPU load. The following figures show the frame execution time of a stream sample according to the set up as described above. For clarity reasons, each figure contains five test runs reflecting all test runs performed. The term no_stress indicates that no additional load is imposed, whereas stress_c1 / stress_c2 indicate that one / two processes generating CPU load are run concurrently. Regardless from the policy assigned to the test-suite, the processes invoked by stress are managed by the standard linux time-sharing policy. That is, they are inserted into a low prioritised queue (static priority equals to 0) with all other processes that do not require real-time behaviour. The decision which process to run from this queue is based on a dynamically determined priority that considers only processes resided within this queue. Round Robin policy As one can see from Figure 10, Figure 11 and Figure 12 there is no noticeable impact on the execution time of any frame during multiple test runs and different load conditions with RR policy. This finding allows using the test-suite for further investigation when operated on RR because in this case all effects observed can be attributed to the parameter set instead off any interference. 6 http://linux.die.net/man/1/stress 31 Figure 10 : RR policy / no resource-consumer Figure 11 : RR policy / one resource-consumer 32 Figure 12 : RR policy / two resource-consumers First In First Out policy As illustrated in Figure 13, Figure 14 and Figure 15 executing the test-suite under FIFO policy yields similar results compared to RR policy. Hence, the FIFO policy option is as well-suited as the RR one in order to quantify the impact of parameters according to chapter 5.2. Since there is no remarkable difference between results achieved with RR and FIFO policy, all findings shown in the following apply to both policies equally. 33 Figure 13 : FIFO policy / no resource-consumer Figure 14 : FIFO policy / one resource-consumer 34 Figure 15 : FIFO policy / two resource-consumers NO_RT policy: Using the NO_RT option treats the test-suite process the same way as any other non real-time process such as those invoked by stress. The results arising from this configuration are depicted in Figure 16, Figure 17 and Figure 18. As you can see from Figure 16 and Figure 17 additional load up to one resources-consumer does not apparently affect the execution time of the test-process. Being exposed to two concurrently running resource-consumers, as illustrated in Figure 18, no deterministic behaviour can be guaranteed any more. This observation can be attributed to the underlying hardware (see 4.1/ Test Platform) that exhibits a dual-core CPU. Hence, two processes invoked by stress and the test-suite must share limited resources being managed by linux’s standard time-sharing policy as described above. With respect to a reliable investigation of parameters, this finding makes the demand of a real-time scheduling policy evident. 35 Figure 16 : NO_RT policy / no resource-consumer Figure 17 : NO_RT policy / one resource-consumer 36 Figure 18 : NO_RT policy / two resource-consumer 6.2 Load Analysis In the previous chapter it has been shown that the test-suite complies with the requirement of repeatable results and resistance against load on the system. However, all results presented have been conducted under steady state conditions. That is, no impact of the environmental data acquired by stereo cameras has been considered. In real traffic situations it is almost not possible to predict the complexity and load imposed by scenes varying from modest highway conditions up to complex urban traffic. Since this accounts for the execution time, this chapter intends to demonstrate the range of possible execution time based on a couple of scenes with different complexity. To this end, less complex scenes with well defined structure up to urban traffic situations have been evaluated using the testsuit. The following scenes have been selected based on a qualitative assessment. The timing information have been achieved using a constant parameter set (minh = 1.3, maxh = 2.3, minw = 0.4, maxw = 1.4, vs = 0.06, sf = 1 (see 5.3)). In the following, five different scenes will be presented. Each scene features a brief explanation of complexity and the results of minimum 37 (minTime), maximum (maxTme) and mean (meanTime) execution time achieved among all computed frames. Minimum Load Scene This scene has been selected with respect to little complexity reflected by a small number of objects and plain background. Figure 19 : minimum-load scene meanTime: 1.06766 / minTime: 0.997607 / maxTime: 1.15153 Low Load Scene This scene has been selected due to a little number of less complex objects within close range. Figure 20 : low-load scene meanTime: 1.18073 / minTime: 1.1281 / maxTime: 1.39354 Medium Load Scene This scene is said to represent a medium load since it exhibits an increased number of relevant objects within an extended visible range. 38 Figure 21 : medium-load scene meanTime: 1.51551 / minTime: 1.40675 / maxTime: 1.75253 High Load Scene This scene has been identified to pose high load since it exhibits an extended number of objects compared to the medium load scene. Figure 22 : high-load scene meanTime: 1.98782 / minTime: 1.68413 / maxTime : 2.36058 Maximum Load Scene This scene generates maximum load on the system which can be attributed to a wide visible range and a high number of objects in comparison with the other scenes. Figure 23 : maximum-load scene meanTime: 2.2505 / minTime: 2.10289 / maxTime : 2.73859 39 As one can see from the results above, two significant findings can be deduced considering increased complexity of environmental conditions. At first, the impact on the execution time varies within a broad range. This can be seen from the mean time results. Comparing the minimum and maximum load conditions, the execution time can even double. Secondly, the fluctuation of the execution time rises with increased complexity, which can be seen from the range covered from minTime up to maxTime. Both findings are visualized in Figure 24 with the X axis representing the complexity of scene as 1 = minimum load, 2 = low load, 3 = medium load, 4 = high load and 5 = maximum load. Figure 24 : time per frame vs. complexity of scene 40 Given this insight into the timing behaviour of the people tracker, the necessity for a mechanism to adapt the execution time to a particular upper bound becomes apparent. The concern of the following chapter is to expose the feasibility of adaptive execution time based on results that have been achieved with parameters according to chapter 5.2. 6.3 Impact of Parameterization This chapter reveals the results that have been achieved with the aim to adapt the execution time of PCL’s people tracker algorithm when less but still sufficient precision is accepted in favour of shorter execution time. The motivation for this is driven by findings presented in previous chapters. As exemplified through scenarios of varying complexity, the execution time may even double. However, this ratio may only be regarded as a rough guideline rather than a design benchmark. Since the complexity of real traffic situations can at best only be estimated, the need to explore algorithms capable of adaptive execution time becomes apparent. To this end, the impacts on the execution time of parameters according to chapter 5.2 have been investigated. In the following, each parameter is evaluated not only regarding the execution time but also with respect to the quality of results that is reflected by the detection rate (see chapter 5.4/ detRate). Among the other matrices presented in chapter 5.4, the detection rate turned out to lend itself best in order to provide condensed and significant information of the quality. The results of each parameter shown in the following refer to the ground truth setup as depicted in Figure 9. Apart from this one, other ground truth configurations have been examined, which partially confirmed the findings stated below, though with much less clarity. In this regard, three major circumstances can be stated to which the more significant results of the setup according to Figure 9 can be attributed. Firstly, the recognition of persons turned out to diminish the farther a person stands apart from the centre of the scene. Secondly, a fall in recognition of persons has been observed when partial visibility of the persons occurred. That is, as soon as the legs have been hidden up to the knees, the recognition diminished dramatically. Finally, too little contrast between the background and a person induced an insufficient segmentation resulting in bad recognition. 41 6.3.1 Sampling Factor As mentioned in chapter 5.2, the sampling factor serves the purpose to reduce the number of points within the point cloud. The effect of this measure relates to all stages of the people tracker pipeline (see Figure 4). Since less points result in a diminished computational effort, the execution time is expected to steadily decrease with increasing sampling factor. The results of the sampling factor analysis shown in the following have been achieved under constant conditions as concerns the other parameters. Theses have been assigned to: minh : 1.3 / maxh : 2.3 / minw : 0.4 / maxw : 1.4 / vs : 0.06 The test procedure spanned multiple test-runs with varying sampling factor in the range from one up to thirteen in order to verify negligible fluctuations among the test-runs as introduce in chapter 6.1. With every sampling factor considered deterministic results have been achieved. Since the maximum execution time is of primary importance, this figure has been taken into account in conjunction with the detection rate of the test-runs. The line graph in Figure 25 illustrates the correlation between the execution time and detection rate depending on the sampling factor. The sampling factor is plotted on the X axis. The Y axis on the left hand side represents the maximum execution time, whereas the one on the right hand side shows the detection rate. The blue line represents the course of execution time and the red one the associated detection rate. 42 Figure 25 : sampling factor – execution time vs. detection rate As one can see from the blue line, the execution time is steadily falling with an increasing sampling factor, which confirms the assumption stated above. This trend can especially be observed for sampling factors in the range from one up to six, whereas further increase of the sampling factor entails a diminished drop of execution time. As indicated by the red line, the detection rate exhibits a stable and high level up to a sampling factor of eight. Beyond this value a rapid decline of detection rate occurs leading to zero at a sampling factor of eleven. A striking feature emerges at a sampling factor of eight where a peak of the detection rate can be seen although less information is provided by further downsampling. This can probably be attributed to the underlying support-vector-machine person classifier that yields a higher recognition under this particular condition. The view on the execution time and the detection rate shows a shifted decline of both characteristics. The execution time decreases the most when the detection rate remains 43 pretty stable and vice versa. Additionally, the detection rate experiences a rapid drop from about 90% to zero. In terms of adaptable timing behaviour, the observed characteristic deviates from the desired case where both the execution time and the detection rate simultaneously decline over a wide range of the sampling factor. Based on these results, no satisfying capability of execution time reduction can be assigned to the sampling factor. 6.3.2 Voxel Size Similar to the sampling factor, the voxel size filtering lowers the number of points in the point-cloud. Hence, this parameter also affects all stages of the people tracker pipeline (see Figure 4). Due to diminished computational effort the resulting execution time is expected to steadily fall with increased voxel size. The results of the voxel size analysis shown in the following have been achieved under constant conditions as concerns the other parameters. Since the sampling factor takes effect prior to the voxel size filtering, its value has been assigned to one. This implies no filtering as concerns the sampling factor, which means that all findings observed can be attributed to the voxel size filtering. The parameter set have been assigned to: minh : 1.3 / maxh : 2.3 / minw : 0.4 / maxw : 1.4 / sf : 1 The test procedure spanned multiple test-runs with varying voxel size in the range from two up to twenty-two cm in size in order to verify negligible fluctuations among the test-runs as it has been done with the sampling factor. With every voxel size considered deterministic results have been achieved. The evaluation of the voxel size factor is based on the maximum execution time in conjunction with the detection rate. The line graph in Figure 26 illustrates the correlation between the execution time and detection rate depending on the voxel size. The voxel size is plotted on the X axis. The Y axis on the left hand side represents the maximum execution time, whereas the one on the right hand side shows the detection rate. The blue line represents the course of execution time and the red one the associated detection rate. 44 Figure 26 : voxel size – execution time vs. detection rate As indicated by the blue line, the execution time is steadily falling showing a less rapid decline compared with the one of the sampling factor. Generally, this timing behaviour is regarded more suitable in terms of adaptation then the one associated with the sampling factor (see Figure 25). This is the case since the timing characteristic of the voxel size parameter is not that limited to a range where the gradient of execution time is not too high respectively too low. As illustrated in Figure 26, a range from four up to eighteen cm in voxel size can be regarded matching this demand. The course of the detection rate, as shown by the red line, exhibits high gradients associated with small voxels up to five cm respectively large voxels beyond fifteen cm in size. These ranges are not suitable for execution time adaptation because almost no change of the execution time comes along with a sharp drop of detection rate. Of particular interest in terms of detection rate is the range of ten up to fifteen cm in voxel size. This range features a smooth decline making it well-suited in terms of scalable behaviour. Thanks to approximately linear decline between 95% and 65%, the 45 detection rate can be regarded as sufficient to make reasonable decisions. A striking feature can be seen at eighteen cm in voxel size indicated by a peak of detection rate after having dropped to zero. This peak is likely induced by the underlying support-vector-machine person classifier that indicates this faulty detection. Similar behaviour has already been observed with the sampling factor. Taking into account both the execution time and the detection rate, one can consider the range from ten up to fifteen cm in voxel size being most suited to balance between execution time and quality of results. Taking the execution time at ten cm in voxel size as benchmark, the execution time can be reduced by 25% compared to its value at fifteen cm in voxel size. In spite of this remarkable reduction of execution time, the impact of the sampling factor on this result must not be neglected since the sampling factor takes effect beforehand. This interaction between both parameters is expected to lower the effect of the voxel size with increasing sampling factor. 6.3.3 Height Limits According to the explanations of chapter 5.2, the height limits serve the primary purpose to set the range of person clusters’ height. Only clusters matching these limits are fed into the subsequent person classification that is based on the trained support-vector-machine approach. Furthermore, the height limits contribute to the maximum or minimum number of points contained in a person cluster (see chapter 5.2/ Height-Limits). Since no distinct separation of the impacts associated with both effects of the height limits can be provided, the following assessment of results assumes that the limitation of height predominates the influence on the number of points. Therefore, the impact of the height limits contributes to the 3D clustering and people detection stages of the people detection pipeline (see Figure 4). The results of the height limits analysis shown in the following have been achieved under constant conditions as concerns the other parameters. These have been assigned to: minw : 0.4 / maxw : 1.4 / sf : 1 / vs : 0.06 The evaluation of the height limits is based on the maximum execution time in conjunction with the detection rate. To this end, multiple test-runs with varying height limits have been repeated in order to confirm deterministic behaviour according to the findings of chapter 46 6.1. As mentioned above, the following results refer to the ground truth setup shown in Figure 9 that features two persons. Both the persons are about 1.8 meter in size. Therefore, the test strategy keep focused on that size as average of the minimum and maximum height. This is indicated by the header of Figure 27.The impact induced by the height limits has been tested applying intervals of different range limited by minimum and maximum height. That is, the height range of four cm is reflected by minh = 1.78m and maxh = 1.82m. The line graph in Figure 27 illustrates the correlation between the execution time and detection rate depending on the height range limited by minimum and maximum height. The height range is plotted on the X axis. The Y axis on the left hand side represents the maximum execution time, whereas the one on the right hand side shows the detection rate. The blue line represents the course of execution time and the red one the associated detection rate. Figure 27 : height limits – execution time vs. detection rate 47 As indicated by the blue line, the impact on the execution time caused by the height limits turns out to be much less significant in comparison to the results of the sampling factor or voxel size. The linear increase of the blue line implies a linear distribution of clusters regarding their size. Since the reference ground truth only contains two persons, it is likely to observe a higher gradient of the execution time in case of more complex scenes featured with more clusters. Among other factors, this assumption may contribute to the wide range of execution time based on scenes of different complexity as presented in chapter 6.2. The course of the detection rate can be subdivided into two sections. The first one is identified by a range up to four cm. This section exhibits a rapid drop of detection rate that does not satisfy the need for a sufficient detection rate in order to make reliable decisions. The second section spreads out beyond the height range of four cm. This section is identified by a high detection rate of at least 75%. Considering this section as appropriate for executiontime-adaptation, one can reduce the execution time by almost 10% if the time at height range of four cm is taken as benchmark. Applying this parameterization approach, it must be taken into account that real traffic situations exhibit people widely varying in size. Hence, this measure could only be applicable in conjunction with some kind of people size monitoring to ensure that minh and maxh cover all people being in front of the vehicle. 6.3.4 Width Limits By means of width limits, one can set the minimum respectively maximum number of points that apply to a person cluster. This measure of execution time adaptation takes effect in the 3D clustering and people detection stage of the people tracker pipeline (see Figure 4). The adjustment of the point number of person clusters is accomplished in accordance with the equations presented in Listing 4. Except from the width limits the height limits as well as the voxel size contribute to the total number of points that characterize a person cluster. In order to constrain this influencing factor to the width limits all other parameters have been assigned constant values as follows: minh : 1.3 / maxh : 2.3 / sf : 1 / vs : 0.06 By default, minw and maxw are assigned to 0.1 respectively 8. Considering the other parameters default as well, this leads to 36 respectively 5111 points in total. Since this is a 48 very wide range, it is worth thinking about the impact of its scope and the effects resulting from the limitation of points associated with a person cluster. To this end, two test strategies have been applied. The first one focuses on the impact of the lower bound of points given a constant upper bound, whereas the second strategy investigates the reversed case. In the course of both strategies each parameter set has been repeated multiple times, which confirmed the deterministic behaviour shown in chapter 6.1. The subsequent evaluation refers to the number of points instead of the minimum / maximum width because both can only be considered in conjunction with the other parameters of Listing 4. Therefore, it is more general to provide the number of points instead of min /max width because they can be deduced from the other parameters. In the course of the investigation according to the first strategy, a constant upper bound of 1000 points and a lower bound in the range from 50 up to 550 points has been taken into account. The second strategy considers the upper bound ranging from 700 up to 5000 points keeping a constant lower bound of 450 points. The line graphs in Figure 28 and Figure 29 illustrate the correlation between the execution time and detection rate depending on the lower / upper bound of points. The respective constant bound is indicated at the head and the respective variable bound is plotted on the X axis. The Y axis on the left hand side represents the maximum execution time, whereas the one on the right hand side shows the detection rate. The blue line represents the course of execution time and the red one the associated detection rate. 49 Figure 28 : impact of lower point bound As indicated by the blue line in Figure 28, the execution time is constantly falling with an increasing lower bound. This can be attributed to fewer clusters that are fed to the person detection stage (see Figure 4). As a consequence less computational effort is needed. This effect stagnates beyond 300 points as the number of clusters does not significantly decrease further. The impact on the detection rate is almost not apparent up to a lower bound of 450 points as indicated by the red line. Further increase of the lower bound results in a rapid drop of the detection rate. This behaviour reveals that there are almost no clusters matching person characteristics in the range between 500 and 1000 points. 50 Figure 29 : impact of upper point bound Given a constant lower bound of 450 points, the execution time rises with an increasing upper bound as shown in Figure 29, which can be attributed to a growing number of clusters. This effect stagnates beyond 2000 points because the number of potential cluster being subject to the person detection stage does not grow further. The impact on the detection rate becomes only apparent beneath an upper bound of 800 points as indicated by the red line. As already noticed from Figure 28, a range of at least 450 up to 900 points is needed in order to capture all clusters regarded as persons. All in all, both test strategies turned out that the capability of width limits to adapt the detection rate in favour of execution time is not applicable. However, the findings gathered from both test strategies point out that an equally high level of detection rate can be 51 achieved saving about 20% of execution time if the number of points is adjusted to a suitable range covering all clusters without an overhead of points. This adjustment should be tuned by means of width limits because the other parameters of Listing 4 primarily relate to their particular purposes. 52 7 Conclusion This thesis makes a contribution to algorithms for future automotive driving functions addressing the issue of limited WCET predictability when 3D information of the environment is processed. Dealing with this question is of high importance for research and development because reliable responsiveness and safety will certainly pose a prerequisite for the certification in traffic use. Many widespread algorithms, such as those provided by the point cloud library, emerge from research on universities and open-source projects that primarily focus on their feasibility. The transition of these approaches into industrial scale requires additional investigation and in-depth validation. To this end, a test-suite has been developed that focuses on essential parameters having impact on the execution time of PCL’s people library algorithms. These parameters can be distinguished into those having impact on the number of points associated with frames and those being related to clusters recognized within these frames. The voxel size and the sampling factor refer to the first category, whereas the width and height limits to the latter one. As we have seen from shots of various road scenarios, the complexity of the scene leads to a broad extent of execution time. In our case, we have observed a doubling of execution time between the simplest scenario and the most complex one. However, this result provides neither an upper bound of the execution time nor a reliable benchmark for system design. In fact, it is almost not possible to predict the WCET when dealing with image processing and object recognition based on machine learning. A suitable approach to meet the demands of real-time applications is given by adaptive algorithms that allow scalable accuracy of results and execution time. In this regard, the use of the sampling factor turned out to lack applicable correlation between the execution time and detection rate in order to reduce execution time, whereas the detection rate remains on sufficient level. Unlike the sampling factor, the usage of the voxel-size has shown more promising results. When applied within particular range, the voxel-size allows reducing the execution up to 25% and keeping the detection rate above 70% at the same time. Similar results have been observed using the height limits although not with such an extent. The evaluation of the width limits hasn’t yielded any applicable usage in terms of execution time and detection rate scalability. Nevertheless, we have seen a significant reduction of execution time when the range of points associated with person clusters is adapted in an appropriate manner. 53 Talking about the results of this thesis, the applicable limitations should not be concealed. All findings and results presented in chapter 6.3 are based upon simple scenarios that have been set up with the aim to clearly indicate significant tendencies. Hence, more scenarios with a broad range of complexity would be needed in order to provide further insights and to reinforce those already found. Additionally, each parameter has been considered without any interrelation to the others. This assumption cannot be kept up because there are interrelations that have been mentioned such as the order of down-sampling and voxelfiltering. Finally, the impact of the image acquisition system has not been considered at all. A few more systems should be taken into account in order to quantify this factor. This brief outline of open questions pinpoints the way for follow-up activities that are needed moving towards future automated driving functions making use of cognitive algorithms. 54 Bibliography [1] Insup Lee, Joseph Y-T. Leung, Sang H. Son / Handbook of Real-Time and Embedded Systems / Chapman & Hall 2008 [2] Giorgio C. Buttazzo / Hard Real-Time Computing Systems / Springer 2011 [3] Thomas Dean, Mark Boddy / An Analysis of Time-Dependent Planing / Department of Computer Science Brown University 1988 [4] Ihme T., Wetzelsberger K., Speckert M., Fischer J. / Real-time Image Processing based on a Task-pair Scheduling Concept / 2011 IEEE International Conference on Robotics and Auto-mation (ICRA 2011). Shanghai International Conference Center, Shanghai, China, May 9-13, 2011, pp. 5596-5601 [5] Marschik N., Speckert M., Ihme T. / Towards Adaptive scheduling for Real-Time Image Processing / Autonome Mobile Systeme 2012 (AMS) [6] https://github.com/PointCloudLibrary/pcl/tree/master/people [7] Wilhelm R., Engblom J., Ermedahl A., Holsti N., Thesing S., Whalley D., Bernat G., Ferdinand C., Heckmann R., Mitra T., Mueller F., Puaut I., Puschner P., Staschulat J., Stenström P. / The Worst-Case Execution Time Problem — Overview of Methods and Survey of Tools / ACM Transactions on Embedded Computing Systems (TECS) ,Volume 7 Issue 3, April 2008, Article No. 36 [8] Shlomo Zilberstein / Using Anytime Algorithms in Intelligent Systems / AI Magazine ,Volume 17 Number 3, 1996 [9] https://www.willowgarage.com/papers/3d-here-point-cloud-library-pcl [10] http://pointclouds.org/media/ias2014.html / M.Munaro / RGB-D people detection 55 [11] http://pointclouds.org/documentation/tutorials/walkthrough.php#walkthrough [12] http://pandaboard.org/sites/default/files/board_reference/ES/ Panda_Board_Spec_DOC-21054_REV0_1.pdf [13] https://rt.wiki.kernel.org/index.php/CONFIG_PREEMPT_RT_Patch [14] http://linux.die.net/man/2/sched_setscheduler [15] Michael Kerrisk / The Linux Programming Interface: A Linux and Unix System Programming Handbook / No Starch Press, 2010 [16] https://rt.wiki.kernel.org/index.php/HOWTO:_Build_an_RT-application [17] http://linux.die.net/man/2/clock_gettime [18] http://www.bmwcarit.com/downloads/publications/ ValidatingTheRealTimeCapabilitiesOfTheROSCommunicationMiddleware.pdf [19] https://www.yoctoproject.org/ [20] Kushal Koolwal / Investigating latency effects of the Linux real-time Preemption Patches (PREEMPT RT) on AMD’s GEODE LX Platform / VersaLogic Corporation 3888 Stewart Road, Eugene, OR 97402 USA [21] Felipe Cerqueira, Björn B. Brandenburg / A Comparison of Scheduling Latency in Linux, PREEMPT RT, and LITMUS / Max Planck Institute for Software Systems [22] https://github.com/bmwcarit/meta-ros