Technische Hochschule Nürnberg Georg Simon Ohm

advertisement
Technische Hochschule Nürnberg Georg Simon Ohm
Fakultät Elektrotechnik Feinwerktechnik Informationstechnik
Studiengang Softwareengineering
Masterarbeit von
Michael Mühlbauer-Prassek
Profiling of Cognitive Algorithms for Future
Automotive Driving Functions
SS 2015
Technische Hochschule Nürnberg Georg Simon Ohm
Fakultät Elektrotechnik Feinwerktechnik Informationstechnik
Studiengang Softwareengineering
Masterarbeit von
Michael Mühlbauer-Prassek
Profiling of Cognitive Algorithms for Future
Automotive Driving Functions
SS 2015
Ersteller
:
Michael Mühlbauer-Prassek
Erstprüfer
:
Prof. Dr. Bruno Lurz
Zweitprüfer
:
Prof. Dr. Hans-Georg Hopf
Betreuer
:
Dr. Lukas Bulwahn (BMW Car IT GmbH)
Abgabedatum
:
04.09.2015
Declaration
Ich, Michael Mühlbauer-Prassek, Matrikel-Nr. ________________, versichere, dass ich die
Arbeit selbständig verfasst, nicht anderweitig für Prüfungszwecke vorgelegt, alle benutzten
Quellen
und Hilfsmittel angegeben sowie wörtliche und sinngemäße Zitate als solche
gekennzeichnet habe.
I herewith declare that I worked on this thesis independently. Furthermore, it was not
submitted to any other examining committee. All sources and aids used in this thesis,
including literal and analogous citations, have been identified.
Michael Mühlbauer-Prassek
Abstract
Future automated driving functions will not only build upon sensors measuring physical
quantities directly related to the vehicle such as angular and longitudinal accelerations.
Additional information acquired by 3D laser scanners, radars and stereo cameras will be fed
into an environmental model in order to automate driving functions and relieve the driver.
As opposed to the first sort of data, the computational load imposed by environmental
conditions is hardly to predict. Since current research is moving towards autonomous
driving, this issue is of great importance when it comes to real-time image processing. In this
area, traditional real-time-system design, which is based on the worst case execution time,
turned out to lack suitable methods due to algorithms being subject to almost unpredictable
worst case execution time. Therefore, adaptive computational approaches are worth
thinking about considering both the execution time and the quality of results. Given a
system capable to balance these characteristics against each other by means of particular
parameters, computationally demanding peak loads could be managed using less precise but
still sufficient results in favour of shortened execution time.
With reference to this approach, this thesis deals with the timing behaviour of PCL’s
people library, that features people recognition based upon pointcloud data. To this end,
parameters have been identified that are expected to have impact on the execution time
and a test suite has been developed to provide a comprehensive insight into the correlation
of execution time and quality of results associated with those parameters. In order to
establish practice-oriented conditions throughout the tests, an embedded platform is
employed and a real-time capable environment is set up.
V
Table of Contents
List of Figures .............................................................................................................................VI
1
Introduction ........................................................................................................................ 1
2
Problem Space .................................................................................................................... 3
2.1
Measurement Techniques ........................................................................................... 3
2.2
Static Analysis .............................................................................................................. 5
2.3
Any-Time Approach ..................................................................................................... 6
3
Object under Investigation ................................................................................................. 9
4
Test Environment ............................................................................................................. 11
5
6
7
4.1
Hardware Setup ......................................................................................................... 11
4.2
Software Setup........................................................................................................... 12
4.3
Test Suite Setup ......................................................................................................... 14
Test Approach ................................................................................................................... 16
5.1
Input Data .................................................................................................................. 16
5.2
Parameter Variation .................................................................................................. 19
5.3
Test Execution ............................................................................................................ 22
5.4
Metrics and Classification Figures ............................................................................. 25
Test Results ....................................................................................................................... 28
6.1
Proof of Performance ................................................................................................ 28
6.2
Load Analysis .............................................................................................................. 36
6.3
Impact of Parameterization ....................................................................................... 40
6.3.1
Sampling Factor .................................................................................................. 41
6.3.2
Voxel Size ............................................................................................................ 43
6.3.3
Height Limits ....................................................................................................... 45
6.3.4
Width Limits........................................................................................................ 47
Conclusion ........................................................................................................................ 52
Bibliography .............................................................................................................................. 54
VI
List of Figures
Figure 1 : distribution of execution time ................................................................................... 4
Figure 2 : phases of static WCET analysis .................................................................................. 5
Figure 3 : example of any time computation ............................................................................ 8
Figure 4 : people detection pipeline ........................................................................................ 10
Figure 5 : mounting position of stereo camera ........................................................................ 11
Figure 6 : raw image vs. pointcloud ......................................................................................... 16
Figure 7 : voxel-grip over a pointcloud..................................................................................... 20
Figure 8 : test directory structur .............................................................................................. 22
Figure 9 : reference ground truth arrangement ...................................................................... 29
Figure 10 : RR policy / no resource-consumer ......................................................................... 31
Figure 11 : RR policy / one resource-consumer ....................................................................... 31
Figure 12 : RR policy / two resource-consumers ..................................................................... 32
Figure 13 : FIFO policy / no resource-consumer ...................................................................... 33
Figure 14 : FIFO policy / one resource-consumer .................................................................... 33
Figure 15 : FIFO policy / two resource-consumers .................................................................. 34
Figure 16 : NO_RT policy / no resource-consumer .................................................................. 35
Figure 17 : NO_RT policy / one resource-consumer ................................................................ 35
Figure 18 : NO_RT policy / two resource-consumer ................................................................ 36
Figure 19 : minimum-load scene .............................................................................................. 37
Figure 20 : low-load scene........................................................................................................ 37
Figure 21 : medium-load scene ................................................................................................ 38
Figure 22 : high-load scene ...................................................................................................... 38
Figure 23 : maximum-load scene ............................................................................................. 38
Figure 24 : time per frame vs. complexity of scene ................................................................. 39
Figure 25 : sampling factor – execution time vs. detection rate ............................................. 42
Figure 26 : voxel size – execution time vs. detection rate ....................................................... 44
Figure 27 : height limits – execution time vs. detection rate .................................................. 46
Figure 28 : impact of lower point bound.................................................................................. 49
Figure 29 : impact of upper point bound ................................................................................. 50
1
1 Introduction
Nowadays, embedded systems serve different purposes in modern cars. Considering driving
assistance and safety features, e.g. navigation systems or ESP, that have become common
standard in the past few years, none of them could be realized without an increasing spread
and networking of electronic control units. In this respect, the development of smaller,
cheaper and at the same time more powerful electronics enables engineers to design more
sophisticated features, which will lead to autonomous driving in foreseeable future. This
trend comes along with an increasing complexity of system architecture, which constitutes
new challenges to software engineering. Moving towards automated driving functions
requires highly dependable systems because any malfunction could cause severe hazards to
humans and the environment. To this end, real-time requirements have to be met in a
reliable manner. The knowledge of the worst case execution time (WCET) is a prerequisite of
established real-time design methods in order to devise schedulable systems ensuring
deterministic, predictable and timely computation under any conditions [1]. The challenge of
the WCET determination in respect to demanding software components results from its
dependency on the complexity of code, extent and diversity of input data, current state of
software / hardware and the underlying hardware architecture. The goal of WCET analysis
methods is to provide safe and tight upper bounds in reasonable time and cost. The
difficulties in this area of computer science are reflected by an ongoing research on analysis
methods addressing this issue since the early days of real-time applications [7]. A common
objective of these methods is to achieve a more precise estimation of the WCET because
their results yield more or less pessimistic upper bounds including safety margins that
mitigate uncertainties [1]. Considering modern hardware, this goal becomes even more
complex. This is due to the usage of caches, pipelining and other features to increase the
average-case performance, which makes it increasingly difficult to predict its behavior [2]. In
addition, input-data dependent algorithms, used to obtain surrounding information from
sensors and cameras, raise the complexity to predict tight upper bounds for the WCET [4].
To encounter these difficulties, it is worthwhile considering a decision-making scheme based
on information resulting from a trade-off between quality and computation time. In this
regard, the term any-time algorithms has been established by Thomas Dean and Mark
Boddy [3]. The idea of any time algorithms is to provide as precise results as possible in
currently available time when data-dependent computation comes into play. This approach
2
intends to improve the overall capacity utilization not relying on predetermined WCET. At
the same time, this approach prevents the system from violating deadlines in a few timecritical situations by providing less precise but still sufficient results, which should do to
make safe decisions [5]. Under regular conditions, a safety margin, represented by higher
quality of results, guarantees stable responsiveness and high accuracy. With reference to
such an approach, this master thesis deals with the people library of the point cloud library
project [6] - in the following called “people tracker” – that features human detection based
on stereo camera images. The goal of this thesis is to investigate and verify people tracker’s
capability to provide reliable information in scalable time by means of parameterization. If
confirmed, this property could be used to limit the execution time of the people tracker
under high load induced through demanding environmental conditions.
3
2 Problem Space
As mentioned in the introduction, the determination of WCET is a decisive part in the
development of realtime-critical applications. This chapter outlines methods employed to
estimate the WCET and illustrates problems and limitations in case of modern hardware and
algorithms in the area of artificial perception whose execution time is significantly affected
by the complexity of input-data. Referring to these issues, the any-time approach will be
presented in order to provide a measure to mitigate the effort associated with WCET
estimation and scheduling of real-time applications.
Generally speaking, the methods of WCET estimation can be distinguished into
measurement techniques and static analysis. Both the methods have in common that the
complexity of software and hardware creates a significant burden in practical use and lead
to specialized solutions for particular conditions, which increases the time and money spent
in the course of development [1]. Measurement techniques are typically suited when the
average-case execution time is of most interest and sporadic exceedance of deadlines can be
tolerated. When it comes to timing-critical applications with crucial safety requirements,
static analysis in conjunction with measurement (hybrid methods) is the means of choice.
Regardless of the method chosen, the rule of thumb in the development of real-time
applications implies simplicity and transparency of code because its structure has direct
impact on the analyzability [1]. Unfortunately, requirements, such as statically bounded
loops, minimization of input-data dependency and simple code structures ([1],[2]) are
almost not feasible considering applications in the field of image processing and machine
learning. Apart from that, modern processor architectures and its memory systems,
designed towards optimized throughput, contribute to the required effort for WCET
estimation by adding even more uncertainty. This is reflected by execution times depending
on the instruction history, which may entail a fluctuation of several orders of magnitude for
a single instruction [7].
2.1 Measurement Techniques
Measurement techniques permit the execution time determination of a piece of code by
means of logic analyzers, hardware traces, high-resolution timers, emulators and other
4
means [1]. However, all these techniques require a certain degree of debug and analysis
features build in the hardware. Apart from that, the evaluation of the timing behavior by
means of measurement has impact on the timing itself, when instrumentation is needed.
This phenomenon is called probe effect. A general issue of measurement techniques is the
fact that they are limited by the time available to perform a bounded number of test runs
and the coverage of selected input-data in conjunction with a broad diversity of initial states
that are possible on advanced processors. Apart from that, measurement techniques are
known to underestimate the WCET and there is no evidence that the real WCET can be
observed in the course of measuring [7]. This fact constitutes a significant drawback of
measurement techniques because it has to be considered by safety margins, which – in
order to guarantee reliability – lead to over allocation of resources [1]. Moreover, the
complexity of hardware and software makes it more and more difficult to deduce and
compile appropriate data-sets putting the system under test into most stressful states [7].
These issues are illustrated in Figure 1.
As indicated by the light gray area, the observed values of the WCET represent a subset of
possible but not noticed values (dark gray area) during the course of measurement. Of
special interest are peak values arising from high-load conditions that are hard to replicate
being subject to extraordinary conditions. Therefore, safety margins must be applied in
order to deduce reliable upper bounds for the WCET.
Figure 1 : distribution of execution time [1]
5
2.2 Static Analysis
In comparison to measurement techniques, static analysis lends itself to more precise
approximation of the real WCET. However, the same difficulties apply to the effort while
hardware and software becomes more complex. A static analysis comprises three stages [7],
as depicted in Figure 2.
Figure 2 : phases of static WCET analysis [1]
Flow Analysis Stage:
At this stage of analysis possible paths of a piece of code are identified in order to
examine the dynamic behavior of the code regarding interdependencies of conditions,
function calls and the impact of bounds on loop iterations.
Low-Level Analysis Stage:
The goal of this stage of static analysis is to derive the execution time of machine
instructions based on the compiled object code with respect to the underlying
hardware. This is achieved by a timing model that reflects the hardware specification.
6
Calculation Stage:
At this stage of analysis the results of the flow analysis and the low-level analysis are
merged to derive the WCET based on identified execution paths and associated machine
instruction timing.
Although the scheme of the static analysis looks straightforward, it is subject to two major
aspects making it fairly complex. The fist one is going by the fact that it is almost not possible
to specify all paths in a given piece of code because bounds on loops must be applied to
ensure finite execution [1]. The other aspect can be attributed to the complexity of
performance-enhancing hardware features that must be taken into account by the timing
model. Although the model does not need to reproduce all details of the hardware, the state
space of advanced processors exhibits time-consuming analysis effort [7]. Since the
development and examination of timing models imposes considerable costs and their
dependency on a particular architecture hampers universal application to a large range of
processors employed in embedded systems, the use of static analysis still represents a
remarkable burden. Furthermore, a survey on several tools published in [7] issued
limitations regarding code-structure (loop nesting, use of pointers, dynamically allocated
data), flow analysis and detection of infeasible paths, programming language, accuracy
regarding overestimation and delays caused by preemption / context-switches.
2.3 Any-Time Approach
As illustrated above, both measurement techniques and static analysis as means of
development for real-time applications suffer from a certain amount of limitations.
Considering desirable features of real-time applications, such as predictability and
robustness [2], in order to guarantee expected system behavior and the capability to
manage overload conditions, these methods turned out to be insufficient when it comes to
data dependent runtimes typically found in the area of image-processing applications [5].
Apart from that, such applications exhibit a situation-dependent demand of resulting quality.
The development towards a predetermined worst case scenario would result in low system
utilization [8]. However, this scenario could only be an estimate and the system itself would
still be prone to rare peak loads. The any-time approach addresses this problem of WCET
estimation and scheduling of input-data dependent algorithms. Its approach pursues the
7
strategy to provide suitable results within a short time and to improve the accuracy as long
as deadlines can be met. Since this case is not characterized by a fixed WCET but rather a
dynamically determined upper bound, the term expected case execution time (ECET) has
been established [5]. The determination of the ECTE is accomplished by means of statistics
and system monitoring [4], whereas the utility of results is derived from performance
profiles that indicate the output quality of an algorithm depending on its computation time
[8]. The flexibility provided by the any-time approach is given by the capability of adaptive
computation time, which is a function of task-specific parameters. These parameters should
be selected by a monitoring component that is capable to optimize the quality of service
based on application demands and current load imposed by the environment [5].
According to Dean and Boddy [3], the employed algorithms must satisfy the following
specification:

The algorithm can be interrupted and resumed with little overhead.

The algorithm provides increasingly good answers over a range of response times.

The algorithm can be terminated at any time.
To exemplify the idea of the any-time approach, Figure 3 refers to results from Ihme et al.
[4] achieved with an extended SURF (Speeded Up Robust Features) algorithm for image
processing that satisfy the requirements as mentioned above. As indicated by the dark green
bar, first results come up after a very short time no matter which frame is considered.
Subsequent processing stages provide improved results, which in this case show a
progressive strategy regarding quality and computation time. Each stage is initiated
according to the ECET. However, this can lead to the violation of deadlines as indicated by
the red bars. In this case, the algorithm terminates and the result from the previous stage is
regarded for further use. This property of the any-time approach facilitates a higher degree
of utilization compared to guarantee-based scheduling or rather more reliability compared
to best-effort scheduling.
8
Figure 3 : example of any time computation [5]
9
3 Object under Investigation
Based on the motivation illustrated in the previous chapters, this thesis focuses on the
people library, that is a component of the Point Cloud Library (PCL)[9]. The PCL provides a
comprehensive collection of algorithms employed for filtering, feature estimation, surface
recognition, registration, model fitting and segmentation of point cloud data [11]. This data
can be obtained e.g. from RGB-D cameras, stereo cameras and laser scanners in order to
gain spatial information. The people library makes use of PCL’s feature set providing
information about the appearance of human shapes. Since obstacle evasion, among others,
is a desirable application in the area of autonomous driving that could benefit from PCL, it is
worth evaluating its characteristics. This thesis focuses on the capability of the people library
to provide scalable runtime in conjunction with a trade-off against the accuracy of results.
This area of concern emerges from the massive impact of varying complexity of input-data
on the computation time of people tracker’s processing pipeline (see Figure 4) making it
hardly predictable. The stages of the pipeline serve the following tasks [10]:
Voxel Grid Filtering :
In general, data supplied to the pipeline comprise outliers and noise that impede the
processing. Hence, they need to be removed prior to the subsequent stages.
Furthermore, the processing can be accelerated using a diminished number of points.
The characteristics of this step are explained in more detail in chapter 5.2 (see VoxelSize).
Ground Plane Detection and Removal :
In order to detect single objects, a subset of the point cloud associated with the ground
plane needs to be removed because it is the part of the scene that all object are
connected to. The estimation of the ground plane is accomplished by the RANSAC
approach that yields results in an iterative way. It is a non-deterministic algorithm in the
sense that it produces a reasonable result only with a certain probability. This
probability increases with every iteration1.
1
Quotation from : http://en.wikipedia.org/wiki/RANSAC
10
3D Clustering
This stage identifies the cohesion among remaining points based on the euclidean
clustering approach. The result is a batch of separated point clusters representing single
objects that do not necessarily correspond with human shapes.
People Detection
At this stage each cluster is examined whether it corresponds to characteristics of a
human being or a spurious object. This is accomplished by a support-vector-machinebased person classifier.
Figure 4 : people detection pipeline
11
4 Test Environment
This chapter gives an outline of the hardware employed for the tests, prerequisites and
provisions made in order to achieve reliable statements and the specification of all items
needed to build the test suite.
4.1 Hardware Setup
Test Platform
The hardware platform used within the course of this thesis is the PandaBoard ES. It is a
single-board computer based on the Texas Instruments OMAP4460 system on a chip. It
features a dual-core ARM® Cortex™-A9 MPCore™ with SMP at 1.2 GHz each and 1 GB
low-power DDR2 RAM. The Pandaboard ES supports removable non-volatile-memory
storage via onboard SD/MMC card cage. In the course of this investigation a SanDisk
Extreme Pro SD card is used with a capacity of 16GB. For an in-depth board-specification
please refer to the PandaBoard ES specification [12].
Figure 5 : mounting position of stereo camera
12
Image Acquisition
The image acquisition system consists of two monochrome cameras manufactured by
Basler. Each camera provides 30 frames per second with a resolution of 1296 x 966 px.
The point-clouds are generated from both image streams by a post-processing tool that
considers the temporal coherence of the images. For an in-depth specification please
refer to the data sheet of the camera2. The cameras have been mounted in a BMW 5er
series (F 10) next to each other beneath the interior mirror as illustrated in Figure 5. The
projection of this position to the floor represents the origin of the reference coordinate
system.
4.2 Software Setup
The goal of the tests conducted in this thesis is to acquire findings based on the timing
behaviour of the people library. To this end, the Linux operating system is used due to its
support for the selected hardware platform and facilities to customize a Linux distribution
that is best suited to our demands. However, some additional provisions have to be made
because the mainline Linux kernel and its default setup of process management lack
sufficient real-time behaviour in order to provide replicable test results.
Making the kernel preemptable
When running an application under the mainline Linux kernel, there is no guarantee to
avoid extraordinary latencies caused by interrupt handlers and kernel functionality that
can block a task of high priority. Applying the CONFIG_PREEMPT_RT patch to the
standard kernel following the RT kernel wiki [13], these issues can be considerably
tackled. This patch takes effect minimizing the amount of kernel code that is nonpreemptible. Among others, elaborations provided by K. Koolwal [20] and F. Cerqueira /
B. B. Brandenburg [21], respectively, present benchmarking studies comparing standard
kernel versions with patched ones. These include results that have been achieved with
diverse metrics to quantify determinism and latency. To this end, both elaborations
have made use of different approaches to expose the system under test to CPU- and
I/O-bound workload. The results of these studies show a considerable improvement in
2
http://www.baslerweb.com/de/produkte/flaechenkameras/ace/aca1300-30gm
13
terms of real-time capable behaviour serving us as rationale for the usage of the
CONFIG_PREEMPT_RT in this thesis.
Scheduling Policy
According to the Linux manual [14] by default Linux applies a round-robin time-sharing
scheduling policy. That is, a process is inserted into a low prioritized queue and granted
CPU based on a dynamically determined priority among other processes within this
queue. In terms of real-time requirements, this fair play policy leads to unpredictable
latencies. Apart from that, the scheduler considers processes scheduled under real time
policies first. Hence, the test process must be ran under a real time capable policy,
which can be accomplished with the sched_setscheduler system call ([15]) assigning the
policy to first in first out (FIFO) and round robin (RR), respectively. In order to prioritize
processes, one can specify the priority value ascending from 1 up to 99. According to
[16] it is, however, not recommended to assign a process the value of 99 because there
are management threads that need to run with the highest priority. Note that although
Linux provides FIFO and RR rt-policies no difference has been observed between these
two policies in the course of this thesis.
Memory Management
Another source for latencies can be attributed to page faults. Therefore, [16] advices to
lock the virtual address space of real-time application into RAM. This can be achieved by
the mlockall system call according to [15].
Precision of Time Measurement
In order to achieve high precision when measuring elapsed time, one can employ the
system call clock_gettime using time stamps before and after the code section of
interest. According to [17], the parameter clk_id, that specifies the characteristics of the
accuracy, is supposed to be set to CLOCK_MONOTONIC_RAW because this gives access
to a raw hardware-based time avoiding any interference.
14
According to previous research ([18]) performed at BMW Car IT GmbH, this measures are
suitable to achieve proven real time capabilities. In addition, this setup has been tested
under increased CPU workload imposed by the generator stress3, which turned out to have
negligible impact on the execution time of processes scheduled with real-time policy.
4.3 Test Suite Setup
The effort to build a customized Linux image for a dedicated hardware platform, e.g. the
PandaBoard ES, can be managed through the facilities provided by the yocto project. The
yocto project is an open-source collaboration that provides templates and tools with the aim
to support developers creating custom Linux systems for embedded devices [19]. The yocto
project is based on the poky platform builder, which is the reference build system
incorporating the open-embedded project and a build scheduler called BitBake. This
infrastructure is based upon a set of meta-data that is composed of recipes and layers.
Recipes serve the purpose to define sources, configurations, dependencies and compile
instructions. Layers represent compilations of recipes put together in order to meet a certain
demand. Basic setups are pooled by core layers such as openembedded-core or meta-oe.
Layers developed towards particular hardware support are called board-support-packages
(BSP). The meta-ti layer, applied in this thesis, is used to provide configurations specific to
boards using processors manufactured by Texas Instruments in order to build a Linux image
for the PandaBoard ES. Application-specific configurations can be established including
custom layers. In this thesis the meta-ros layer has been added providing support through
the ROS API to access point-cloud data stored as rosbag files. The particular specification of
the test suite and the build system as realized in this thesis is shown in Table 1. The
following gives a brief outline of the steps towards a custom Linux image.
1. download the cross-compilation environment from the Yocto Project website
2. add the mata-oe, mata-ros and mata-ti layers according to Table 1
3. create a recipe as an extension of the core-image-ros-roscore.bb build recipe to
include the pcl-people-tracker-timing-test suite and additional components if desired
3
http://people.seas.harvard.edu/~apw/stress/
15
4. amend the kernel build recipe of the meta-ti layer in order to incorporate the
CONFIG_PREEMPT_RT patch with a configuration which enables preemption
5. invoke the build process using the bitbake command
6. install the resulting image to the storage device
7. check the kernel version using the uname -a command; a patched kernel version
contains an -rt** identifier of the respective patch.
For detailed information on the respective steps please refer to the RT kernel wiki [13], the
official Yocto Project website [19] and the meta-ros repository on GitHub [22].
Item
Specification
Yocto Project
1.5.4
Poky Platform Builder
meta-oe
meta-ros
meta-ti
gcc-version
10.0.4 Dora
git://git.openembedded.org/meta-openembedded
SHA1 ID e75ae8f50af3effe560c43fc63cfd1f39395f011
git://github.com/bmwcarit/meta-ros.git
SHA1 ID d0a954d11e822b0f8be83ecaadac784770d38445
git://git.yoctoproject.org/meta-ti
SHA1 ID 4390f867bf883b93cf36cedbb7ef6b11e079c1e4
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
git://dev.omapzoom.org/pub/scm/integration/kernel-ubuntu.git
branch=ti-ubuntu-3.4-1485
SHA1 ID b3d5eeb10553e4bc0c3f250a4d06d43c4ab397a9
kernel
In conjunction with CONFIG_PREEMPT_RT patch from
http://hbrobotics.org/wiki/images/8/88/Patch-3-4-9-rt17.patch.doc
with following configuration
http://hbrobotics.org/wiki/images/c/c7/Config-3-4-9-rt17.doc
git://github.com/PointCloudLibrary/pcl.git
SHA1 ID ae08f0780750aae8a8b8ea0e9c82209071ffb724
pcl
In conjunction with following patch:
0001-allow-to-run-people-library-without-visualization.patch from
git://github.com/bmwcarit/pcl-people-tracker-timing-test/tree/master/res
pcl-people-tracker-timing-
git://github.com/bmwcarit/pcl-people-tracker-timing-test
test
SHA1 ID 624e7aee57e74dd77f1db929910bb81d3d3f9090
Table 1 : test-suite version specification
16
5 Test Approach
This chapter describes the workflow and usage of the test suite. All settings and parameters
needed as well as outputs are explained in detail.
5.1 Input Data
This chapter pinpoints all input data required to execute a test session.
Pointcloud Data
According to chapter 4.1, the image is acquired by two monochrome cameras. Since
both cameras only provide 2D images, this data must be coupled in order to generate
depth information needed to obtain point-cloud data. This conversion is accomplished
with the aid of a pre-processing tool that is not part of the test-suite. Thanks to
message-passing supplied by the ROS middleware4 throughout the test vehicle, this
conversion is based on camera inputs in sensor_msgs/Image format and outputs in
sensor_msgs/PointCloud2 format. The input as well as output data are stored in ROS’s
bag format. Figure 6 illustrates a sample snapshot of a monochrome image and the
associated point-cloud.
Figure 6 : raw image vs. pointcloud
4
http://www.ros.org/
17
Reference Files
As motivated in the introduction, the aim of the underlying investigation is to identify a
correlation between the execution time of the people tracker algorithm depending on
parameters provided by PCL’s API and its associated quality of results. In this regard, the
accuracy of recognition and position detection poses a proper metric that can be
determined using information given by the people tracker API. To this end, actual
positions of people must be known to be compared to the computed ones. This
information is provided by a user defined text file serving as reference. Its structure
indicates the total number of frames in the first line and the positions of people per
frame in subsequent lines as shown below. Note that the ground truth plane is in
parallel to PCL’s X-Z plane. Hence, the first coordinate represents the X value and the
second one the Z value of a particular person. The following listing shows a sample
reference file:
Frame_Count 3
Frame_Number 1
X_1 Z_1
X_2 Z_2
Frame_Number 2
X_1 Z_1
X_2 Z_2
X_3 Z_3
Frame_Number 3
X_1 Z_1
Listing 1 : reference-file structure
Test Configuration
This file is needed to configure general test-suite settings that are supposed to remain
unchanged throughout a batch of tests. However, it is desirable to amend them since
e.g. camera settings can change. The following listing shows a sample file content
18
topic perception_pcl_objects/mesh
rgb_intrinsics_matrix 1.2 0.0 6. 0.0 1.2 1.7 0.0 0.0 1.0
svm_filename trainedLinearSVMForPeopleDetectionWithHOG.yaml
groundCoeffs 0.0 -1.0 0.0 1.0
maxDeviation 0.3
minConfidence -1.5
Listing 2 : test configuration file structure
topic :
This parameter indicates the topic name that identifies the point-cloud stored within the
input bag file.
rgb_intrinsics_matrix :
This parameter represents the coefficients of the intrinsic camera matrix that indicate its
optical, geometric and digital characteristics.
svm_filename :
This parameter represents the name of file needed for the support-vector-machine
classifier (see below)
groundCoeffs :
This parameter represents the components of the plane equation needed to detect and
remove the ground plane.
maxDeviation :
This parameter defines a threshold in meters that limits the accepted deviation from a
position according to the reference file. In case of exceedance a detected object is not
considered a match.
minConfidence :
This parameter serves the purpose to set the threshold for the HOG confidence. That is,
clusters are only considered a match that exhibit a confidence value greater than
minConfidence.
SVM Configuration File
In order to recognize and distinguish human shapes from spurious object, the PCL library
utilizes a support-vector-machine (SVM) algorithm. The object recognition method is
based on the histogram of oriented gradients approach that analyzes features according
to the orientation and arrangement of gradients resulting from point-clouds associated
19
with clusters. The resulting features are characterized by descriptors. These descriptors
must be classified in order to verify their coherence with human shapes. To this end, the
SVM builds up a decision making model based on patterns. To allow the model being
specific in terms of human shapes, an appropriate configuration is load that satisfies the
demand of people detection.
5.2 Parameter Variation
This chapter outlines the parameters provided by the PCL that have been considered in
terms of their impact on the timing behaviour of the people tracker. The associated API
interface as well as the functional impact of each parameter will be explained in detail.
Sampling Factor
If assigned a value greater than 1, this parameter down samples the number of points
stored in the point cloud. Since this parameter takes effect right at the beginning of the
processing pipeline (see Figure 4), all subsequent stages are affected. The value of this
parameter can be set according to the following API function:
void setSamplingFactor (int sampling_factor)
Although points of a point-cloud object are stored in a member represented by a C++
vector, that is a one-dimensional sequence container, they are said to represent an
organized point-cloud dataset. This structure consists of rows and columns, which pays
off speeding up the computation of nearest neighbour operations due to known
relations among points. In order to map this two-dimensional structure to a onedimensional vector, the pointcloud is characterized by its width and height. In this
regard, width corresponds to the number of columns and height to the number of rows.
A particular element of the vector can be accessed given the column/row coordinates as
follows, whereas row ranges in [0,height[ and column in [0,width[ .
Point [column, row] = Vector [row * width + column]
20
If applied, the sampling-factor downsizes both width and height by division in a linear
manner. Hence, a sampling-factor of two divides the number of points by four, a
sampling-factor of three by nine and so on.
Voxel Size
The voxel-size parameter also surves the purpouse to reduce the number of points in a
given input-cloud. It is set according to the following API function:
void set VoxelSize (float voxel_size)
The voxel-size refers to the length of edges of 3D cubes that are put over the pointcloud as illustrated in Figure 7.
Figure 7 : voxel-grip over a pointcloud
It must be taken into account, that filtering by means of the voxel-size is based on the
resulting point-cloud that has already been downsized by the sampling-factor. As
opposed to the sampling-factor filtering, the voxel-size filtering does not proportionally
apply to the entire point-cloud. That is, the points within each voxel are approximated
21
by their centroid instead of being approximated regarding the centre of the voxel. This
way the approximation yields a better representation of the origin point-cloud.
Height Limits
The height limits determine the minimum and maximum height that is allowed for a
person cluster up from the ground. These parameters are set using the following API
function:
template <typename PointT> void
pcl::people::GroundBasedPeopleDetectionApp<PointT>::
setPersonClusterLimits
(float min_height, float max_height,
float min_width, float max_width)
Listing 3 : function setPersonClusterLimits
By default, PCL assigns min_height to 1.3 and max_height to 2.3 meter. Although it is
not the primary purpose of the height limits, these parameters also have effect on the
minimum respectively maximum number of points that belong to a person cluster. This
is in accordance with following relations:
min_points =
(int) (min_height * min_width / voxel_size / voxel_size)
max_points =
(int) (max_height * max_width / voxel_size / voxel_size)
Listing 4 : min_points / max_points equation
Width Limits
The min_width and max_width parameter, respectively, is set by the same API function
as the height-limits. As opposed to the height-limits, these parameters do not refer to
any characteristics of the people in the scene. The sole purpose of the width-limits is to
determine to minimum respectively maximum number of points that a person cluster
22
contains (see above). By default, PCL assigns min_width to 0.1 and max_width to 8.
Given all parameters as default (voxel_size = 0.06 meter), min_points is equal to 36 and
max_points is equal to 5111 points.
5.3 Test Execution
This chapter gives an instruction on how to perform a test and how the test environment has
to look like. First of all, the test environment structure muss be set up as depicted in Figure
8.
Test_Exe
refers
to
the
executable
of
the
testsuite.
TestConfig.txt
and
trainedLinearSVMForPeopleDetectionWithHOG.yaml are necessary to configure the testsuite
according to the explanations of chapter 5.1. The .bag file contains the pointcloud stream
and the corresponding reference file must reside in a folder named ReferFiles. Note that the
names of the .bag file and its associated reference file can be arbitrary and need not follow
any pattern.
point_cloud_stream.bag
ReferFiles
Refer_point_cloud_stream.txt
TestConfig.txt
Test_Exe
trainedLinear SVMForPeopleDetectionWithHOG.yalm
Figure 8 : test directory structur
Given that, a test run can be executed according to the following command:
23
./Test_Exe file_1 file_2 parameters policy [tag]
file_1
name of the .bag file including file extension
file_2
name of reference file including file extension
-minh
parameter setting the minimum height limit as float
-maxh
parameter setting the maximum height limit as float
-minw
parameter setting the minimum width as float
-maxw
parameter setting the maximum width as float
-vs
parameter setting the voxel size as float
-sf
parameter setting the down sampling factor as float or int (note: a float value
is converted to an int one by truncating the fractional part)
policy
selector for scheduling policy mode (available options: RR | FIFO | NO_RT)
[tag]
parameter used to tag output files
Following error messages have been defined in case of particular error conditions:
InputFile not found.
 No file found corresponding to file_1 parameter in the test directory.
No reference file found.
 No file found corresponding to file_2 parameter in the ReferFiles folder of the test
directory.
Test parameter not set.
 Some of the parameter list items have not been set.
Incorrect inputs.
 Test command does not match required pattern.
Settings in TestConfig.txt missing.
 Some settings of TestConfig.txt have not been set.
No TestConfig.txt found in <path/to/root>.
 There is no TestConfig.txt file in the test directory.
Wrong input for scheduling policy. Available options : FIFO | RR | NO_RT
 Wrong parameter assigned to the policy parameter.
24
Failed to set policy.
 Setting of scheduling policy using sched_setscheduler system call failed.
mlockall failed
 Locking of the calling process's virtual address space into RAM using mlockall system call
failed.
An exemplary command to invoke a test run could look like this:
./Test_Exe scene_1.bag ref_scene_1.txt -minc -1.5 -minh 1.3 maxh 2.3 -minw 0.9 -maxw 1.4 -vs 0.06 -sf 1 RR s1_3
Please note that this command can only be executed with root permissions. On success
following output is generated during a test run:
RR policy set.
Frame Count : 36
Invoke performTest
Frame : 0
Frame : 1
Frame : 2
...
Frame : 33
Frame : 34
Frame : 35
performTest finished
Invoke performEvaluation
performEvaluation finished
Invoke writeResultFile
writeResultFile finished
Invoke writeReportFile
writeReportFile finished
Listing 5 : test-execution runtime information
In this case, a new folder named Results will be created if it does not already exists. This is
the location where a report and result file of each test run is stored (see chapter 5.4).
25
5.4 Metrics and Classification Figures
As mentioned in the previous chapter, two files are generated after every test run. These
files contain the characteristics of a particular test run. In order to evaluate and compare the
results of multiple test runs, some metrics and classification figures have been introduced in
the course of this thesis. In the following, both the structure of files and their contents are
explained in detail.
The first file is said to be the report file. Its name consists of the file_1 parameter, the file
classifier Report, an optional tag and the file extension txt. E.g. given a road_scene.bag input
file that was tagged with 1 the associated report file is road_scene_Report_1.txt. The
content of this file exhibits the following structure:
Fri Jun
5 13:32:50 2015
-- Frame : 0
---- Time : 0.261413
---- Cluster : 0
------ B_X
: 2.18725
------ B_Y
: 0.87723
------ B_Z
: 5.28796
------ HOG_C : -1.10829
---- Cluster : 1
------ B_X
: 0.404633
------ B_Y
: 0.878143
------ B_Z
: 5.5867
------ HOG_C : -0.944061
---- Cluster : 2
------ B_X
: 0.402876
------ B_Y
: 0.869012
------ B_Z
: 5.40856
------ HOG_C : -0.810767
-- Frame : 1 ...
Listing 6 : report-file content
26
The first line contains the date and time of test execution. The subsequent lines contain the
detection results separated by frame count. Each frame includes information about its
execution time and the centroid position of clusters that are expected to represent persons.
Each person cluster is identified by its bottom position relative to the camera position
(according to the right-hand-rule, whereas Z points into the scene and Y to the ground) and
the confidence value HOG_C that is based on its histogram of oriented gradients (HOG). Both
the position and HOG_C are calculated by PCL’s algorithms. Note that in this thesis the
parameter minConfidence (see 5.1/Test Configuration) is equal to -1.5. Hence, clusters are
only considered having a HOG_C value greater than -1.5. By means of the report-file one can
check the positions and the precision of detected clusters.
The second file associated with a particular test run is called the result-file. The name of this
file is subject to the same pattern as the report-file. Hence, referring to the example from
above, the name of the result file would be road_scene_Result_1.txt. The content of this file
exhibits the following structure:
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
Fri Jun 5 13:32:50 2015
--- Parameterlist --minc : -1.5
minh : 1.3
maxh : 2.3
minw : 0.4
maxw : 1.4
vs
: 0.06
sf
: 1
-- Characteristics -meanTime : 1.07173
stdDevT : 0.0336307
minTime : 1.00204
maxTime : 1.15381
---- Detection ---detRate : 0.944
maxFaSe : 1
meanHOG : -1
---------------------
27
#FRAME
0
1
2
...
EXE_TIME
1.0645
1.1136
1.077
FP
1
3
2
FN
0
0
0
MA
2
2
2
MA_DEV
0.72
0.71
0.67
Listing 7 : result-file structure
The first line contains the date and time of test execution. The following section entitled
Parameterlist outlines all input parameters that characterise this test run.
The second section entitled Characteristics lists time-specific classification figures such as the
mean execution time of all frames (meanTme), the standard deviation (stdDevT) of
execution time and its minimum (minTime) and maximum (maxTime) peak values.
The subsequent section entitled Detection contains detection-specific figures. The detRate
figure represents a ratio between frames where all reference positions (see chapter 5.1/
Rererence Files) have been found with a specified deviation (see chapter 5.1/ Test
Configuration) and all frames of a given stream. The maxFaSeq figure represents the
maximum false sequence. This is the longest sequence of frames that did not match their
associated references. This figure is of interest in conjunction with detRate because a result
can be considered more valuable than another having a smaller maxFaSeq, whereas detRate
is equal. The meanHOG figure represents the mean HOG confidence of clusters associated
with frames that contribute to the denominator of detRate. This figure is a measure for the
detection accuracy.
The last section contains six columns. The first one indicates the frame count and the second
one its associated execution time. The FP column represents the false positive count that
indicates the number of spurious object. That is, objects that have a HOG confidence greater
than minConfidence but no ground truth reference. The FN column represents the false
negative count that indicates the number of referenced objects being not recognized. The
MA column indicates the number of objects that match a reference. The last column called
MA_DEV represents a ratio between the mean deviation of matching objects in the current
frame and maxDeviation parameter(see chapter 5.1/ TestConfiguration).
The result-file serves the purpose to evaluate the impact of parameters according to chapter
5.2. Moreover, it can be utilized to visualize the data using gnuplot.
28
6 Test Results
This chapter presents the results that have been achieved in the course of this thesis
regarding the timing behaviour of PCL’s people library and its dependency on parameters
according to chapter 5.2. The evaluation of the timing behaviour has been conducted based
on an instance of pcl::people::GroundBasedPeopleDetectionApp<PointT>5 class that provides
the method compute according to following specification:
Bool compute
(std::vector<pcl::people::PersonCluster<PointT> >& clusters);
This method encapsulates all operations depicted in chapter 3 in order to perform people
detection based on a 3D point cloud. The execution time of each frame has been determined
making a time stamp immediately before and after the call of the compute method (see 4.2 /
Precision of Time Measurement).
The following subchapters show the results observed under various conditions. With
provisions made towards a real-time capable system (see 4.2) a proof of deterministic timing
behaviour is provided under steady state conditions. Since the limitation to a particular load
cannot be kept up in real traffic situations, the impact of fluctuating load on the execution
time will be presented. However, in terms of predictability and safety the execution time of
a system is supposed to be limited in time. To this end, parameters having an impact on the
execution time are evaluated. This investigation intends to quantify the extent to which the
execution time can be adapted due to fluctuating load in order to meet desired time
constraints and sufficient level of detection.
6.1 Proof of Performance
In order to gain reliable findings regarding the correlation between the execution time and
particular parameters, deterministic system behaviour poses a necessary prerequisite. To
this end, measures have been applied to the test-suite according to chapter 4.2. With this
5
https://github.com/PointCloudLibrary/pcl/blob/master/people/include/pcl/people/ground_based_people_de
tection_app.h
29
setup, the test-suite is expected to provide reproducible and non-fluctuating results. In order
to give evidence for a deterministic behaviour, steady state conditions in terms of scene
arrangement have been employed.
Figure 9 illustrates the reference ground truth arrangement chosen for this purpose. The
circle labelled with a C represents the position of the stereo camera systems, whereas the
two circles labelled with 1 and 2 represent the positions of persons.
Figure 9 : reference ground truth arrangement
Given this arrangement, multiple test runs have been perform under a constant parameter
set (minh = 1.3, maxh = 2.3, minw = 0.4, maxw = 1.4, vs = 0.06, sf = 3 (see 5.3)) and with a
resulting detection rate (see 5.4) of 94.4% among 36 frames.
The proof of determinism requires negligible deviation of execution time for each frame of
the stream during multiple test runs. A deviation of execution time between two arbitrary
frames within a test run results from a fluctuating point-cloud density. This is even the case
with static scenes. To reduce this impact, a scene with low complexity has been chosen.
30
To emphasize the need for a real time scheduling policy and to illustrate the consequences
that come along with disregard of this design issue, three batches of tests have been
performed covering RR, FIFO and NO_RT policy (see 5.3). In order to preclude the impact on
the execution time imposed by unpredictable processing on the system, the workload
generator stress6 has been used to impose additional CPU load. The following figures show
the frame execution time of a stream sample according to the set up as described above. For
clarity reasons, each figure contains five test runs reflecting all test runs performed. The
term no_stress indicates that no additional load is imposed, whereas stress_c1 / stress_c2
indicate that one / two processes generating CPU load are run concurrently. Regardless from
the policy assigned to the test-suite, the processes invoked by stress are managed by the
standard linux time-sharing policy. That is, they are inserted into a low prioritised queue
(static priority equals to 0) with all other processes that do not require real-time behaviour.
The decision which process to run from this queue is based on a dynamically determined
priority that considers only processes resided within this queue.
Round Robin policy
As one can see from Figure 10, Figure 11 and Figure 12 there is no noticeable impact on
the execution time of any frame during multiple test runs and different load conditions
with RR policy. This finding allows using the test-suite for further investigation when
operated on RR because in this case all effects observed can be attributed to the
parameter set instead off any interference.
6
http://linux.die.net/man/1/stress
31
Figure 10 : RR policy / no resource-consumer
Figure 11 : RR policy / one resource-consumer
32
Figure 12 : RR policy / two resource-consumers
First In First Out policy
As illustrated in Figure 13, Figure 14 and Figure 15 executing the test-suite under FIFO
policy yields similar results compared to RR policy. Hence, the FIFO policy option is as
well-suited as the RR one in order to quantify the impact of parameters according to
chapter 5.2. Since there is no remarkable difference between results achieved with RR
and FIFO policy, all findings shown in the following apply to both policies equally.
33
Figure 13 : FIFO policy / no resource-consumer
Figure 14 : FIFO policy / one resource-consumer
34
Figure 15 : FIFO policy / two resource-consumers
NO_RT policy:
Using the NO_RT option treats the test-suite process the same way as any other non
real-time process such as those invoked by stress.
The results arising from this
configuration are depicted in Figure 16, Figure 17 and Figure 18. As you can see from
Figure 16 and Figure 17 additional load up to one resources-consumer does not
apparently affect the execution time of the test-process. Being exposed to two
concurrently running resource-consumers, as illustrated in Figure 18, no deterministic
behaviour can be guaranteed any more. This observation can be attributed to the
underlying hardware (see 4.1/ Test Platform) that exhibits a dual-core CPU. Hence, two
processes invoked by stress and the test-suite must share limited resources being
managed by linux’s standard time-sharing policy as described above. With respect to a
reliable investigation of parameters, this finding makes the demand of a real-time
scheduling policy evident.
35
Figure 16 : NO_RT policy / no resource-consumer
Figure 17 : NO_RT policy / one resource-consumer
36
Figure 18 : NO_RT policy / two resource-consumer
6.2 Load Analysis
In the previous chapter it has been shown that the test-suite complies with the requirement
of repeatable results and resistance against load on the system. However, all results
presented have been conducted under steady state conditions. That is, no impact of the
environmental data acquired by stereo cameras has been considered. In real traffic
situations it is almost not possible to predict the complexity and load imposed by scenes
varying from modest highway conditions up to complex urban traffic. Since this accounts for
the execution time, this chapter intends to demonstrate the range of possible execution
time based on a couple of scenes with different complexity. To this end, less complex scenes
with well defined structure up to urban traffic situations have been evaluated using the testsuit. The following scenes have been selected based on a qualitative assessment. The timing
information have been achieved using a constant parameter set (minh = 1.3, maxh = 2.3,
minw = 0.4, maxw = 1.4, vs = 0.06, sf = 1 (see 5.3)). In the following, five different scenes will be
presented. Each scene features a brief explanation of complexity and the results of minimum
37
(minTime), maximum (maxTme) and mean (meanTime) execution time achieved among all
computed frames.
Minimum Load Scene
This scene has been selected with respect to little complexity reflected by a small
number of objects and plain background.
Figure 19 : minimum-load scene
meanTime: 1.06766 / minTime: 0.997607 / maxTime: 1.15153
Low Load Scene
This scene has been selected due to a little number of less complex objects within close
range.
Figure 20 : low-load scene
meanTime: 1.18073 / minTime: 1.1281 / maxTime: 1.39354
Medium Load Scene
This scene is said to represent a medium load since it exhibits an increased number of
relevant objects within an extended visible range.
38
Figure 21 : medium-load scene
meanTime: 1.51551 / minTime: 1.40675 / maxTime: 1.75253
High Load Scene
This scene has been identified to pose high load since it exhibits an extended number of
objects compared to the medium load scene.
Figure 22 : high-load scene
meanTime: 1.98782 / minTime: 1.68413 / maxTime
: 2.36058
Maximum Load Scene
This scene generates maximum load on the system which can be attributed to a wide
visible range and a high number of objects in comparison with the other scenes.
Figure 23 : maximum-load scene
meanTime: 2.2505 / minTime: 2.10289 / maxTime
: 2.73859
39
As one can see from the results above, two significant findings can be deduced considering
increased complexity of environmental conditions. At first, the impact on the execution time
varies within a broad range. This can be seen from the mean time results. Comparing the
minimum and maximum load conditions, the execution time can even double. Secondly, the
fluctuation of the execution time rises with increased complexity, which can be seen from
the range covered from minTime up to maxTime. Both findings are visualized in Figure 24
with the X axis representing the complexity of scene as 1 = minimum load, 2 = low load, 3 =
medium load, 4 = high load and 5 = maximum load.
Figure 24 : time per frame vs. complexity of scene
40
Given this insight into the timing behaviour of the people tracker, the necessity for a
mechanism to adapt the execution time to a particular upper bound becomes apparent. The
concern of the following chapter is to expose the feasibility of adaptive execution time based
on results that have been achieved with parameters according to chapter 5.2.
6.3 Impact of Parameterization
This chapter reveals the results that have been achieved with the aim to adapt the execution
time of PCL’s people tracker algorithm when less but still sufficient precision is accepted in
favour of shorter execution time. The motivation for this is driven by findings presented in
previous chapters. As exemplified through scenarios of varying complexity, the execution
time may even double. However, this ratio may only be regarded as a rough guideline rather
than a design benchmark. Since the complexity of real traffic situations can at best only be
estimated, the need to explore algorithms capable of adaptive execution time becomes
apparent. To this end, the impacts on the execution time of parameters according to chapter
5.2 have been investigated. In the following, each parameter is evaluated not only regarding
the execution time but also with respect to the quality of results that is reflected by the
detection rate (see chapter 5.4/ detRate). Among the other matrices presented in chapter
5.4, the detection rate turned out to lend itself best in order to provide condensed and
significant information of the quality. The results of each parameter shown in the following
refer to the ground truth setup as depicted in Figure 9. Apart from this one, other ground
truth configurations have been examined, which partially confirmed the findings stated
below, though with much less clarity. In this regard, three major circumstances can be stated
to which the more significant results of the setup according to Figure 9 can be attributed.
Firstly, the recognition of persons turned out to diminish the farther a person stands apart
from the centre of the scene. Secondly, a fall in recognition of persons has been observed
when partial visibility of the persons occurred. That is, as soon as the legs have been hidden
up to the knees, the recognition diminished dramatically. Finally, too little contrast between
the background and a person induced an insufficient segmentation resulting in bad
recognition.
41
6.3.1 Sampling Factor
As mentioned in chapter 5.2, the sampling factor serves the purpose to reduce the number
of points within the point cloud. The effect of this measure relates to all stages of the people
tracker pipeline (see Figure 4). Since less points result in a diminished computational effort,
the execution time is expected to steadily decrease with increasing sampling factor.
The results of the sampling factor analysis shown in the following have been achieved under
constant conditions as concerns the other parameters. Theses have been assigned to:
minh : 1.3 / maxh : 2.3 / minw : 0.4 / maxw : 1.4 / vs : 0.06
The test procedure spanned multiple test-runs with varying sampling factor in the range
from one up to thirteen in order to verify negligible fluctuations among the test-runs as
introduce in chapter 6.1. With every sampling factor considered deterministic results have
been achieved. Since the maximum execution time is of primary importance, this figure has
been taken into account in conjunction with the detection rate of the test-runs. The line
graph in Figure 25 illustrates the correlation between the execution time and detection rate
depending on the sampling factor. The sampling factor is plotted on the X axis. The Y axis on
the left hand side represents the maximum execution time, whereas the one on the right
hand side shows the detection rate. The blue line represents the course of execution time
and the red one the associated detection rate.
42
Figure 25 : sampling factor – execution time vs. detection rate
As one can see from the blue line, the execution time is steadily falling with an increasing
sampling factor, which confirms the assumption stated above. This trend can especially be
observed for sampling factors in the range from one up to six, whereas further increase of
the sampling factor entails a diminished drop of execution time. As indicated by the red line,
the detection rate exhibits a stable and high level up to a sampling factor of eight. Beyond
this value a rapid decline of detection rate occurs leading to zero at a sampling factor of
eleven. A striking feature emerges at a sampling factor of eight where a peak of the
detection rate can be seen although less information is provided by further downsampling.
This can probably be attributed to the underlying support-vector-machine person classifier
that yields a higher recognition under this particular condition.
The view on the execution time and the detection rate shows a shifted decline of both
characteristics. The execution time decreases the most when the detection rate remains
43
pretty stable and vice versa. Additionally, the detection rate experiences a rapid drop from
about 90% to zero. In terms of adaptable timing behaviour, the observed characteristic
deviates from the desired case where both the execution time and the detection rate
simultaneously decline over a wide range of the sampling factor. Based on these results, no
satisfying capability of execution time reduction can be assigned to the sampling factor.
6.3.2 Voxel Size
Similar to the sampling factor, the voxel size filtering lowers the number of points in the
point-cloud. Hence, this parameter also affects all stages of the people tracker pipeline (see
Figure 4). Due to diminished computational effort the resulting execution time is expected to
steadily fall with increased voxel size.
The results of the voxel size analysis shown in the following have been achieved under
constant conditions as concerns the other parameters. Since the sampling factor takes effect
prior to the voxel size filtering, its value has been assigned to one. This implies no filtering as
concerns the sampling factor, which means that all findings observed can be attributed to
the voxel size filtering. The parameter set have been assigned to:
minh : 1.3 / maxh : 2.3 / minw : 0.4 / maxw : 1.4 / sf : 1
The test procedure spanned multiple test-runs with varying voxel size in the range from two
up to twenty-two cm in size in order to verify negligible fluctuations among the test-runs as
it has been done with the sampling factor. With every voxel size considered deterministic
results have been achieved.
The evaluation of the voxel size factor is based on the maximum execution time in
conjunction with the detection rate. The line graph in Figure 26 illustrates the correlation
between the execution time and detection rate depending on the voxel size. The voxel size is
plotted on the X axis. The Y axis on the left hand side represents the maximum execution
time, whereas the one on the right hand side shows the detection rate. The blue line
represents the course of execution time and the red one the associated detection rate.
44
Figure 26 : voxel size – execution time vs. detection rate
As indicated by the blue line, the execution time is steadily falling showing a less rapid
decline compared with the one of the sampling factor. Generally, this timing behaviour is
regarded more suitable in terms of adaptation then the one associated with the sampling
factor (see Figure 25). This is the case since the timing characteristic of the voxel size
parameter is not that limited to a range where the gradient of execution time is not too high
respectively too low. As illustrated in Figure 26, a range from four up to eighteen cm in voxel
size can be regarded matching this demand. The course of the detection rate, as shown by
the red line, exhibits high gradients associated with small voxels up to five cm respectively
large voxels beyond fifteen cm in size. These ranges are not suitable for execution time
adaptation because almost no change of the execution time comes along with a sharp drop
of detection rate. Of particular interest in terms of detection rate is the range of ten up to
fifteen cm in voxel size. This range features a smooth decline making it well-suited in terms
of scalable behaviour. Thanks to approximately linear decline between 95% and 65%, the
45
detection rate can be regarded as sufficient to make reasonable decisions. A striking feature
can be seen at eighteen cm in voxel size indicated by a peak of detection rate after having
dropped to zero. This peak is likely induced by the underlying support-vector-machine
person classifier that indicates this faulty detection. Similar behaviour has already been
observed with the sampling factor.
Taking into account both the execution time and the detection rate, one can consider the
range from ten up to fifteen cm in voxel size being most suited to balance between
execution time and quality of results. Taking the execution time at ten cm in voxel size as
benchmark, the execution time can be reduced by 25% compared to its value at fifteen cm in
voxel size. In spite of this remarkable reduction of execution time, the impact of the
sampling factor on this result must not be neglected since the sampling factor takes effect
beforehand. This interaction between both parameters is expected to lower the effect of the
voxel size with increasing sampling factor.
6.3.3 Height Limits
According to the explanations of chapter 5.2, the height limits serve the primary purpose to
set the range of person clusters’ height. Only clusters matching these limits are fed into the
subsequent person classification that is based on the trained support-vector-machine
approach. Furthermore, the height limits contribute to the maximum or minimum number
of points contained in a person cluster (see chapter 5.2/ Height-Limits). Since no distinct
separation of the impacts associated with both effects of the height limits can be provided,
the following assessment of results assumes that the limitation of height predominates the
influence on the number of points. Therefore, the impact of the height limits contributes to
the 3D clustering and people detection stages of the people detection pipeline (see Figure 4).
The results of the height limits analysis shown in the following have been achieved under
constant conditions as concerns the other parameters. These have been assigned to:
minw : 0.4 / maxw : 1.4 / sf : 1 / vs : 0.06
The evaluation of the height limits is based on the maximum execution time in conjunction
with the detection rate. To this end, multiple test-runs with varying height limits have been
repeated in order to confirm deterministic behaviour according to the findings of chapter
46
6.1. As mentioned above, the following results refer to the ground truth setup shown in
Figure 9 that features two persons. Both the persons are about 1.8 meter in size. Therefore,
the test strategy keep focused on that size as average of the minimum and maximum height.
This is indicated by the header of Figure 27.The impact induced by the height limits has been
tested applying intervals of different range limited by minimum and maximum height. That
is, the height range of four cm is reflected by minh = 1.78m and maxh = 1.82m. The line
graph in Figure 27 illustrates the correlation between the execution time and detection rate
depending on the height range limited by minimum and maximum height. The height range
is plotted on the X axis. The Y axis on the left hand side represents the maximum execution
time, whereas the one on the right hand side shows the detection rate. The blue line
represents the course of execution time and the red one the associated detection rate.
Figure 27 : height limits – execution time vs. detection rate
47
As indicated by the blue line, the impact on the execution time caused by the height limits
turns out to be much less significant in comparison to the results of the sampling factor or
voxel size. The linear increase of the blue line implies a linear distribution of clusters
regarding their size. Since the reference ground truth only contains two persons, it is likely to
observe a higher gradient of the execution time in case of more complex scenes featured
with more clusters. Among other factors, this assumption may contribute to the wide range
of execution time based on scenes of different complexity as presented in chapter 6.2. The
course of the detection rate can be subdivided into two sections. The first one is identified
by a range up to four cm. This section exhibits a rapid drop of detection rate that does not
satisfy the need for a sufficient detection rate in order to make reliable decisions. The
second section spreads out beyond the height range of four cm. This section is identified by
a high detection rate of at least 75%. Considering this section as appropriate for executiontime-adaptation, one can reduce the execution time by almost 10% if the time at height
range of four cm is taken as benchmark. Applying this parameterization approach, it must be
taken into account that real traffic situations exhibit people widely varying in size. Hence,
this measure could only be applicable in conjunction with some kind of people size
monitoring to ensure that minh and maxh cover all people being in front of the vehicle.
6.3.4 Width Limits
By means of width limits, one can set the minimum respectively maximum number of points
that apply to a person cluster. This measure of execution time adaptation takes effect in the
3D clustering and people detection stage of the people tracker pipeline (see Figure 4). The
adjustment of the point number of person clusters is accomplished in accordance with the
equations presented in Listing 4. Except from the width limits the height limits as well as the
voxel size contribute to the total number of points that characterize a person cluster. In
order to constrain this influencing factor to the width limits all other parameters have been
assigned constant values as follows:
minh : 1.3 / maxh : 2.3 / sf : 1 / vs : 0.06
By default, minw and maxw are assigned to 0.1 respectively 8. Considering the other
parameters default as well, this leads to 36 respectively 5111 points in total. Since this is a
48
very wide range, it is worth thinking about the impact of its scope and the effects resulting
from the limitation of points associated with a person cluster. To this end, two test
strategies have been applied. The first one focuses on the impact of the lower bound of
points given a constant upper bound, whereas the second strategy investigates the reversed
case. In the course of both strategies each parameter set has been repeated multiple times,
which confirmed the deterministic behaviour shown in chapter 6.1. The subsequent
evaluation refers to the number of points instead of the minimum / maximum width
because both can only be considered in conjunction with the other parameters of Listing 4.
Therefore, it is more general to provide the number of points instead of min /max width
because they can be deduced from the other parameters. In the course of the investigation
according to the first strategy, a constant upper bound of 1000 points and a lower bound in
the range from 50 up to 550 points has been taken into account. The second strategy
considers the upper bound ranging from 700 up to 5000 points keeping a constant lower
bound of 450 points. The line graphs in Figure 28 and Figure 29 illustrate the correlation
between the execution time and detection rate depending on the lower / upper bound of
points. The respective constant bound is indicated at the head and the respective variable
bound is plotted on the X axis. The Y axis on the left hand side represents the maximum
execution time, whereas the one on the right hand side shows the detection rate. The blue
line represents the course of execution time and the red one the associated detection rate.
49
Figure 28 : impact of lower point bound
As indicated by the blue line in Figure 28, the execution time is constantly falling with an
increasing lower bound. This can be attributed to fewer clusters that are fed to the person
detection stage (see Figure 4). As a consequence less computational effort is needed. This
effect stagnates beyond 300 points as the number of clusters does not significantly decrease
further. The impact on the detection rate is almost not apparent up to a lower bound of 450
points as indicated by the red line. Further increase of the lower bound results in a rapid
drop of the detection rate. This behaviour reveals that there are almost no clusters matching
person characteristics in the range between 500 and 1000 points.
50
Figure 29 : impact of upper point bound
Given a constant lower bound of 450 points, the execution time rises with an increasing
upper bound as shown in Figure 29, which can be attributed to a growing number of
clusters. This effect stagnates beyond 2000 points because the number of potential cluster
being subject to the person detection stage does not grow further. The impact on the
detection rate becomes only apparent beneath an upper bound of 800 points as indicated by
the red line. As already noticed from Figure 28, a range of at least 450 up to 900 points is
needed in order to capture all clusters regarded as persons.
All in all, both test strategies turned out that the capability of width limits to adapt the
detection rate in favour of execution time is not applicable. However, the findings gathered
from both test strategies point out that an equally high level of detection rate can be
51
achieved saving about 20% of execution time if the number of points is adjusted to a suitable
range covering all clusters without an overhead of points. This adjustment should be tuned
by means of width limits because the other parameters of Listing 4 primarily relate to their
particular purposes.
52
7 Conclusion
This thesis makes a contribution to algorithms for future automotive driving functions
addressing the issue of limited WCET predictability when 3D information of the environment
is processed. Dealing with this question is of high importance for research and development
because reliable responsiveness and safety will certainly pose a prerequisite for the
certification in traffic use. Many widespread algorithms, such as those provided by the point
cloud library, emerge from research on universities and open-source projects that primarily
focus on their feasibility. The transition of these approaches into industrial scale requires
additional investigation and in-depth validation.
To this end, a test-suite has been developed that focuses on essential parameters having
impact on the execution time of PCL’s people library algorithms. These parameters can be
distinguished into those having impact on the number of points associated with frames and
those being related to clusters recognized within these frames. The voxel size and the
sampling factor refer to the first category, whereas the width and height limits to the latter
one. As we have seen from shots of various road scenarios, the complexity of the scene
leads to a broad extent of execution time. In our case, we have observed a doubling of
execution time between the simplest scenario and the most complex one. However, this
result provides neither an upper bound of the execution time nor a reliable benchmark for
system design. In fact, it is almost not possible to predict the WCET when dealing with image
processing and object recognition based on machine learning. A suitable approach to meet
the demands of real-time applications is given by adaptive algorithms that allow scalable
accuracy of results and execution time.
In this regard, the use of the sampling factor turned out to lack applicable correlation
between the execution time and detection rate in order to reduce execution time, whereas
the detection rate remains on sufficient level. Unlike the sampling factor, the usage of the
voxel-size has shown more promising results. When applied within particular range, the
voxel-size allows reducing the execution up to 25% and keeping the detection rate above
70% at the same time. Similar results have been observed using the height limits although
not with such an extent. The evaluation of the width limits hasn’t yielded any applicable
usage in terms of execution time and detection rate scalability. Nevertheless, we have seen
a significant reduction of execution time when the range of points associated with person
clusters is adapted in an appropriate manner.
53
Talking about the results of this thesis, the applicable limitations should not be concealed.
All findings and results presented in chapter 6.3 are based upon simple scenarios that have
been set up with the aim to clearly indicate significant tendencies. Hence, more scenarios
with a broad range of complexity would be needed in order to provide further insights and
to reinforce those already found. Additionally, each parameter has been considered without
any interrelation to the others. This assumption cannot be kept up because there are
interrelations that have been mentioned such as the order of down-sampling and voxelfiltering. Finally, the impact of the image acquisition system has not been considered at all. A
few more systems should be taken into account in order to quantify this factor. This brief
outline of open questions pinpoints the way for follow-up activities that are needed moving
towards future automated driving functions making use of cognitive algorithms.
54
Bibliography
[1]
Insup Lee, Joseph Y-T. Leung, Sang H. Son / Handbook of Real-Time and Embedded
Systems / Chapman & Hall 2008
[2]
Giorgio C. Buttazzo / Hard Real-Time Computing Systems / Springer 2011
[3]
Thomas Dean, Mark Boddy / An Analysis of Time-Dependent Planing / Department of
Computer Science Brown University 1988
[4]
Ihme T., Wetzelsberger K., Speckert M., Fischer J. / Real-time Image Processing based
on a Task-pair Scheduling Concept / 2011 IEEE International Conference on Robotics
and Auto-mation (ICRA 2011). Shanghai International Conference Center, Shanghai,
China, May 9-13, 2011, pp. 5596-5601
[5]
Marschik N., Speckert M., Ihme T. / Towards Adaptive scheduling for Real-Time Image
Processing / Autonome Mobile Systeme 2012 (AMS)
[6]
https://github.com/PointCloudLibrary/pcl/tree/master/people
[7]
Wilhelm R., Engblom J., Ermedahl A., Holsti N., Thesing S., Whalley D., Bernat G.,
Ferdinand C., Heckmann R., Mitra T., Mueller F., Puaut I., Puschner P., Staschulat J.,
Stenström P. / The Worst-Case Execution Time Problem — Overview of Methods and
Survey of Tools / ACM Transactions on Embedded Computing Systems (TECS) ,Volume
7 Issue 3, April 2008, Article No. 36
[8]
Shlomo Zilberstein / Using Anytime Algorithms in Intelligent Systems / AI Magazine
,Volume 17 Number 3, 1996
[9]
https://www.willowgarage.com/papers/3d-here-point-cloud-library-pcl
[10] http://pointclouds.org/media/ias2014.html / M.Munaro / RGB-D people detection
55
[11] http://pointclouds.org/documentation/tutorials/walkthrough.php#walkthrough
[12] http://pandaboard.org/sites/default/files/board_reference/ES/
Panda_Board_Spec_DOC-21054_REV0_1.pdf
[13] https://rt.wiki.kernel.org/index.php/CONFIG_PREEMPT_RT_Patch
[14] http://linux.die.net/man/2/sched_setscheduler
[15] Michael Kerrisk / The Linux Programming Interface: A Linux and Unix System
Programming Handbook / No Starch Press, 2010
[16] https://rt.wiki.kernel.org/index.php/HOWTO:_Build_an_RT-application
[17] http://linux.die.net/man/2/clock_gettime
[18] http://www.bmwcarit.com/downloads/publications/
ValidatingTheRealTimeCapabilitiesOfTheROSCommunicationMiddleware.pdf
[19] https://www.yoctoproject.org/
[20] Kushal Koolwal / Investigating latency effects of the Linux real-time Preemption
Patches (PREEMPT RT) on AMD’s GEODE LX Platform / VersaLogic Corporation
3888 Stewart Road, Eugene, OR 97402 USA
[21] Felipe Cerqueira, Björn B. Brandenburg / A Comparison of Scheduling Latency
in Linux, PREEMPT RT, and LITMUS / Max Planck Institute for Software Systems
[22] https://github.com/bmwcarit/meta-ros
Download