Uploaded by shakib hasan

A report on autonomous vehicle

advertisement
Department of Electrical And Electronic Engineering
Khulna University of Engineering & Technology
Khulna – 9203, Bangladesh
An improved deep learning based approach of steering angle
prediction for autonomous vehicle
Supervised by:
Submitted by:
Dr. Md. Salah Uddin Yusuf
Md. Shakib Hasan Rudro
Professor,
Roll: 1803085
Department of EEE,
Department of EEE,
Khulna University of Engineering & Technology Khulna University of Engineering & Technology
DECLARATION
This is to certify that the thesis work, An improved deep learning based approach of
steering angle prediction for autonomous vehicle by Md. Shakib Hasan Rudro was
performed under the supervision of Prof. Dr. Md. Salah Uddin Yusuf in the Department of
Electrical Electronic Engineering, Khulna University of Engineering & Technology,
Khulna, Bangladesh. The above thesis work has not been submitted anywhere for any
degree.
The above claims are accurate. This study was submitted as an undergraduate thesis.
Signature of Supervisor
Signature of Student
ACKNOWLEDGMENT
First of all, I express my sincere gratitude to the Almighty for His blessings, which have
facilitated the successful completion of this significant academic endeavor.
I extend my deepest appreciation to my honorable supervisor, Prof. Dr. Md. Salah Uddin
Yusuf, Head of the Department of Electrical and Electronic Engineering, Khulna University
of Engineering & Technology, for his invaluable guidance and unwavering support
throughout the course of this thesis. His insightful advice, encouragement, and constant
supervision have been instrumental in shaping the direction of this research.
I am also grateful to all the honorable faculty members of the Department of Electrical and
Electronic Engineering for their cooperation, encouragement, and scholarly insights, which
have enriched this work and contributed to its academic rigor.
Furthermore, I would like to extend my heartfelt thanks to my family and friends for their
unwavering support, encouragement, and understanding during this academic journey.
February 2024
Author
Md. Shakib Hasan Rudro
i
ABSTRACT
This research presents an improved deep learning approach for steering angle prediction in
autonomous vehicles. The study focuses on enhancing the accuracy and robustness of
steering angle prediction models, which are crucial for safe and reliable autonomous driving
systems. Leveraging advanced deep learning techniques, including convolutional neural
networks (CNNs). the proposed approach integrates image data from onboard cameras with
vehicle sensor data to predict steering angles in real-time. The research explores novel
architectures, data pre-processing methods, and training strategies to optimize model
performance and generalization capabilities. Experimental evaluations conducted on realworld driving datasets demonstrate the effectiveness and efficiency of the proposed approach
compared to existing methods. The findings of this study contribute to the advancement of
autonomous vehicle technology and have significant implications for enhancing road safety
and transportation efficiency.
Keywords: Autonomous vehicle, End to end learning, CNN, Semantic Segmentation
ii
Table of Contents
CHAPTER I: INTRODUCTION
1
1.1. Introduction
1
1.2. Problem Description
2
1.3. Motivation
3
1.4. Objectives
5
1.5. Application
5
CHAPTER II: LITERATURE REVIEW
6
2.1. Literature Review
6
CHAPTER III: TERMINOLOGY
8
3.1. Autonomous vehicle definition
8
3.2. End to end learning
8
3.3. CNN
9
3.4. Segmentation
10
3.4.1. FCN-8
12
3.4.2. UNET
13
3.5. Simulator
14
3.5.1. AirSim
15
3.5.1.1. AirSim Block Diagram
16
3.5.1.2. Why airsim
16
3.5.1.3. Graphical User Interface
17
iii
CHAPTER IV: METHODOLOGY
18
4.1. Block Diagram
18
4.2. Methodology
18
4.2.1. Data acquisition
18
4.2.2. Data pre-processing
18
4.2.2.1. Augmentation
19
4.2.3. Segmentation
21
4.2.4. Model Architectures
22
4.2.4.1. Nvidia pilotnet
22
4.2.4.2. Model A
23
4.2.4.3. Model B
24
4.2.5. Training and testing
25
4.2.6. Loss function
25
4.2.7. Evaluation
25
4.3. Challenges
26
CHAPTER V: RESULTS AND DISCUSSION
5.1. Result and discussion
28
28
5.1.1. The Actual Vs predicted steering angle
28
5.1.2. Time comparison
30
5.1.3. Autonomy Comparison
31
5.1.4. FPS vs Maximum Speed
33
5.1.5. Number of data vs Autonomy
34
iv
CHAPTER VI: CONCLUSION
35
6.1. Conclusion
35
6.2. Future work
36
References
37
v
Table of Figures
Figure 1: An autonnomous vehicle with different sensors and camera
8
Figure 2: Traditional vs End to end approach
9
Figure 3: Basic CNN architecture
10
Figure 4: Road Segmentation
11
Figure 5: FCN8 architecture
12
Figure 6: UNET architecture
13
Figure 7: Preview of the airsim simulator
15
Figure 8: Airsim block diagram
16
Figure 9: Graphical user interface
17
Figure 10: Block diagram of the experiment
18
Figure 11: Input image after crop and resize
19
Figure 12: Random brightness
19
Figure 13: Random shifft
20
Figure 14: Random shadow
20
Figure 15: Random flip
21
Figure 16: FCN8 architecture
21
Figure 17: Nvidia pilotnet architecture
22
Figure 18: Proposed model A
23
Figure 19: Proposed model B
24
vi
Figure 20: Challenges of autonomous vehicle
27
Figure 21: Steering angle (actual vs prediction) for model A
28
Figure 22: Steering angle (actual vs prediction) for model B
29
Figure 23: FPS comparison of the baseline model and model A and B
30
Figure 24: Autonomy comparison between various model for (1.5) hours time span for road
with obstacle
32
Figure 25: Fps vs maximum speed
33
vii
Index of Tables
Table 1: Hardware specification
30
viii
(1) CHAPTER I: INTRODUCTION
1.1. Introduction
The emergence of Deep Convolutional Neural Networks (CNNs) has ushered in a
transformative era in the field of autonomous driving. With the fusion of cutting-edge
computer vision, sensor technology, and artificial intelligence, deep CNN-based selfdriving cars have made remarkable strides in recent years. [1] These vehicles represent
the pinnacle of autonomous mobility, offering the promise of safer, more efficient, and
sustainable transportation solutions.
The core premise of deep CNN-based self-driving car research is to replicate and enhance
the human ability to perceive and navigate the complex dynamics of the road
environment. [2] These vehicles leverage deep neural networks to interpret vast streams
of sensor data, including images, LiDAR, radar, and GPS information, in real-time. By
mimicking human visual perception and cognitive decision-making processes, these
intelligent machines can detect obstacles, interpret road signs, and make split-second
decisions to ensure the safety of passengers, pedestrians, and other road users.
An autonomous vehicle is the type of vehicles that has the ability to operate itself as well
as perform required functions without the necessity of any kind of human intervention. It
also capable of sensing its surroundings and take decision from it. Autonomous vehicles
has the potential to transform the way we travel and commute. A successful development
of an autonomous vehicle requires precise testing and evaluation.
Simulation is a critical tool for testing the behavior of autonomous vehicles in various
traffic scenarios. The use of neural networks has shown significant promise in accurately
modeling traffic dynamics. [3]
1
While the concept of self-driving cars holds great promise, it is not without its challenges.
Driving is an inherently complex task, and the intricacies of road regulations and
unpredictable scenarios make navigating busy roads a daunting challenge, even for
human drivers. [4] Therefore, achieving fully autonomous vehicles cannot rely solely on a
single deep learning model. Instead, it necessitates the implementation of a sophisticated
combination of separately trained neural network models. In this research landscape, I
delve into the intricacies of self-driving car systems and explore the challenges and
opportunities presented by these vehicles, ranging from robust perception and sensor
fusion to decision-making algorithms and regulatory considerations. My objective is to
establish a comprehensive pipeline incorporating carefully selected deep learning
techniques that collectively enhance safety, efficiency, and accessibility in transportation.
This vision aims to create a future where autonomous vehicles seamlessly coexist with
traditional human-driven cars. [5]
This report is to investigate the use of neural networks and to simulate the behavior of
autonomous vehicles. The primary objective is the development and implementation of a
neural network-based model. It will be used to test the behavior of an autonomous vehicle
in complex traffic scenarios. I will investigate the performance of the neural network
model in various scenarios. The results of this study will provide valuable insights into
the effectiveness of using neural networks for autonomous vehicles. The proposed study
has the potential to contribute to the development of safe and efficient autonomous
vehicle technologies— which can ultimately benefit society by reducing traffic accidents,
improving transportation efficiency reducing carbon emissions. [6]
1.2. Problem Description
Conventional autonomous driving systems adopt a modular deployment strategy, wherein
each functionality, such as perception, prediction, and planning, is individually developed
and integrated into the onboard vehicle. The planning or control module, responsible for
generating steering and acceleration outputs, plays a crucial role in determining the
driving experience. The most common approach for planning in modular pipelines
involves using sophisticated rule-based designs, which are often ineffective in addressing
2
the vast number of situations that occur while driving. Therefore, there is a growing trend
to leverage large-scale data and to use learning-based planning as a viable alternative.
We define end-to-end autonomous driving systems as fully differentiable programs that
take raw camera data as input and produce control actions as output.
1.3. Motivation
In the classical pipeline, each model serves a standalone component and corresponds to a
specific task (e.g., lane detection). Such a design is beneficial in terms of interpretability,
verifiability, and ease of debugging. [7] However, since the optimization goal across
modules is different, with detection in perception pursuing mean average precision (mAP)
while planning aiming for driving safety and comfort, the entire system may not be
aligned with a unified target. Errors from each module, as the sequential procedure
proceeds, could be compounded and result in an information loss for the driving system.
Moreover, the multi-task, multi-model deployment may increase the computational
burden and potentially lead to sub-optimal use of computation.
In contrast to its classical counterpart, an end-to-end autonomous system offers several
advantages. [8]
(a) The most apparent merit is its simplicity in combining perception, prediction, and
planning into a single model that can be jointly trained.
(b) The whole system, including its inter- mediate representations, is optimized
towards the ultimate task.
(c) Shared backbones increase computational efficiency.
(d) Data-driven optimization has the potential to offer emergent abilities that improve
the system by simply scaling training resources.
The development of self-driving cars represents one of the most transformative and
promising advancements in the field of transportation. Autonomous vehicles have the
potential to revolutionize our daily lives, making transportation safer, more efficient, and
environmentally friendly. [4] At the heart of this groundbreaking technology lies
computer vision, a field that has seen unprecedented growth and innovation in recent
3
years. This paper is motivated by the profound impact that computer vision techniques are
having on the realization of autonomous driving and the myriad challenges and
opportunities they present.
1. Safety and Efficiency: Human error is a leading cause of traffic accidents
worldwide. Self-driving cars, empowered by computer vision, offer the promise of
significantly reducing accidents by providing vehicles with the ability to perceive
their surroundings, make decisions, and execute maneuvers with precision. By
eliminating the risk associated with distracted or impaired driving, self-driving
cars have the potential to save countless lives and reduce the economic toll of
accidents.
2. Accessibility and Mobility: Autonomous vehicles have the potential to
revolutionize transportation for individuals with disabilities and the elderly.
3. Environmental Impact: Autonomous driving can contribute to a more sustainable
future by optimizing traffic flow, reducing congestion, and minimizing fuel
consumption.
4. Technological Advancements: The rapid evolution of computer vision techniques,
driven by deep learning and artificial intelligence, has opened new possibilities for
autonomous vehicles. My goal is to explore the latest developments in computer
vision and their application to self-driving cars, shedding light on cutting-edge
research and technological innovations.
5. Challenges and Ethical Considerations: While the potential benefits of selfdriving cars are vast, there are numerous challenges to overcome, including
ethical dilemmas, regulatory frameworks, and cybersecurity concerns.
4
1.4. Objectives
The objectives of the simulation of autonomous vehicle by using neural network are:
1. To develop an optimized deep CNN based model to predict
steering angle for
self driving car.
2. To evaluate the models and compare them with the Nvidia pilotnet model.
3. To visualize the performance of the models using airsim simulator.
1.5. Application
Application of autonomous vehicle are the following:
1. Enhanced public transportation systems.
2. Improved ride-sharing services.
3. Increased accessibility for all.
4. Efficient emergency response.
5. Streamlined last-mile delivery.
6. Automation in agriculture.
7. Military reconnaissance and logistics.
8. Impact on urban planning and infrastructure.
5
(2) CHAPTER II: LITERATURE REVIEW
2.1. Literature Review
Torabi et al. [9] introduced the term behavior cloning. The process of reconstructing the
human sub cognitive skill through the computer program is referred to as behavioral
cloning. Here, actions of human performing the skill are recorded along with the situation
that gave rise to the action. Human skills such as driving can be reconstructed through
recorded actions which are maintained in a structured way by using learning algorithms in
terms of various manifestation traces to reproduce the skilled behavior.
Bojarski et al. [10] started their research work at NVIDIA on self-driving car inspired by
the ALVINN and DARPA projects. The motivation for their work was to create an end-toend model which enables steering of the car without manual intervention [1] while
recording humans driving behavior along with the steering angle at every second.
Based on the NVIDIA proposed architecture pilotnet (as shown in Fig. 2), Viswanath et
al. from Texas Instruments released JacintoNet i.e an end-to-end neural network for
embedded system vehicles such as tiny humanoid robots.
Xu et al. [11] trained a neural network for predicting discrete or continuous actions also
based on camera inputs. Codevilla et al. also trained a network using camera inputs and
conditioned on high-level commands to output steering and acceleration. This is the first
model that doesn’t just follow lane but also incorporates high level commands. They also
evaluated their approach in realistic simulations of urban driving and on a 1/5 scale
robotic truck.
Kuefler et al. [12] use Generative Adversarial Imitation Learning (GAIL) with simple
affordance- style features as inputs to overcome cascading errors typically present in
behavior cloned policies so that they are more robust to perturbations.
Hecker et al. [13] used 360-degree camera inputs instead of a single front facing camera
and desired route planner to predict steering and speed.
6
M¨uller et al. [5] train a system in simulation using CARLA by training a driving policy
from a scene segmentation network to output high-level control, thereby enabling transfer
learning to the real world using a different segmentation network trained on real data.
Bansal et al. [14] in a research at Waymo (Google) presents another model called
ChauffeurNet, which then outputs a driving trajectory that is consumed by a controller
which translates it to steering and acceleration.
Saifullah et al. [15] used a transformer-based approach for behavior cloning. We’ve
already seen the effectiveness of transformer model in natural language processing
engines such as ChatGPT. The authors proposed a model called BeT (Behavior
Transformer).they claim that this model harnessing the power of transformer can handle
complex tasks in a humanly manner.
7
(3) CHAPTER III: TERMINOLOGY
3.1. Autonomous vehicle definition
Autonomous vehicles, also known as self-driving cars or driverless cars, are vehicles
equipped with advanced technologies that enable them to navigate and operate without
human intervention. These vehicles utilize a combination of sensors, cameras, radar, lidar,
GPS, and sophisticated algorithms to perceive their surroundings, interpret sensory
inputs, and make decisions to navigate safely to their destination. [16]
Figure 1: An autonomous vehicle with different sensors and
camera
Autonomous vehicles have the potential to revolutionize transportation by offering
numerous benefits, including increased road safety, reduced traffic congestion, improved
energy efficiency, and enhanced mobility for individuals with disabilities or limited
access to transportation. They hold promise for transforming various industries, including
transportation, logistics, and urban planning.
3.2. End to end learning
In end-to-end learning, the neural network is trained using a large dataset of input-output
pairs, where the inputs are raw sensor data collected from the vehicle's environment, and
the outputs are corresponding control commands executed by the vehicle's actuators. The
network learns to extract relevant features from the raw input data and generate
8
appropriate control commands without explicit feature engineering or manual rule-based
programming. [17]
Figure 2: Traditional vs End to end approach
End-to-end learning offers several potential advantages for autonomous vehicles,
including simplicity, scalability, and the ability to adapt to diverse driving conditions. By
directly learning driving policies from data, end-to-end approaches have the potential to
capture complex driving behaviors and adapt to novel situations that may not have been
explicitly programmed.
3.3. CNN
Convolutional Neural Networks (CNNs) are a class of deep neural networks that have
revolutionized the field of computer vision. Inspired by the organization of the visual
cortex in animals, CNNs are particularly well-suited for tasks such as image recognition,
object detection, and image classification.
CNNs consist of multiple layers, including convolutional layers, pooling layers, and fully
connected layers. The convolutional layers apply a set of learnable filters to the input
image, detecting features such as edges, textures, and shapes. Pooling layers reduce the
spatial dimensions of the feature maps, while preserving important information. Fully
connected layers combine the features learned by the previous layers to make predictions
about the input data. [18]
9
One of the key advantages of CNNs is their ability to automatically learn hierarchical
representations of features directly from raw input data. This hierarchical feature learning
enables CNNs to achieve superior performance on a wide range of visual recognition
tasks compared to traditional machine learning algorithms.
CNNs have been widely adopted in various applications, including image classification,
object detection, facial recognition, medical image analysis, and autonomous vehicles.
Their success can be attributed to their ability to automatically learn relevant features
from large amounts of data, their scalability to handle complex and high-dimensional
inputs, and their ability to generalize well to new, unseen data.
Figure 3: Basic CNN architecture
3.4. Segmentation
In the context of autonomous vehicles, image segmentation plays a crucial role in
understanding the vehicle's surroundings and making informed decisions for navigation
and control. [19] Here's how segmentation is utilized:
10
Figure 4: Road Segmentation
1. Road and Lane Segmentation: Segmenting the road and lane markings from the
surrounding environment is essential for autonomous vehicles to navigate safely
within lanes. This allows the vehicle to stay centered in its lane and make
appropriate adjustments for turns, lane changes, and other maneuvers.
2. Obstacle Detection and Segmentation: Segmentation techniques are used to
identify and delineate obstacles such as vehicles, pedestrians, cyclists, and other
objects in the vehicle's path. By segmenting obstacles from the background,
autonomous vehicles can assess potential collision risks and plan avoidance
strategies accordingly.
3. Semantic Understanding of the Environment: Semantic segmentation provides
a detailed understanding of the scene by segmenting different elements such as
roads, sidewalks, buildings, vegetation, and other infrastructure. This semantic
understanding enhances the vehicle's situational awareness and enables more
informed decision-making.
4. Localization and Mapping: Segmentation aids in generating high-definition
maps of the environment by accurately delineating various features and
11
landmarks. These maps, often referred to as semantic maps, facilitate precise
localization and navigation of autonomous vehicles by providing detailed
information about the surroundings.
Overall, image segmentation is a critical component of perception systems in autonomous
vehicles, enabling them to interpret and understand the visual information from onboard
sensors and navigate safely and effectively in complex real-world environments.
3.4.1. FCN-8
FCN-8, introduced by Long et al., is renowned for its capability to perform end-to-end
pixel-wise classification. It replaces the fully connected layers of traditional CNNs with
convolutional layers, enabling it to accept inputs of arbitrary sizes and produce dense
predictions. FCN-8 utilizes skip connections from earlier layers to preserve spatial
information, which helps in generating detailed segmentation maps. Additionally, it
incorporates transposed convolutions to upsample feature maps and restore the resolution
of the output. While FCN-8 is efficient and flexible, it may struggle with capturing fine
details due to downsampling operations.
Figure 5: FCN8 architecture
12
3.4.2. UNET
On the other hand, U-Net, proposed by Ronneberger et al., is characterized by its
symmetric encoder-decoder architecture with skip connections. The contracting path of
U-Net resembles a typical CNN encoder, gradually reducing spatial dimensions and
extracting high-level features. However, unlike FCN-8, U-Net's expansive path employs
transposed convolutions for upsampling while also integrating skip connections from the
contracting path. These skip connections facilitate the fusion of low-level and high-level
features, aiding in precise localization and segmentation of objects. U-Net's architecture is
particularly advantageous for medical image segmentation tasks and scenarios where
detailed localization is crucial. Despite its effectiveness, U-Net may be computationally
expensive due to the large number of parameters, especially in deeper architectures.
Figure 6: UNET architecture
13
3.5. Simulator
Simulators play a crucial role in various fields, including autonomous vehicles, aerospace,
robotics, and healthcare, for several reasons:
1. Cost-Effective Testing: Simulators provide a cost-effective alternative to realworld testing. They allow researchers and developers to conduct extensive testing
and validation of systems and algorithms without the need for expensive physical
prototypes or equipment. This significantly reduces development costs and risks
associated with real-world testing.
2. Safety: Simulators offer a safe and controlled environment for testing complex
systems and algorithms, especially those designed for critical applications such as
autonomous vehicles and medical devices. Simulated environments enable
researchers to identify and address potential safety issues without putting human
lives or valuable resources at risk.
3. Re-producibility: Simulators enable researchers to reproduce and control specific
scenarios and conditions with precision. This reproducibility ensures consistency
in testing procedures and results, making it easier to evaluate and compare
different approaches, algorithms, and systems.
4. Scalability: Simulators allow researchers to scale experiments and simulations to
a larger scope and complexity than would be feasible in the real world. This
scalability enables the evaluation of systems and algorithms under a wide range of
scenarios, including rare or extreme conditions that may be difficult to encounter
in reality.
5. Accessibility: Simulators make experimental setups and testing environments
accessible to a broader audience of researchers, developers, and enthusiasts. They
democratize access to advanced technologies and enable collaboration and
knowledge sharing across disciplines and geographic locations. [20]
14
In summary, simulators are necessary tools for research, development, and testing in
various fields due to their cost-effectiveness, safety, reproducibility, scalability, iterative
development capabilities, and accessibility. They enable researchers and developers to
accelerate innovation, mitigate risks, and advance the state-of-the-art in their respective
domains.
3.5.1. AirSim
Figure 7: Preview of the airsim simulator
Microsoft AirSim is an open-source, cross-platform simulator for autonomous vehicles
(AVs), robotics research, and artificial intelligence (AI) development. Developed by
Microsoft Research, AirSim provides a realistic simulation environment for training and
testing autonomous systems in various scenarios without the need for physical prototypes
or real-world testing.
15
3.5.1.1. AirSim Block Diagram
Image
Interpreter
AirSim
Visual Output
Steering command
Figure 8: Airsim block diagram
The communication between the interpreter and airsim happens via TCP port. Airsim
sends a stream of image of desired dimension to the interpreter and the interpreter
provides steering commands after processing the images and sends them back to the
simulator. The simulator then shows a video of the performance of the vehicle based on
the provided command. Autonomy is computed by observing the output of Airsim.
3.5.1.2. Why airsim
Key features of Microsoft AirSim include:
1. High-Fidelity Simulation: AirSim offers a high-fidelity physics engine and
realistic graphics, enabling researchers and developers to create immersive and
realistic simulation environments for AVs and robotics applications.
2. Support for Multiple Platforms: AirSim is designed to be compatible with
multiple platforms, including Windows, Linux, and macOS, making it accessible
to a wide range of developers and researchers.
3. Extensibility and Customization: AirSim provides a modular architecture that
allows users to extend and customize the simulator to suit their specific research
needs. Users can integrate custom vehicle models, sensors, environments, and
algorithms into AirSim for experimentation and testing.
4. Built-in Sensor Simulation: AirSim supports various sensors commonly used in
AVs and robotics, including cameras, lidar, radar, and GPS. These sensors can
generate realistic data streams that mimic real-world sensor outputs, enabling
researchers to develop and validate perception and control algorithms in a
simulated environment.
16
5. Integration with AI Frameworks: AirSim seamlessly integrates with popular AI
frameworks such as TensorFlow and PyTorch, allowing researchers to leverage
state-of-the-art machine learning algorithms for perception, planning, and control
tasks within the simulator.
Overall, Microsoft AirSim provides a powerful and flexible platform for researchers and
developers to accelerate innovation in autonomous systems by enabling rapid
prototyping, testing, and validation in a simulated environment. Its realistic simulation
capabilities and extensibility make it an invaluable tool for advancing the field of AVs and
robotics. [21]
3.5.1.3. Graphical User Interface
Figure 9: Graphical user interface
A graphical user interface was designed to visualize the steering angle and also the speed
curve of the vehicle. It also shows the FPS and numerical value of the steering angle. This
can be operated in both manual and autonomous mode. In the manual mode, it works as a
data collector. Using this Graphical User Interface(GUI) autonomy and the maximum
speed for a certain FPS can be calculated.
17
(4) CHAPTER IV: METHODOLOGY
4.1. Block Diagram
Figure 10: Block diagram of the experiment
4.2. Methodology
4.2.1. Data acquisition
Utilizing the AirSim simulator, I conducted data collection procedures to assemble
comprehensive datasets for training and validating autonomous vehicle algorithms. This
involved capturing images from multiple viewpoints, including the center, left, and right
perspectives, along with recording the corresponding steering angles. By incorporating
data from various viewpoints, this approach aims to enhance the robustness and
generalization capabilities of the autonomous vehicle system, thereby facilitating more
accurate and reliable navigation in diverse real-world scenarios.
4.2.2. Data pre-processing
In the data pre-processing phase, meticulous attention was given to maintaining
uniformity across all collected images. Each image underwent cropping and resizing to
18
Figure 11: Input image after crop and resize
standard dimensions of 66×200 pixels, ensuring consistency in the dataset. Subsequently,
three types of augmentation techniques were applied: random brightness adjustments,
random shifts, and random flips. These augmentation strategies were implemented to
enhance the robustness and variability of the dataset, thereby facilitating more effective
training of the autonomous vehicle model. [23] Concurrently, the corresponding steering
angles for each augmented image were meticulously logged in a CSV file, ensuring
precise alignment between the image data and steering control inputs. This systematic
pre-processing methodology ensured the preparation of a high-quality dataset conducive
to the training process.
4.2.2.1. Augmentation
Random Brightness:
Figure 12: Random brightness
19
Random shift:
Figure 13: Random shifft
Random shadow:
Figure 14: Random shadow
20
Random flip:
Figure 15: Random flip
4.2.3. Segmentation
Semantic segmentation, achieved through techniques like Fully Convolutional Networks
(FCNs), partitions an image into meaningful regions, enhancing scene understanding and
object localization which improved perception capabilities, robustness to environmental
variations, and support for end-to-end learning. FCN8, chosen for its speed, enables realtime processing which is crucial for autonomous driving.
Figure 16: FCN8 architecture
While UNet offers superior performance, FCN8's efficiency ensures the model operates
swiftly, crucial for timely decision-making in dynamic environments. Thus, FCN8
balances performance and speed, ideal for real-time applications like autonomous driving.
These advancements contribute to the development of more effective and reliable
21
autonomous driving systems capable of navigating safely in complex real-world
environments. [24]
4.2.4. Model Architectures
4.2.4.1. Nvidia pilotnet
Figure 17: Nvidia pilotnet architecture
22
4.2.4.2. Model A
Figure 18: Proposed model A
23
4.2.4.3. Model B
Figure 19: Proposed model B
24
4.2.5. Training and testing
The training process involves preparing and augmenting datasets, training the model
using
optimization
algorithms,
and
fine-tuning
hyper-parameters
for
optimal
performance. Validation assesses the model's performance on a separate dataset, while
testing evaluates its generalization to real-world scenarios. Finally, deployment in the
target environment requires continuous monitoring and updating to maintain
effectiveness.
4.2.6. Loss function
Mean Square Error (MSE) is a common loss function used in regression tasks, including
machine learning and neural network training. It measures the average squared difference
between the predicted values and the actual values in a dataset. [12]
In the context of training models for tasks such as regression, MSE calculates the average
squared difference between the predicted output and the ground truth labels. It penalizes
larger errors more heavily than smaller ones, making it suitable for tasks where precise
prediction accuracy is important.
Mathematically, MSE is calculated by taking the average of the squared differences
between predicted and actual values:
n
MSE=
1
∑ ( y − y ' )2
n i=1 i i
where n is the number of samples, yi represents the actual value, and y’i represents the
predicted value.
Overall, MSE is a useful loss function for training regression models, providing a
quantitative measure of the model's performance by quantifying the average squared
difference between predicted and actual values.
4.2.7. Evaluation
The percentage of the time the network could drive the car without human intervention is
defined as autonomy. The metric is determined by counting simulated human
interventions. These interventions occur when the simulated vehicle departs from the
25
center line by more than one meter. We assume that in real life an actual intervention
would require a total of six seconds: this is the time required for a human to retake control
of the vehicle, re-center it, and then restart the self-steering mode. We calculate the
percentage autonomy by counting the number of interventions, multiplying by 6 seconds,
dividing by the elapsed time of the simulated test, and then subtracting the result from 1.
[10]
autonomy=(1−
(number of interventions)×6 seconds
)×100
elapsed time [ seconds]
4.3. Challenges
Driving itself is a complex task that takes years to master even for human. So there are
many challenges to overcome in autonomous vehicle implementation. One such challenge
that stands out is the stability issue. If the vehicle somehow drifts from its expected
trajectory, than it gets slightly unfamiliar images as input. Which leads to an error in the
predicted steering angle, which in turn pushes the vehicle toward more unfamiliar region
and the cycle repeats.
As a result the error builds up and the vehicle shows unstable behavior. A visible shaking
can be noticed in such case. In this research, I propose two possible solutions to the
problem.
26
Figure 20: Challenges of autonomous vehicle
1. While training, there was 3 cameras to cover a greater region of the scenario and adjust
them properly along with their corresponding steering angle with respect to the center
image. This step ensures more coverage of the sorrounding.
2. Taking the moving average of the previous few frames with the current frame. This
step minimizes the shaking of the vehicle.
27
(5) CHAPTER V: RESULTS AND DISCUSSION
5.1. Result and discussion
The results of the steering angle prediction model demonstrate its effectiveness in
accurately estimating steering angles based on input images captured from the center, left,
and right viewpoints. Through careful training and validation, the model showcases a
strong ability to infer steering commands, crucial for guiding autonomous vehicles along
desired trajectories.
5.1.1. The Actual Vs predicted steering angle
Figure 21: Steering angle (actual vs prediction) for model A
The graph depicting the comparison between actual and predicted steering angles by
Model A reveals a remarkable resemblance between the two. Notably, the model's
predictions closely mirror the actual steering angles, capturing abrupt changes in direction
almost instantaneously. This close alignment between predicted and actual values
underscores the model's efficacy in accurately estimating steering commands, a critical
aspect for ensuring precise control of autonomous vehicles. The rapid response of the
model to changes in direction further highlights its capability to adapt to dynamic driving
28
scenarios in real-time. Overall, the graph illustrates the impressive performance of Model
A in steering angle prediction, affirming its potential for enhancing the autonomy and
safety of vehicle navigation systems.
Figure 22: Steering angle (actual vs prediction) for model B
The graphical representation of steering angle comparisons between actual and predicted
values for Model B showcases a significant resemblance between the two datasets.
Notably, Model B's predictions closely mirror the actual steering angles, capturing subtle
and abrupt changes in direction with impressive accuracy. This close correspondence
indicates the model's proficiency in accurately estimating steering commands, essential
for ensuring precise vehicle control in diverse driving conditions. [25] Moreover, the
swift response of Model B to changes in direction underscores its adaptability and
responsiveness, further enhancing its suitability for real-world applications.
29
5.1.2. Time comparison
The graph depicting frames per second (FPS) for three different models—Nvidia, Model
A, and Model B—holds significant importance in the context of autonomous vehicles.
Real-time processing is crucial for autonomous vehicles to make rapid decisions and
navigate safely in dynamic environments.
FPS comparison
16
14
Frames/Sec
12
10
8
6
4
2
0
Nvidia pilotnet
Model A
Model B
Models
Figure 23: FPS comparison of the baseline model and model A and B
While Nvidia exhibits the highest FPS among the models, both Model A and Model B
demonstrate slightly lower FPS. However, these FPS values remain within a tolerable
range, indicating that they are still capable of processing frames at a sufficiently high rate
for real-time operation.
The slight sacrifice in FPS for Model A and Model B is justified by their ability to
provide greater autonomy and make accurate predictions, such as steering angle
estimation or semantic segmentation. The trade-off between FPS and autonomy is
therefore worth considering, as it ensures the models can effectively analyze and respond
to the surrounding environment in real-time, ultimately enhancing the safety and
efficiency of autonomous vehicle navigation systems.
30
It's important to note that FPS values are heavily dependent on the specifications of the
device used for testing. The measurements presented in the graph were conducted under
the context of specific device specifications, outlined in the provided table.
Table 1: Hardware specification
CPU
GPU
RAM
Intel Core i5 11th Gen 11300H
Nvidia RTX 3050
16 GB
It's worth emphasizing that FPS performance can vary significantly based on factors such
as CPU and GPU processing power, memory capacity, and optimization techniques
employed. Therefore, upgrading the device specifications, such as utilizing a more
powerful CPU or GPU, increasing memory capacity, or optimizing software
configurations, can lead to significant improvements in FPS for all models. [26]
By investing in hardware upgrades or utilizing more advanced computing resources, such
as high-performance GPUs or dedicated processing units, the FPS of all models can be
substantially enhanced. This underscores the importance of considering device
specifications and resource availability when evaluating FPS performance and optimizing
the performance of autonomous vehicle systems.
5.1.3. Autonomy Comparison
In comparing the autonomy between Model A and Model B with the baseline Nvidia
model, significant improvements are evident, despite utilizing the same amount of data.
Both Model A and Model B demonstrate enhanced autonomy, showcasing advancements
in their ability to make independent decisions and navigate complex environments.
31
Autonomy analysis
100
Autonomy(%)
95
90
With obstacle
Without obstacle
85
80
75
Nvidia pilotnet
Model A
Model B
Models
Figure 24: Autonomy comparison between various model for (1.5) hours time span
for road with obstacle
Model A and Model B exhibit superior autonomy compared to the baseline Nvidia model
due to several factors. Firstly, their improved accuracy in tasks such as steering angle
prediction or semantic segmentation enables more precise and reliable decision-making,
leading to smoother and safer navigation. This increased accuracy is attributed to the
utilization of advanced neural network architectures and optimized training
methodologies.
Furthermore, Model A and Model B showcase enhanced adaptability to diverse driving
conditions and scenarios. Through robust training and validation processes, these models
have learned to generalize effectively from the provided dataset, allowing them to
respond appropriately to real-world challenges such as varying road conditions,
unexpected obstacles, and dynamic traffic patterns.
Moreover, the introduction of novel features or techniques in Model A and Model B may
contribute to their improved autonomy. For instance, the incorporation of additional
32
sensor modalities, advanced fusion techniques, or sophisticated planning algorithms may
further enhance their capabilities beyond those of the baseline Nvidia model.
Overall, the comparison highlights the substantial advancements in autonomy achieved
by Model A and Model B, underscoring the continuous evolution and refinement of
autonomous vehicle technology. These improvements signify a promising trajectory
towards the development of highly autonomous vehicles capable of navigating safely and
efficiently in diverse real-world environments.
5.1.4. FPS vs Maximum Speed
As we know there is a direct relationship between the number of frames that can be
processed per second and the maximum speed at which the vehicle can operate. I have
tested the maximum speed achieved in various FPS and plotted them. From the graph is is
clear that there is a linear relationship between the maximum speed and FPS.
FPS vs maximum speed
16
14
Speed(m/s)
12
10
8
6
4
2
0
10
12
14
FPS
Figure 25: Fps vs maximum speed
33
16
5.1.5. Number of data vs Autonomy
All of the models were trained on different amount of data. We can see in the following
chart that the autonomy increased as the amount of data increased. But in the lower
region proposed model A and B shows significantly greater autonomy than the Nvidia
pilotnet model trained on the same amount of data. But as we increase the the size of the
dataset, autonomy of all the models becomes close. This graph shows us that the proposed
models perform better even if trained on low amount of data. Which is necessary because
collecting data for autonomous vehicle is expensive, and risky. Specially for regions like
Bangladesh collecting huge data of various scenario is a challenge. So the proposed
models will perform better in such scenarios.
Autonomy(%)
Autonomy vs the ammount of data
100
90
80
70
60
50
40
30
20
10
0
Nvidia
Model A
Model B
9k
23k
50k
No. of training data
34
92k
(6) CHAPTER VI: CONCLUSION
6.1. Conclusion
The thesis explores the development and enhancement of autonomous vehicle technology,
focusing on key aspects such as perception, decision-making, and autonomy. [27]
Through rigorous experimentation and analysis, several significant findings and
advancements have been achieved, culminating in a comprehensive understanding of
autonomous vehicle systems and their potential impact on transportation and society.
The research begins with an investigation into perception algorithms, particularly
semantic segmentation and steering angle prediction, crucial for understanding the
vehicle's surroundings and making informed decisions. By employing advanced neural
network architectures such as FCN8 and UNet, significant improvements in accuracy and
reliability are achieved, leading to more precise perception capabilities. [28]
Furthermore, the thesis delves into the decision-making process, exploring the intricacies
of steering control and trajectory planning in autonomous vehicles. Through the
development of sophisticated decision-making algorithms and optimization techniques,
the models demonstrate enhanced autonomy and adaptability, capable of navigating
complex environments with confidence and efficiency.
Moreover, the research emphasizes the importance of real-time performance and
computational efficiency in autonomous vehicle systems. [7] By evaluating the frames
per second (FPS) of different models and considering device specifications, valuable
insights are gained into the trade-offs between autonomy and computational resources,
guiding future optimizations and advancements in system design. [13]
Overall, the thesis contributes significant insights and advancements to the field of
autonomous vehicle technology, paving the way for safer, more efficient, and more
autonomous transportation systems. By leveraging cutting-edge technologies and
methodologies, autonomous vehicles hold the potential to revolutionize mobility, reshape
urban landscapes, and enhance the quality of life for people around the world. As we
continue to push the boundaries of innovation and research in this field, the future of
35
autonomous vehicles remains bright, promising a world where transportation is safer,
more sustainable, and more accessible than ever before.
6.2. Future work
The proposed deep learning approach for steering angle prediction in autonomous
vehicles presents a promising foundation for further exploration and development. Here
are some potential avenues for future work:
1. Investigate the impact of deeper and more complex neural network
architectures: Explore deeper convolutional neural networks (CNNs), recurrent
neural networks (RNNs), or a combination of both (CNN-RNN) to capture
complex temporal and spatial dependencies within the sensor data.
2. Incorporate additional sensor modalities: Integrate data from various sensors like
LiDAR, radar, and GPS alongside camera images to provide a more
comprehensive understanding of the environment.
3. Conduct extensive on-road testing: Evaluate the performance of the developed
model in real-world driving scenarios under diverse weather conditions, traffic
patterns, and road infrastructure variations.
4. Develop safety and reliability measures: Implement mechanisms for anomaly
detection, fault tolerance, and explainability to ensure the safe and reliable
operation of the autonomous vehicle in real-world scenarios.
5. Address ethical concerns surrounding autonomous vehicles: Explore the ethical
implications of self-driving cars in terms of decision-making in critical
situations, liability in case of accidents, and potential biases in the training data.
By addressing these future research directions, this can contribute to advancing the stateof-the-art in autonomous vehicle technology, ultimately paving the way towards safer,
more efficient, and more widely adopted autonomous transportation systems.
36
(7) References
[1] S. OwaisAli Chishti, S. Riaz, M. BilalZaib, and M. Nauman, “Self-Driving Cars
Using CNN and Q-Learning,” in 2018 IEEE 21st International Multi-Topic
Conference (INMIC), Nov. 2018, pp. 1–7. doi: 10.1109/INMIC.2018.8595684.
[2] J. del Egio, L. M. Bergasa, E. Romera, C. Gómez Huélamo, J. Araluce, and R. Barea,
“Self-driving a Car in Simulation Through a CNN,” in Advances in Physical Agents,
Springer, Cham, 2019, pp. 31–43. doi: 10.1007/978-3-319-99885-5_3.
[3] A. Bhalla, M. S. Nikhila, and P. Singh, “Simulation of Self-driving Car using Deep
Learning,” in 2020 3rd International Conference on Intelligent Sustainable Systems
(ICISS), Dec. 2020, pp. 519–525. doi: 10.1109/ICISS49785.2020.9315968.
[4] J. Y. C. Chen and J. E. Thropp, “Review of Low Frame Rate Effects on Human
Performance,” IEEE Trans. Syst. Man Cybern. - Part Syst. Hum., vol. 37, no. 6, pp.
1063–1076, Nov. 2007, doi: 10.1109/TSMCA.2007.904779.
[5] M. Müller, A. Dosovitskiy, B. Ghanem, and V. Koltun, “Driving Policy Transfer via
Modularity and Abstraction.” arXiv, Dec. 13, 2018. doi: 10.48550/arXiv.1804.09364.
[6] J. Ni, K. Shen, Y. Chen, W. Cao, and S. X. Yang, “An Improved Deep Network-Based
Scene Classification Method for Self-Driving Cars,” IEEE Trans. Instrum. Meas., vol.
71, pp. 1–14, 2022, doi: 10.1109/TIM.2022.3146923.
[7] F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy, “End-to-end
Driving via Conditional Imitation Learning.” arXiv, Mar. 02, 2018. doi:
10.48550/arXiv.1710.02410.
[8] K. Gauen et al., “Comparison of Visual Datasets for Machine Learning,” 2017 IEEE
Int. Conf. Inf. Reuse Integr. IRI, pp. 346–355, Aug. 2017, doi: 10.1109/IRI.2017.59.
37
[9] F. Torabi, G. Warnell, and P. Stone, “Behavioral Cloning from Observation,” pp.
4950–4957, 2018, Accessed: Feb. 21, 2024. [Online]. Available:
https://www.ijcai.org/proceedings/2018/687
[10]
M. Bojarski et al., “End to End Learning for Self-Driving Cars.” arXiv, Apr. 25,
2016. doi: 10.48550/arXiv.1604.07316.
[11]
H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end Learning of Driving Models
from Large-scale Video Datasets.” arXiv, Jul. 23, 2017. doi:
10.48550/arXiv.1612.01079.
[12]
A. Kuefler, J. Morton, T. Wheeler, and M. Kochenderfer, “Imitating Driver
Behavior with Generative Adversarial Networks.” arXiv, Jan. 23, 2017. doi:
10.48550/arXiv.1701.06699.
[13]
S. Hecker, D. Dai, and L. Van Gool, “End-to-End Learning of Driving Models
with Surround-View Cameras and Route Planners.” arXiv, Aug. 06, 2018. doi:
10.48550/arXiv.1803.10158.
[14]
M. Bansal, A. Krizhevsky, and A. Ogale, “ChauffeurNet: Learning to Drive by
Imitating the Best and Synthesizing the Worst.” arXiv, Dec. 07, 2018. doi:
10.48550/arXiv.1812.03079.
[15]
N. M. M. Shafiullah, Z. J. Cui, A. Altanzaya, and L. Pinto, “Behavior
Transformers: Cloning k modes with one stone.” arXiv, Oct. 11, 2022. doi:
10.48550/arXiv.2206.11251.
[16]
A. Faisal, M. Kamruzzaman, T. Yigitcanlar, and G. Currie, “Understanding
autonomous vehicles: A systematic literature review on capability, impact, planning
and policy,” J. Transp. Land Use, vol. 12, no. 1, pp. 45–72, 2019, Accessed: Feb. 21,
2024. [Online]. Available: https://www.jstor.org/stable/26911258
38
[17]
T. Glasmachers, “Limits of End-to-End Learning,” in Proceedings of the Ninth
Asian Conference on Machine Learning, PMLR, Nov. 2017, pp. 17–32. Accessed:
Feb. 21, 2024. [Online]. Available:
https://proceedings.mlr.press/v77/glasmachers17a.html
[18]
Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A Survey of Convolutional Neural
Networks: Analysis, Applications, and Prospects,” IEEE Trans. Neural Netw. Learn.
Syst., vol. 33, no. 12, pp. 6999–7019, Dec. 2022, doi: 10.1109/TNNLS.2021.3084827.
[19]
Y. Xiao, F. Codevilla, A. Gurram, O. Urfalioglu, and A. M. López, “Multimodal
End-to-End Autonomous Driving,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 1,
pp. 537–547, Jan. 2022, doi: 10.1109/TITS.2020.3013234.
[20]
H. Li, J. Li, X. Guan, B. Liang, Y. Lai, and X. Luo, “Research on Overfitting of
Deep Learning,” in 2019 15th International Conference on Computational
Intelligence and Security (CIS), Dec. 2019, pp. 78–81. doi: 10.1109/CIS.2019.00025.
[21]
S. Shah, D. Dey, C. Lovett, and A. Kapoor, “AirSim: High-Fidelity Visual and
Physical Simulation for Autonomous Vehicles,” in Field and Service Robotics,
Springer, Cham, 2018, pp. 621–635. doi: 10.1007/978-3-319-67361-5_40.
[22]
J. Cui, H. Qiu, D. Chen, P. Stone, and Y. Zhu, “COOPERNAUT: End-to-End
Driving with Cooperative Perception for Networked Vehicles,” in 2022 IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, pp.
17231–17241. doi: 10.1109/CVPR52688.2022.01674.
[23]
Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “End-to-End Urban Driving
by Imitating a Reinforcement Learning Coach.” arXiv, Oct. 04, 2021. doi:
10.48550/arXiv.2108.08265.
[24]
J. Hawke et al., “Urban Driving with Conditional Imitation Learning.” arXiv, Dec.
05, 2019. doi: 10.48550/arXiv.1912.00177.
39
[25]
S. Narayan and G. Tagliarini, “An analysis of underfitting in MLP networks,” in
Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.,
Jul. 2005, pp. 984–988 vol. 2. doi: 10.1109/IJCNN.2005.1555986.
[26]
R. Chekroun, M. Toromanoff, S. Hornauer, and F. Moutarde, “GRI: General
Reinforced Imitation and its Application to Vision-Based Autonomous Driving.”
arXiv, May 17, 2022. doi: 10.48550/arXiv.2111.08575.
[27]
J. Kolluri, V. K. Kotte, M. S. B. Phridviraj, and S. Razia, “Reducing Overfitting
Problem in Machine Learning Using Novel L1/4 Regularization Method,” in 2020 4th
International Conference on Trends in Electronics and Informatics (ICOEI)(48184),
Jun. 2020, pp. 934–938. doi: 10.1109/ICOEI48184.2020.9142992.
[28]
A. Mikołajczyk and M. Grochowski, “Data augmentation for improving deep
learning in image classification problem,” in 2018 International Interdisciplinary
PhD Workshop (IIPhDW), May 2018, pp. 117–122. doi:
10.1109/IIPHDW.2018.8388338.
40
Download