A report on autonomous vehicle

Department of Electrical And Electronic Engineering Khulna University of Engineering & Technology Khulna – 9203, Bangladesh An improved deep learning based approach of steering angle prediction for autonomous vehicle Supervised by: Submitted by: Dr. Md. Salah Uddin Yusuf Md. Shakib Hasan Rudro Professor, Roll: 1803085 Department of EEE, Department of EEE, Khulna University of Engineering & Technology Khulna University of Engineering & Technology DECLARATION This is to certify that the thesis work, An improved deep learning based approach of steering angle prediction for autonomous vehicle by Md. Shakib Hasan Rudro was performed under the supervision of Prof. Dr. Md. Salah Uddin Yusuf in the Department of Electrical Electronic Engineering, Khulna University of Engineering & Technology, Khulna, Bangladesh. The above thesis work has not been submitted anywhere for any degree. The above claims are accurate. This study was submitted as an undergraduate thesis. Signature of Supervisor Signature of Student ACKNOWLEDGMENT First of all, I express my sincere gratitude to the Almighty for His blessings, which have facilitated the successful completion of this significant academic endeavor. I extend my deepest appreciation to my honorable supervisor, Prof. Dr. Md. Salah Uddin Yusuf, Head of the Department of Electrical and Electronic Engineering, Khulna University of Engineering & Technology, for his invaluable guidance and unwavering support throughout the course of this thesis. His insightful advice, encouragement, and constant supervision have been instrumental in shaping the direction of this research. I am also grateful to all the honorable faculty members of the Department of Electrical and Electronic Engineering for their cooperation, encouragement, and scholarly insights, which have enriched this work and contributed to its academic rigor. Furthermore, I would like to extend my heartfelt thanks to my family and friends for their unwavering support, encouragement, and understanding during this academic journey. February 2024 Author Md. Shakib Hasan Rudro i ABSTRACT This research presents an improved deep learning approach for steering angle prediction in autonomous vehicles. The study focuses on enhancing the accuracy and robustness of steering angle prediction models, which are crucial for safe and reliable autonomous driving systems. Leveraging advanced deep learning techniques, including convolutional neural networks (CNNs). the proposed approach integrates image data from onboard cameras with vehicle sensor data to predict steering angles in real-time. The research explores novel architectures, data pre-processing methods, and training strategies to optimize model performance and generalization capabilities. Experimental evaluations conducted on realworld driving datasets demonstrate the effectiveness and efficiency of the proposed approach compared to existing methods. The findings of this study contribute to the advancement of autonomous vehicle technology and have significant implications for enhancing road safety and transportation efficiency. Keywords: Autonomous vehicle, End to end learning, CNN, Semantic Segmentation ii Table of Contents CHAPTER I: INTRODUCTION 1 1.1. Introduction 1 1.2. Problem Description 2 1.3. Motivation 3 1.4. Objectives 5 1.5. Application 5 CHAPTER II: LITERATURE REVIEW 6 2.1. Literature Review 6 CHAPTER III: TERMINOLOGY 8 3.1. Autonomous vehicle definition 8 3.2. End to end learning 8 3.3. CNN 9 3.4. Segmentation 10 3.4.1. FCN-8 12 3.4.2. UNET 13 3.5. Simulator 14 3.5.1. AirSim 15 3.5.1.1. AirSim Block Diagram 16 3.5.1.2. Why airsim 16 3.5.1.3. Graphical User Interface 17 iii CHAPTER IV: METHODOLOGY 18 4.1. Block Diagram 18 4.2. Methodology 18 4.2.1. Data acquisition 18 4.2.2. Data pre-processing 18 4.2.2.1. Augmentation 19 4.2.3. Segmentation 21 4.2.4. Model Architectures 22 4.2.4.1. Nvidia pilotnet 22 4.2.4.2. Model A 23 4.2.4.3. Model B 24 4.2.5. Training and testing 25 4.2.6. Loss function 25 4.2.7. Evaluation 25 4.3. Challenges 26 CHAPTER V: RESULTS AND DISCUSSION 5.1. Result and discussion 28 28 5.1.1. The Actual Vs predicted steering angle 28 5.1.2. Time comparison 30 5.1.3. Autonomy Comparison 31 5.1.4. FPS vs Maximum Speed 33 5.1.5. Number of data vs Autonomy 34 iv CHAPTER VI: CONCLUSION 35 6.1. Conclusion 35 6.2. Future work 36 References 37 v Table of Figures Figure 1: An autonnomous vehicle with different sensors and camera 8 Figure 2: Traditional vs End to end approach 9 Figure 3: Basic CNN architecture 10 Figure 4: Road Segmentation 11 Figure 5: FCN8 architecture 12 Figure 6: UNET architecture 13 Figure 7: Preview of the airsim simulator 15 Figure 8: Airsim block diagram 16 Figure 9: Graphical user interface 17 Figure 10: Block diagram of the experiment 18 Figure 11: Input image after crop and resize 19 Figure 12: Random brightness 19 Figure 13: Random shifft 20 Figure 14: Random shadow 20 Figure 15: Random flip 21 Figure 16: FCN8 architecture 21 Figure 17: Nvidia pilotnet architecture 22 Figure 18: Proposed model A 23 Figure 19: Proposed model B 24 vi Figure 20: Challenges of autonomous vehicle 27 Figure 21: Steering angle (actual vs prediction) for model A 28 Figure 22: Steering angle (actual vs prediction) for model B 29 Figure 23: FPS comparison of the baseline model and model A and B 30 Figure 24: Autonomy comparison between various model for (1.5) hours time span for road with obstacle 32 Figure 25: Fps vs maximum speed 33 vii Index of Tables Table 1: Hardware specification 30 viii (1) CHAPTER I: INTRODUCTION 1.1. Introduction The emergence of Deep Convolutional Neural Networks (CNNs) has ushered in a transformative era in the field of autonomous driving. With the fusion of cutting-edge computer vision, sensor technology, and artificial intelligence, deep CNN-based selfdriving cars have made remarkable strides in recent years. [1] These vehicles represent the pinnacle of autonomous mobility, offering the promise of safer, more efficient, and sustainable transportation solutions. The core premise of deep CNN-based self-driving car research is to replicate and enhance the human ability to perceive and navigate the complex dynamics of the road environment. [2] These vehicles leverage deep neural networks to interpret vast streams of sensor data, including images, LiDAR, radar, and GPS information, in real-time. By mimicking human visual perception and cognitive decision-making processes, these intelligent machines can detect obstacles, interpret road signs, and make split-second decisions to ensure the safety of passengers, pedestrians, and other road users. An autonomous vehicle is the type of vehicles that has the ability to operate itself as well as perform required functions without the necessity of any kind of human intervention. It also capable of sensing its surroundings and take decision from it. Autonomous vehicles has the potential to transform the way we travel and commute. A successful development of an autonomous vehicle requires precise testing and evaluation. Simulation is a critical tool for testing the behavior of autonomous vehicles in various traffic scenarios. The use of neural networks has shown significant promise in accurately modeling traffic dynamics. [3] 1 While the concept of self-driving cars holds great promise, it is not without its challenges. Driving is an inherently complex task, and the intricacies of road regulations and unpredictable scenarios make navigating busy roads a daunting challenge, even for human drivers. [4] Therefore, achieving fully autonomous vehicles cannot rely solely on a single deep learning model. Instead, it necessitates the implementation of a sophisticated combination of separately trained neural network models. In this research landscape, I delve into the intricacies of self-driving car systems and explore the challenges and opportunities presented by these vehicles, ranging from robust perception and sensor fusion to decision-making algorithms and regulatory considerations. My objective is to establish a comprehensive pipeline incorporating carefully selected deep learning techniques that collectively enhance safety, efficiency, and accessibility in transportation. This vision aims to create a future where autonomous vehicles seamlessly coexist with traditional human-driven cars. [5] This report is to investigate the use of neural networks and to simulate the behavior of autonomous vehicles. The primary objective is the development and implementation of a neural network-based model. It will be used to test the behavior of an autonomous vehicle in complex traffic scenarios. I will investigate the performance of the neural network model in various scenarios. The results of this study will provide valuable insights into the effectiveness of using neural networks for autonomous vehicles. The proposed study has the potential to contribute to the development of safe and efficient autonomous vehicle technologies— which can ultimately benefit society by reducing traffic accidents, improving transportation efficiency reducing carbon emissions. [6] 1.2. Problem Description Conventional autonomous driving systems adopt a modular deployment strategy, wherein each functionality, such as perception, prediction, and planning, is individually developed and integrated into the onboard vehicle. The planning or control module, responsible for generating steering and acceleration outputs, plays a crucial role in determining the driving experience. The most common approach for planning in modular pipelines involves using sophisticated rule-based designs, which are often ineffective in addressing 2 the vast number of situations that occur while driving. Therefore, there is a growing trend to leverage large-scale data and to use learning-based planning as a viable alternative. We define end-to-end autonomous driving systems as fully differentiable programs that take raw camera data as input and produce control actions as output. 1.3. Motivation In the classical pipeline, each model serves a standalone component and corresponds to a specific task (e.g., lane detection). Such a design is beneficial in terms of interpretability, verifiability, and ease of debugging. [7] However, since the optimization goal across modules is different, with detection in perception pursuing mean average precision (mAP) while planning aiming for driving safety and comfort, the entire system may not be aligned with a unified target. Errors from each module, as the sequential procedure proceeds, could be compounded and result in an information loss for the driving system. Moreover, the multi-task, multi-model deployment may increase the computational burden and potentially lead to sub-optimal use of computation. In contrast to its classical counterpart, an end-to-end autonomous system offers several advantages. [8] (a) The most apparent merit is its simplicity in combining perception, prediction, and planning into a single model that can be jointly trained. (b) The whole system, including its inter- mediate representations, is optimized towards the ultimate task. (c) Shared backbones increase computational efficiency. (d) Data-driven optimization has the potential to offer emergent abilities that improve the system by simply scaling training resources. The development of self-driving cars represents one of the most transformative and promising advancements in the field of transportation. Autonomous vehicles have the potential to revolutionize our daily lives, making transportation safer, more efficient, and environmentally friendly. [4] At the heart of this groundbreaking technology lies computer vision, a field that has seen unprecedented growth and innovation in recent 3 years. This paper is motivated by the profound impact that computer vision techniques are having on the realization of autonomous driving and the myriad challenges and opportunities they present. 1. Safety and Efficiency: Human error is a leading cause of traffic accidents worldwide. Self-driving cars, empowered by computer vision, offer the promise of significantly reducing accidents by providing vehicles with the ability to perceive their surroundings, make decisions, and execute maneuvers with precision. By eliminating the risk associated with distracted or impaired driving, self-driving cars have the potential to save countless lives and reduce the economic toll of accidents. 2. Accessibility and Mobility: Autonomous vehicles have the potential to revolutionize transportation for individuals with disabilities and the elderly. 3. Environmental Impact: Autonomous driving can contribute to a more sustainable future by optimizing traffic flow, reducing congestion, and minimizing fuel consumption. 4. Technological Advancements: The rapid evolution of computer vision techniques, driven by deep learning and artificial intelligence, has opened new possibilities for autonomous vehicles. My goal is to explore the latest developments in computer vision and their application to self-driving cars, shedding light on cutting-edge research and technological innovations. 5. Challenges and Ethical Considerations: While the potential benefits of selfdriving cars are vast, there are numerous challenges to overcome, including ethical dilemmas, regulatory frameworks, and cybersecurity concerns. 4 1.4. Objectives The objectives of the simulation of autonomous vehicle by using neural network are: 1. To develop an optimized deep CNN based model to predict steering angle for self driving car. 2. To evaluate the models and compare them with the Nvidia pilotnet model. 3. To visualize the performance of the models using airsim simulator. 1.5. Application Application of autonomous vehicle are the following: 1. Enhanced public transportation systems. 2. Improved ride-sharing services. 3. Increased accessibility for all. 4. Efficient emergency response. 5. Streamlined last-mile delivery. 6. Automation in agriculture. 7. Military reconnaissance and logistics. 8. Impact on urban planning and infrastructure. 5 (2) CHAPTER II: LITERATURE REVIEW 2.1. Literature Review Torabi et al. [9] introduced the term behavior cloning. The process of reconstructing the human sub cognitive skill through the computer program is referred to as behavioral cloning. Here, actions of human performing the skill are recorded along with the situation that gave rise to the action. Human skills such as driving can be reconstructed through recorded actions which are maintained in a structured way by using learning algorithms in terms of various manifestation traces to reproduce the skilled behavior. Bojarski et al. [10] started their research work at NVIDIA on self-driving car inspired by the ALVINN and DARPA projects. The motivation for their work was to create an end-toend model which enables steering of the car without manual intervention [1] while recording humans driving behavior along with the steering angle at every second. Based on the NVIDIA proposed architecture pilotnet (as shown in Fig. 2), Viswanath et al. from Texas Instruments released JacintoNet i.e an end-to-end neural network for embedded system vehicles such as tiny humanoid robots. Xu et al. [11] trained a neural network for predicting discrete or continuous actions also based on camera inputs. Codevilla et al. also trained a network using camera inputs and conditioned on high-level commands to output steering and acceleration. This is the first model that doesn’t just follow lane but also incorporates high level commands. They also evaluated their approach in realistic simulations of urban driving and on a 1/5 scale robotic truck. Kuefler et al. [12] use Generative Adversarial Imitation Learning (GAIL) with simple affordance- style features as inputs to overcome cascading errors typically present in behavior cloned policies so that they are more robust to perturbations. Hecker et al. [13] used 360-degree camera inputs instead of a single front facing camera and desired route planner to predict steering and speed. 6 M¨uller et al. [5] train a system in simulation using CARLA by training a driving policy from a scene segmentation network to output high-level control, thereby enabling transfer learning to the real world using a different segmentation network trained on real data. Bansal et al. [14] in a research at Waymo (Google) presents another model called ChauffeurNet, which then outputs a driving trajectory that is consumed by a controller which translates it to steering and acceleration. Saifullah et al. [15] used a transformer-based approach for behavior cloning. We’ve already seen the effectiveness of transformer model in natural language processing engines such as ChatGPT. The authors proposed a model called BeT (Behavior Transformer).they claim that this model harnessing the power of transformer can handle complex tasks in a humanly manner. 7 (3) CHAPTER III: TERMINOLOGY 3.1. Autonomous vehicle definition Autonomous vehicles, also known as self-driving cars or driverless cars, are vehicles equipped with advanced technologies that enable them to navigate and operate without human intervention. These vehicles utilize a combination of sensors, cameras, radar, lidar, GPS, and sophisticated algorithms to perceive their surroundings, interpret sensory inputs, and make decisions to navigate safely to their destination. [16] Figure 1: An autonomous vehicle with different sensors and camera Autonomous vehicles have the potential to revolutionize transportation by offering numerous benefits, including increased road safety, reduced traffic congestion, improved energy efficiency, and enhanced mobility for individuals with disabilities or limited access to transportation. They hold promise for transforming various industries, including transportation, logistics, and urban planning. 3.2. End to end learning In end-to-end learning, the neural network is trained using a large dataset of input-output pairs, where the inputs are raw sensor data collected from the vehicle's environment, and the outputs are corresponding control commands executed by the vehicle's actuators. The network learns to extract relevant features from the raw input data and generate 8 appropriate control commands without explicit feature engineering or manual rule-based programming. [17] Figure 2: Traditional vs End to end approach End-to-end learning offers several potential advantages for autonomous vehicles, including simplicity, scalability, and the ability to adapt to diverse driving conditions. By directly learning driving policies from data, end-to-end approaches have the potential to capture complex driving behaviors and adapt to novel situations that may not have been explicitly programmed. 3.3. CNN Convolutional Neural Networks (CNNs) are a class of deep neural networks that have revolutionized the field of computer vision. Inspired by the organization of the visual cortex in animals, CNNs are particularly well-suited for tasks such as image recognition, object detection, and image classification. CNNs consist of multiple layers, including convolutional layers, pooling layers, and fully connected layers. The convolutional layers apply a set of learnable filters to the input image, detecting features such as edges, textures, and shapes. Pooling layers reduce the spatial dimensions of the feature maps, while preserving important information. Fully connected layers combine the features learned by the previous layers to make predictions about the input data. [18] 9 One of the key advantages of CNNs is their ability to automatically learn hierarchical representations of features directly from raw input data. This hierarchical feature learning enables CNNs to achieve superior performance on a wide range of visual recognition tasks compared to traditional machine learning algorithms. CNNs have been widely adopted in various applications, including image classification, object detection, facial recognition, medical image analysis, and autonomous vehicles. Their success can be attributed to their ability to automatically learn relevant features from large amounts of data, their scalability to handle complex and high-dimensional inputs, and their ability to generalize well to new, unseen data. Figure 3: Basic CNN architecture 3.4. Segmentation In the context of autonomous vehicles, image segmentation plays a crucial role in understanding the vehicle's surroundings and making informed decisions for navigation and control. [19] Here's how segmentation is utilized: 10 Figure 4: Road Segmentation 1. Road and Lane Segmentation: Segmenting the road and lane markings from the surrounding environment is essential for autonomous vehicles to navigate safely within lanes. This allows the vehicle to stay centered in its lane and make appropriate adjustments for turns, lane changes, and other maneuvers. 2. Obstacle Detection and Segmentation: Segmentation techniques are used to identify and delineate obstacles such as vehicles, pedestrians, cyclists, and other objects in the vehicle's path. By segmenting obstacles from the background, autonomous vehicles can assess potential collision risks and plan avoidance strategies accordingly. 3. Semantic Understanding of the Environment: Semantic segmentation provides a detailed understanding of the scene by segmenting different elements such as roads, sidewalks, buildings, vegetation, and other infrastructure. This semantic understanding enhances the vehicle's situational awareness and enables more informed decision-making. 4. Localization and Mapping: Segmentation aids in generating high-definition maps of the environment by accurately delineating various features and 11 landmarks. These maps, often referred to as semantic maps, facilitate precise localization and navigation of autonomous vehicles by providing detailed information about the surroundings. Overall, image segmentation is a critical component of perception systems in autonomous vehicles, enabling them to interpret and understand the visual information from onboard sensors and navigate safely and effectively in complex real-world environments. 3.4.1. FCN-8 FCN-8, introduced by Long et al., is renowned for its capability to perform end-to-end pixel-wise classification. It replaces the fully connected layers of traditional CNNs with convolutional layers, enabling it to accept inputs of arbitrary sizes and produce dense predictions. FCN-8 utilizes skip connections from earlier layers to preserve spatial information, which helps in generating detailed segmentation maps. Additionally, it incorporates transposed convolutions to upsample feature maps and restore the resolution of the output. While FCN-8 is efficient and flexible, it may struggle with capturing fine details due to downsampling operations. Figure 5: FCN8 architecture 12 3.4.2. UNET On the other hand, U-Net, proposed by Ronneberger et al., is characterized by its symmetric encoder-decoder architecture with skip connections. The contracting path of U-Net resembles a typical CNN encoder, gradually reducing spatial dimensions and extracting high-level features. However, unlike FCN-8, U-Net's expansive path employs transposed convolutions for upsampling while also integrating skip connections from the contracting path. These skip connections facilitate the fusion of low-level and high-level features, aiding in precise localization and segmentation of objects. U-Net's architecture is particularly advantageous for medical image segmentation tasks and scenarios where detailed localization is crucial. Despite its effectiveness, U-Net may be computationally expensive due to the large number of parameters, especially in deeper architectures. Figure 6: UNET architecture 13 3.5. Simulator Simulators play a crucial role in various fields, including autonomous vehicles, aerospace, robotics, and healthcare, for several reasons: 1. Cost-Effective Testing: Simulators provide a cost-effective alternative to realworld testing. They allow researchers and developers to conduct extensive testing and validation of systems and algorithms without the need for expensive physical prototypes or equipment. This significantly reduces development costs and risks associated with real-world testing. 2. Safety: Simulators offer a safe and controlled environment for testing complex systems and algorithms, especially those designed for critical applications such as autonomous vehicles and medical devices. Simulated environments enable researchers to identify and address potential safety issues without putting human lives or valuable resources at risk. 3. Re-producibility: Simulators enable researchers to reproduce and control specific scenarios and conditions with precision. This reproducibility ensures consistency in testing procedures and results, making it easier to evaluate and compare different approaches, algorithms, and systems. 4. Scalability: Simulators allow researchers to scale experiments and simulations to a larger scope and complexity than would be feasible in the real world. This scalability enables the evaluation of systems and algorithms under a wide range of scenarios, including rare or extreme conditions that may be difficult to encounter in reality. 5. Accessibility: Simulators make experimental setups and testing environments accessible to a broader audience of researchers, developers, and enthusiasts. They democratize access to advanced technologies and enable collaboration and knowledge sharing across disciplines and geographic locations. [20] 14 In summary, simulators are necessary tools for research, development, and testing in various fields due to their cost-effectiveness, safety, reproducibility, scalability, iterative development capabilities, and accessibility. They enable researchers and developers to accelerate innovation, mitigate risks, and advance the state-of-the-art in their respective domains. 3.5.1. AirSim Figure 7: Preview of the airsim simulator Microsoft AirSim is an open-source, cross-platform simulator for autonomous vehicles (AVs), robotics research, and artificial intelligence (AI) development. Developed by Microsoft Research, AirSim provides a realistic simulation environment for training and testing autonomous systems in various scenarios without the need for physical prototypes or real-world testing. 15 3.5.1.1. AirSim Block Diagram Image Interpreter AirSim Visual Output Steering command Figure 8: Airsim block diagram The communication between the interpreter and airsim happens via TCP port. Airsim sends a stream of image of desired dimension to the interpreter and the interpreter provides steering commands after processing the images and sends them back to the simulator. The simulator then shows a video of the performance of the vehicle based on the provided command. Autonomy is computed by observing the output of Airsim. 3.5.1.2. Why airsim Key features of Microsoft AirSim include: 1. High-Fidelity Simulation: AirSim offers a high-fidelity physics engine and realistic graphics, enabling researchers and developers to create immersive and realistic simulation environments for AVs and robotics applications. 2. Support for Multiple Platforms: AirSim is designed to be compatible with multiple platforms, including Windows, Linux, and macOS, making it accessible to a wide range of developers and researchers. 3. Extensibility and Customization: AirSim provides a modular architecture that allows users to extend and customize the simulator to suit their specific research needs. Users can integrate custom vehicle models, sensors, environments, and algorithms into AirSim for experimentation and testing. 4. Built-in Sensor Simulation: AirSim supports various sensors commonly used in AVs and robotics, including cameras, lidar, radar, and GPS. These sensors can generate realistic data streams that mimic real-world sensor outputs, enabling researchers to develop and validate perception and control algorithms in a simulated environment. 16 5. Integration with AI Frameworks: AirSim seamlessly integrates with popular AI frameworks such as TensorFlow and PyTorch, allowing researchers to leverage state-of-the-art machine learning algorithms for perception, planning, and control tasks within the simulator. Overall, Microsoft AirSim provides a powerful and flexible platform for researchers and developers to accelerate innovation in autonomous systems by enabling rapid prototyping, testing, and validation in a simulated environment. Its realistic simulation capabilities and extensibility make it an invaluable tool for advancing the field of AVs and robotics. [21] 3.5.1.3. Graphical User Interface Figure 9: Graphical user interface A graphical user interface was designed to visualize the steering angle and also the speed curve of the vehicle. It also shows the FPS and numerical value of the steering angle. This can be operated in both manual and autonomous mode. In the manual mode, it works as a data collector. Using this Graphical User Interface(GUI) autonomy and the maximum speed for a certain FPS can be calculated. 17 (4) CHAPTER IV: METHODOLOGY 4.1. Block Diagram Figure 10: Block diagram of the experiment 4.2. Methodology 4.2.1. Data acquisition Utilizing the AirSim simulator, I conducted data collection procedures to assemble comprehensive datasets for training and validating autonomous vehicle algorithms. This involved capturing images from multiple viewpoints, including the center, left, and right perspectives, along with recording the corresponding steering angles. By incorporating data from various viewpoints, this approach aims to enhance the robustness and generalization capabilities of the autonomous vehicle system, thereby facilitating more accurate and reliable navigation in diverse real-world scenarios. 4.2.2. Data pre-processing In the data pre-processing phase, meticulous attention was given to maintaining uniformity across all collected images. Each image underwent cropping and resizing to 18 Figure 11: Input image after crop and resize standard dimensions of 66×200 pixels, ensuring consistency in the dataset. Subsequently, three types of augmentation techniques were applied: random brightness adjustments, random shifts, and random flips. These augmentation strategies were implemented to enhance the robustness and variability of the dataset, thereby facilitating more effective training of the autonomous vehicle model. [23] Concurrently, the corresponding steering angles for each augmented image were meticulously logged in a CSV file, ensuring precise alignment between the image data and steering control inputs. This systematic pre-processing methodology ensured the preparation of a high-quality dataset conducive to the training process. 4.2.2.1. Augmentation Random Brightness: Figure 12: Random brightness 19 Random shift: Figure 13: Random shifft Random shadow: Figure 14: Random shadow 20 Random flip: Figure 15: Random flip 4.2.3. Segmentation Semantic segmentation, achieved through techniques like Fully Convolutional Networks (FCNs), partitions an image into meaningful regions, enhancing scene understanding and object localization which improved perception capabilities, robustness to environmental variations, and support for end-to-end learning. FCN8, chosen for its speed, enables realtime processing which is crucial for autonomous driving. Figure 16: FCN8 architecture While UNet offers superior performance, FCN8's efficiency ensures the model operates swiftly, crucial for timely decision-making in dynamic environments. Thus, FCN8 balances performance and speed, ideal for real-time applications like autonomous driving. These advancements contribute to the development of more effective and reliable 21 autonomous driving systems capable of navigating safely in complex real-world environments. [24] 4.2.4. Model Architectures 4.2.4.1. Nvidia pilotnet Figure 17: Nvidia pilotnet architecture 22 4.2.4.2. Model A Figure 18: Proposed model A 23 4.2.4.3. Model B Figure 19: Proposed model B 24 4.2.5. Training and testing The training process involves preparing and augmenting datasets, training the model using optimization algorithms, and fine-tuning hyper-parameters for optimal performance. Validation assesses the model's performance on a separate dataset, while testing evaluates its generalization to real-world scenarios. Finally, deployment in the target environment requires continuous monitoring and updating to maintain effectiveness. 4.2.6. Loss function Mean Square Error (MSE) is a common loss function used in regression tasks, including machine learning and neural network training. It measures the average squared difference between the predicted values and the actual values in a dataset. [12] In the context of training models for tasks such as regression, MSE calculates the average squared difference between the predicted output and the ground truth labels. It penalizes larger errors more heavily than smaller ones, making it suitable for tasks where precise prediction accuracy is important. Mathematically, MSE is calculated by taking the average of the squared differences between predicted and actual values: n MSE= 1 ∑ ( y − y ' )2 n i=1 i i where n is the number of samples, yi represents the actual value, and y’i represents the predicted value. Overall, MSE is a useful loss function for training regression models, providing a quantitative measure of the model's performance by quantifying the average squared difference between predicted and actual values. 4.2.7. Evaluation The percentage of the time the network could drive the car without human intervention is defined as autonomy. The metric is determined by counting simulated human interventions. These interventions occur when the simulated vehicle departs from the 25 center line by more than one meter. We assume that in real life an actual intervention would require a total of six seconds: this is the time required for a human to retake control of the vehicle, re-center it, and then restart the self-steering mode. We calculate the percentage autonomy by counting the number of interventions, multiplying by 6 seconds, dividing by the elapsed time of the simulated test, and then subtracting the result from 1. [10] autonomy=(1− (number of interventions)×6 seconds )×100 elapsed time [ seconds] 4.3. Challenges Driving itself is a complex task that takes years to master even for human. So there are many challenges to overcome in autonomous vehicle implementation. One such challenge that stands out is the stability issue. If the vehicle somehow drifts from its expected trajectory, than it gets slightly unfamiliar images as input. Which leads to an error in the predicted steering angle, which in turn pushes the vehicle toward more unfamiliar region and the cycle repeats. As a result the error builds up and the vehicle shows unstable behavior. A visible shaking can be noticed in such case. In this research, I propose two possible solutions to the problem. 26 Figure 20: Challenges of autonomous vehicle 1. While training, there was 3 cameras to cover a greater region of the scenario and adjust them properly along with their corresponding steering angle with respect to the center image. This step ensures more coverage of the sorrounding. 2. Taking the moving average of the previous few frames with the current frame. This step minimizes the shaking of the vehicle. 27 (5) CHAPTER V: RESULTS AND DISCUSSION 5.1. Result and discussion The results of the steering angle prediction model demonstrate its effectiveness in accurately estimating steering angles based on input images captured from the center, left, and right viewpoints. Through careful training and validation, the model showcases a strong ability to infer steering commands, crucial for guiding autonomous vehicles along desired trajectories. 5.1.1. The Actual Vs predicted steering angle Figure 21: Steering angle (actual vs prediction) for model A The graph depicting the comparison between actual and predicted steering angles by Model A reveals a remarkable resemblance between the two. Notably, the model's predictions closely mirror the actual steering angles, capturing abrupt changes in direction almost instantaneously. This close alignment between predicted and actual values underscores the model's efficacy in accurately estimating steering commands, a critical aspect for ensuring precise control of autonomous vehicles. The rapid response of the model to changes in direction further highlights its capability to adapt to dynamic driving 28 scenarios in real-time. Overall, the graph illustrates the impressive performance of Model A in steering angle prediction, affirming its potential for enhancing the autonomy and safety of vehicle navigation systems. Figure 22: Steering angle (actual vs prediction) for model B The graphical representation of steering angle comparisons between actual and predicted values for Model B showcases a significant resemblance between the two datasets. Notably, Model B's predictions closely mirror the actual steering angles, capturing subtle and abrupt changes in direction with impressive accuracy. This close correspondence indicates the model's proficiency in accurately estimating steering commands, essential for ensuring precise vehicle control in diverse driving conditions. [25] Moreover, the swift response of Model B to changes in direction underscores its adaptability and responsiveness, further enhancing its suitability for real-world applications. 29 5.1.2. Time comparison The graph depicting frames per second (FPS) for three different models—Nvidia, Model A, and Model B—holds significant importance in the context of autonomous vehicles. Real-time processing is crucial for autonomous vehicles to make rapid decisions and navigate safely in dynamic environments. FPS comparison 16 14 Frames/Sec 12 10 8 6 4 2 0 Nvidia pilotnet Model A Model B Models Figure 23: FPS comparison of the baseline model and model A and B While Nvidia exhibits the highest FPS among the models, both Model A and Model B demonstrate slightly lower FPS. However, these FPS values remain within a tolerable range, indicating that they are still capable of processing frames at a sufficiently high rate for real-time operation. The slight sacrifice in FPS for Model A and Model B is justified by their ability to provide greater autonomy and make accurate predictions, such as steering angle estimation or semantic segmentation. The trade-off between FPS and autonomy is therefore worth considering, as it ensures the models can effectively analyze and respond to the surrounding environment in real-time, ultimately enhancing the safety and efficiency of autonomous vehicle navigation systems. 30 It's important to note that FPS values are heavily dependent on the specifications of the device used for testing. The measurements presented in the graph were conducted under the context of specific device specifications, outlined in the provided table. Table 1: Hardware specification CPU GPU RAM Intel Core i5 11th Gen 11300H Nvidia RTX 3050 16 GB It's worth emphasizing that FPS performance can vary significantly based on factors such as CPU and GPU processing power, memory capacity, and optimization techniques employed. Therefore, upgrading the device specifications, such as utilizing a more powerful CPU or GPU, increasing memory capacity, or optimizing software configurations, can lead to significant improvements in FPS for all models. [26] By investing in hardware upgrades or utilizing more advanced computing resources, such as high-performance GPUs or dedicated processing units, the FPS of all models can be substantially enhanced. This underscores the importance of considering device specifications and resource availability when evaluating FPS performance and optimizing the performance of autonomous vehicle systems. 5.1.3. Autonomy Comparison In comparing the autonomy between Model A and Model B with the baseline Nvidia model, significant improvements are evident, despite utilizing the same amount of data. Both Model A and Model B demonstrate enhanced autonomy, showcasing advancements in their ability to make independent decisions and navigate complex environments. 31 Autonomy analysis 100 Autonomy(%) 95 90 With obstacle Without obstacle 85 80 75 Nvidia pilotnet Model A Model B Models Figure 24: Autonomy comparison between various model for (1.5) hours time span for road with obstacle Model A and Model B exhibit superior autonomy compared to the baseline Nvidia model due to several factors. Firstly, their improved accuracy in tasks such as steering angle prediction or semantic segmentation enables more precise and reliable decision-making, leading to smoother and safer navigation. This increased accuracy is attributed to the utilization of advanced neural network architectures and optimized training methodologies. Furthermore, Model A and Model B showcase enhanced adaptability to diverse driving conditions and scenarios. Through robust training and validation processes, these models have learned to generalize effectively from the provided dataset, allowing them to respond appropriately to real-world challenges such as varying road conditions, unexpected obstacles, and dynamic traffic patterns. Moreover, the introduction of novel features or techniques in Model A and Model B may contribute to their improved autonomy. For instance, the incorporation of additional 32 sensor modalities, advanced fusion techniques, or sophisticated planning algorithms may further enhance their capabilities beyond those of the baseline Nvidia model. Overall, the comparison highlights the substantial advancements in autonomy achieved by Model A and Model B, underscoring the continuous evolution and refinement of autonomous vehicle technology. These improvements signify a promising trajectory towards the development of highly autonomous vehicles capable of navigating safely and efficiently in diverse real-world environments. 5.1.4. FPS vs Maximum Speed As we know there is a direct relationship between the number of frames that can be processed per second and the maximum speed at which the vehicle can operate. I have tested the maximum speed achieved in various FPS and plotted them. From the graph is is clear that there is a linear relationship between the maximum speed and FPS. FPS vs maximum speed 16 14 Speed(m/s) 12 10 8 6 4 2 0 10 12 14 FPS Figure 25: Fps vs maximum speed 33 16 5.1.5. Number of data vs Autonomy All of the models were trained on different amount of data. We can see in the following chart that the autonomy increased as the amount of data increased. But in the lower region proposed model A and B shows significantly greater autonomy than the Nvidia pilotnet model trained on the same amount of data. But as we increase the the size of the dataset, autonomy of all the models becomes close. This graph shows us that the proposed models perform better even if trained on low amount of data. Which is necessary because collecting data for autonomous vehicle is expensive, and risky. Specially for regions like Bangladesh collecting huge data of various scenario is a challenge. So the proposed models will perform better in such scenarios. Autonomy(%) Autonomy vs the ammount of data 100 90 80 70 60 50 40 30 20 10 0 Nvidia Model A Model B 9k 23k 50k No. of training data 34 92k (6) CHAPTER VI: CONCLUSION 6.1. Conclusion The thesis explores the development and enhancement of autonomous vehicle technology, focusing on key aspects such as perception, decision-making, and autonomy. [27] Through rigorous experimentation and analysis, several significant findings and advancements have been achieved, culminating in a comprehensive understanding of autonomous vehicle systems and their potential impact on transportation and society. The research begins with an investigation into perception algorithms, particularly semantic segmentation and steering angle prediction, crucial for understanding the vehicle's surroundings and making informed decisions. By employing advanced neural network architectures such as FCN8 and UNet, significant improvements in accuracy and reliability are achieved, leading to more precise perception capabilities. [28] Furthermore, the thesis delves into the decision-making process, exploring the intricacies of steering control and trajectory planning in autonomous vehicles. Through the development of sophisticated decision-making algorithms and optimization techniques, the models demonstrate enhanced autonomy and adaptability, capable of navigating complex environments with confidence and efficiency. Moreover, the research emphasizes the importance of real-time performance and computational efficiency in autonomous vehicle systems. [7] By evaluating the frames per second (FPS) of different models and considering device specifications, valuable insights are gained into the trade-offs between autonomy and computational resources, guiding future optimizations and advancements in system design. [13] Overall, the thesis contributes significant insights and advancements to the field of autonomous vehicle technology, paving the way for safer, more efficient, and more autonomous transportation systems. By leveraging cutting-edge technologies and methodologies, autonomous vehicles hold the potential to revolutionize mobility, reshape urban landscapes, and enhance the quality of life for people around the world. As we continue to push the boundaries of innovation and research in this field, the future of 35 autonomous vehicles remains bright, promising a world where transportation is safer, more sustainable, and more accessible than ever before. 6.2. Future work The proposed deep learning approach for steering angle prediction in autonomous vehicles presents a promising foundation for further exploration and development. Here are some potential avenues for future work: 1. Investigate the impact of deeper and more complex neural network architectures: Explore deeper convolutional neural networks (CNNs), recurrent neural networks (RNNs), or a combination of both (CNN-RNN) to capture complex temporal and spatial dependencies within the sensor data. 2. Incorporate additional sensor modalities: Integrate data from various sensors like LiDAR, radar, and GPS alongside camera images to provide a more comprehensive understanding of the environment. 3. Conduct extensive on-road testing: Evaluate the performance of the developed model in real-world driving scenarios under diverse weather conditions, traffic patterns, and road infrastructure variations. 4. Develop safety and reliability measures: Implement mechanisms for anomaly detection, fault tolerance, and explainability to ensure the safe and reliable operation of the autonomous vehicle in real-world scenarios. 5. Address ethical concerns surrounding autonomous vehicles: Explore the ethical implications of self-driving cars in terms of decision-making in critical situations, liability in case of accidents, and potential biases in the training data. By addressing these future research directions, this can contribute to advancing the stateof-the-art in autonomous vehicle technology, ultimately paving the way towards safer, more efficient, and more widely adopted autonomous transportation systems. 36 (7) References [1] S. OwaisAli Chishti, S. Riaz, M. BilalZaib, and M. Nauman, “Self-Driving Cars Using CNN and Q-Learning,” in 2018 IEEE 21st International Multi-Topic Conference (INMIC), Nov. 2018, pp. 1–7. doi: 10.1109/INMIC.2018.8595684. [2] J. del Egio, L. M. Bergasa, E. Romera, C. Gómez Huélamo, J. Araluce, and R. Barea, “Self-driving a Car in Simulation Through a CNN,” in Advances in Physical Agents, Springer, Cham, 2019, pp. 31–43. doi: 10.1007/978-3-319-99885-5_3. [3] A. Bhalla, M. S. Nikhila, and P. Singh, “Simulation of Self-driving Car using Deep Learning,” in 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), Dec. 2020, pp. 519–525. doi: 10.1109/ICISS49785.2020.9315968. [4] J. Y. C. Chen and J. E. Thropp, “Review of Low Frame Rate Effects on Human Performance,” IEEE Trans. Syst. Man Cybern. - Part Syst. Hum., vol. 37, no. 6, pp. 1063–1076, Nov. 2007, doi: 10.1109/TSMCA.2007.904779. [5] M. Müller, A. Dosovitskiy, B. Ghanem, and V. Koltun, “Driving Policy Transfer via Modularity and Abstraction.” arXiv, Dec. 13, 2018. doi: 10.48550/arXiv.1804.09364. [6] J. Ni, K. Shen, Y. Chen, W. Cao, and S. X. Yang, “An Improved Deep Network-Based Scene Classification Method for Self-Driving Cars,” IEEE Trans. Instrum. Meas., vol. 71, pp. 1–14, 2022, doi: 10.1109/TIM.2022.3146923. [7] F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy, “End-to-end Driving via Conditional Imitation Learning.” arXiv, Mar. 02, 2018. doi: 10.48550/arXiv.1710.02410. [8] K. Gauen et al., “Comparison of Visual Datasets for Machine Learning,” 2017 IEEE Int. Conf. Inf. Reuse Integr. IRI, pp. 346–355, Aug. 2017, doi: 10.1109/IRI.2017.59. 37 [9] F. Torabi, G. Warnell, and P. Stone, “Behavioral Cloning from Observation,” pp. 4950–4957, 2018, Accessed: Feb. 21, 2024. [Online]. Available: https://www.ijcai.org/proceedings/2018/687 [10] M. Bojarski et al., “End to End Learning for Self-Driving Cars.” arXiv, Apr. 25, 2016. doi: 10.48550/arXiv.1604.07316. [11] H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end Learning of Driving Models from Large-scale Video Datasets.” arXiv, Jul. 23, 2017. doi: 10.48550/arXiv.1612.01079. [12] A. Kuefler, J. Morton, T. Wheeler, and M. Kochenderfer, “Imitating Driver Behavior with Generative Adversarial Networks.” arXiv, Jan. 23, 2017. doi: 10.48550/arXiv.1701.06699. [13] S. Hecker, D. Dai, and L. Van Gool, “End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners.” arXiv, Aug. 06, 2018. doi: 10.48550/arXiv.1803.10158. [14] M. Bansal, A. Krizhevsky, and A. Ogale, “ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst.” arXiv, Dec. 07, 2018. doi: 10.48550/arXiv.1812.03079. [15] N. M. M. Shafiullah, Z. J. Cui, A. Altanzaya, and L. Pinto, “Behavior Transformers: Cloning k modes with one stone.” arXiv, Oct. 11, 2022. doi: 10.48550/arXiv.2206.11251. [16] A. Faisal, M. Kamruzzaman, T. Yigitcanlar, and G. Currie, “Understanding autonomous vehicles: A systematic literature review on capability, impact, planning and policy,” J. Transp. Land Use, vol. 12, no. 1, pp. 45–72, 2019, Accessed: Feb. 21, 2024. [Online]. Available: https://www.jstor.org/stable/26911258 38 [17] T. Glasmachers, “Limits of End-to-End Learning,” in Proceedings of the Ninth Asian Conference on Machine Learning, PMLR, Nov. 2017, pp. 17–32. Accessed: Feb. 21, 2024. [Online]. Available: https://proceedings.mlr.press/v77/glasmachers17a.html [18] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 12, pp. 6999–7019, Dec. 2022, doi: 10.1109/TNNLS.2021.3084827. [19] Y. Xiao, F. Codevilla, A. Gurram, O. Urfalioglu, and A. M. López, “Multimodal End-to-End Autonomous Driving,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 1, pp. 537–547, Jan. 2022, doi: 10.1109/TITS.2020.3013234. [20] H. Li, J. Li, X. Guan, B. Liang, Y. Lai, and X. Luo, “Research on Overfitting of Deep Learning,” in 2019 15th International Conference on Computational Intelligence and Security (CIS), Dec. 2019, pp. 78–81. doi: 10.1109/CIS.2019.00025. [21] S. Shah, D. Dey, C. Lovett, and A. Kapoor, “AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles,” in Field and Service Robotics, Springer, Cham, 2018, pp. 621–635. doi: 10.1007/978-3-319-67361-5_40. [22] J. Cui, H. Qiu, D. Chen, P. Stone, and Y. Zhu, “COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, pp. 17231–17241. doi: 10.1109/CVPR52688.2022.01674. [23] Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “End-to-End Urban Driving by Imitating a Reinforcement Learning Coach.” arXiv, Oct. 04, 2021. doi: 10.48550/arXiv.2108.08265. [24] J. Hawke et al., “Urban Driving with Conditional Imitation Learning.” arXiv, Dec. 05, 2019. doi: 10.48550/arXiv.1912.00177. 39 [25] S. Narayan and G. Tagliarini, “An analysis of underfitting in MLP networks,” in Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Jul. 2005, pp. 984–988 vol. 2. doi: 10.1109/IJCNN.2005.1555986. [26] R. Chekroun, M. Toromanoff, S. Hornauer, and F. Moutarde, “GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving.” arXiv, May 17, 2022. doi: 10.48550/arXiv.2111.08575. [27] J. Kolluri, V. K. Kotte, M. S. B. Phridviraj, and S. Razia, “Reducing Overfitting Problem in Machine Learning Using Novel L1/4 Regularization Method,” in 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), Jun. 2020, pp. 934–938. doi: 10.1109/ICOEI48184.2020.9142992. [28] A. Mikołajczyk and M. Grochowski, “Data augmentation for improving deep learning in image classification problem,” in 2018 International Interdisciplinary PhD Workshop (IIPhDW), May 2018, pp. 117–122. doi: 10.1109/IIPHDW.2018.8388338. 40

A report on autonomous vehicle

Related documents

Products

Support

A report on autonomous vehicle

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib