Visually Impaired/Blind Outdoor Navigation Assistance System Team Name and Members Team name: Cheongju Eagles Members: Hatem Ibrahem (PhD student). Email: hatem@cbnu.ac.kr, hatem.hosam1991@gmail.com Bilel Yagoub (PhD student). Email: bilel.yagoub@cbnu.ac.kr, yagobilel@gmail.com Ahmed Salem (PhD student). Email: ahmeddiefy@cbnu.ac.kr, ahmeddiefy@gmail.com Hyun Soo Kang (Assistant professor). Email: hskang@cbnu.ac.kr Real scenario video experiments are available in the links below : Full scenario: https://youtu.be/SxVp8J_Ikdw Vision based navigation test: https://youtu.be/UPR_vHPTZ14 Code: https://github.com/HatemHosam/Visually-impaired-solution-OpenCV-AIcompetition2021 Problem Statement In the real world, the visually impaired and blind people do not have the ability to go outside without a partner or assistant person. Sometimes those people have the desire to visit normal places like markets, shops, restaurants or other emergency places like hospital, pharmacy and police stations without depending on other persons. The visually impaired persons face a lot of problems in street navigation as it is difficult to navigate through the streets individually to reach their destination by walk or by public transportation depending on basic tools like blind sticks or common sense. The problem become even harder when it comes to street crossing, walking on platforms, obstacle and car avoidance or even reaching to the nearest bus station on time. Those people may get the assistance from a machine-vision AI-based navigation system to help them accurately navigate in the streets, save them from street car-accidents, and give them a compensation to the visual impairing by accurate voice orders. If visually impaired person follow voice command, he can safely and easily reach to his destination. Proposed solution We propose an AI-based computer vision hardware/software implementation to solve the problem of the individual street navigation for visually impaired/blind individuals. With the aid of the current achievements of semantic segmentation and depth estimation, we could integrate those great computer vision algorithms in our solution to provide a street-scene perception. The recent semantic segmentation methods which are trained on the popular segmentation dataset “ADE20K” provide a detailed indoor and outdoor scenes understanding via segmentation. This model can be easily integrated in an AI neural inference unit with camera like OAK-D or raspberry pi. Another tool is integrated in our application which is the GPS which receive the desired location by the user, this GPS service is implemented using mobile application which receive the voice command get it on Google maps, then the smart phone provide the directions to the user through a custom mobile application, during the GPS navigation the OAK-D board is visualizing the street and provide the specific street pavement positioning, obstacle avoidance command until user reach to his destination. Implementation details The implemented system has three main components which are: 1- Voice navigation based mobile application 2- The head-mounted OAK-D kit. 3- Raspberry pi4 as a host kit and wireless communication provider. And other secondary components such as two power banks and two USB cables (USB type C) for OAKD and RPI4 powering. The hardware components of our system is shown below in Figure 1. Figure 1: components of the proposed solution. Voice navigation based mobile application: An android mobile application was developed containing google maps API however many other features are added to empower our system. A voice based -precise walking navigation system was implemented from A to Z to provide precise commands. In an experimental scenario, we implemented a function that calculate the precise navigation angle between two precise way-points (the precision of the latitude and longitude fractional-part is 10 digits) on the map with different angle’s range cases. Since the heading in the maps API is always calculated in regard to the north direction, we implemented the function to calculate the relative angle between two waypoints on the map, the voice orders are generated based on the relative angle as follows: If the bearing angle is between 0 and 30 or 0 and -30 so no need to change your heading unless the angle become more than 30 or less than -30. If the angle between 30 and 50 or -30 and -50 the voice command is “turn slightly left” or “turn slightly right”, respectively. If the angle between 50 and 120 or -50 and -120, the voice command is “turn left” or “turn right”. If the angle is more than 100 or less than -120 with some margin, the voice command is “you are out of course, you need to turn around”. Also the mobile application provides Bluetooth communication between the application and the RPI4 board to receive the computer vision based commands for more precise navigation in the street. A text-to-speech android API is used to generate the voice command from a written text in the android application. Threading is used in the application to organize the voice commands coming from the RPI4 and the voice commands generated from the maps API navigation. During the navigation when the user reaches a track transition way-point a beep sound is generated, it can be noticed at the video. The head-mounted OAK-D kit. The OAK-D kit is fitted in a head-mount band to be controlled by the user’s head direction, the OAKD kit was hosted by the RPI4 kit and they are connected through USB. Two main computer vision models were used to provide the computer vision based navigation. The first model used is a semantic segmentation model using deeplabv3+ architecture trained on ADE20K dataset and the second model is a stereo depth estimation model to calculate the objects z location to be used to calculate the object’s distance from the user. The ADE20K is a large dataset containing around 20K training images of indoor and outdoor scenes. ADE20K has 150 different categories which almost cover all the general objects people might scene in their daily life. We focus on 6 main classes; pavements, person, car, motorcycle, bicycle, and tree. Other categories will be useful in a future extension of the project. Since in the current prototype we help the user to avoid other nearby pedestrians who blocks the user’s path, the nearby vehicles. Also we help the user to avoid hitting common obstacles such as a tree. To localize the objects in front of the user, a center-scene box is created in the center of the frame to be used to define which object the user is looking at. Using the segmentation output, we find the segmentation mask pixels of the desired categories. Then we calculate the boundaries of the object from the segmentation mask of each object. We define the pavement location either on the right or on the left if the maximum x boundary of the pavement is less than the center-box’s minimum x location and the opposite is done to define if the pavement on the other side. We also used other objects masks such as person, vehicle, tree, and sign post and we define the overlap between the mask and the center box. If they overlaps then the stereo depth is calculated to define how far the object from the user is. If the overlapped object is less than 3 meters away, a voice command is generated to warn the user and ask him to avoid it by going left or right (“watch out, an “object category” is in front of you”). Raspberry pi 4 as a host kit and communication provider The RPI4 is used with Linux system “Raspbian”, the depth-ai library is used to stream the models to OAK-D. OpenVino is used to convert the DeeplabV3+ segmentation model to the intermediate representation and then depth-ai model converter is used to obtain the .blob model to run directly on the OAK-D. The OAK-D. The blue-dot library is used to provide Bluetooth communication with the mobile application. The voice commands are generated based on the output segmentation and depth estimation (for distance augmentation) obtained from the OAK-D board and the voice commands are sent via Bluetooth in the form of text to the mobile application in order to be spoken by the textto-speech API used in the mobile application. Figure 2. The experiment track scenario Power banks for OAK-D and RPI4 powering We used two 10,000 mAh power banks “brand name: Next” of an output voltage of 5V and maximum current of 3A which is convenient with both the OAK-D kit and the RPI4 kit. Those power banks are able to power the system for approximately 2 hours and 30 minutes. The USB C port from the power banks is used to provide the power required. Experiment: We tested our implementation on a real-life scenario in Chungbuk national university, Cheongju-si, South Korea. The scenario consists of custom high precision 9 way-points in a shape of square as shown in the figure. The scenario is tested for the implemented custom walking navigation system, also pavement guidance and object avoidance (people, cars, motorbikes, bicycle and, trees) are tested and shown in the attached video and images. The image below from the mobile application only for testing since no need for the map in case of a blind person is using the system. The generated voice commands are obvious in the attached video. The ready-to-use experimental setup is as shown in figure 3. The user use the head mount strap which holds the OAK-D kit. The OAK-D and the RPI4 each is connected to a battery power bank and they are connected to each other through USB cable. We use a small waist bag to hold the batteries and RPI4 board. The system is tested in the designed scenario to experiment the walking navigation system and the vision based navigation system. The system showed a good walking navigation accuracy. Also the vision based navigation system showed a good accuracy in pavement localization and object detection such as person, vehicles and trees. We tested the vision-based pavement localization and object avoidance separately to show the ability of the implemented system to avoid different objects in the scene. The system could perfectly localize the pavement Figure 3. The ready to use setup whether it is on the right or on the left of the user based on the location relative to the center-box as shown at the in figure 4. The left image is captured during the localization of the pavement on the right, and the right image is captured during the localization of the pavement on the left. Figure 4. Pavement localization during the vision based navigation experiment. The system was able also to recognize nearby cars, persons or trees during the trip as shown at the right image in figure 5. First image from the left shows the car detection, the middle image shows the tree detection, and the right image shows the person detection during the trip. Figure 5. Car, tree, and person detection during the vision based navigation experiment. An examples of the output voice commands received from the RPI4 that are based on the vision of OAKD during the test scenario are shown figure 6. The figure shows that the user is following the track commands coming from the walking navigation while in the same time, it receives the vision based commands based on the combined segmentation and the stereo camera-based distance measurement output. The image in the left, shows that the segmentation of the pavement is synchronized with the received command as shown in the bottom of the application window in the left of the image. The second image on the right shows a situation when a person appeared in front of the user the RPI 4 send the voice command of “watch out, a person is in front of you” to warn the user and to inform him to avoid the person and change his path. During the full scenario experiment a beep sound is generated when the user reach a transition way-point. Also the tone of the voice commands of the vision-based navigation is different from the GPS-based navigation to let the user recognize what the source of the voice order is. Figure 6. Full system experiment (Vision-based Navigation + GPS-based Navigation) on the custom scenario. Conclusion The proposed system can provide a combination of GPS-based walking navigation and vision based navigation to a blind user through voice commands which can guide the blind user to his destination easily and safely as it can detect the common objects in the streets. However, the system still a prototype in the development stage and many features can be added in the future to support more functions such as the indoor navigation, bus/taxi riding and more objects can be added to the object avoidance system. Acknowledgement We would like to thank Bilel Yagoub who is one of the group members for his effort in the video editing and we also would like to thank our university lab member “HeeJoo Kwon” for helping us in video capturing of the proposed system experiments. Biography of team members Hatem Ibrahem received his B.Eng. degree in electrical engineering (electronics and communication) from Assiut University, Assiut, Egypt, in 2013. He is currently pursuing the combined master’s and Ph.D degree with the School of Information and Communication Engineering, Chungbuk National University, Chungbuk, South Korea. His research interests include multimedia, image processing, machine learning, deep learning and computer vision. Bilel Yagoub received his B.Sc degree in computer science from University Abdel-hamid Ibn Badis Mostaganem, Algeria, in 2013, and the M.Sc degree from the University Oran 1 Ahmed Ben Bella, Algeria, in 2015. He is currently pursuing the Ph.D degree with the school of Information and Communication Engineering, Chungbuk National University, Chungbuk, South Korea. His research interests include Web/Mobile application development, deep learning, and computer vision. Ahmed Salem received the B.Eng. degree in electrical engineering (electronics & communication) from Assiut University, Assiut, Egypt, in 2012, and the M.Eng. degree in electronics and communication engineering from Egypt-Japan University of Science and Technology, Alexandria, Egypt, in 2016. He is currently pursuing the Ph.D. degree with the school of Information and Communication Engineering, Chungbuk National University, Chungbuk, South Korea. His research interests include multimedia, computer vision, and machine learning. Hyun-Soo Kang received the BS degree in electronic engineering from Kyoungpook National University, Republic of Korea, in 1991, and the MS and PhD degrees in electrical and electronics engineering from KAIST in 1994 and 1999, respectively. From 1999 to 2005, he was with Hynix Semiconductor Co., Ltd., Electronics and Telecommunications Research Institute (ETRI), and Chungang University. He joined the College of Electrical and Computer Engineering of Chungbuk National University, Chungbuk, Republic of Korea, in Mar. 2005. His research interests include image compression and image processing.