Uploaded by Hatem Hosam

Visually impaired outdoor navigation assistance system

advertisement
Visually Impaired/Blind Outdoor Navigation Assistance System
Team Name and Members
Team name: Cheongju Eagles
Members:
Hatem Ibrahem (PhD student). Email: hatem@cbnu.ac.kr, hatem.hosam1991@gmail.com
Bilel Yagoub (PhD student). Email: bilel.yagoub@cbnu.ac.kr, yagobilel@gmail.com
Ahmed Salem (PhD student). Email: ahmeddiefy@cbnu.ac.kr, ahmeddiefy@gmail.com
Hyun Soo Kang (Assistant professor). Email: hskang@cbnu.ac.kr
Real scenario video experiments are available in the links below :
Full scenario: https://youtu.be/SxVp8J_Ikdw
Vision based navigation test: https://youtu.be/UPR_vHPTZ14
Code: https://github.com/HatemHosam/Visually-impaired-solution-OpenCV-AIcompetition2021
Problem Statement
In the real world, the visually impaired and blind people do not have the ability to go outside
without a partner or assistant person. Sometimes those people have the desire to visit normal places
like markets, shops, restaurants or other emergency places like hospital, pharmacy and police
stations without depending on other persons. The visually impaired persons face a lot of problems in
street navigation as it is difficult to navigate through the streets individually to reach their
destination by walk or by public transportation depending on basic tools like blind sticks or
common sense. The problem become even harder when it comes to street crossing, walking on
platforms, obstacle and car avoidance or even reaching to the nearest bus station on time. Those
people may get the assistance from a machine-vision AI-based navigation system to help them
accurately navigate in the streets, save them from street car-accidents, and give them a
compensation to the visual impairing by accurate voice orders. If visually impaired person follow
voice command, he can safely and easily reach to his destination.
Proposed solution
We propose an AI-based computer vision hardware/software implementation to solve the problem of
the individual street navigation for visually impaired/blind individuals. With the aid of the current
achievements of semantic segmentation and depth estimation, we could integrate those great computer
vision algorithms in our solution to provide a street-scene perception.
The recent semantic segmentation methods which are trained on the popular segmentation dataset
“ADE20K” provide a detailed indoor and outdoor scenes understanding via segmentation. This model can
be easily integrated in an AI neural inference unit with camera like OAK-D or raspberry pi. Another
tool is integrated in our application which is the GPS which receive the desired location by the user, this
GPS service is implemented using mobile application which receive the voice command get it on Google
maps, then the smart phone provide the directions to the user through a custom mobile application, during
the GPS navigation the OAK-D board is visualizing the street and provide the specific street pavement
positioning, obstacle avoidance command until user reach to his destination.
Implementation details
The implemented system has three main components which are:
1- Voice navigation based mobile application
2- The head-mounted OAK-D kit.
3- Raspberry pi4 as a host kit and wireless communication provider.
And other secondary components such as two power banks and two USB cables (USB type C) for OAKD and RPI4 powering. The hardware components of our system is shown below in Figure 1.
Figure 1: components of the proposed solution.

Voice navigation based mobile application:
An android mobile application was developed containing google maps API however many other
features are added to empower our system. A voice based -precise walking navigation system was
implemented from A to Z to provide precise commands. In an experimental scenario, we
implemented a function that calculate the precise navigation angle between two precise way-points
(the precision of the latitude and longitude fractional-part is 10 digits) on the map with different
angle’s range cases. Since the heading in the maps API is always calculated in regard to the north
direction, we implemented the function to calculate the relative angle between two waypoints on the
map, the voice orders are generated based on the relative angle as follows:
If the bearing angle is between 0 and 30 or 0 and -30 so no need to change your heading unless the
angle become more than 30 or less than -30.
If the angle between 30 and 50 or -30 and -50 the voice command is “turn slightly left” or “turn
slightly right”, respectively.
If the angle between 50 and 120 or -50 and -120, the voice command is “turn left” or “turn right”.
If the angle is more than 100 or less than -120 with some margin, the voice command is “you are out
of course, you need to turn around”.
Also the mobile application provides Bluetooth communication between the application and the RPI4
board to receive the computer vision based commands for more precise navigation in the street.
A text-to-speech android API is used to generate the voice command from a written text in the
android application.
Threading is used in the application to organize the voice commands coming from the RPI4 and the
voice commands generated from the maps API navigation. During the navigation when the user
reaches a track transition way-point a beep sound is generated, it can be noticed at the video.

The head-mounted OAK-D kit.
The OAK-D kit is fitted in a head-mount band to be controlled by the user’s head direction, the OAKD kit was hosted by the RPI4 kit and they are connected through USB. Two main computer vision
models were used to provide the computer vision based navigation.
The first model used is a semantic segmentation model using deeplabv3+ architecture trained on
ADE20K dataset and the second model is a stereo depth estimation model to calculate the objects z
location to be used to calculate the object’s distance from the user. The ADE20K is a large dataset
containing around 20K training images of indoor and outdoor scenes. ADE20K has 150 different
categories which almost cover all the general objects people might scene in their daily life. We focus
on 6 main classes; pavements, person, car, motorcycle, bicycle, and tree. Other categories will be
useful in a future extension of the project. Since in the current prototype we help the user to avoid
other nearby pedestrians who blocks the user’s path, the nearby vehicles. Also we help the user to
avoid hitting common obstacles such as a tree.
To localize the objects in front of the user, a center-scene box is created in the center of the frame to
be used to define which object the user is looking at. Using the segmentation output, we find the
segmentation mask pixels of the desired categories. Then we calculate the boundaries of the object
from the segmentation mask of each object. We define the pavement location either on the right or on
the left if the maximum x boundary of the pavement is less than the center-box’s minimum x location
and the opposite is done to define if the pavement on the other side. We also used other objects masks
such as person, vehicle, tree, and sign post and we define the overlap between the mask and the center
box. If they overlaps then the stereo depth is calculated to define how far the object from the user is.
If the overlapped object is less than 3 meters away, a voice command is generated to warn the user
and ask him to avoid it by going left or right (“watch out, an “object category” is in front of you”).

Raspberry pi 4 as a host kit and communication provider
The RPI4 is used with Linux system “Raspbian”, the depth-ai
library is used to stream the models to OAK-D. OpenVino is used
to convert the DeeplabV3+ segmentation model to the
intermediate representation and then depth-ai model converter is
used to obtain the .blob model to run directly on the OAK-D. The
OAK-D. The blue-dot library is used to provide Bluetooth
communication with the mobile application. The voice commands
are generated based on the output segmentation and depth
estimation (for distance augmentation) obtained from the OAK-D
board and the voice commands are sent via Bluetooth in the form
of text to the mobile application in order to be spoken by the textto-speech API used in the mobile application.
Figure 2. The experiment track scenario

Power banks for OAK-D and RPI4 powering
We used two 10,000 mAh power banks “brand name: Next” of an output voltage of 5V and
maximum current of 3A which is convenient with both the OAK-D kit and the RPI4 kit. Those power
banks are able to power the system for approximately 2 hours and 30 minutes. The USB C port from
the power banks is used to provide the power required.
Experiment:
We tested our implementation on a real-life scenario in Chungbuk national
university, Cheongju-si, South Korea. The scenario consists of custom high
precision 9 way-points in a shape of square as shown in the figure. The
scenario is tested for the implemented custom walking navigation system, also
pavement guidance and object avoidance (people, cars, motorbikes, bicycle
and, trees) are tested and shown in the attached video and images. The image
below from the mobile application only for testing since no need for the map
in case of a blind person is using the system. The generated voice commands
are obvious in the attached video.
The ready-to-use experimental setup is as shown in figure 3. The user use the
head mount strap which holds the OAK-D kit. The OAK-D and the RPI4 each
is connected to a battery power bank and they are connected to each other
through USB cable. We use a small waist bag to hold the batteries and RPI4
board. The system is tested in the designed scenario to experiment the walking
navigation system and the vision based navigation system. The system showed
a good walking navigation accuracy. Also the vision based navigation system
showed a good accuracy in pavement localization and object detection such as
person, vehicles and trees.
We tested the vision-based pavement localization and object avoidance separately to show the ability of
the implemented system to avoid different objects in the scene. The system could perfectly localize the
pavement
Figure 3. The ready to use setup
whether it is on the right or on the left of the user based on the location relative to the center-box as
shown at the in figure 4. The left image is captured during the localization of the pavement on the right,
and the right image is captured during the localization of the pavement on the left.
Figure 4. Pavement localization during the vision based navigation experiment.
The system was able also to recognize nearby cars, persons or trees during the trip as shown at the right
image in figure 5. First image from the left shows the car detection, the middle image shows the tree
detection, and the right image shows the person detection during the trip.
Figure 5. Car, tree, and person detection during the vision based navigation experiment.
An examples of the output voice commands received from the RPI4 that are based on the vision of OAKD during the test scenario are shown figure 6. The figure shows that the user is following the track
commands coming from the walking navigation while in the same time, it receives the vision based
commands based on the combined segmentation and the stereo camera-based distance measurement
output. The image in the left, shows that the segmentation of the pavement is synchronized with the
received command as shown in the bottom of the application window in the left of the image. The second
image on the right shows a situation when a person appeared in front of the user the RPI 4 send the voice
command of “watch out, a person is in front of you” to warn the user and to inform him to avoid the
person and change his path. During the full scenario experiment a beep sound is generated when the user
reach a transition way-point. Also the tone of the voice commands of the vision-based navigation is
different from the GPS-based navigation to let the user recognize what the source of the voice order is.
Figure 6. Full system experiment (Vision-based Navigation + GPS-based Navigation) on the custom scenario.
Conclusion
The proposed system can provide a combination of GPS-based walking navigation and vision based
navigation to a blind user through voice commands which can guide the blind user to his destination
easily and safely as it can detect the common objects in the streets. However, the system still a prototype
in the development stage and many features can be added in the future to support more functions such as
the indoor navigation, bus/taxi riding and more objects can be added to the object avoidance system.
Acknowledgement
We would like to thank Bilel Yagoub who is one of the group members for his effort in the video editing
and we also would like to thank our university lab member “HeeJoo Kwon” for helping us in video
capturing of the proposed system experiments.
Biography of team members
Hatem Ibrahem received his B.Eng. degree in electrical engineering (electronics and
communication) from Assiut University, Assiut, Egypt, in 2013. He is currently pursuing the
combined master’s and Ph.D degree with the School of Information and Communication
Engineering, Chungbuk National University, Chungbuk, South Korea.
His research interests include multimedia, image processing, machine learning, deep learning
and computer vision.
Bilel Yagoub received his B.Sc degree in computer science from University Abdel-hamid Ibn Badis
Mostaganem, Algeria, in 2013, and the M.Sc degree from the University Oran 1 Ahmed Ben Bella,
Algeria, in 2015. He is currently pursuing the Ph.D degree with the school of Information and
Communication Engineering, Chungbuk National University, Chungbuk, South Korea. His research
interests include Web/Mobile application development, deep learning, and computer vision.
Ahmed Salem received the B.Eng. degree in electrical engineering (electronics &
communication) from Assiut University, Assiut, Egypt, in 2012, and the M.Eng. degree in
electronics and communication engineering from Egypt-Japan University of Science and
Technology, Alexandria, Egypt, in 2016.
He is currently pursuing the Ph.D. degree with the school of Information and Communication
Engineering, Chungbuk National University, Chungbuk, South Korea. His research interests
include multimedia, computer vision, and machine learning.
Hyun-Soo Kang received the BS degree in electronic engineering from Kyoungpook National
University, Republic of Korea, in 1991, and the MS and PhD degrees in electrical and electronics
engineering from KAIST in 1994 and 1999, respectively. From 1999 to 2005, he was with Hynix
Semiconductor Co., Ltd., Electronics and Telecommunications Research Institute (ETRI), and
Chungang University. He joined the College of Electrical and Computer Engineering of Chungbuk
National University, Chungbuk, Republic of Korea, in Mar. 2005.
His research interests include image compression and image processing.
Download