Uploaded by zairannadeem

Minor Project Report

advertisement
Page |1
Mini Project Report
on
Sign Language Detection and Recognition Using Deep
Learning
By
Group ID: 09
Syead Maaz Ahmed (201900007)
Farheen (201900014)
Ebbani Thapa (201900040)
Under the guidance of
Mr. Shantanu Kumar Mishra, Assistant Professor,
Department of Computer Science and Engineering, SMIT
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SIKKIM MANIPAL INSTITUTE OF TECHNOLOGY
(A constituent college of Sikkim Manipal University)
MAJITAR, RANGPO, EAST SIKKIM – 737136
Page |2
LIST OF CONTENTS
Content
Page No.
Abstract
3
Introduction
3-4
Literature Survey
4-8
Problem Definition
8-9
Solution Strategy
9 – 10
System Requirements
10
Design
11
Implementation
12 - 14
Input / Output
15
Conclusion
16
Problems
16
Future Works
16
Gantt Chart
17
References
17
Page |3
ABSTRACT
Conversing to a person with hearing disability is always a major challenge. Sign language has
indelibly become the ultimate panacea and is a very powerful tool for individuals with
hearing and speech disability to communicate their feelings and opinions to the world. It
makes the integration process between them and others smooth and less complex. However,
the invention of sign language alone, is not enough. There are many strings attached to this
boon. The sign gestures often get mixed and confused for someone who has never learnt it or
knows it in a different language. However, this communication gap which has existed for
years can now be narrowed with the introduction of various techniques to automate the
detection of sign gestures.
In this report, we introduce a Sign Language recognition using American Sign Language. In
this study, the user must be able to capture images of the hand gesture using web camera and
the system shall predict and display the name of the captured image. The images undergo a
series of processing steps which include various Computer vision techniques such as the
conversion to grayscale, gaussian blur. And the region of interest which, in our case is the
hand gesture is segmented. The features extracted are the binary pixels of the images. We
make use of Convolutional Neural Network (CNN) for training and to classify the images.
We are able to recognize more than 20 alphabets of American Sign language with high
accuracy. Our model has achieved a remarkable accuracy of above 96%.
INTRODUCTION
As well stipulated by Nelson Mandela, “Talk to a man in a language he understands, that
goes to his head. Talk to him in his own language, that goes to his heart”, language is
undoubtedly essential to human interaction and has existed since human civilization began.
It is a medium humans use to communicate to express themselves and understand notions of
the real world. Without it, no books, no cell phones and definitely not any word I am writing
would have any meaning. It is so deeply embedded in our everyday routine that we often take
it for granted and don’t realize its importance. Sadly, in the fast changing society we live in,
people with hearing impairment are usually forgotten and left out. They have to struggle to
Page |4
bring up their ideas, voice out their opinions and express themselves to people who are
different to them.
People with impaired speech and hearing uses sign language as a form of communication.
Disabled People use this sign language gestures as a tool of non-verbal communication to
express their own emotions and thoughts to other common people. But these common people
find it difficult to understand their expression, thus trained sign language expertise are needed
during medical and legal appointment, educational and training session. Over the past few
years, there has been an increase in demand for these services. Other form of services such as
video remote human interpret using the high-speed Internet connection, has been introduced,
thus these services provides an easy to use sign language interpret service, which can be used
and benefited, yet have major limitations.
To address this, we are putting forward a sign language recognition system. It will be an
ultimate tool for people with hearing disability to communicate their thoughts as well as a
very good interpretation for non-sign language user to understand what the latter is saying. In
this recognition system, we will be use a custom CNN model to recognize gestures in sign
language. Convolutional neural network of 11 layers is constructed, three Convolution layers,
three Max-Pooling Layers, two dense layers, one flattening layer and two dropout layer. We
will use the American Sign Language (ASL) Dataset from MNIST (Modified National
Institute of Standards and Technology Dataset) to train the model to identify the gesture. The
dataset contains the features of different augmented gestures. Shall introduce a custom CNN
(Convolutional Neural Network) model to identify the sign from a video frame using OpenCV.
Initially, we will use feature extracted dataset to train the custom model that has 11 layers
with a default image size. Rest of the report contains the following: Literature Survey,
elaborated Problem Definition, Solution Strategy, Implementation, Pseudo Code Gantt Chart
and finally References taken from.
LITERATURE SURVEY
2.1 A convolutional neural network to classify American Sign Language
fingerspelling from depth and color images
Page |5
Title
A convolutional neural network to classify American Sign Language fingerspelling from
depth and color images
Author
Ameen, SA and Vadera, S
Salient Features
The paper aims to utilize a deep learning architecture to recognize the kind of signs presented
as images. Explanation on how a convolution aims to apply kernel transformations on an
image to identify relevant features while the main goal of pooling is to introduce invariance
to local translation and reduce the number of hidden units. The paper explores the use of a
different architecture that recognizes that depth and intensity are inherently different types of
information and that there may be advantages in keeping those separate in the initial layers of
a ConvNet. An analysis of the confusion matrix has identified two types of errors: (i)
symmetric errors, such as two letters that can be misclassified as each other and (ii)
asymmetric errors, where one letter is misclassified as another but not the other way round.
Pros
The results of the empirical evaluation showed an improvement by 3% compared to their
previous works, and with precision rate over 82%.
Cons
The sign for the letter R has nearly the same shape as that for the letter U, especially when the
hand moves. In both letters, the signer needs to use two fingers to convey the meaning. In
addition, the distance between the camera and the fingers is nearly equal which makes it
difficult to recognize the differences even when using depth.
2.2 Static Sign Language Recognition Using Deep Learning
Title
Static Sign Language Recognition Using Deep Learning
Page |6
Author
Lean Karlo S. Tolentino, Ronnie O. Serfa Juan, August C. Thio-ac, Maria Abigail B.
Pamahoy, Joni Rose R. Forteza, and Xavier Jet O. Garcia
Salient Features
The main objective of the project was to develop a system that can translate static sign
language into its corresponding word equivalent that includes letters, numbers, and basic
static signs to familiarize the users with the fundamentals of sign language. The paper
explains how the system was developed on basis of skin-color modeling technique, i.e.,
explicit skin-color space thresholding that will extract pixels (hand) from non–pixels
(background). The images were fed into the model called the Convolutional Neural Network
(CNN) for classification of images. Keras and TensorFlow was used for training of images.
Provided with proper lighting condition and a uniform background.
Pros
The testing accuracy of 90.04% in letter recognition, 93.44% in number recognition and
97.52% in static word recognition, obtaining an average of 93.667% based on the gesture
recognition with limited time. Each system was trained using 2,400, 50 × 50 images of each
letter/number/word gesture.
Cons
The studies proposed a complex process of skin color thresholding; it could be seen that
when only the bare hands of the signer are used, it was difficult for the system to recognize
the gesture because of different hindrances such as noise.
2.3 American Sign Language Alphabet Recognition using Deep Learning
Title
American Sign Language Alphabet Recognition using Deep Learning
Author
Nikhil Kasukurthi, Brij Rokad, Shiv Bidani and Dr. Aju Dennisan
Page |7
Salient Features
For translating the image to the relevant alphabet, they have trained the pre-trained model on
the SqueezeNet model. The model is trained on the Surrey Finger thereby using the Surrey
Finger Dataset. The trained model is then used for inference from the images that are being
fed as input to the image. The model was trained on NVIDIA K80 GPU for a dataset size of
41,258 images. Each sample provided RGB image (320x320 pixels), Depth map (320x320
pixels), Segmentation masks (320x320 pixels) for the classes: background, person, three
classes for each finger and one for each palm. The evaluation was a low computation process
and could also be carried out on a handheld mobile device.
Pros
The evaluation was a low computation process and could also be carried out on a handheld
mobile device. The maximum validation accuracy attained was 83.29% at the 9th epoch.
Whereas the maximum training accuracy attained was 87.47%. The correlation between the
training and validation accuracy was 98.47% which signified that the model had been trained
accurately.
Cons
The model is able to give accurate predictions but there are certain cases when it failed. This
happened when similar looking alphabet like ‘a’ and ‘t’ where the difference between them is
thumb on the side for ‘a’ whereas ‘t’ has thumb in between index and middle finger. When an
image with different light conditions is given, or the fingers are not visible then it lead to a
false prediction.
2.4 Sign Language Recognition Using Deep Learning and Computer Vision
Title
Sign Language Recognition Using Deep Learning and Computer Vision
Author
R.S. Sabeenian, S. Sai Bharathwaj and , M. Mohamed Aadhil
Page |8
Salient Features
A custom CNN model is used to recognize gestures in sign language. Convolutional neural
network of 11 layers is constructed, four Convolution layers, three Max-Pooling Layers, two
dense layers, one flattening layer and one dropout layer. The American Sign Language
Dataset is used from MNIST (Modified National Institute of Standards and Technology
database) to train the model to identify the gesture. The dataset contains the features of
different augmented gestures. Also, a custom CNN (Convolutional Neural Network) model is
introduced to identify of the sign from a video frame using Open-CV.
Pros
The usage of the custom CNN model makes it easy in choosing the variety of convolution to
utilize (3x3, 5x5) in the model itself. The validation dataset consisted of 7172 samples, and
the validation accuracy of the model resulted to greater than 93 %.
Cons
The major issue faced was due to the background of the image. As the model was trained
with a segmented grayscale gesture image, it didn’t support background subtraction from the
image when the frames were dropped from a video.
PROBLEM DEFINITION
There are more than 70 million people who are hearing or speech impaired. Recognizing sign
language by human interpreters might not be a very challenging task as all that’s required is
for the interpreter to learn the particular sign language. Sign languages is not just ‘natural
language represented by sign’ or not just hand representation of the words as it is but rather it
is the representation of meaning.
Various facts associated with sign language, a natural language, which most of us are
unaware. Some of them are listed below:
● NOT the same all over the world
● NOT just gestures and pantomime but do have their own grammar.
● Dictionary is smaller compared to other languages.
Page |9
● Finger-spelling for unknown words.
● Adjectives are placed after the noun for most of the sign language.
● Never use suffixes
● Always sign in present tense
● Do not use articles.
● Do not uses I but uses me.
● Have no gerunds
● Use of eyebrows and non-manual expressions.
Normal people face difficulty in understanding their language. Hence there is a need of a
system which recognizes the different signs, gestures and conveys the information to the
normal people.
SOLUTION STRATEGY
The major reason to develop this system is to make communication through the internet
easier for the deaf and mute community.
A hand gesture recognition gadget can offer an opportunity for deaf people to talk with vocal
humans without the need of an interpreter.
The working of this approach is carried out with the help of certain modules which are as
follows:
•
Data Set: In this step, a set of images for each letter in the sign language is fed to a
database. The number of images may vary from 50 to 100, with different angles of
each particular gesture included. The input obtained is then compared with the given
images in the dataset to identify the gesture made. The reason for the number of
images in the dataset is to get the output with a good amount of accuracy and also to
avoid ambiguity.
P a g e | 10
•
Image detection: This is the step that comes right after camera capture. Image
detection refers to detecting the image that is obtained.
•
Feature Extraction: Feature extraction refers to extracting the details from the image
captured. In a sign language interpreter, the image captured is a gesture made by a
hand. These features are then used to recognize the gesture using certain algorithms.
•
Image Recognition: Image recognition is the most crucial procedure of this project.
The acquired image is converted to its vector form.
•
Output: The flow of execution takes place in the following manner: The camera gets
the input gesture image from the user, the detection process takes place to check if it
is a hand or not using certain algorithms, image recognition is the next step where the
image acquired from the user is compared with the images in the dataset, to interpret
the shown gesture. Image recognition is done using a model known as CNN or
Convolutional Neural Network. The next step is the output where the recognized
symbol is converted to text form as the output.
SYSTEM REQUIREMENTS
Software Requirements
•
Python 3.10
•
Jupyter-lab
•
GPU – CUNN/CUDN
•
Object Detection API
•
OpenCV
•
TensorFlow
•
Numpy
•
Pandas
Hardware Requirements
•
Processors: Processor - Intel® Core™ i5/AMD Ryzen 5
•
RAM: 08GB
•
Storage: 20GB
•
Standard Devices: Keyboard, monitor, webcam and mouse
P a g e | 11
AMERICAN SIGN LANGUAGE
DESIGN
P a g e | 12
IMPLEMENTATION
6.1 Algorithm
The entire model is based on the concept of Convolutional Neural Network (CNN). CNN
comes under deep neural networks belonging to one of its classes. We can see CNN applied in
many areas but most of these areas including visual images. The steps followed would include
providing input image into convolution layer. Choosing parameters, applying filters with
strides, padding if required. Performing convolution on the image and applying ReLU
activation to the matrix. Perform pooling to reduce dimensionality size. Add as many
convolutional layers until satisfied Flatten the output and feed into a fully connected layer (FC
Layer). Output the class using an activation function (Logistic Regression with cost functions)
and classifies images. CNNs are used for image classification and recognition because of its
high accuracy rate thus this will be best for training the model since our project revolves around
images.
PSEUDO CODE
Dataset transformed to grayscale
P a g e | 13
Convolution Neural Network is applied for training and classification
Model Summary
P a g e | 14
Plotting accuracy VS val_accuracy graph
Evaluation
Using camera to take alphabet sign as input
P a g e | 15
INPUT / OUTPUT
P a g e | 16
CONCLUSION
Many breakthroughs have been made in the field of artificial intelligence, machine learning
and computer vision. They have immensely contributed in how we perceive things around us
and improve the way in which we apply their techniques in our everyday lives. Many researches
have been conducted on sign gesture recognition using different techniques like ANN, LSTM
and 3D CNN. However, most of them require extra computing power. On the other hand, our
research paper requires low computing power and gives a remarkable accuracy of above 95%.
In our research, we proposed to normalize and rescale our images to 64 pixels in order to extract
features (binary pixels) and make the system more robust. We use CNN to classify the more
than 20 alphabetical American sign gestures and successfully achieve an accuracy of 97%
which is better than other related work stated in this paper.
PROBLEMS
Sign languages are very broad and differ from country to country in terms of gestures, body
language and face expressions. The grammars and structure of a sentence also varies a lot. In
our study, learning and capturing the gestures was quite a challenge for us since the movement
of hands had to be precise and on point . Some gestures are difficult to reproduce. And it was
hard to keep our hands in exact same position when creating our dataset.
FUTURE WORK
We look forward to improve the model so that it recognizes more alphabetical features while
at the same time get a high accuracy. Further we would like to extend this alphabetical
recognition system into a fully automated conversation recognition system. We would also like
to enhance the system by adding speech recognition so that blind people can benefit as well.
P a g e | 17
GANTT CHART
November
December
January
Febuary
March
April
Problem Identification
Feasibility Study
Literature Survey
SRS and Design
Coding
Testing
Documentation
REFERENCES
[1] Amen, S., & Vadera, S. (2017). A convolutional neural network to classify American Sign
Language fingerspelling from depth and colour images (University of Salford Manchester
(2017))
[2] L. Tolentino, R. Juan, August C. , Maria A. Pamahoy, J. Forteza, and Xavier O. Gracia.
Static Sign Language Recognition Using Deep Learning (International Journal of Machine
Learning and Computing, Vol. 9, No. 6, December 2019)
[3] Nikhil Kasukurthi, Brij Rokad, Shiv Bidani, Dr. Aju Dennisan. Sign Language Recognition
Using Deep Learning and Computer Vision (Vellore Institute of Technology University (2019)
[4] R.S. Sabeenian, S. Sai Bharathwaj and , M. Mohamed Aadhil. Sign Language Recognition
Using Deep Learning and Computer Vision ( Journal of Advanced Research in Dynamical and
Control Systems, May, 2020)
[5] Prof. Radha S. Shirbhate, Mr. Vedant D. Shinde, Ms. Sanam A. Metkari, Ms. Pooja U.
Borkar, Ms. Mayuri A. Khandge Using Sign language Recognition Using Machine Learning
Algorithm (International Research Journal of Engineering and Technology (IRJET)) (2020)
Download