Uploaded by JaY KachhaP

Sign Language Recognition Project Report

advertisement
Project Report on
SIGN LANGUAGE RECOGNITION
Submitted in partial fulfillment of the requirements for
the award of the degree of
BACHELOR OF TECHNOLOGY
in
CSE & IT
by
JAY KACHHAP (19030445002)
MD MANZAR ANSARI (19030445004)
SYED SAQLAIN AHMAD (19030485005)
Under the supervision
Of
Prof. Panjeet Kumar Lenka
(Department of IT)
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
BIT SINDRI, DHANBAD, JHARKHAND
INDIA, PIN-828123
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
B.I.T SINDRI, DHANBAD
CERTIFICATE
This is to certify that the dissertation bound herewith is an authentic record of project
work on “Sign Language Recognition” carried out by the following students under
our guidance and supervision of partial fulfilment of the requirement for the award
of B. Tech degree in CSE & IT (session: 2019 - 2023) of B. I. T. Sindri, Dhanbad
under the affiliation of Jharkhand University of Technology, Jharkhand.
STUDENTS NAME
REGISTRATION NO.
JAY KACHHAP
19030445002
MD MANZAR ANSARI
19030445004
SYED SAQLAIN AHMAD
19030485005
(Signature of HOD)
(Signature of Supervisor)
Dept. of CSE & IT
I
ACKNOWLEDGEMENT
It gives us immense pleasure to express our heartiest and sincere gratitude,
indebtedness to our supervisor and mentor, Mr. PANJEET LENKA, Assistant
Professor, Department of Information Technology, B.I.T. Sindri for his cooperation
and tireless efforts in executing this project. His firm determination always boosted
confidence in us. It was his constant hard work that helped us in giving this project
a successful outcome.
We are extremely grateful to and would like to add our deep sense of gratitude to,
Dr.S.C.Dutta, Head of the Department, Information Technology, B.I.T. Sindri for
his suggestions, cooperation and constant encouragement that helped us to grow
personally and professionally.
Our heartfelt gratitude and appreciation is also to the administration, the workers and
the lab assistants at department of CSE & IT who were always there in need and
support.
.
JAY KACHHAP
…………………………
MD MANZAR ANSARI
…………………………
SYED SAQLAIN AHMAD
…………………………
(Signatures)
II
ABSTRACT
The project aims at building a machine learning model that will be able to
classify the various hand gestures used in sign language. In this user independent
model, classification machine learning algorithms are trained using a set of image
data collected by us,trained testing is done on a completely different set of data.
For the image dataset, depth images are used, which gave better results than some
of the previous literatures, owing to the reduced pre-processing time. Various
machine learning algorithms are applied on the datasets, including Recurrent
Neural Network (RNN). An attempt is made to increase the accuracy of the RNN
model by pre-training it on the ImageNet dataset. However, a small dataset was
used for pre-training, which gave an accuracy of 85% during training.
III
TABLE OF CONTENTS
CERTIFICATE
I
ACKNOWLEDGEMENT
II
ABSTRACT
III
1. Introduction……………………………………………..
1
2. Literature Review……………………………………….
2
3. Problem Statement……………………………………..
3
4. Methodology used …….…………………………………
4
4.1Working Flow chart ………………………………
5
5. Terminologies …………………………………………..
6
6. Results …………………………………………………..
10
7. Conclusion ……………...…………………………………
11
8. References………………………………………………
12
IV
CHAPTER 01
1. INTRODUCTION
Communication is very crucial to human beings, as it enables us to
express ourselves. We communicate through speech, gestures, body
language, reading, writing or through visual aids, speech being one of
the most commonly used among them. However, unfortunately, for the
speaking and hearing-impaired minority, there is a communication gap.
Visual aids, or an interpreter, are used for communicating with them.
However, these methods are rather cumbersome and expensive, and
can’t be used in an emergency. Sign Language chiefly uses manual
communication to convey meaning. This involves simultaneously
combining hand shapes, orientations and movement of the hands, arms
or body to express the speaker’s thoughts.
Sign Language consists of fingerspelling, which spells out words
character by character, and word level association which involves hand
gestures that convey the word meaning. Fingerspelling is a vital tool in
sign language, as it enables the communication of names, addresses and
other words that do not carry a meaning in word level association. In
spite of this, fingerspelling is not widely used as it is challenging to
understand and difficult to use. Moreover, there is no universal sign
language and very few people know it, which makes it an inadequate
alternative for communication.
RNN (Recurrent Neural Network)
RNNs are used in deep learning and in the development of models that simulate
neuron activity in the human brain. They are especially powerful in use cases
where context is critical to predicting an outcome, and are also distinct from other
types of artificial neural networks because they use feedback loops to process a
sequence of data that informs the final output. These feedback loops allow
information to persist. This effect often is described as memory.
RNN use cases tend to be connected to language models in which knowing the
next letter in a word or the next word in a sentence is predicated on the data that
comes before it.
LSTM
The key concept of the LSTM is cell state or the “memory state” of the network,
which captures information from previous steps. Information is added to the cell
state with several different gates: forget gate, input gate and output gate. Gates can
be thought of as control units that control which data is added to the cell state.
The first important gate of the LSTM is the forget gate. Forget gate processes the
previous hidden state and the current input by applying the sigmoid function,
which maps the final value to the interval between 0 (forget data) and 1 (pass it
through unchanged).
Next, we pass the previous hidden state and the input to the sigmoid of the input
gate and also pass hidden state and current input to the tanh function, followed by
multiplying both sigmoid and tanh output.
All these values are then used to update the cell state, by first multiplying cell state
by vector from the forget gate. This is then pointwise added to the vector from the
input gate, to obtain the new, updated value of the cell state.
ReLU Function (Rectified linear function)
In a neural network, the activation function is responsible for transforming the
summed weighted input from the node into the activation of the node or output for
that input. The rectified linear activation function or ReLU for short is a
piecewise linear function that will output the input directly if it is positive,
otherwise, it will output zero. It has become the default activation function for
many types of neural networks because a model that uses it is easier to train and
often achieves better performance.
1
CHAPTER 02
LITRATURE REVIEW
Table 2.1 Existing research on spam text pre-processing.
PUBLICATION
AUTHOR
OUTCOMES
TITLE
DETAILS
Pratik
Automatic Generation of
Bhatia[1]
sign Language
Santosh
Shail[2]
Ankita
Wadhawan[3]
researchgate.net
Applied research in
computer-based Sign
multimedia ISL Dictionary
To create language barrier
researchgate.net
free learning environment
language interpretation
for deaf students
Indian Sign Language
Regular and irregular signs
recognition system for
researchgate.net
Dynamic Signs
with recurrent neural
network using human key
recognition (Dynamic
Signs).
Sign language recognition
Sang-ki ko[4]
To build online multilingual
To translate the meaning of
reserchgate.net
signs from visual input such
as images and videos.
point detection.
2
CHAPTER 03
PROBLEM STATEMENT
• Boundary detection of the hand gestures is one of the major
problems we faced while preparing this model, therefore we shifted
to mediapipe holistic.
• In OpenCv the processing of image is in the format of BGR but
mediapipe holistic works on RGB, therefore we converted from
BGR to RGB using OpenCv function.
• When we captured the image, sometimes the whole data points
were not recognized as per the model’s requirement and the array
becomes null, to eliminate the problem we replaced the previous
array with the zero array.
CHAPTER 04
METHODOLOGY USED
Our project aims to capture sign language performed by signers on a real-time
basis and interpret the language to produce textual and audio output for the
illiterate. For this, a camera-based approach will be made use of, owing to the ease
of portability and movement that the camera-based method offers over other
techniques.
The video of the signer will be first captured by a camera-enabled device. This
video will then be processed by our application. The video would be divided into a
number of frames which will convert the video into a raw image sequence. This
image sequence will then be processed to initially identify the boundaries. This
will be useful to separate the different body parts being captured by the camera
into two major subparts - head and hands.
4
4.1
Flow Chart of Model
IMAGE
ACQUISITION
Optimizing the
model
Key points are
then passed
through model to
make predictions
IMAGE
PREPROCESSING
Creating a model
using LSTM
FEATURE EXTRACTION
STORING THE
KEYPOINTS
Display the result
Fig:. 4.1 Flowchart of our Model
CONVERTING TO
MODEL BASED
FORMAT
RENDERING
KEYPOINTS
CHAPTER
05
TECHINOLOGIES
OpenCV
OpenCV is an open-source software library for computer vision and machine
learning. The OpenCV full form is Open-Source Computer Vision Library. It was
created to provide a shared infrastructure for applications for computer vision and to
speed up the use of machine perception in consumer products. OpenCV, as a BSDlicensed software, makes it simple for companies to use and change the code. There
are some predefined packages and libraries that make our life simple and OpenCV
is one of them.
PYTHON
Python is a user-friendly language and easy to work with but this advantage comes
with a cost of speed, as Python is slower to languages such as C or C++. So we
extend Python with C/C++, which allows us to write computationally intensive code
in C/C++ and create Python wrappers that can be used as Python modules. Doing
this, the code is fast, as it is written in original C/C++ code (since it is the actual C++
code working in the background) and also, it is easier to code in Python than C/C++.
OpenCV-Python is a Python wrapper for the original OpenCV C++ implementation.
JUPYTER NOTEBOOK (IDE)
JupyterLab is the latest web-based interactive development environment for
notebooks, code, and data. Its flexible interface allows users to configure
and arrange workflows in data science, scientific computing, computational
journalism, and machine learning. A modular design invites extensions to
expand and enrich functionality.
NumPy
NumPy stands for numeric python which is a python package for the computation
and processing of the multidimensional and single dimensional array elements.
Travis Oliphant created NumPy package in 2005 by injecting the features of the
ancestor module Numeric into another module Numarray.
It is an extension module of Python which is mostly written in C. It provides various
functions which are capable of performing the numeric computations with a high
speed.
NumPy provides various powerful data structures, implementing multi-dimensional
arrays and matrices. These data structures are used for the optimal computations
regarding arrays and matrices.
MatplotLib
Matplotlib is a python library used to create 2D graphs and plots by using python
scripts. It has a module named pyplot which makes things easy for plotting by
providing feature to control line styles, font properties, formatting axes etc. It
supports a very wide variety of graphs and plots namely - histogram, bar charts,
power spectra, error charts etc. It is used along with NumPy to provide an
environment that is an effective open source alternative for MatLab. It can also be
used with graphics toolkits like PyQt and wxPython.
MediaPipe
MediaPipe is a Framework for building machine learning pipelines for processing
time-series data like video, audio, etc. This cross-platform Framework works on
Desktop/Server, Android, iOS, and embedded devices like Raspberry Pi and Jetson
Nano.
MediaPipe Python solutions are the easiest for beginners because of the simplicity
of the setup process and the popularity of the Python programming language. The
modularity of the MediaPipe Framework enables customization. But before
plunging into customization, we recommend getting comfortable with various prebuilt solutions. Understand internal APIs associated with them and then tweak the
outputs to create your exciting applications.
CHAPTER
6.1 RESULTS
The model we created using LSTM is trained on our own generated dataset,
and to optimize the model we used an optimizer named ADAM with loss
06
function categorical entropy which is used in multiclass classification
problems.
Now we are successfully recognizing three signs properly with the accuracy
of about 80% which can be later on improved by us using different datasets
and more training epochs.
In this model anyone can create his own signs and label according to their
needs and train the model using that dataset.
10
6.2 CONCLUSION
• The main purpose of sign language detection system is providing a feasible
way of communication between a normal and dumb people by using hand
gesture. The proposed system can be accessed by using webcam or any inbuilt camera that detects the signs and processes them for recognition.
• From the result of the model, we can conclude that the proposed system can
give accurate results under controlled light and intensity.
• Furthermore, custom gestures can easily be added and more the images
taken at different angle and frame will provide more accuracy to the model.
Thus, the model can easily be extended on a large scale by increasing the
dataset.
11
REFERENCES
• Kang, Byeongkeun, Subarna Tripathi, and Truong Q. Nguyen.
"Real-time sign language fingerspelling recognition using
convolutional neural networks from depth map." Pattern
Recognition (ACPR), 2015 3rd IAPR Asian Conference on. IEEE,
2015.
• Farnaz D. Notash and Elahe Elhamki. “Comparing loneliness, depression
and stress in students with hearingimpaired and normal students studying in
secondary schools of Tabriz”. In: International Journal of Humanities and
Cultural Studies February 2016 Special Issue (2016). issn: 2356-5926.
• “The Cognitive, Psychological and Cultural Impact of Communication
Barrier on Deaf Adults”. In: Journal of Communication Disorders, Deaf
Studies Hearing Aids 4 (2 2016). doi: 10.4172/2375-4427.1000164.
• Akash. ASL Alphabet. url: https://www.kaggle.com/grassknoted/aslalphabet. (accessed: 24.10.2018).
• Vivek Bheda and Dianna Radpour. “Using Deep Convolutional Networks
for Gesture Recognition in American Sign Language”. In: CoRR
abs/1710.06836 (2017). arXiv: 1710.06836. url: http://arxiv.org/abs/1710.
06836.
• scikit-learn.org
• deeplearningbooks.org: Convolutional Networks
• ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets
12
Download