Project Report on SIGN LANGUAGE RECOGNITION Submitted in partial fulfillment of the requirements for the award of the degree of BACHELOR OF TECHNOLOGY in CSE & IT by JAY KACHHAP (19030445002) MD MANZAR ANSARI (19030445004) SYED SAQLAIN AHMAD (19030485005) Under the supervision Of Prof. Panjeet Kumar Lenka (Department of IT) DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING BIT SINDRI, DHANBAD, JHARKHAND INDIA, PIN-828123 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.I.T SINDRI, DHANBAD CERTIFICATE This is to certify that the dissertation bound herewith is an authentic record of project work on “Sign Language Recognition” carried out by the following students under our guidance and supervision of partial fulfilment of the requirement for the award of B. Tech degree in CSE & IT (session: 2019 - 2023) of B. I. T. Sindri, Dhanbad under the affiliation of Jharkhand University of Technology, Jharkhand. STUDENTS NAME REGISTRATION NO. JAY KACHHAP 19030445002 MD MANZAR ANSARI 19030445004 SYED SAQLAIN AHMAD 19030485005 (Signature of HOD) (Signature of Supervisor) Dept. of CSE & IT I ACKNOWLEDGEMENT It gives us immense pleasure to express our heartiest and sincere gratitude, indebtedness to our supervisor and mentor, Mr. PANJEET LENKA, Assistant Professor, Department of Information Technology, B.I.T. Sindri for his cooperation and tireless efforts in executing this project. His firm determination always boosted confidence in us. It was his constant hard work that helped us in giving this project a successful outcome. We are extremely grateful to and would like to add our deep sense of gratitude to, Dr.S.C.Dutta, Head of the Department, Information Technology, B.I.T. Sindri for his suggestions, cooperation and constant encouragement that helped us to grow personally and professionally. Our heartfelt gratitude and appreciation is also to the administration, the workers and the lab assistants at department of CSE & IT who were always there in need and support. . JAY KACHHAP ………………………… MD MANZAR ANSARI ………………………… SYED SAQLAIN AHMAD ………………………… (Signatures) II ABSTRACT The project aims at building a machine learning model that will be able to classify the various hand gestures used in sign language. In this user independent model, classification machine learning algorithms are trained using a set of image data collected by us,trained testing is done on a completely different set of data. For the image dataset, depth images are used, which gave better results than some of the previous literatures, owing to the reduced pre-processing time. Various machine learning algorithms are applied on the datasets, including Recurrent Neural Network (RNN). An attempt is made to increase the accuracy of the RNN model by pre-training it on the ImageNet dataset. However, a small dataset was used for pre-training, which gave an accuracy of 85% during training. III TABLE OF CONTENTS CERTIFICATE I ACKNOWLEDGEMENT II ABSTRACT III 1. Introduction…………………………………………….. 1 2. Literature Review………………………………………. 2 3. Problem Statement…………………………………….. 3 4. Methodology used …….………………………………… 4 4.1Working Flow chart ……………………………… 5 5. Terminologies ………………………………………….. 6 6. Results ………………………………………………….. 10 7. Conclusion ……………...………………………………… 11 8. References……………………………………………… 12 IV CHAPTER 01 1. INTRODUCTION Communication is very crucial to human beings, as it enables us to express ourselves. We communicate through speech, gestures, body language, reading, writing or through visual aids, speech being one of the most commonly used among them. However, unfortunately, for the speaking and hearing-impaired minority, there is a communication gap. Visual aids, or an interpreter, are used for communicating with them. However, these methods are rather cumbersome and expensive, and can’t be used in an emergency. Sign Language chiefly uses manual communication to convey meaning. This involves simultaneously combining hand shapes, orientations and movement of the hands, arms or body to express the speaker’s thoughts. Sign Language consists of fingerspelling, which spells out words character by character, and word level association which involves hand gestures that convey the word meaning. Fingerspelling is a vital tool in sign language, as it enables the communication of names, addresses and other words that do not carry a meaning in word level association. In spite of this, fingerspelling is not widely used as it is challenging to understand and difficult to use. Moreover, there is no universal sign language and very few people know it, which makes it an inadequate alternative for communication. RNN (Recurrent Neural Network) RNNs are used in deep learning and in the development of models that simulate neuron activity in the human brain. They are especially powerful in use cases where context is critical to predicting an outcome, and are also distinct from other types of artificial neural networks because they use feedback loops to process a sequence of data that informs the final output. These feedback loops allow information to persist. This effect often is described as memory. RNN use cases tend to be connected to language models in which knowing the next letter in a word or the next word in a sentence is predicated on the data that comes before it. LSTM The key concept of the LSTM is cell state or the “memory state” of the network, which captures information from previous steps. Information is added to the cell state with several different gates: forget gate, input gate and output gate. Gates can be thought of as control units that control which data is added to the cell state. The first important gate of the LSTM is the forget gate. Forget gate processes the previous hidden state and the current input by applying the sigmoid function, which maps the final value to the interval between 0 (forget data) and 1 (pass it through unchanged). Next, we pass the previous hidden state and the input to the sigmoid of the input gate and also pass hidden state and current input to the tanh function, followed by multiplying both sigmoid and tanh output. All these values are then used to update the cell state, by first multiplying cell state by vector from the forget gate. This is then pointwise added to the vector from the input gate, to obtain the new, updated value of the cell state. ReLU Function (Rectified linear function) In a neural network, the activation function is responsible for transforming the summed weighted input from the node into the activation of the node or output for that input. The rectified linear activation function or ReLU for short is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero. It has become the default activation function for many types of neural networks because a model that uses it is easier to train and often achieves better performance. 1 CHAPTER 02 LITRATURE REVIEW Table 2.1 Existing research on spam text pre-processing. PUBLICATION AUTHOR OUTCOMES TITLE DETAILS Pratik Automatic Generation of Bhatia[1] sign Language Santosh Shail[2] Ankita Wadhawan[3] researchgate.net Applied research in computer-based Sign multimedia ISL Dictionary To create language barrier researchgate.net free learning environment language interpretation for deaf students Indian Sign Language Regular and irregular signs recognition system for researchgate.net Dynamic Signs with recurrent neural network using human key recognition (Dynamic Signs). Sign language recognition Sang-ki ko[4] To build online multilingual To translate the meaning of reserchgate.net signs from visual input such as images and videos. point detection. 2 CHAPTER 03 PROBLEM STATEMENT • Boundary detection of the hand gestures is one of the major problems we faced while preparing this model, therefore we shifted to mediapipe holistic. • In OpenCv the processing of image is in the format of BGR but mediapipe holistic works on RGB, therefore we converted from BGR to RGB using OpenCv function. • When we captured the image, sometimes the whole data points were not recognized as per the model’s requirement and the array becomes null, to eliminate the problem we replaced the previous array with the zero array. CHAPTER 04 METHODOLOGY USED Our project aims to capture sign language performed by signers on a real-time basis and interpret the language to produce textual and audio output for the illiterate. For this, a camera-based approach will be made use of, owing to the ease of portability and movement that the camera-based method offers over other techniques. The video of the signer will be first captured by a camera-enabled device. This video will then be processed by our application. The video would be divided into a number of frames which will convert the video into a raw image sequence. This image sequence will then be processed to initially identify the boundaries. This will be useful to separate the different body parts being captured by the camera into two major subparts - head and hands. 4 4.1 Flow Chart of Model IMAGE ACQUISITION Optimizing the model Key points are then passed through model to make predictions IMAGE PREPROCESSING Creating a model using LSTM FEATURE EXTRACTION STORING THE KEYPOINTS Display the result Fig:. 4.1 Flowchart of our Model CONVERTING TO MODEL BASED FORMAT RENDERING KEYPOINTS CHAPTER 05 TECHINOLOGIES OpenCV OpenCV is an open-source software library for computer vision and machine learning. The OpenCV full form is Open-Source Computer Vision Library. It was created to provide a shared infrastructure for applications for computer vision and to speed up the use of machine perception in consumer products. OpenCV, as a BSDlicensed software, makes it simple for companies to use and change the code. There are some predefined packages and libraries that make our life simple and OpenCV is one of them. PYTHON Python is a user-friendly language and easy to work with but this advantage comes with a cost of speed, as Python is slower to languages such as C or C++. So we extend Python with C/C++, which allows us to write computationally intensive code in C/C++ and create Python wrappers that can be used as Python modules. Doing this, the code is fast, as it is written in original C/C++ code (since it is the actual C++ code working in the background) and also, it is easier to code in Python than C/C++. OpenCV-Python is a Python wrapper for the original OpenCV C++ implementation. JUPYTER NOTEBOOK (IDE) JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design invites extensions to expand and enrich functionality. NumPy NumPy stands for numeric python which is a python package for the computation and processing of the multidimensional and single dimensional array elements. Travis Oliphant created NumPy package in 2005 by injecting the features of the ancestor module Numeric into another module Numarray. It is an extension module of Python which is mostly written in C. It provides various functions which are capable of performing the numeric computations with a high speed. NumPy provides various powerful data structures, implementing multi-dimensional arrays and matrices. These data structures are used for the optimal computations regarding arrays and matrices. MatplotLib Matplotlib is a python library used to create 2D graphs and plots by using python scripts. It has a module named pyplot which makes things easy for plotting by providing feature to control line styles, font properties, formatting axes etc. It supports a very wide variety of graphs and plots namely - histogram, bar charts, power spectra, error charts etc. It is used along with NumPy to provide an environment that is an effective open source alternative for MatLab. It can also be used with graphics toolkits like PyQt and wxPython. MediaPipe MediaPipe is a Framework for building machine learning pipelines for processing time-series data like video, audio, etc. This cross-platform Framework works on Desktop/Server, Android, iOS, and embedded devices like Raspberry Pi and Jetson Nano. MediaPipe Python solutions are the easiest for beginners because of the simplicity of the setup process and the popularity of the Python programming language. The modularity of the MediaPipe Framework enables customization. But before plunging into customization, we recommend getting comfortable with various prebuilt solutions. Understand internal APIs associated with them and then tweak the outputs to create your exciting applications. CHAPTER 6.1 RESULTS The model we created using LSTM is trained on our own generated dataset, and to optimize the model we used an optimizer named ADAM with loss 06 function categorical entropy which is used in multiclass classification problems. Now we are successfully recognizing three signs properly with the accuracy of about 80% which can be later on improved by us using different datasets and more training epochs. In this model anyone can create his own signs and label according to their needs and train the model using that dataset. 10 6.2 CONCLUSION • The main purpose of sign language detection system is providing a feasible way of communication between a normal and dumb people by using hand gesture. The proposed system can be accessed by using webcam or any inbuilt camera that detects the signs and processes them for recognition. • From the result of the model, we can conclude that the proposed system can give accurate results under controlled light and intensity. • Furthermore, custom gestures can easily be added and more the images taken at different angle and frame will provide more accuracy to the model. Thus, the model can easily be extended on a large scale by increasing the dataset. 11 REFERENCES • Kang, Byeongkeun, Subarna Tripathi, and Truong Q. Nguyen. "Real-time sign language fingerspelling recognition using convolutional neural networks from depth map." Pattern Recognition (ACPR), 2015 3rd IAPR Asian Conference on. IEEE, 2015. • Farnaz D. Notash and Elahe Elhamki. “Comparing loneliness, depression and stress in students with hearingimpaired and normal students studying in secondary schools of Tabriz”. In: International Journal of Humanities and Cultural Studies February 2016 Special Issue (2016). issn: 2356-5926. • “The Cognitive, Psychological and Cultural Impact of Communication Barrier on Deaf Adults”. In: Journal of Communication Disorders, Deaf Studies Hearing Aids 4 (2 2016). doi: 10.4172/2375-4427.1000164. • Akash. ASL Alphabet. url: https://www.kaggle.com/grassknoted/aslalphabet. (accessed: 24.10.2018). • Vivek Bheda and Dianna Radpour. “Using Deep Convolutional Networks for Gesture Recognition in American Sign Language”. In: CoRR abs/1710.06836 (2017). arXiv: 1710.06836. url: http://arxiv.org/abs/1710. 06836. • scikit-learn.org • deeplearningbooks.org: Convolutional Networks • ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets 12