Sign Language Recognition System Mr. Usman Latif, Mr. Mubashar Ali khan, Mr. Ammar Rafiq, Miss Amna Yasin Abstract—Hearing impeded individuals consistently utilizes signs to speak with one another. In any case, it is consistently hard for an ordinary people to comprehend gesture based communication. Sign based communication is meaningful and normal way for communication among normal and hearing crippled people (information altogether went on through the hand movement).Thus, an interpreter is expected to comprehend what they talk and speak with us. The entire thought is to manufacture a framework that empowers correspondences between a conference debilitated and ordinary individuals. The Sign Language Recognition (SLR) framework makes an interpretation of the gesture based communication to message. Proposed work plans to assemble a framework that will have the option to arrange the different hand offers of gesture based communication. Proposed framework is able to do recognize alphabets, digits, some keywords and also capable of concatenation of alphabets to make words. In proposed work, set of image data is trained using machine learning algorithms that is Convolutional Neural Network (CNN) and testing is done on runtime. Image processing and Feature Extraction techniques are applied to simplify the images. By using CNN model during the training and validation of data set system gained accuracy up to 80%. By using the SLR system it becomes easy for normal people to understand sign language of hearing impaired people. Index Terms—Convolutional Neural Network (CNN), Concatenation of Alphabets, Sign Language Recognition. I. I NTRODUCTION People express their feelings, express their thoughts, their opinions, and experiences from the people around them “by speech”. They communicate by listening and talking, but there are a lot of people out there unable to hear or speak. It would not be right for us to overlook the individuals who are denied of this valuable blessing. Hearing impaired (HI) individuals find it difficult to convey their message to ordinary individuals. The only communication method available for HI individuals is “sign language”. But the problem is that ordinary individuals cannot understand the sign language. So it is difficult for ordinary people to understand hearing impaired individuals. Using sign language they are limited to their own world. Hard of hearing is a handicap that disable their hearing and make them unfit to hear, while quiet is a handicap that weaken their talking and make them unfit to talk. Both are as it were crippled at their hearing as well as talking, consequently can at present do a lot different things. The main thing that separate them from ordinary people is communication. If there is a route for typical individuals and hard of hearing individuals to communicate, the hard of hearing individuals can undoubtedly live like an ordinary individual. Furthermore, the main way for them to impart is through communication via gestures. While gesture based communication is difficult to understand for ordinary individuals, is as yet getting little consideration from the ordinary individuals. We as the typical individuals will in general overlook the significance of communication via gestures, except if there are friends and family who are hard of hearing. One of the solution for speak with the hard of hearing individuals is by utilizing the administrations of gesture based communication mediator. In any case, the utilization of sign language translator can be exorbitant. Modest arrangement is required so the hard of hearing and ordinary individuals can impart typically. Thusly, researchers need to find a way for the hard of hearing quiet people so they can examine viably with customary person. The disclosure for this is the utilization of Sign Language Recognition System. The proposed work expect to see the gesture based communication, and make a translation of it to the local language through substance or talk. Regardless, developing this structure cost a great deal and is difficult to be applied for ordinary use. Early investigates have known to be powerful in Sign Language Recognition System by using data gloves. However, the critical cost of the gloves and wearable character make it difficult to be promoted . Understanding that, examiners by then endeavor to develop an unadulterated vision Sign Language Recognition Systems. In any case, it is additionally going with inconveniences, especially to follow hands development. The issues of creating communication via gestures range from the picture securing to the classification cycle. Specialists are as yet finding the best technique for the picture securing. Social event pictures utilizing camera gives the challenges of picture pre-handling. In the interim, utilizing dynamic sensor gadget can be expensive. Classification techniques additionally give specialists a few downsides. Wide decision of recognition strategy makes researcher incapable to focus on one best technique. Picking one strategy to be centered around will in general make other technique that may be better suit for Sign Language Recognition, not being tried. Evaluating different strategies makes researcher scarcely creates one technique to its fullest possibilities. This task attempted to eliminate this obstruction between typical what’s more, the hard of hearing people by building up a framework which can make an interpretation of communication through gesture into text. We have proposed a convolutional neural organization model which can perceive the signs with precision. II. R ELATED W ORK Al-Ahdal and Tahir , acquainted the novel’s methodology with planning a SLR framework dependent on EMG sensors with information glove. This technique depends on the indications of electromyography recorded in the possession of the muscles to spread the limits of the expressions of word streams in constant SLR. SLR System is sought after because of its capacities beating the hindrance between the almost totally senseless. At present, powerful SLR is not, at this point accessible world in light of numerous deterrents. Also, as we probably am aware, Sign Language acknowledgment has risen as a need significant examination territories in the field of human PC (HCI). Consequently, this paper presents an outline of much examination is dealing with the reception of Sign Language framework, and the serious framework was isolated by an image talks about photography procedures and survey methods. Shortcomings and shortcomings that add to the framework full usefulness or in any case will be featured by asking the serious issues related with cutting edge frameworks. Next, a novel method to manufacture a SLR framework dependent on Includes EMG sensors and information glove. This technique depends on electromyography signals recorded from hand muscles to share the limits of the names of surges of words in persistent SLR. The proposed program is required to take care of the spelling issue, which will contribute in data created with nonstop sign quality mindfulness program. Iwan Njoto Sandjaja and Nelson Marcos , proposed shading glove strategy that eliminates key highlights from video utilizes a multicolor tracking algorithm. The recognition framework establishes the framework for manual recognition that tends to genuine and current issues by enlisting the hard of hearing network and prompting compelling use. Information number for gesture based communication obtaining the program is a video record number for 5,000 Filipino original copies Frame size 640 x 480 pixels with 15 edge/second. Shaded gloves utilize less shading than hued ones gloves in existing exploration. Framework redistributing is basic Features from the video utilize a multicolor following calculation quicker than the shading following calculation accessible in light of the fact that didn’t utilize a monotonous cycle. Next, the program peruses again observes the estimation of Filipino communication via gestures and preparing the exploratory stage utilizing Markov’s concealed model. The framework utilizes Markov’s shrouded model (HMM) for the preparation and testing stage. Highlight deliveries can follow 92.3 percent everything being equal. The showcase can likewise observe Filipino gesture based communication an aggregate of 85.52 percent with moderate precision. Dibya Jyoti Bora1, Anil Kumar Gupta and Fayaz Ahmad Khan , presents the paper while picking the correct shading space is the most significant issue for shading picture partition measure. Normally L * A * B * and HSV are the most widely recognized chosen shading spaces. In this paper a near examination done between these two shaded territories according to shading picture partition. By estimating their presentation, we take a gander at the boundaries: mse and psnr. Things being what they are, HSV shading space works superior to L * A * B *. Hand and late disclosure PC research, there has been a great deal of work in the previous a hand motion was performed. Numerous new specialist co-ops they offer continuous gadgets and convenient innovation. A considerable lot of these are business the accessible innovation is costly and costly. The reason for this work was to make ongoing other regular activity recognition webcam. Through the investigation cycle we exhibit that we recommend that the framework have the option to distinguish and identify contact 80% precision. The proposed cycle can proceed with created by creating following techniques utilizing a model based following . In , the creators examined the shading structures of hsv space with an accentuation on visual view of assorted variety in Hue, Saturation and pixel intensity of the picture. Eliminate pixel includes by choosing Hue or Quantity as the prevailing resource as far as Saturation number of pixels. The segment that utilizes this strategy gives better ID of articles in the picture contrasted with those created utilizing RGB shading space. In , the creators recommended shading partition they moved toward where they began changing over the rgb picture into a solitary hsv. At that point utilize the majority of Otsu’s obstructing on V channel to get a pleasant fringe from the photograph. The subsequent picture is then separated into KMeans incorporation for ceaseless combination far off locales coming about because of the utilization of Many Otsu block. In the end they made a foundation evacuation and morphological revision. The consequence of this technique is discovered to be good regarding MSE and PSNR values are remembered for the review. In , the creators initially changed over the first picture from RGB structure HSV structure. At that point utilize the move you mean with FELICM rather than Hue, Saturation and Value Components. Last photographs acquired from move with FELICM at blend. The proposed technique shows better execution the level has past calculations. In , the creators proposed another quantization cycle HSV shading space to produce shading histogram once a dark histogram of K-Means combination, working across various size in HSV shading space. Thusly, usage of centroids and number of assortments naturally estimated. The foundation preparing channel says acquainted with effectively dispose of low-weight locales. This a technique is accessible to accomplish high computational speed again the outcomes are shut to public remark. Also, about thusly, can be given in key districts of photographs effectively. In , the creators changed the picture from the rgb shading space to l * a * b * shading space. In the wake of separating the three channels of l * a * b *, one course is chosen by the shading underneath thought. From that point forward, a hereditary calculation is utilized for that channel picture. This strategy is discovered to be viable in isolates complex foundation pictures. In , the picture is changed over from rgb to l * a * b * space, and afterward k implies that the calculation is applied to the outcome picture. The pixels are then named with an alternate picture. In the long run pictures were made that isolated the genuine picture by shading. As a novel consolidating shading picture made by a blend a split is proposed. Here, the shading space for l * a * b * is chosen. At that point there is the consolidated exertion of the KMeans calculation, the sobel channel and the water calculation is utilized for dividing. The outcome is discovered to be acceptable as indicated by MSE and PSNR rates. In [12,] the creators proposed a powerful combination calculation when the consolidating is done in the shading space l * a * b *. Picture division is accessible straightforwardly by setting every pixel and its comparing assortment. The calculation is applied for detachment and picture testing the outcomes plainly show the order of what you like. In , the creators proposed an assortment of ants system corresponding to the shading region of the CIE Lab where CMC separation is utilized to figure the separation between pixels as this separation estimation is accessible to deliver the best outcomes according to the CIE lab shading field. Execution of this technique is contrasted with the MSE boundary and is accessible at fulfill. Asanterabi Malima, Erolozgur, and Mujdatcetin , This methodology contains ventures for portioning the hand district, finding the fingers ,lastly ordering the signs. The calculation is invariant to interpretation, turn, and size of the hand. This calculation can be reached out in various manners to perceive a more extensive arrangement of motions. The division segment of calculation is excessively basic, and should be improved if this method should be utilized in testing working conditions. Solid execution of hand signs acknowledgment methods in an overall setting require managing impediments, transient following for perceiving dynamic motions, just as 3D demonstrating of the hand, which are still generally far off the present status of the workmanship. Mark Batcher , Gripsee is the name of the Robot whose implementation is debated in this paper, robot specifically use to understand an article, manage it, and transfer it from one place to another. It fills in as a multi-skill Robot that can play out various undertakings, robot is used as a services robot. Kevin Gabayan, Steven Lansel , proposed the paper that deals with the lively time sign acknowledgment approach including single sign medium. Model, a sensor collaboration prototyping programming and equipment condition, as of now utilizes a unique time wrapping motion acknowledgment approach including single sign channels. Creator use a five channel accelerometer and whirligig blend board to test translational and rotational expanding speeds, and a microcontroller to achieve simple to advanced transformation also transfer approaching signs. Layout coordinating by means of direct time traveling and lively time traveling are disconnected, just as fortification learning through the algorithm HMM continuously. M. Ebrahim Al-Ahdal and Nooritawati Md Tahir , proposed a paper that represent a diagram of fundamental exploration that dependent on the gesture Language acknowledgment framework, and the formed framework ordered into the gesuture catching strategy and acknowledgment methods is talked about. The qualities and impediments that add to the framework working impeccably or in any case will be featured by summoning serious issues related with the created frameworks. Next, a novel strategy for planning SLR framework dependent on consolidating EMG sensors with an information glove is suggested. This strategy depends on electromyography signals captured from hands for allotting word limits for floods of words in nonstop system. Iwan Njoto Sandjaja and Nelson Marcos , proposed a work in which Sign language acknowledgment framework that detects numbers sets down establishment for hand movement acknowledgment which tends to genuine and latest issues in marking hearing impaired network and prompts pragmatic applications. The contribution of gesture based communication number acknowledgment framework is 5000 Filipino gesture based number video document with 640 x 480 pixels outline size and 15 casing/second. A shading coded gloves utilizes minor shading contrasted and other shading coded gloves in the current exploration. The framework extricates significant highlights from the video utilizing multicolor following calculation which is quicker than existing shading following calculation since it didn’t utilize recursive strategy. After that, framework grasp and perceives the Filipino gesutre based number in preparing and test stage utilizing HMM. The framework utilizes HMM to prepare and test stage. The component extraction can follow 92.3% all things considered. The recognizer additionally could perceive Filipino communication via gestures number with 85.52% normal precision. Noor Adnan Ibraheem and Rafiqul Zaman Khan , proposed a review for different earlier motion acknowledgment approaches is given specific accentuation close by hand signs. An audit of static hand act strategies are clarified with various instruments and calculations applied on signal acknowledgment framework, including connectionist models, HMM, and fuzzy clustering. Difficulties and future enhancement are additionally featured. Archana S. Ghotkar, Rucha Khatal , Sanjana Khupase, Surbhi Asati and Mithila Hadap , proposed different authentic foundation, required, degree and worry of ISL are mentioned. Vision based hand motion acknowledgment framework debated as hand plays crucial correspondence mode. Pondering earlier reported work, various techniques open for hand following, division, include extraction and order are recorded. Vision based framework have difficulties over conventional equipment based methodology; by productive usage of PC vision and example acknowledgment, it is conceivable to take a shot at such framework that will be common and acknowledged, all in all. Paulraj M P, Sazali Yaacob, Mohd Shuhanaz bin Zanar Azalan, Rajkumar Palaniappan , proposed a basic gesture based communication acknowledgment framework that has been created utilizing skin shading division and ANN. The sec- ond invariants highlights removed from the privilege and left hand motion pictures are utilized to build up an organization model. The framework actualized and tried for its legitimacy. Framework shows recognition rate of 92.85% after the testing. Divya Deora1, Nikesh Bajaj,k  presents that every gesture based communication system is prepared to perceive explicit arrangements of gesture that likely yield the gesture in necessary configuration. These frameworks are worked with ground-breaking picture preparing strategies. The communication via gestures acknowledgment frameworks are fit for perceiving a particular arrangement of marking motions and yield the comparing text/sound. The majority of these frameworks include the methods of discovery, division, following, motion acknowledgment and arrangement. Yikai Fang, Kongqiao Wang, Jian Cheng and Hanqing Lu , proposed a solid progressing hand gesture identification technique. In this proposed work, directly off the bat, a specific movement is expected to trigger the hand ID followed by following; by then hand is divided using development and concealing prompts; finally, to break the limitation of perspective extent experienced in most of learning based hand signal methods, the scale-space incorporate revelation is joined into movement affirmation. Applying the proposed procedure to course of picture examining, preliminary outcomes show that our method achieves great execution. J. H. Kim, N. D. Thang, and T. S. Kim , proposed a work in which creator built up a 3D hand movement following and sign acknowledgment framework through an information glove (to be specific the KHU-1 information glove comprising of three trihub accelerometer sensors, one regulator, one Bluetooth). The KHU-1 information glove equipped for sending hand movement gestures to computer through remote correspondence by means of Bluetooth. Additionally we have executed a 3D computerized hand model for hand movement following and acknowledgment. The actualized 3D computerized hand model depends on the kinematic chain hypothesis using ellipsoids and joints. At last, creator have used a standard based calculation to perceive straightforward hand signs to be specific scissor, rock, and paper utilizing the 3D advanced hand model and the KHU-1 information glove. Various starter trial results are introduced in the paper. J. Weissmann and R. Salomon , proposed a paper that investigates the utilization of gestures of hand as a methods for human and PC collaborations for augmented reality applications. For the application, explicit hand signs, for example, ”clench hand”, ”pointer”, and ”triumph sign”, have been characterized. Various exisiting approaches utilize different camera based acknowledgment frameworks, which are fairly expensive and delicate to ecological changes. MIE324 Final Report , Sign Language Recognition Anna Deza ( ) and Danial Hasan ( ) Decemeber second 2018 Word Count: 1993 Penalty: 0%. The objective of this undertaking was to fabricate a neural organization ready to characterize which letter of the American Sign Language (ASL) letter set is being marked, given a picture of a marking hand. This venture is an initial move towards building a potential gesture based communication interpreter, which can take correspondences in communication via gestures and make an interpretation of them into composed and oral language. Such an interpreter would incredibly bring down the boundary for some hard of hearing and quiet people to have the option to all the more likely speak with others in everyday collaborations. This objective is additionally roused by the separation that is felt inside the hard of hearing network. Forlornness and sadness exists in higher rates among the hearing impaired populace, particularly when they are drenched in a meeting world. Enormous obstructions that significantly influence life quality originate from the correspondence disengage between the hearing impaired people. A few models are data hardship, restriction of social associations, and trouble incorporating in the public eye. Most exploration executions for this assignment have utilized profundity maps produced by profundity camera and high goal pictures. The target of this venture was to check whether neural organizations can arrange marked ASL letters utilizing basic pictures of hands taken with an individual gadget, for example, a PC webcam. This is in arrangement with the inspiration as this would make a future usage of a continuous ASL-to-oral/composed language interpreter viable in a regular circumstance. III. P ROPOSED M ETHODOLOGY In proposed methodology a system is developed that takes gestures of user making particular sign through webcam and converts them into text without use of any data glove or sensor-based glove. In this system a Convolutional Neural Network (CNN) model is implemented using both Tensorflow and Keras. Then the model was trained using Keras. A. CNN Model of proposed work B. Convolution Convolution is a way to capture information about the ordering of pixels. The type of convolutions are interested in 2d discrete convolutions, which act like a weighted sliding sum over an area of pixels. For instance, a matrix called a kernel slides across the pixels in an image. At each point, it calculates the weighted sum of the kernels’ values and pixel in chunk of the image. The sum is then put in the first value of the output image. The kernel then slides over one pixel and repeats the process for every pixel in the image.This process incorporates information about a pixels’ neighboring values into its own value. If we compare the original image to the output image, the result looks like a low-budget photoshop filter. It perform different convolutions on input, where every activity utilizes an alternate channel. This outcomes in various component maps. At last, it take these element guides and set up them as the last yield of the convolution layer. Much the same as some other Neural Network, it utilize an activation function to make yield non-direct. On account of a Convolutional Neural Network, the yield of the convolution will be gone through the activation work. This could be the ReLU enactment work. C. Relu Fig. 1. CNN Model of proposed work This model is manufactured utilizing 3 convolutional layers with different sizes advancing from 2 x 2 to 5 x 5, activation function ReLu, and the standard Max Pooling and Dropout. This manufactured model is fed to fully connected layer which yield the output of 26 classes of Alphabets, 10 classes of digits and 7 classes of specific words. The design followed by imagining that the first layer with little kernel size would catch little features, for example, hand layout, finger edges and shadows. The bigger kernel size ideally catches blends of the little features like finger crossing, edges, hand area, etc. An extra activity called ReLU has been utilized after each Convolution activity ReLU refers to Rectified Linear Unit and is a non linear movement. ReLU segments clever movement (applied per pixel) and replaces all negative pixel regards in component map by zero. The purpose behind ReLU is to introduce non-linearity in CNN, since most of this current reality data we would require CNN model to learn would be non-direct (Convolution is a straight activity component shrewd grid augmentation and expansion, so we represent nonlinearity by presenting a non-direct capacity like ReLU). D. Pooling In CNN, pooling is utilized to diminish the spatial size of the convolved include. There are basically two sorts of pooling, for example, max pooling and normal pooling. In max pooling, a window moves over the input matrix and makes the matrix with maximum values of those windows. In average pooling, it is similar to max pooling but uses average instead of maximum value. The window moves according to the stride value. If the stride value is 2 then the window moves by 2 columns to right in the matrix after each operation. In short, the pooling technique helps to diminish the computational force needed to break down the information. E. Classification Fig. 2. Convolution Neural Network Working Here, the completely associated layers will fill in as a classifier on head of these extracted features. They will allocate a likelihood for the item on the picture being what the calculation predicts it is. Adding a Fully-Connected layer is an (ordinarily) humble strategy for learning non-direct blends of the huge level features as addressed by the yield of the convolutional layer. The Fully-Connected layer is learning a maybe non-direct capacity in that space. The fully connected layers gain proficiency with a (perhaps non-direct) work between the significant level highlights given as a yield from the convolutional layers. Over a movement of ages, the model can perceive overpowering and certain lowlevel features in pictures and describe them using the Softmax Classification methodology. C. Error Rate of Alphabets Figure 4 shows the error rate of alphabets. The error rate for alphabets lies between 0% to 50%. Some alphabets have 0% error rate as the system predicts them correctly. The error rate of other alphabets is due to same shape of the signs. The system get confuse to predict some alphabets due to their same shape. IV. R ESULTS By using the CNN model during the traning and validation of data set the system have gained accuracy upto 80%. This system includes 43 classes. Accuracy is different for different classes. Some signs have the recognition rate of 100% while some have the recognition rate of 50% to 90%. A. Recognition and Error Rate 1) Recognition Rate: The Accuracy of each class is calculated by the following formula: Recognition Rate = (Number of correct recognition / Total images) * 100 In this Total images is the No.of images taken as input and the No. of correct recognition is the prediction that the system made correctly. It is then multiplied by 100 to get the percentage. 2) Error Rate: In predective model none of the recognition system recognize the signs with 100% accuracy, as the accuracy depends on many factors. The error rate in our SLR system is calculated by the following formula Error Rate = 100-Recognition rate Fig. 4. Error rate of Alphabets D. Accuracy Rate of Digits Figure 5 shows the accuracy rate of digits. Some digits have the accuracy rate of 100% while some have the accuracy rate of 50% to 80%. This system is fairly good at predicting the correct digits as shown in figure. B. Accuracy Rate of Alphabets Figure 3 shows the accuracy rate of alphabets. Some alphabets have the accuracy rate of 100% while some have the accuracy rate of 50% to 90%. This framework is genuinely acceptable at predicting the right alphabets. Investigating some of confounded letters in order, similar to V and W, we see that the two letters are comparative fit as a fiddle. This discloses to us why the model may confound these two letters, as a W resembles a rehashed V. Fig. 5. Accuracy rate of Digits E. Error Rate of Digits Figure 6 shows the error rate of digits. The error rate for digits lies between 0% to 50%. Some digits have 0% error rate as the system predicts them correctly. Fig. 3. Accuracy rate of Alphabets H. Comparison of Recogniton and Error rate of Alphabets Figure 9 shows the comparison of recogniton and error rate of alphabets. In the Graph, blue line shows the accuracy rate of alphabets which is different for every alphabet. Some alphabets have the accuracy rate of 100% while some have the accuracy rate of 50% to 90%. Red line shows the error rate of alphabets which is between 0% to 50%. Fig. 6. Error rate of Digits F. Accuracy Rate of Specific Words Figure 7 shows the accuracy rate of specific words. Five out of seven specific words have the accuracy rate of 100% while other two specific words have the accuracy rate of 50% and 60%. This system is good at predicting the correct words as shown in figure. Fig. 9. Recogniton and Error rate of Alphabets I. Comparison of Recogniton and Error rate of Digits Figure 10 shows the Comparison of Recogniton and Error rate of digits. In the Graph, blue line shows the accuracy rate of digits which is different for every digit. Some digit have the accuracy rate of 100% while some have the accuracy rate of 50% to 90%. Red line shows the error rate of digits which is between 0% to 50%. Fig. 7. Accuracy rate of Specific Words G. Error Rate of Specific Words Figure 8 shows the error rate of specific words. Only two specific words have the error rate of 40% and 50% while other words have 0% error rate as the system predicts them correctly. Fig. 10. Recogniton and Error rate of Digits J. Comparison of Recogniton and Error rate of Specific Words Figure 11 shows the comparison of recogniton and error rate of specific words. In the Graph, blue line shows the accuracy rate of specific words five out of seven specific words have the accuracy rate of 100%, while only two specific words have error rate of 40% and 50% which is shown by the red line. Fig. 8. Error rate of Words Fig. 11. Recogniton and Error rate of Specific Words V. C ONCLUSION Proposed work aims to make communication simpler between hearing impaired people and normal people by introducing computer in communication path so that signs can be captured, recognized, translated to text and displayed it on screen. Sign Language Reconition (SLR) system is implemented by using OpenCV, Tensorflow, Keras and CNN model.Deep Learning is turning into an exceptionally famous subset of AI because of its significant level of execution across numerous sorts of information. An incredible method to utilize deep learning is to figuring out how to classify pictures to manufacture a convolutional neural Network (CNN). The Keras library in Python makes it really easy to fabricate a CNN. Keras is a ground-breaking and simple to-utilize free open source python for making and evaluating profound learning models. It wraps the powerful numerical count libraries Theano and TensorFlow and grants you to plan neural association models in precise code. The principle goal of proposed work is to plan a framework that makes an interpretation of static hand signs into English text continuously, utilizing webcam of PC. By utilizing Convolutional neural organization the propose framework can perceive static signs upto 80% accuracy. This framework can also be additionally enhanced by making dataset more diverse.