Models for Automated Sign Language Recognition 1 Nayak , 1 Sarkar , 2 Loeding Sunita Sudeep Barbara 1Department of Computer Science & Engineering, 2Department Of Special Education Problem Statement Extract sign models from continuous sentences of American Sign Language. In the following two sentences, the target word to be extracted is HOUSE. The frames representing the sign ‘HOUSE’ are marked in red, and neighboring words are marked in megenta. The frames in between indicate co-articulation between signs. fs-JOHN CAN BUY HOUSE FUTURE The Approach d ijkr0 Matching (Dynamic Programming) Features (Relational Distributions) Edge Sequence Video Relational Distribution: Given an edge image of a frame in the video, what is the probability that the horizontal and vertical distance between any two edge pixels is x and y, respectively. Space of Relational Distribution (SoRD) is arrived at by carrying out a principal component analysis of all the relational distribution images. • Series of points representing the frames of a sentence were linearly interpolated in the SoRD space to form a point series that represents the sentence. • Frames representing less motion in the sentence are represented by closer points along the curve than the frames representing larger motion, that are represented by points farther apart along the curve, thus achieving speed invariance. • We consider two linearly interpolated, time-normalized sentences having a common sign with different adjacent signs. For example, for the sign, ‘House’, the first two training sentences considered are “ SHE WOMAN HER HOUSE FIRE”, and “fs-JOHN CAN BUY HOUSE FUTURE” • Segments with length of k points from the first sentence are compared with all the segments of length k-points from the second sentence, and the best matching segments are found using dynamic programming. SHE WOMAN HER HOUSE FIRE Motion Representation (Curves in SoRD) Most of the previous works in sign language recognition use Hidden Markov Models and use color gloves or magnetic trackers. Our work uses plain color videos without any wearable aids. It is based on computing a relational distribution for each image in the video and embedding them in a lower dimensional space called Space of Relational Distributions. r0 k 2 ( S ( r ) S ( ( r ))) i j ij r r0 (k m , r0m , ijm ) arg min d ijkr 0 k , r0 , ij Learning Sign Models (Signemes) • The segment extracted from the first sentence is then used to extract the segments from all other training sentences. • Signeme is defined as the mean of all the extracted segments. Advantages: •Does not use tracking •Captures the motion required for discriminating the signs in video sequences without use of color gloves or magnetic trackers. •Trains the segmentation noise in the sign models themselves, resulting in practical models with respect to segmentation. •Takes into account the relative position of the head and the two hands, which is quite important for sign language recognition. Tests and Results: •Signeme models were used to localize the signs in the new test sentences. •The models were compared with the speed-normalized SoRD test curves to find the position of best match using the Euclidean matching similar to the signeme extraction process. • We report the results in terms of Start Offset and End Offset, i.e. the difference of frames between the ground truth sign and our retrieved signeme. • Ground truth being used from Boston University’s SignStream annotations. • Tested with 18 signs. Some extracted signemes are explicitly shown below: (Ground truth sign is marked in RED, while the localized signemes are marked in GREEN) Sign - 'BUY’ Test sentence - 'JOHN BUY WHAT?' Sign - 'FUTURE‘, Test sentence - 'FUTURE JOHN BUY HOUSE' Conclusion & Future Work: • We presented a novel approach for automatic extraction of sign models from continuous American Sign Language sentences using a time-normalized, continuous representation of signs and sentences. • We proposed the concept of signemes that is robust with respect to coarticulation effects, for modeling signs. • In future, we plan to expand the number of signs and work towards signer independence. Broader Scope: Our research leads towards increasing the ways people can communicate with computers. It would help the Deaf to communicate naturally to other hearing people who do not understand sign language by translating their signs to plain English words that can be read by others. Also for the hearing community, it would increase the ways by which they can communicate with machines. It would help people interact with computers visually using cameras. Acknowledgment: This work was supported by the US National Science Foundation ITR Grant No. IIS 0312993.