IGERT_Poster

advertisement
Models for Automated Sign Language Recognition
1
Nayak ,
1
Sarkar ,
2
Loeding
Sunita
Sudeep
Barbara
1Department of Computer Science & Engineering, 2Department Of Special Education
Problem Statement
Extract sign models from continuous sentences of American Sign
Language.
In the following two sentences, the target word to be extracted is
HOUSE. The frames representing the sign ‘HOUSE’ are marked in
red, and neighboring words are marked in megenta. The frames in
between indicate co-articulation between signs.
fs-JOHN CAN BUY HOUSE FUTURE
The Approach

d ijkr0 
Matching
(Dynamic
Programming)
Features
(Relational
Distributions)
Edge
Sequence
Video
Relational Distribution: Given an edge image of a frame in the video, what
is the probability that the horizontal and vertical distance between any two
edge pixels is x and y, respectively.
Space of Relational Distribution (SoRD) is arrived at by carrying out a
principal component analysis of all the relational distribution images.
• Series of points representing the frames of a sentence were linearly
interpolated in the SoRD space to form a point series that represents the
sentence.
• Frames representing less motion in the sentence are represented by
closer points along the curve than the frames representing larger motion,
that are represented by points farther apart along the curve, thus
achieving
speed invariance.
• We consider two linearly interpolated, time-normalized sentences having
a common sign with different adjacent signs. For example, for the sign,
‘House’, the first two training sentences considered are
“ SHE WOMAN HER HOUSE FIRE”, and
“fs-JOHN CAN BUY HOUSE FUTURE”
• Segments with length of k points from the first sentence are compared
with all the segments of length k-points from the second sentence, and
the best matching segments are found using dynamic programming.
SHE WOMAN HER HOUSE FIRE
Motion
Representation
(Curves in SoRD)
Most of the previous works in sign language recognition use Hidden
Markov Models and use color gloves or magnetic trackers. Our work uses
plain color videos without any wearable aids. It is based on computing a
relational distribution for each image in the video and embedding them in
a lower dimensional space called Space of Relational Distributions.
r0  k
2
(
S
(
r
)

S
(

(
r
)))
 i
j
ij
r  r0

(k m , r0m ,  ijm )  arg min d ijkr
0
k , r0 , ij
Learning Sign
Models
(Signemes)
• The segment extracted from the first sentence is then used to
extract the segments from all other training sentences.
• Signeme is defined as the mean of all the extracted segments.
Advantages:
•Does not use tracking
•Captures the motion required for discriminating the signs in video
sequences without use of color gloves or magnetic trackers.
•Trains the segmentation noise in the sign models themselves, resulting in
practical models with respect to segmentation.
•Takes into account the relative position of the head and the two hands,
which is quite important for sign language recognition.
Tests and Results:
•Signeme models were used to localize the
signs in the new test sentences.
•The models were compared with the
speed-normalized SoRD test curves to find
the position of best match using the
Euclidean matching similar to the signeme
extraction process.
• We report the results in terms of Start Offset and End Offset, i.e. the
difference of frames between the ground truth sign and our retrieved
signeme.
• Ground truth being used from Boston University’s SignStream
annotations.
• Tested with 18 signs.
Some extracted signemes are explicitly shown below: (Ground
truth sign is marked in RED, while the localized signemes are
marked in GREEN)
Sign - 'BUY’ Test sentence - 'JOHN BUY
WHAT?'
Sign - 'FUTURE‘, Test sentence - 'FUTURE JOHN
BUY HOUSE'
Conclusion & Future Work:
• We presented a novel approach for automatic extraction of sign
models from continuous American Sign Language sentences using a
time-normalized, continuous representation of signs and sentences.
• We proposed the concept of signemes that is robust with respect to
coarticulation effects, for modeling signs.
• In future, we plan to expand the number of signs and work towards
signer independence.
Broader Scope:
Our research leads towards increasing the ways people can
communicate with computers. It would help the Deaf to communicate
naturally to other hearing people who do not understand sign language
by translating their signs to plain English words that can be read by
others. Also for the hearing community, it would increase the ways by
which they can communicate with machines. It would help people
interact with computers visually using cameras.
Acknowledgment: This work was supported by the US National
Science Foundation ITR Grant No. IIS 0312993.
Download