Download Phase 2 document

advertisement
Making Kinections
Chris Weeden
Andrew Stohr
Balu Swarna
Bhargav Shivkumar
Table of Contents
Introduction and History.........................................................Page 3
Current Solutions…………………………………………………..Page 4
Goals……………………………………………………………………..Page 6
Quality of Life…………………………………………………………Page 7
Functional Requirements……………………………………….Page 8
User Profile/ Deployment Environment/Future………Page 11
Introduction and History
Languages have helped mankind to communicate effectively over centuries.
Though oral communication constitutes a major portion of communication, body
language helps to emphasize and enhance the expression and thus enriches the
experience. But what if body language is your only means of communication?
In order for deaf, hard of hearing, or speech impaired people to communicate
via telephone with hearing people, they must use a Video Relay Service (VRS.) A VRS
is a service that involves relaying through a sign language interpreter, all
communications to and from a deaf person. These services are useful but are also
expensive to taxpayers (in the United States) because a certified sign language
interpreter must be employed for each simultaneous call.
An alternative form of communication between deaf and hearing people is a
TTY or Teletypewriter. This is a tool that a deaf person can use to type out
messages to a hearing person, and an operator can read their messages. The
operator then transcribes into text the hearing person’s response. This form of
communication is not ideal either as it requires the deaf person to communicate in
English, which is not the primary language of most deaf individuals.
Imagine a Kinect-based sign language recognition system, which could
interpret standard sign language into English text, and perhaps text then to speech.
This would be a multi-use application for the deaf, to include many forms of sign
language and gesturing.
There are many types of users for this system. First and foremost would be
individuals who are deaf, hard of hearing, or in some other way speech impaired.
Second, the system also helps anyone who wishes to communicate with someone
when they don’t understand sign language themselves. For instance, this could be a
teacher trying to teach hearing impaired students, or a doctor needing to treat a
patient.
The system would improve the users’ quality of life by allowing them to
communicate with people who can hear, using sign language. It also allows other
parties to communicate back with the hearing impaired even when they may not
know sign language. Arguably the most valuable outcome, which this system would
produce, would be the preservation of the deaf individual’s ability to communicate
using ASL. Deaf culture cherishes and is generally incredibly prideful of ASL and
when deaf individuals must use English to communicate with hearing individuals,
they are being forced to abandon their primary language and in favor of English – a
spoken language which is foreign and unnatural to them.
Additionally, interactions that would otherwise be cumbersome and would
require a human translator would be much easier. The next time a hearing impaired
person walks into a doctor’s office, the doctor will turn on their own system, and the
patient will be able to communicate freely and smoothly with the doctor.
Current Solutions
There are already some current solutions to the problem, and teams working
on solutions to this communication problem. In the past there have been attempts to
use electronic gloves to gather data related to hand gestures. Although these gloves
may provide accurate information, they are very expensive for a normal consumer
or small business. They are also very inefficient in recognizing facial expressions
and it cannot handle multiple person interaction at the same time. It is for this
reason that a computer-vision sign language recognition system would be more
ideal.
At the moment the current best solution has been from Microsoft. Microsoft,
the maker of Kinect, have taken the problem into their own hands, and spent their
own money, to try to find a solution using their own device. By using the Kinect,
there can be an inexpensive and easy way for obtaining depth and skeletal
information, along with hand gestures, posture, and facial expressions. Microsoft
Research Asia has teamed up with researchers at Institute of Computing Technology
at the Chinese Academy of Sciences in order to do so.
The project team that is currently working on the problem is called The
Kinect Sign Language Translator project. Currently their solution captures the
gestures, while machine learning and pattern recognition programming help
interpret the meaning. The system is capable of capturing a given conversation from
both sides: a deaf person who is showing signs and a person who is speaking. Visual
signs are converted to written and spoken translation rendered in real-time while
spoken words are turned into accurate visual signs. The system consists of two
modes, the first of which is translation mode, which translates sign language into
text or speech. This includes isolated word recognition and sentence recognition.
The raising and putting down of hands is defined as the “start” and the “end” of each
sign language word to be recognized. By tracking the position of the user’s hands,
many signs can be detected instantly by using the research group’s 3D trajectory
matching algorithm. The second mode is communication mode, in which a nonhearing impaired person can communicate with the signer through an avatar. They
speak to the system, and the avatar will sign what they spoke. This can also be done
via typing it into the Kinect software.
The downsides are that it currently takes five people to establish the
recognition patterns for just one word. Only 300 Chinese sign language words have
been added out of a total 4000. As you can see there is still a long road ahead
towards mastering this problem.
The hope is that with an implementation of a system similar to this one, the
360 million people world wide with hearing problems, will be able to use a new
system like the ones mentioned above in order to make the quality of life better for
everyone who communicates with the hearing impaired, and improve the quality of
life of the hearing impaired using it.
Goals
Some of the goals for this system are:
•
To create a UI that is simple and easy to use. The hope for this project
is that anyone with a hearing disability will have the chance to combat
the issue through the use of the Kinect. This means that age and
technological skills shouldn’t have an impact on the use of the
program, it should be easy for everyone.
•
To acknowledge not only hand gestures, but facial expressions as well.
A lot of emphasis is put on facial expressions and without them; you
lose part of the language. Therefore, its important to include these in a
conversation in order to fully understand what’s being said.
•
To quickly and efficiently translate languages to/from ASL. It is
difficult to have a conversation when there is a lag in responses. Even
more so, it could get frustrating. The purpose of this project is to
improve a users quality of life. However, if the program lags, it may
make the users life more difficult, even to the point where they wont
want to use the system.
Quality of Life
Of course the system is being created with hopes to improve the quality of
life. But what exactly will it improve? There are many applications for the Kinectbased sign language. Take a school for example. It is proven that kids with a hearing
disability have a much harder time learning in school. But what if there was
something affordable, which could be used by teachers to give these students more
of an equal opportunity? This would not only help hearing impaired kids to learn
more effectively, but it would also give them more confidence in and outside the
classroom. Another situation the Kinect will help in is at a doctor’s office. If someone
with a hearing disability were to get sick, it may be difficult for him or her to express
the symptoms unless the doctor knows sign language. This may cause a miss
diagnosis by the doctor, if the patient was even willing to go into the office. With the
Kinect, it would be much easier for the patient to express how they are feeling. It
will also help them feel more comfortable at the doctor’s office, which means it is
more likely for them to visit.
Expected features from the system (Functional Requirements)
This system is being built in order to make life easier for the hearing and speech
impaired while going about their daily chores. The design of this system must keep
in mind the target users, and must be built accordingly so that it is more of a boon
than a bane. Some of the features expected out of the system can be as follows:

The system must make use of a Microsoft Kinect device or something similar
to create an interactive application to convert American Sign Language to
Normal speech and vice versa.

There must be a provision to convert input text to Sign language as well. This
text needs to be input in the system through a touch pad on the screen or an
attached keyboard. Input text will be in plain English and emphasis is to be
laid on easy and fast input as this must not be a deterrent to use the system.

The system must have the capability to detect start of signing either when
the user starts to use sign language or when a particular start sign is used.
This start sign must be universal so that people new to the system can easily
pick it up.

There will be two vantage points to the system:
a. The point of view of the disabled person.
b. The point of view of the unimpaired person who does not know sign
language.
It follows from this that there must be 2 screens, which relay the working of
the application to both parties, and consequently giving them the feedback
that the system is working as it should be.

The screen, let us call it SignView, that is pointed at the disabled person must
relay the live camera feed as well as the text that is being construed from
their signing. This feed must be fast enough and not suffer unnecessary lag
causing the conversation to be interrupted. This visual display is to be used
when the hearing/speech impaired person is the “speaker”.

SignView must have a listen mode, which is to be used when the opposite
party is saying something. In this mode, the screen must relay, through
animations, the relevant signs for what the opposite party is saying. It must
also textually display the transcript of what is being said for further ease of
use.

SignView must have an indicator flag/ animation, which would signal to the
impaired user that the opposite party has something to say or signal when
the opposite party interrupts the flow for some reason.

The screen that is pointed to the unimpaired user, let us call it SpeechView,
will also have similar functionalities to help the user understand what the
impaired person is trying to communicate. There must be a textual script that
is automatically typed on the screen as the impaired user signs the relevant
conversational portions.

In addition to the textual display, the SpeechView must also have an audio
that will be speak out the converted signs. This audio option on the
SpeechView must be mutable if the unimpaired user so desires. The speed of
sign to audio conversion must be seamless to ensure a smooth conversation.

In “Speak” mode, the SpeechView should listen to what the unimpaired
person has to say and convert that to text, which is to be displayed in the
SpeechView as well as into signs, which will be displayed in the SignView.

Toggling between Speak mode and Listen mode in the SpeechView must be
based on the voice of the unimpaired user. When the user is silent, the
SpeechView must be in listen mode and when the user speaks up, it must
automatically toggle speak mode.

This toggling between Speak and listen modes in SpeechView must also be
correlated to the SignView. When SpeechView is in Listen mode, SignView
must be in Speak mode and vice versa.

The SpeechView must have an interface to control this toggling as well as
input of text in case the unimpaired user wants prefers to have manual
control on the operation.

The UI design of both these interfaces must be done bearing in mind the
simplicity of the operation of the system. Text and signs must be large
enough to see clearly and must be presented in a pleasing manner. Bright
and jarring colors must not be used except for highlighting certain key
areas/focus points as and when the situation calls for it.

The Kinect device being used must be able to focus on the hand signs as well
as the facial expressions of the impaired user as facial expressions are an
integral part of any sign language.

Interactive screen displays will make the impaired user much more
comfortable with the system and efforts must be taken towards making them
feel that the system is here to help them. This includes putting across errors
on their part in a more gentle way.

There are no budgetary constraints for the system subject to the complete
user satisfaction and ease of use.
This summarizes the key requirements of the Kinect based Sign Language
recognition system. The designers are free to contribute any suggestions/additions
to this system based on their personal interactions with hearing / speech impaired
people. Valid and useful ideas can be incorporated into the system based on
customer approval.
User Profile
The main end users of the system are people with hearing impairment. The age and
technical skills of the users should not be criteria while designing the system. This is
because the two main applications of the system are in classroom and in the
doctor’s office, which can involve users from various age groups. In the classroom
environment, this Kinect based system is used to teach students with hearing
impairment; in this case users are at both ends of the system. At a broader
perspective the system is always used as a bridge between a person who can
understand sign language and a person who cannot understand it.
Deployment Environment
As of now the target work area of this project is confined to a room
(classroom/doctor’s office). The system would require plenty of space in front of the
Kinect device in order for there to be space for the hearing impaired person to sign.
The lighting of the room also has to be taken into consideration. If the room is
poorly lit, the Kinect will not be able to register movements that the signer is
making, along with their posture and facial expressions. In order to make portability
easier, the system should be designed in such a way that it works on a normal
desktop computer. Take into consideration a wide variety of desktop computers
available today and compatibility should not be an issue.
Future Plans and Expandability
The future plan is to transform this stationary system that is confined in a room to a
portable device. This opens up doors to application of this project in many new
scenarios. Some of them include ticket counters, bill desks at super markets, banks
and so on!
Download