Speech and Handwriti..

Overview of Speech and Grammar
Kevin Lin
Richard Fateman
University of Calif. Berkeley
There has been much research into speech and handwriting recognition. However, there
are limitations to recognition algorithms and the computer will inherently have trouble
with similar letters or words with similar sounds. For example, it is extremely difficult to
differentiate the spoken characters ‘b’, ‘d’, ‘e’, ‘c’, ‘p’, ‘v’it is extremely difficult to
differentiate the written characters ‘s’, and ‘5’ (especially when the 5 is written with 1
stroke). Handwriting recognition and voice recognition can be successfully combined in
a way to dramatically increase accuracy.
Speech Recognition
What is needed for speech recognition of mathematics to be implemented using
Microsoft Speech SDK 5.1? To assist in accuracy, the speech can be limited to a subset
of the English language through the definition of an XML grammar. The grammar
should be restrictive so that non-valid words and phrases can be ruled out, but should be
versatile enough to incorporate a variety of mathematical symbols. Also, the grammar
should not impede the user’s natural dictation. For example, a user can read “(x + y)/(x –
y)” as “quantity x plus y over quantity x minus y”. The software should be able to
determine the proper placement of the parentheses based on the user’s dictation of
“quantity,” and other context or mathematical convention. As an example, if the user
speaks one over two pi, or “1/2” it is unreasonable to interpret that as ½*, as would be
conventional in programming languages. If the user wanted that he would have said /2.
He must have meant 1/(2).
The current version of grammar, math.xml allows numbers from 0 to 99, capital and
lower case English characters, capital and lower case Greek characters, common symbols
such as exclamation mark, and several mathematical functions. The user also has a
choice of dictating bold, italics, or underline, or any combination of the above as a prefix
to any phrase. The modifiers “Upper”, “Uppercase”, “Capital”, and “Big” are used to
delineate capital Greek characters and capital English characters, and the modifiers
Lower, Lower Case, and Small are used to delineate lower case Greek and English
characters. It is optional to use lower-case modifiers when dictating. (i.e. saying “aye”
will output “a”, saying “small a” will output “a”, and saying “capital a” will output “A”).
Numbers is implemented using three lists: digits, teens, and decades. Digits and Teens
are used individually to implement the numbers 0 to 19, and Decades followed by digits
is used to implement digits from 21 to 99. The diagram below outlines the details for the
speech grammar developed to facilitate speech of mathematical symbols.
A javascript file, speech.js has been developed to use math.xml. This javascript also
writes to a output file: C:\temp\testfile.txt, which can be used to interact with SKEME
through file input and output. The user can quit the program by speaking “quit”.
Currently, Microsoft Speech SDK 5.1 does not support alternates with custom grammar.
Our hope is that alternates will become available with future releases. Alternates are
useful because they allow the user to choose a different word if the recognition program
does not recognize the speech correctly. Also, alternates help with the multimodal input
of speech and handwriting by providing additional possibilities to match handwriting and
There are several ways to sidestep the problem with the alternatives. Empirical data on
common mistakes and sounds that are similar can be compiled and a lisp function can
manually assign alternates to the output of speech. This would allow the functionality of
alternatives as stated above; however, the results would not be as accurate and therefore
the recognition accuracy would not improve as much as if the alternates were passed
directly from the speech recognizer. Secondly, custom grammar may not be necessary.
An alternative approach would be to use the built-in grammar and to dynamically alter or
limit the rules. However, the accuracy may be lower with this method because it does not
take advantage of a limited vocabulary.
(don’t use this)
Handwriting Recognition
Handwriting recognition can be implemented by Microsoft Tablet PC Platform SDK 1.0.
However, there may be problems with the lack of software for non-tablet-edition PCs.
Problems in Implementing Hand and Speech Recognition
Here are examples of typical mathematical equations that users may encounter during
everyday use (see attached file). From these examples, we can see that there is a timing
challenge; that is, the speech and handwriting inputs may not occur at exactly the same
time. Also, each speech input may correspond to more than one handwriting input and
vice versa. For example, the user may say “x” after the writing the two strokes, or the
user may say “sin of theta” while writing “sin ”.
One solution maybe be to wait until the user is finished speaking by adding a timeout that
detects both the handwriting and speech modules being idle for a set amount of time (say
1 second). It would then process the data afterwards, attempting to match phrases loosely
based on the time the input occurred. However, this poses a problem because the user
cannot see the result of his input for several seconds unless if he or she speaks really
There may also be occasions where the redundancy in input fails, and the voice
recognition does not agree with the handwriting recognition. In this case, the program
must determine whether to use the speech input, to use the handwriting input, or to ignore
the input altogether. This may be based on logic statements which determine which input
would most likely make sense (for example, if the voice recognition returns “integral t
squared delta” and the handwriting returns “integral t dt”, the output of the handwriting
recognition should overwrite that of the voice recognition).
Microsoft Speech SDK 5.1
Microsoft Tablet PC Platform SDK
…(I did not complete the sources yet)…