Using Speech in Logo Programming

advertisement
Using Speech Input and Output in Logo Programming
Peter Tomcsányi
Department of Infromatics Education, Faculty of Mathematics, Physics and Informatics,Comenius
University, Bratislava, Slovak Republic
tomcsanyi@fmph.uniba.sk
Abstract
Speech technology is a quickly developing new technology. In our new Logo implementation called Imagine
(http://www.logo.com/catalogue/titles/imagine/index.html) we wanted to make this new technology available
for children. In this article we describe how speech input and output can be used in Logo programs to make
them more interesting for all kinds of users.
Keywords: Speech Output, Speech Input, Speech Engine, Imagine, Logo
1. Introduction
Imagine is a new Logo implementation (see Blaho, Kalas, Tomcsányi (1999) and Blaho, Kalas (2001)).
During its development we were influenced by number of new computer technologies. Speech engines for
Windows are such a new technology. Although speech-related technology is around for some years, only
recently it becomes to be available for a broader audience of users and also software developers.
In Windows there is a defined standard for interfacing between an application and a speech engine called
SAPI. Using this software interface any program can use the functionality of a third party speech engine,
which implements speech input and output functionality.
Imagine is able to co-operate with any SAPI 4.0 compliant speech engine. Some of the engines can be found
on the Internet for free, some of them can be bought. In Windows 2000 the output engine is already installed
automatically. Sometimes when you get a game using speech output you get also a speech engine even if you
do not know about it. Imagine gives the power of these engines to the Logo programmer.
Most speech engines are English, but there are also engines for other major languages like French, Spanish,
Russian or German. Speech engines for languages spoken by relatively small number of people are not
available yet, but this may change in next few years.
2. Speech output
Speech output is called also Text to Speech conversion. It turns written text into spoken text. Imagine
implements a Say command to use speech output and SetVoice to select a voice (as there may be
installed several voices in several languages at the same time).
Speech output can be used in several ways. When saying constant texts then the same effect could be
achieved by using a WAV file containing the text. The power of speech input is utilised better when it is
used to say variable texts. Then it brings something new comparing to using WAV files.
Talking dice
This example is based on the Web dice project, which can be found at:
http://www.logo.com/imagine/project_gallery/dice.HTM
219
Let's create a turtle and give it a dice shape. The shape is an animated one, so we must set it to manual
animation mode because we do not want the turtle to animate automatically.
Then we define the onClick event of this turtle to change the frame items randomly 3 to 8 times and play a
short sound each time and finally the announce the resulting frame item number:
repeat 3 + random 6 [setframeitem 1 + random 6 wait 20 play
I115 T120 L4 O2 32F]] ( Say [You threw] frameitem )
[S0
So now the project is ready - each time the dice is clicked the dice is thrown and the speech output
announces the thrown number.
3. Speech input
The more exciting thing is speech input. In general there are two levels of speech recognition.
The Command and Control functionality accepts a fixed set of voice commands - in Imagine we name it
the voice menu. When the speech engine recognises a spoken command then it notifies the application and
the application can react accordingly.
In Continuous Dictation the speech engine tries to recognise each spoken word and transform it to written
text.
Continuous dictation demands much more computing power and is much more sensitive to the way how
each person speaks. A period of training is usually needed to adapt the engine to each individual speaker.
Therefore in Imagine we implemented only the use of command and control speech recognition. The
implementation enables the user to add recognition of spoken commands to any Logo program.
Speech input in Logo? So let's command the turtle!
The easiest thing to start with is trying to command the turtle by voice. It means to create a voice menu for
basic commands.
In Imagine each page can have its own voice menu. The menu is active while the particular page is shown.
Another voice menu (called mainVoiceMenu) is defined for the whole Main Window and it is active all the
time in addition to the voice menu of the active page.
220
A voice menu in Imagine is a list consisting of pairs:
[phrase1 action1 phrase2 action2 ...]
Each phrase is a word or list and each action is a list of Logo instructions to be executed when the phrase is
recognised.
To implement the voice menu for basic turtle movements we will define this voice menu:
[left [lt 45] right [rt 45] forward [fd 50] back [bk 50] [pen up]
[pu] [pen down] [pd]]
The menu can be set using a command:
page1'setVoiceMenu [left [lt 45] right [rt 45] forward [fd 50] back
[bk 50] [pen up] [pu] [pen down] [pd]]
Or it can be written into the "change me" dialogue of the current page (without the outer brackets):
Our experience with both kids and adults shows that using even such a simple application of voice
commands is very interesting for them and they feel to be much more actually controlling the turtle than
when they must write the commands. And it is also more fun for them.
Even more fun is usually when commands to start and stop a continuous movement are added to the menu.
Then the turtle can start moving and the user can modify its path by left and right commands. The forward
and back commands are not needed anymore. Also adding colour and line thickness commands are
interesting.
We define two helper procedures:
Page1'to startMoving
if done? "move
[(every 30 [fd 1] "move)]
end
Page1'to stopMoving
cancel "move
end
Then we can define a longer voice menu:
221
[[red pen] [setpc "red] [black pen] [setpc "black]
[big pen] [setpw 10][small pen] [setpw 1] right [rt 45]
left [lt 45] [pen up] [pu] [pen down] [pd] clean [clean]
start [startMoving] stop [stopMoving]]
Handling command inputs
In the above examples we can also see one of the constraints of speech menus - spoken phrases cannot
include variable parts - variable inputs to commands. The command and control speech engine interface
cannot recognise parameters given to commands - it can recognise just exact phrases. This disadvantage can
be compensated in several ways, for example:

Define more voice commands doing the same thing with different parameters:
[[big forward] [fd 100] forward [fd 50]
[small forward] [fd 10]]

Define special voice commands for changing the parameter for all subsequent commands of the same
kind:
[[angle 45] [make "angle 45] [angle 90] [make "angle 90]
left [lt :angle] right [rt :angle]]

Define an alternative (not speech) way of setting the parameter for all subsequent commands of the same
kind. For example we can create a slider named angle, having values from 0 to 360 and then the action
for the phrases left and right could be [lt angle] or [rt angle] respectively.
Listen just when I speak to you
Even in the above simple examples we quickly run into the basic problem of real world speech input usage.
If the person commanding a turtle is not alone in a quiet room then his computer may hear also voices of
others and the speaker tends to speak not just to the computer but also to others.
The first problem can be partially solved by using better hardware (some microphones can more or less
successfully eliminate background noise) and partially by organisational means (putting the computers more
distant from each other, do not use speech input with big groups of pupils etc.).
The second problem can be avoided by designing the program in such a way that it will listen only when the
user wants.
In Imagine there is a global switch to switch listening to voice commands on or off. The MainWindow object
has a setting acceptVoiceMenu. Its value can be true or false. By default it is true, which means that if the
content of voiceMenu setting is not empty then speech commands are recognised and executed. When set to
false then Imagine does not execute voice commands even there are active voiceMenu and mainVoiceMenu
settings.
The basic way to set the acceptVoiceMenu setting is using the main menu. Its Options/Accept Voice Menu
command directly corresponds to the acceptVoiceMenu setting.
To make switching voice menus on and off easier for the user of a particular Logo program, we can program
a switch button located on the current page, which will switch the acceptVoiceCommand setting on and off
according to the state of that button:
222
Another technique is defining special voice commands to switch listening on and off. The trick here is that
we cannot switch off listening completely because then no command for switching it on could be recognised.
We will rather change between two menus. One menu (the sleep menu) will contain just one phrase and its
corresponding action will redefine the voice menu to the full menu. One command of the full menu can
switch the menu back to the sleep menu.
In the following example we will demonstrate a slightly modified approach: the full menu will revert to the
sleep menu after some time of silence. It means that if for some period of time there has been no command
recognised the computer will turn off listening to the whole set of commands and will listen only to the only
command, which can wake it up again.
In Page1 we define three new procedures:
page1'to switchOn
setvoicemenu [left [do [lt 90]] right [Do [rt 90]]
forward [Do [fd 30]]]
indicator'setshape [setpc red setpenwidth 60 dot]
end
page1'to switchOff
setvoicemenu [computer [switchOn]]
indicator'setshape [setpc black setpenwidth 60 dot]
end
page1'to Do :x
cancel [switchOff]
run :x
after 2000 [switchOff]
end
Procedure switchOn switches listening to the full menu and defines the shape of a turtle called Indicator to a
red filled circle.
Procedure switchOff switches listening to the sleep menu. It contains just one phrase: Computer. Its
corresponding action is calling SwitchOn. Then it sets the indicator's shape to black circle.
The Do procedure is used to run a command from the full menu. After each command is executed it launches
a process, which will do nothing for 2 seconds and then calls switchOff. But before any next command is
223
executed the switchOff process is cancelled. So its effect can take place only if for more than 2 seconds the
procedure Do was not invoked.
Then we define the voiceMenu of page1:
page1'setVoiceMenu computer [switchOn]
And create a turtle somewhere in the corner and name it Indicator. And you can try the whole program.
Note that after saying Computer the computer waits for the first command infinitely because the mechanism
of switching off after two seconds is controlled by the Do procedure, which is invoked only when a
command is recognised. We think that it is a good feature to wait for the first command infinitely. If you do
not like it, you can modify the switchOn procedure starting a switchOff process it its last line:
page1'to switchOn
setvoicemenu [left [do [lt 90]] right [do [rt 90]]
forward [do [fd 30]]]
indicator'setshape [setpc red filledcircle 30]
after 2000 [switchOff]
end
Commanding multiple turtles
Now let's try to make a program, which commands multiple turtles.
Create three turtles and give them names: John, Mary and Annie.
Then define voiceMenu just slightly differently than in our last turtle commanding example:
[[red pen] [setpc "red] [black pen] [setpc "black]
[big pen] [setpw 10][small pen] [setpw 1] right [rt 45]
left [lt 45] [pen up] [pu] [pen down] [pd] clean [clean]
start [startMoving] stop [stopMoving]]
John [setPageWho "John] Mary [setPageWho "Mary]
Annie [setPageWho "Annie] Nobody [setPageWho []]
]
The added commands change the active turtle to John, Mary or Annie accordingly. These new commands do
not use the tell command because it would change the active turtle just in the current process, which was
invoked by the speech command. But we want to change the active turtle globally within the page i.e. to
force also processes started in the future to use the new active turtle. Therefore setPageWho must be used.
The content of PageWho setting of the current page becomes who for all new processes started by
voiceMenu. The command Nobody switches off all turtles.
We must somehow make the active turtle evident to the user. Let's make it blinking. For this we define a
procedure startBlink for Page1 to start a process, which will each 200ms hide the active turtle, then wait
200ms and then show it. Note that during the execution of this procedure the active turtle may change and
therefore we must store the name of the active turtle in a local variable w.
page1'to startBlink
every 200 [
let "w pagewho
if :w <> [] [ask :w [ht] wait 200 ask :w [st]]
]
end
Procedure startBlink must run all the time. Therefore it has to be invoked from the startup procedure of the
MainWindow object:
to startup
StartBlink
end
224
Then we must define slightly modified startMoving and stopMoving procedures. They must consider that
now we can have more move processes running, so they must be named according to the currently active
turtle.
Page1'to startMoving
if done? word "move who
[(every 30 [fd 1] word "move who)]
end
Page1'to stopMoving
cancel word "move who
end
We used this activity with my son (he was 11) and his friend (10). They are not native English speakers. So it
has taken some time to adjust their pronunciation to be understood by the speech engine correctly. But when
they finally succeeded to command the turtles, they were quite excited about the fact that the computer
understands them.
Other uses of speech input
In all above examples we just commanded turtles. This is an evident idea, which comes to anybody's mind
when having speech input with Logo.
There are much more possible uses of speech input where the user does not speak turtle commands, but
provides other input to a program.
Our final example will be a memory game. Its objective is to show a number of things, then hide all of them
and then show all but one of them and ask: "What's missing?". The player must tell the name of the thing,
which is missing.
At first we will create 20 turtles having different shapes and names set according to their shapes. We also
want each turtle to say its name when clicked on it. So we at first create one turtle, give it the shape of a cat,
change its name to Cat and define its onClick event:
Then we copy the turtle to clipboard and paste it 19 times. Then we find a nice picture for each copy and
rename the turtle accordingly. This is an example of using cloning in Logo as it is discussed in more details
in Tomcsanyiova, (2001).
225
Then we must program the hiding of one thing and waiting for the correct answer. In this case we would
need to construct a voice menu containing 20 names each time with different actions (19 say "no" and one
saying "yes"). So we will rather use a different approach now. If an action in the voiceMenu setting is an
empty list then after recognising the corresponding phrase an onVoiceCommand event is triggered on the
object containing the voice menu (Page1 in our case). That event can get the heard phrase as the content of
variable :voicecommand variable and can react accordingly.
Create a button (named automatically b1) and a text box (automatically named text1) on the page. Define the
Caption and the onPush action of b1:
b1'setcaption "Play
b1'setevent "onPush [takeThing]
and define two procedures for the button:
to createVoiceMenu :x
ifelse empty? :x
[op :x]
[op fput first :x fput []createVoiceMenu bf :x]
end
to takeThing
text1'setvalue "
ask all [ht] wait 300
make "hidden pick all
setvoicemenu createVoiceMenu all
ask butmember :hidden all [st]
say [Tell me, what's missing!]
end
The function createVoiceMenu creates a voice menu containing all words on the input list :x and having each
action an empty list.
The main procedure of the button is takeThing. It empties the content of text1, then hides all turtles, then
picks randomly one turtle and assigns its name to a global variable hidden. Then creates the voice menu,
shows all turtles but the picked one and announces the assignment.
When the user says anything it is either on the voice menu generated from names of all turtles or it is
something unknown. In each case an onVoiceCommand event of Page1 is triggered. So we need to define a
reaction:
Page1'setevent "onVoiceCommand [evaluateAnswer]
to evaluateAnswer
text1'setvalue :voicecommand
if :voicecommand = []
[say [Sorry?] stop]
ifelse :voicecommand = :hidden
[say [You are right!]]
[(say [No,] :hidden [was missing.])]
ask :hidden [st]
setvoicemenu []
end
The procedure evaluateAnswer puts sets the heard phrase into text box text1. Then if the heard phrase is
an empty list (which means that the speech engine has registered something said but it was not similar to any
of the phrases on current menus) then the program asks "Sorry?" otherwise it checks if the answer was
correct and reacts accordingly. Lastly shows the one hidden thing and erases the voice menu.
226
4. Conclusion
In the article we wanted to show some basic approaches to utilising speech input and output. Although
speech output was already present in some Logo implementation, speech input is a new phenomenon in the
Logo world (as far as we know). Therefore we had no experience with using speech technology in Logo-like
environments.
We wanted to give some introductory examples how to use this new and exciting technology. We hope that
these examples will be interesting for all kinds of Logo users and help them to start using speech technology
in their programs and gathering more experience how to use this new technology in Logo-like microworlds.
During the development of Imagine we made several trials with small groups of children and adults as well.
The basic result is that the use of speech commands attracts the users. In one of our few practical trials we
the Commanding multiple turtles program with two 11 years old boys. For them it was a challenge to see
how the turtles react to their commands. As their native language was not English the other challenge was
trying to pronounce the English commands as well as possible to be understood by the computer.
On the other hand there are still some problems - there are technical difficulties when used in noisy
environment, the speech engines need strong hardware to run on and there is a small number of languages
supported yet. In fact we have just English recognition engines and therefore we were not able to try any
activities with children, who are not able to pronounce at least a few English words.
5. References
Blaho A, Kalas I and Tomcsányi P (1999) OpenLogo - A New Implementation of Logo in Proceedings of Europlogo
1999, Sofia 1999
Blaho A, Kalas I (2001) Object Metaphore Helps Create Simple Logo Projects in Proceedings of Eurologo 2001
Tomcsányiová M (2001) Cloning in Logo programming in Proceedings of Eurologo 2001
227
Download