Programming MS Agents

advertisement
Programming MS Agents
Matthias Deller
University Of Applied Sciences, Zweibrücken / Germany
e-mail: matthias_deller@web.de
I) Introduction – What are
MSAgents?
Most users of Microsofts Office XP have
already seen, if not used a MS Agent,
likely represented by a little cat, dog or
paper clip in a corner of the screen. These
agents are an interface meant to humanize
interaction between an application and the
user. Instead of simply reading texts and
typing in commands or clicking icons with
the mouse, agents provide a more natural
way to communicate with the computer.
The animated representations of the agent,
called
characters,
can
emphasize
explanations by gesturing at the described
elements, give tips or even speak to the
user, either by using pre-recorded audiofiles or text-to-speech engines. They can
react to mouseclicks, application-triggered
events or even spoken voice. Because they
are versatile and easy to program, they can
have a lot of functions: as a tutor for a
program, aquainting a new user with the
controls; as a messenger, reading texts or
e-mails for the user; or simply as a host to
liven up your website and guide the visitor
through the different sections.
II) The characters
Microsoft
provides
four
standard
characters for its interface: Merlin (a
wizard), Genie (a lamp-ghost), Robby (a
robot) and Peedy (a parrot). Each of these
is equipped with a full set of animations,
which can range from only the standard
animations to 70 or even more. However,
apart from these four, there is a host of
other characters available on the internet.
They range from crude drawings that can
barely perform a handful of animations to
beautiful designed characters with a huge
set of animation sequences. Of course,
most of these professionaly created
characters are only available for a fee,
while some quite good ones can be found
as freeware if you look hard enough. If
there really should be no character file that
matches the desired specifications, it is
possible to create own characters for MS
Agents. Although this should only be done
by people that have a lot of time on their
hands, because they have to provide quite a
lot of animations, at least standard
animations. These include things like
different mouth positions so the
character’s lip movements can
be synchronized with spoken
text. Other animations that
have to be implemented are
animations for gesturing in
different directions, hearing and
listening, hiding and showing, speaking,
moving in all four directions and being
idle. To be able to use a character as
default character, it needs a standard
animation set of no less than 59
animations. Once all this is done, the MS
Agent Character Editor can be used to
sequence the images, provide other
information necessary and finally compile
the data into a character file. So probably
most people will settle for characters that
already exist.
III) Programming the Agent
IV) Character interaction
For this seminar work, the agent was
programmed in Visual Basic, but the
concepts for programming in other
languages such as C++ or HTML are
similar and the differences marginal. First,
you have to load a character’s data using
the agent’s Load command. This data can
be stored in two different ways. The
normal way when the character’s datafile
is stored locally is the .acs file format,
where the whole character information
including all its animations are stored as a
single structured file. The other way is
often used to access the
character’s data from a
webpage. In the .acf and .aca
format, the information is
stored in multiple files, so the
character’s animations can be
accessed seperately to reduce
download times. Of course, this way each
animation has to be loaded before it can be
played. Once the character data is present,
the character object provides a set of
simple commands to animate the character.
Probably the most important of these is the
Show method, which causes the character
to appear on the screen by playing its
defined
showing
animation.
The
counterpart to this command is the Hide
method, letting the character disappear
using its hiding animation. It is also
possibly to let a character appear or
disappear by setting its visible state.
Another very important method when
you’ve loaded an .acf character is the Get
method, used to retrieve the animation data
for the animation. To have the character do
a certain animation, you have to call the
Play method, providing the animation’s
name as a string. And finally, the character
can be told to go to a specific location with
the MoveTo method. Basically, these are
all the commands needed to navigate a
character across the screen.
-with other characters
It is possible to have multiple characters on
the screen at the same time, they can even
communicate with each other (to the extent
implemented by the programmer, of
course). Although synchronizing the
different characters is not as trivial as it
may seem. Animation methods sent to a
character are put in a request queue which
is processed independently for each
character. So if some methods for one
character are called in the code after the
methods for the other character, that
doesn’t mean they are necessarily played
later. Instead, both characters will likely
begin
playing
their
animations
simultaneously. The character object
provides two methods to make it possible
for two characters to, say, have a sensible
dialogue. The first one is the Wait method,
called with the name of a previously
defined animation request. It causes the
character to wait until this request is
finished before it continues processing its
own request queue. For example, this
prevents a character from speaking while
another character is still talking. The other
possibility is the Interrupt method, also
called with a Request-object. In this case,
the character simply interrupts the other
character’s specified request. This is
mostly used to interrupt looping
animations, such as the Processing or the
Idle animations. The animation is stopped
and the character continues with the next
animation in its request queue. Apart from
taking direct influence on other characters,
a characters can access the other one’s
properties. For example the left and top
properties (representing the character’s
position on the screen) can be used to have
one character gesture at the other or look at
him while talking to him. This way, it is
possible to give the impression that the
different characters are aware of one
another.
-with the application
The same synchronization problem as
before appears when trying to let
characters interact with your application. A
sequence of animation methods for a
character and method calls inside the
application are not processed in the same
order. In most cases, the application
commands will have finished processing
before the character has even completed its
first animation sequence. There are two
possibilities to handle this, either by using
the RequestStarted event or the
RequestComplete event. The first one is
triggered everytime a request from a
characters queue has begun, the second one
when the request is finished. By testing for
a certain request, you can make sure a
special operation is carried out at the
beginning or end of the right animation.
-with the user
This is of course the most important kind
of interaction the agent has to perform.
Therefore, a lot of methods are provided to
make the communication between agent
and user as intuitive and natural as
possible. They are mainly parted in
methods for output to the user and for
input from him.
a) Output services:
The most simple and basic way to
communicate with the user is by simply
showing him things. Firstly, there are a lot
of animations to show the operations a
character is performing at the moment,
such as the Announce animation, used to
show that a character has useful
information for the user or the Confused
animation, hinting that the character
doesn’t know what to do or hasn’t
understood a command. He can even try to
get the users attention by playing the
GetAttention animation to show he has
important information for the user. Further,
there are some self-explaining animations
like Processing, Writing, Reading or
some to make the character appear more
friendly or human, like Greet or Wave.
The most useful of these animations is
played by the GestureAt method. By
providing the x and y coordinates of the
place you want the character to point at,
you make him gesture in this direction.
Since the characters can only gesture
directly to the left, right, up or down it is a
good idea to use the MoveTo method first
to bring the character in a position near the
place you want him to gesture at. Another
way for the character to communicate with
the user is by writing text. This can be
done with the method Think. This method
is called with a string which will appear in
a thinking bubble above the character’s
head. The same thing happens with the
method Speak, except that the specified
string appears in a word balloon above the
character. Text in this balloon can also
contain bookmarks that can be clicked by
the user, for example to obtain additional
information about a certain subject. Speak
can also be called with a file path to an
audio file in .wav or .lwv format so the
character will play the file. The third
possibility for the method is the most
natural way to communicate: speech
output. For this feature to work requires
the correct text-to-speech engine
for the character’s Language-ID
to be installed. This way,
characters can speak different
languages with the right
emphasis. The results sound
still a bit flat and emotionless,
but nonetheless recognizable. In addition,
the speech output can be improved by
different tags for the text. The most
important tags are:
\Chr\: specifies wether the character speaks
normal, whispers or uses a monotone voice
\Emp\: emphasizes the next word
\Lst\: repeat the last statement
\Pau\: pauses for the specified number of
milliseconds
\Pit\: sets the pitch value in hertz
\Spd\: sets average talking speed in words
per minute
\Vol\: sets the speaking volume
With these tags, the speech output can be
modified to sound more natural.
b) Input services
As with the output services, there are
several ways for the user to communicate
with the agent. The most unspectacular
way to do so is by simply typing
commands, but this is not the only way.
Characters can react to multiple events
triggered by the user. As mentioned before,
texts displayed in the character’s word
balloon can contain bookmarks. With the
Bookmark event, the character can react
to clicked bookmarks and act accordingly.
Furthermore, the Click event can be used
to determine reactions when the character
is clicked at, differing between a normal
click, a click while holding down the
SHIFT, CTRL or ALT keys or a click with
different
mousebuttons.
The
same
differentiations apply to the DblClick
event, except this event is triggered by a
doubleclick. Two other events triggered by
a users actions are the DragStart and
DragComplete events, carried out when
the character is dragged across the screen.
As with the Click event, different actions
can be defined dependent on which
mousebutton was clicked and which key
was pressed. But the most remarkable
feature is the character’s ability to react to
human speech and (hopefully)
recognize spoken commands.
This requires an installed
speech recognition engine
for the used language. Once
these requirements are met,
the character’s listening mode can be
activated by pressing the listening key or
by calling the Listen method from inside
your application. The character will start to
listen and try to recognize a command
predefined as a Command-object. In this
manner a character can carry out spoken
instructions
implemented
by
the
programmer.
V) Conclusion
All in all, the MS Agent interface provides
a convenient way to create a more natural
interaction between the user and and your
application. Although there is still much
room for improvement in making the
speech output sound less artificial and it is
often necessary to repeat spoken
commands several times until they are
recognized correctly, the interface equips
programmers with all the tools needed to
make characters a real help for the user.
The Demo File
The demonstration application is meant to
show some of the possibilities available to
developers programming the Agent
interface by using some of the methods and
properties described above. To ensure the
program will work properly, there are
some preparations required:
1. Extracting the setup files: There are
two
self-extracting
archives,
Agentdemo.exe
and
installfiles.exe. The second archive
contains setup files for the core
components, the required speech
engines and the character files for
Merlin and Genie. See fileindex.txt
for more information.
2. Extracting the application: The
archive Agentdemo.exe contains
the demo-applications executable
MSAgent_demo.exe and the file
sourcecode.txt.
3. Running the program: Once you’ve
run the necessary setup files,
simply start MSAgent_demo.exe
Download