Programming MS Agents Matthias Deller University Of Applied Sciences, Zweibrücken / Germany e-mail: matthias_deller@web.de I) Introduction – What are MSAgents? Most users of Microsofts Office XP have already seen, if not used a MS Agent, likely represented by a little cat, dog or paper clip in a corner of the screen. These agents are an interface meant to humanize interaction between an application and the user. Instead of simply reading texts and typing in commands or clicking icons with the mouse, agents provide a more natural way to communicate with the computer. The animated representations of the agent, called characters, can emphasize explanations by gesturing at the described elements, give tips or even speak to the user, either by using pre-recorded audiofiles or text-to-speech engines. They can react to mouseclicks, application-triggered events or even spoken voice. Because they are versatile and easy to program, they can have a lot of functions: as a tutor for a program, aquainting a new user with the controls; as a messenger, reading texts or e-mails for the user; or simply as a host to liven up your website and guide the visitor through the different sections. II) The characters Microsoft provides four standard characters for its interface: Merlin (a wizard), Genie (a lamp-ghost), Robby (a robot) and Peedy (a parrot). Each of these is equipped with a full set of animations, which can range from only the standard animations to 70 or even more. However, apart from these four, there is a host of other characters available on the internet. They range from crude drawings that can barely perform a handful of animations to beautiful designed characters with a huge set of animation sequences. Of course, most of these professionaly created characters are only available for a fee, while some quite good ones can be found as freeware if you look hard enough. If there really should be no character file that matches the desired specifications, it is possible to create own characters for MS Agents. Although this should only be done by people that have a lot of time on their hands, because they have to provide quite a lot of animations, at least standard animations. These include things like different mouth positions so the character’s lip movements can be synchronized with spoken text. Other animations that have to be implemented are animations for gesturing in different directions, hearing and listening, hiding and showing, speaking, moving in all four directions and being idle. To be able to use a character as default character, it needs a standard animation set of no less than 59 animations. Once all this is done, the MS Agent Character Editor can be used to sequence the images, provide other information necessary and finally compile the data into a character file. So probably most people will settle for characters that already exist. III) Programming the Agent IV) Character interaction For this seminar work, the agent was programmed in Visual Basic, but the concepts for programming in other languages such as C++ or HTML are similar and the differences marginal. First, you have to load a character’s data using the agent’s Load command. This data can be stored in two different ways. The normal way when the character’s datafile is stored locally is the .acs file format, where the whole character information including all its animations are stored as a single structured file. The other way is often used to access the character’s data from a webpage. In the .acf and .aca format, the information is stored in multiple files, so the character’s animations can be accessed seperately to reduce download times. Of course, this way each animation has to be loaded before it can be played. Once the character data is present, the character object provides a set of simple commands to animate the character. Probably the most important of these is the Show method, which causes the character to appear on the screen by playing its defined showing animation. The counterpart to this command is the Hide method, letting the character disappear using its hiding animation. It is also possibly to let a character appear or disappear by setting its visible state. Another very important method when you’ve loaded an .acf character is the Get method, used to retrieve the animation data for the animation. To have the character do a certain animation, you have to call the Play method, providing the animation’s name as a string. And finally, the character can be told to go to a specific location with the MoveTo method. Basically, these are all the commands needed to navigate a character across the screen. -with other characters It is possible to have multiple characters on the screen at the same time, they can even communicate with each other (to the extent implemented by the programmer, of course). Although synchronizing the different characters is not as trivial as it may seem. Animation methods sent to a character are put in a request queue which is processed independently for each character. So if some methods for one character are called in the code after the methods for the other character, that doesn’t mean they are necessarily played later. Instead, both characters will likely begin playing their animations simultaneously. The character object provides two methods to make it possible for two characters to, say, have a sensible dialogue. The first one is the Wait method, called with the name of a previously defined animation request. It causes the character to wait until this request is finished before it continues processing its own request queue. For example, this prevents a character from speaking while another character is still talking. The other possibility is the Interrupt method, also called with a Request-object. In this case, the character simply interrupts the other character’s specified request. This is mostly used to interrupt looping animations, such as the Processing or the Idle animations. The animation is stopped and the character continues with the next animation in its request queue. Apart from taking direct influence on other characters, a characters can access the other one’s properties. For example the left and top properties (representing the character’s position on the screen) can be used to have one character gesture at the other or look at him while talking to him. This way, it is possible to give the impression that the different characters are aware of one another. -with the application The same synchronization problem as before appears when trying to let characters interact with your application. A sequence of animation methods for a character and method calls inside the application are not processed in the same order. In most cases, the application commands will have finished processing before the character has even completed its first animation sequence. There are two possibilities to handle this, either by using the RequestStarted event or the RequestComplete event. The first one is triggered everytime a request from a characters queue has begun, the second one when the request is finished. By testing for a certain request, you can make sure a special operation is carried out at the beginning or end of the right animation. -with the user This is of course the most important kind of interaction the agent has to perform. Therefore, a lot of methods are provided to make the communication between agent and user as intuitive and natural as possible. They are mainly parted in methods for output to the user and for input from him. a) Output services: The most simple and basic way to communicate with the user is by simply showing him things. Firstly, there are a lot of animations to show the operations a character is performing at the moment, such as the Announce animation, used to show that a character has useful information for the user or the Confused animation, hinting that the character doesn’t know what to do or hasn’t understood a command. He can even try to get the users attention by playing the GetAttention animation to show he has important information for the user. Further, there are some self-explaining animations like Processing, Writing, Reading or some to make the character appear more friendly or human, like Greet or Wave. The most useful of these animations is played by the GestureAt method. By providing the x and y coordinates of the place you want the character to point at, you make him gesture in this direction. Since the characters can only gesture directly to the left, right, up or down it is a good idea to use the MoveTo method first to bring the character in a position near the place you want him to gesture at. Another way for the character to communicate with the user is by writing text. This can be done with the method Think. This method is called with a string which will appear in a thinking bubble above the character’s head. The same thing happens with the method Speak, except that the specified string appears in a word balloon above the character. Text in this balloon can also contain bookmarks that can be clicked by the user, for example to obtain additional information about a certain subject. Speak can also be called with a file path to an audio file in .wav or .lwv format so the character will play the file. The third possibility for the method is the most natural way to communicate: speech output. For this feature to work requires the correct text-to-speech engine for the character’s Language-ID to be installed. This way, characters can speak different languages with the right emphasis. The results sound still a bit flat and emotionless, but nonetheless recognizable. In addition, the speech output can be improved by different tags for the text. The most important tags are: \Chr\: specifies wether the character speaks normal, whispers or uses a monotone voice \Emp\: emphasizes the next word \Lst\: repeat the last statement \Pau\: pauses for the specified number of milliseconds \Pit\: sets the pitch value in hertz \Spd\: sets average talking speed in words per minute \Vol\: sets the speaking volume With these tags, the speech output can be modified to sound more natural. b) Input services As with the output services, there are several ways for the user to communicate with the agent. The most unspectacular way to do so is by simply typing commands, but this is not the only way. Characters can react to multiple events triggered by the user. As mentioned before, texts displayed in the character’s word balloon can contain bookmarks. With the Bookmark event, the character can react to clicked bookmarks and act accordingly. Furthermore, the Click event can be used to determine reactions when the character is clicked at, differing between a normal click, a click while holding down the SHIFT, CTRL or ALT keys or a click with different mousebuttons. The same differentiations apply to the DblClick event, except this event is triggered by a doubleclick. Two other events triggered by a users actions are the DragStart and DragComplete events, carried out when the character is dragged across the screen. As with the Click event, different actions can be defined dependent on which mousebutton was clicked and which key was pressed. But the most remarkable feature is the character’s ability to react to human speech and (hopefully) recognize spoken commands. This requires an installed speech recognition engine for the used language. Once these requirements are met, the character’s listening mode can be activated by pressing the listening key or by calling the Listen method from inside your application. The character will start to listen and try to recognize a command predefined as a Command-object. In this manner a character can carry out spoken instructions implemented by the programmer. V) Conclusion All in all, the MS Agent interface provides a convenient way to create a more natural interaction between the user and and your application. Although there is still much room for improvement in making the speech output sound less artificial and it is often necessary to repeat spoken commands several times until they are recognized correctly, the interface equips programmers with all the tools needed to make characters a real help for the user. The Demo File The demonstration application is meant to show some of the possibilities available to developers programming the Agent interface by using some of the methods and properties described above. To ensure the program will work properly, there are some preparations required: 1. Extracting the setup files: There are two self-extracting archives, Agentdemo.exe and installfiles.exe. The second archive contains setup files for the core components, the required speech engines and the character files for Merlin and Genie. See fileindex.txt for more information. 2. Extracting the application: The archive Agentdemo.exe contains the demo-applications executable MSAgent_demo.exe and the file sourcecode.txt. 3. Running the program: Once you’ve run the necessary setup files, simply start MSAgent_demo.exe