Tactical Language Training System Natalie MacConnell April 21, 2005 Organization of Talk What is the Tactical Language Training System? Objectives and Quick Facts System Architecture Mission Skill Builder Mission Practice Environment Speech Recognition and Error Modeling Demonstration Video Summary What is the Tactical Language Training System (TLTS)? Intelligent tutoring system design to aid military personnel in rapidly acquiring language and cultural skills in order to carry out peaceful and effective communication in foreign countries Focuses on “tactical languages”: subsets of linguistic, gestural, and cultural knowledge and skills necessary to accomplish the task at hand Currently developed for Levantine and Iraqi Arabic Virtual tutor coaches learners in pronunciation, assesses their mastery, and provides assistance Learners then apply their language skills to perform missions in an interactive story environment, where they communicate with autonomous, animated, Arabic speaking characters Quick Facts about TLTS Center for Advanced Research in Technology for Education (CARTE) at the University of Southern California $7.4 million project funded by DARPA Dr. Lewis Johnson, director of CARTE, linguist and A.I. expert Being developed as part of the Training Superiority Program (DARWARS) “DARWARS seeks to transform military training by providing continuously-available, on-demand mission-level training for all forces at all echelons” To be deployed late this year Full program to include about 80 hours of instruction with a vocabulary of around 500 carefully chosen words Objectives of the TLTS Help military and civilian personnel gain an understanding of a foreign language and culture so they can learn to communicate peacefully and effectively with foreigners in their native language Eliminate heavy reliance on language experts Deemphasize written language -- focus on spoken communication skills for immediate application Learn the role of nonverbal communication Develop a more engaging and motivating learning environment compared to traditional language instruction Provide training in less commonly taught, difficult to learn languages Yield rapid acquisition of foreign language skills System Architecture Three main components Mission Skill Builder (MSB): interactive exercises that introduce learner to the vocabulary and pronunciation of the language Mission Practice Environment (MPE): story-based, interactive video game environment where learners advance through game levels by using their newly acquired linguistic and cultural skills to accomplish particular tasks and missions Medina Authoring Tool: used to develop curriculum and game content Common set of services and content databases: Curriculum Database, Pedagogical Agent, Learner Model, and Language Model Language Model consists of: Speech Recognizer: used by MSB and MPE Natural Language Parser: annotates phrases with structural information and refers to relevant grammatical explanations Error Model: finds and analyzes syntactic and phonological mistakes in the learner’s speech System Architecture Diagram MEDINA Authoring Tool Mission Skill Builder (MSB) Language Model Pedagogical Agent NLP Parser Speech Recognizer Curriculum Material Learner Model Error Model Mission Practice Environment (MPE) Johnson, W.L., S. Marsella, N. Mote, H. Vilhjalmsson, S. Narayanan , and S. Choi, Tactical Language Training System: Supporting the Rapid Acquisition of Foreign Language and Cultural Skills Mission Skill Builder Intensive and “intelligent” version of traditional language lab programs where students are exposed to words and phrases pronounced by native speakers, which they imitate and practice Important innovations: Speech Recognizer is tailored for learner speech so it is able to evaluate learner’s pronunciation and detect common errors Pedagogical Agent provides the learner with tailored performance feedback Learner Model tracks what the learner has mastered and what areas the learner needs to improve Learning process involves the following steps: Learner hears Pedagogical Agent pronounce phrase Learner records himself speaking the phrase Speech Recognizer analyzes the recording and passes it to the Pedagogical Agent, which provides appropriate feedback based on pronunciation errors and the Learner Model’s learner history Also instructs students in non-verbal communication Mission Skill Builder Mission Practice Environment Story-based, interactive video game environment designed to give students an unscripted, unpredictable, and challenging test of their mastery of the skills learned in the MSB Learner moves a uniformed figure through a videogame-like Lebanese village Learner speaks into a microphone to control the speech of his character and selects from gestures for nonverbal communication Can carry on free-form conversation with AI-animated Arabic speaking characters, who can understand what is said if it is understandable Arabic and then respond Learner must be careful to use appropriate phrases and gestures Tests the learner’s ability to carry on two-way communication Mission Practice Environment Initial Game Scenario: “In a scene in a café, Sergeant Smith must try to find out who the village headman is. If he doesn’t act properly, one of the café patrons will jump up and demand to know who he really is. If tensions escalate, the patron will eventually accuse the sergeant of being a CIA agent. Standing in the background is the pedagogical agent, here in the role of aide, who can assist the learner by translating phrases or offering suggestions of what to say.” Mission Practice Environment Mission Practice Environment The Learner Model maintained by the Pedagogical Agent controls the aide’s behavior in the game Adapts to each individual, noting consistent errors or difficulties, which can be targeted for remedial practice in the MSB Based on the graphics capabilities of Unreal Tournament Implemented as a Total Conversion Mod to Unreal Tournament 2003 Removed all the combat elements Added a speech recognition engine Added intelligent agents that react to the learner’s speech and pronunciation UnrealWorld: renders it on the screen and provides a user interface Mission Practice Environment Mission Practice Environment MissionEngine Controls what happens in the game, while the UnrealWorld renders it on the screen Represents each character in the story as an agent with its own goals, relationships, and private beliefs High-level director agent influences the character agents Controls how the story unfolds Ensures the pedagogical and dramatic goals are met Backend written in Python MissionEngine System Architecture Pedagogical Agent: intelligent agent that provides feedback and encouragement to the learner based on pronunciation correctness and learner history; implemented in Python Automatic Speech Recognizer: speech recognition system built on top of the Cambridge Hidden Markov Model Toolkit (HTK); implemented as a C++ library PsychSim: decision-making framework of the virtual characters; models the goals, motivations, and world beliefs of the characters; implemented in Python SocialPuppets: module that controls physical character behavior in the environment given a description of the character's intent from PsychSim; implemented in Python Gamebots: interface that allows Unreal Tournament bots to be controlled DataManager: storage module used for all data in the system; implemented in C++ as an XML database MissionEngine Architecture http://www.python.org/pycon/2005/papers/4/MissionEngine.WhitePaper.pdf Speech Recognition Hidden Markov Model Automatic Speech Recognizer bootstrapped from English and Modern Standard Arabic speech and enhanced with data from native and learner Lebanese Arabic speech Implemented using the Cambridge HTK Trained on a Modern Standard Arabic dataset with around 10 hours of native speech, as well as approximately one hour of non-native speech samples Learner speech data is being collected to train the ASR Generated non-native pronunciation variations for every utterance in the system and loaded into the Arabic ASR Hypothesis Rejection Module compares HMM likelihoods from an Arabic recognizer, English recognizer, and pronunciation variants to detect whether the user has spoken the right utterance and provide correct feedback Speech Recognition Dynamic switching of recognition grammars allows the recognizer to focus on recognizing the words and phrases that are likely to occur in a given learning context For the MSB, recognizer is constrained to recognize only the pronunciation variants of the utterances being taught For the MPE, recognizer is a finite state graph, which has all the utterances in the MSB as parallel paths Focuses on recognizing the most likely utterance from among a set of utterances that are appropriate for a given scene Enables the system to simulate dialogue with other characters If the recognizer recognizes a phrase that doesn’t fit into the current context, the character indicates that he does not understand If the recognizer fails to recognize an utterance, the aide makes a suggestion to the learner Error Detection and Modeling The ASR detects learner errors and passes them to the Pedagogical Agent which provides feedback to the learner Aims to recognize (1) what the learner intended to say, (2) the deviations the learner made from what he intended to say For each lesson or exercise, a recognition grammar is loaded that detects correct responses for that context as well as likely learner errors Speech Recognizer must recognize both true Arabic words and mispronounced Arabic words since it is dealing with learner speech The variability of learner language makes robustness difficult to achieve Inaccuracies in the speech analysis algorithms caused utterances that were pronounced correctly but slowly to be rejected -- has been modified to give higher scores for these utterances so they are not rejected as errors Can reduce recognition vocabulary size because the learner is taught a small subset of the language Video Clip http://www.isi.edu/~jmoore/Mankin/TLMankin256.wmv Summary Help people gain an understanding of a foreign language and culture so they can communicate peacefully and effectively with foreigners in their native language Focus on “tactical languages” to accomplish specific missions Focus on spoken communication skills Rapid acquisition of foreign language skills save time Remove need for interpreters save money Model learner speech and common errors, including English language utterances More engaging learning experience (and video games are fun!) References “DARPA Tactical Language Training Project”: http://www.isi.edu/isd/carte/proj_tactlang/tactical_lang_overview.pdf “Experts Use AI to Help GIs Learn Arabic”: http://www.usc.edu/uscnews/stories/10321.html HTK Speech Recognition Toolkit: http://htk.eng.cam.ac.uk/ Johnson, W. L., C. Beal, A. Fowles-Winkler, U. Lauper, S. Marsella , S. Narayanan, D. Papachristou , and H. Vilhjalmsson, Tactical Language Training System: An Interim Report Johnson, W.L., S. Marsella, N. Mote, H. Vilhjalmsson, S. Narayanan , and S. Choi, Tactical Language Training System: Supporting the Rapid Acquisition of Foreign Language and Cultural Skills “Mission to Arabic: It's Not Your Father’s Language Lab”: http://www.isi.edu/stories/print/78.html “MissionEngine: Multi-system integration using Python in the Tactical Language Project”: http://www.python.org/pycon/2005/papers/4/MissionEngine.WhitePaper.pdf Mote, N., W. L. Johnson, A. Sethy, J. Silva, and S. Narayanan, Tactical Language Detection and Modeling of Learner Speech Errors: The case of Arabic tactical language training for American English speakers “The Tactical Language Project at CARTE”: http://www.isi.edu/isd/carte/proj_tactlang/ Additional Resources DARPA Training Superiority Program (DARWARS): http://www.darpa.mil/dso/thrust/biosci/training_super.htm Mission Rehearsal Exercise Project: http://www.ict.usc.edu/disp.php?bd=proj_mre NPR, “A Virtual Course in Iraqi Arabic”: http://www.npr.org/templates/story/story.php?storyId=4503426 Newsweek, “Arabic: High-Tech Tutor”: http://www.msnbc.msn.com/id/5146254/site/newsweek/ The Pulse Journal, “Researchers tame violent video game to keep troops safe in Iraq”: http://www.pulsejournal.com/news/content/shared/news/nation/stories/0222 _TRAINING_GAME.html Wired Magazine, “The War Room”: http://www.wired.com/wired/archive/12.09/warroom.html