Speech Interface to Virtual Reality Applications Authors Wauchope, K., S. Everett, D. Tate, T. Maney M.Cernak, A.Sannier Reporter Chun-Feng Liao References This report discuss 2 implementations of Speech Interface to Virtual Reality Applications. M.Cernak, A.Sannier ,Technical Report, “Command Speech Interface to Virtual Reality Applications”,Virtual Reality Applications Center at Iowa State University of Science and Technology, June 2002. Wauchope, K., S. Everett, D. Tate, T. Maney, "Speech-Interactive Virtual Environments for Ship Familiarization," 2nd International EuroConference on Computer and IT Applications in the Maritime Industries (COMPIT '03), Hamburg, Germany, May 14-17, 2003, pp. 70-83. Agenda Introduction Paper I Paper II Conclusion System design Discussion Introduction Both papers are newly published.(2002,2003) These 2 papers address technical details of Speech-VR integration.\ The 2nd paper take more modern approach . Both of them use similar architecture.(and are also similar to ours!) Ex:Choosing VRML + Java Speech API platform and encountered serveral difficult problems such as java security constraint and were force to use a “brwoser as an application ” instead of “browser as an applet” Paper I M.Cernak, A.Sannier ,Technical Report, “Command Speech Interface to Virtual Reality Applications”,Virtual Reality Applications Center at Iowa State University of Science and Technology, June 2002. Purposes of this paper Describe an approach to control VR applications using multimodal command speech interface (CSI)based on dialog modeling. Used to imporve the usability of VRAC’s C6 . VRAC : Virtual Reality Applications Center C6 is a Virtual Reality System developed by VRAC. Multimodal Interaction Command Addressing,used to trigger system start to record user’s voice for recognition. U :MoleBio S :Yes U :(Targeting the atom 512 by mouse) U :Go There ! S :OK (goto Atom number 512 ). U: User , S: System System Architecture Dialog Management and Speech facilities VR System System Architecture VR : VRAC’s C6 TTS : Festival SR : CSLU Toolkit Platform : Windows OS on PII 400 Three Main Components(1) Speech Synthesis (TTS) : Festival . Three Main Components(2) CSLU Toolkit :Dialog Modeling , Speech Recognition and Nature Language Processing. CSLU was implemented in C and Tcl/tk , developed by OGI (Oregon Graduate Institute ) CSLU (Center of Spoken Language Understanding) Three Main Components(3) Communication Bridge to VR application. To Integrate CSLU(Speech) and C6(VR). How to Integrate CSLU and C6 Initial Attempt : CORBA • C6 support CORBA . • Try to use “Combat” as tcl extension as CORBA Client but failed. • Try to use “Tcl Blend”: - Tck->Java->CORBA->C6 (efficient problems) • Result : use TCP socket. Natural Language Processing Instead of using standard JSGF , the authors use a custom grammar and wrote a specific parser to evaluate it. Very similar to JSGF. We will not discuss the custom grammar in detail here. SCI Test Environment A RAD (GUI) tool that help developers to quickly build the dialog flow. Paper I Conclusion Major advantage of this system is quick deployment. The problematic area is the Speech Recognition Accuracy(provided by CSLU) was poor. US Navy also developed a Speech Inteface to VR System , they will imporved the interaction with VR in terms of their method. Future Work Change TTS and SR to IBM ViaVoice . • Support JSAPI(Java Speech API) • Java is easier to communicate with C6 via CORBA. Paper II Wauchope, K., S. Everett, D. Tate, T. Maney, "Speech-Interactive Virtual Environments for Ship Familiarization," 2nd International EuroConference on Computer and IT Applications in the Maritime Industries (COMPIT '03), Hamburg, Germany, May 1417, 2003, pp. 70-83. Introduction This paper intruduce 2 systems which help newly-aboard crews of US Navy ships to be familiar with their environment quickly. User : Tell me where is Rom 101 ! Motivation Architects of US Navy Ships heavily use CAD tools to design ship models. CAD file can be transferred to 3D model format with little effort. Accroding to author’s previous research ,this Virtual Envirionment did shorten crews’ learning time. Systems introduced 2 Systems • MSFT(Multimodal Ship Familiarization Tool) • ISFS(Interactive Ship Familiarization System) ISFS is a recent transition fo MSFT. System Architecture:MSFT Run as different process MSFT VE veiwer component and speech interface run as two separate processes. Speech interface : using a total IBM solution : • ViaVoice. • IBM’s SMAPI. • IBM’s SRCL grammar. Platform : PIII 500MHz ISFS A recent transistion of MSFT. Using VRML as 3D modeling language. Using JSAPI as interface to speech engine. • ViaVoice totally support JSAPI. • VRML support Java as a scripting language Other structure is identical to MSFT system. Platform : Xeon 2.0GHz ->Need more computing power! Why Chose to Use Standalone VRML Brwoser? Security Limitations.(detail will be discussed later) VM Limitations.(detail will be discussed later) Provide opportunities to customize interface to VRML browser. In my personal experience,system usually become unstable when speech engine work with VRML Plugin via EAI’s Java interface. Security Limitations JRE imposes security limitations on Java Applets. JSAPI was unable to establish a connection with speech engine unless we explicitly reconfig the security settings. Limited VM Most VRML Browser ‘s EAI were implemented using ActiveX thus only support Microsoft’s old VM which dosen’t support most modern functions of Java. • Ex:This may force us to use Java AWT instead of swing which provide better GUI. Providing GUI as VUI Fallback GUI provides a fallback in case the speech recognizer is having trouble accurately transcribing the user’s voice. GUI is adjusted dynamically to provide one-to-one correspondence to VUI . Paper 2 Conclusion The Speech Interface is needed because GUI and VE Viewer both rely on direct manipulation and keep our hand too busy. As HCI become increasingly multimodel,care must be taken to integrate in natural manner. Future Work VRML is more close to Object –oriented and tree-structured. It is hard to represent them in RDBMS. Must find some way to store model data easily and efficiently. Personal thought : Using XML Database. Discussions Switchable! Q&A