Speech Interface to Virtual Reality Applications - 廖峻鋒(Chun

Speech Interface to Virtual Reality Applications Authors Wauchope, K., S. Everett, D. Tate, T. Maney M.Cernak, A.Sannier Reporter Chun-Feng Liao References This report discuss 2 implementations of Speech Interface to Virtual Reality Applications.  M.Cernak, A.Sannier ,Technical Report, “Command Speech Interface to Virtual Reality Applications”,Virtual Reality Applications Center at Iowa State University of Science and Technology, June 2002.  Wauchope, K., S. Everett, D. Tate, T. Maney, "Speech-Interactive Virtual Environments for Ship Familiarization," 2nd International EuroConference on Computer and IT Applications in the Maritime Industries (COMPIT '03), Hamburg, Germany, May 14-17, 2003, pp. 70-83. Agenda      Introduction Paper I Paper II Conclusion System design Discussion Introduction  Both papers are newly published.(2002,2003)  These 2 papers address technical details of Speech-VR integration.\  The 2nd paper take more modern approach .  Both of them use similar architecture.(and are also similar to ours!) Ex:Choosing VRML + Java Speech API platform and encountered serveral difficult problems such as java security constraint and were force to use a “brwoser as an application ” instead of “browser as an applet” Paper I  M.Cernak, A.Sannier ,Technical Report, “Command Speech Interface to Virtual Reality Applications”,Virtual Reality Applications Center at Iowa State University of Science and Technology, June 2002. Purposes of this paper  Describe an approach to control VR applications using multimodal command speech interface (CSI)based on dialog modeling.  Used to imporve the usability of VRAC’s C6 . VRAC : Virtual Reality Applications Center C6 is a Virtual Reality System developed by VRAC. Multimodal Interaction      Command Addressing,used to trigger system start to record user’s voice for recognition. U :MoleBio S :Yes U :(Targeting the atom 512 by mouse) U :Go There ! S :OK (goto Atom number 512 ). U: User , S: System System Architecture Dialog Management and Speech facilities VR System System Architecture     VR : VRAC’s C6 TTS : Festival SR : CSLU Toolkit Platform : Windows OS on PII 400 Three Main Components(1)  Speech Synthesis (TTS) : Festival . Three Main Components(2)  CSLU Toolkit :Dialog Modeling , Speech Recognition and Nature Language Processing.  CSLU was implemented in C and Tcl/tk , developed by OGI (Oregon Graduate Institute ) CSLU (Center of Spoken Language Understanding) Three Main Components(3)  Communication Bridge to VR application.  To Integrate CSLU(Speech) and C6(VR). How to Integrate CSLU and C6  Initial Attempt : CORBA • C6 support CORBA . • Try to use “Combat” as tcl extension as CORBA Client but failed. • Try to use “Tcl Blend”: - Tck->Java->CORBA->C6 (efficient problems) • Result : use TCP socket. Natural Language Processing  Instead of using standard JSGF , the authors use a custom grammar and wrote a specific parser to evaluate it.  Very similar to JSGF.  We will not discuss the custom grammar in detail here. SCI Test Environment  A RAD (GUI) tool that help developers to quickly build the dialog flow. Paper I Conclusion  Major advantage of this system is quick deployment.  The problematic area is the Speech Recognition Accuracy(provided by CSLU) was poor.  US Navy also developed a Speech Inteface to VR System , they will imporved the interaction with VR in terms of their method. Future Work  Change TTS and SR to IBM ViaVoice . • Support JSAPI(Java Speech API) • Java is easier to communicate with C6 via CORBA. Paper II  Wauchope, K., S. Everett, D. Tate, T. Maney, "Speech-Interactive Virtual Environments for Ship Familiarization," 2nd International EuroConference on Computer and IT Applications in the Maritime Industries (COMPIT '03), Hamburg, Germany, May 1417, 2003, pp. 70-83. Introduction  This paper intruduce 2 systems which help newly-aboard crews of US Navy ships to be familiar with their environment quickly. User : Tell me where is Rom 101 ! Motivation  Architects of US Navy Ships heavily use CAD tools to design ship models.  CAD file can be transferred to 3D model format with little effort.  Accroding to author’s previous research ,this Virtual Envirionment did shorten crews’ learning time. Systems introduced  2 Systems • MSFT(Multimodal Ship Familiarization Tool) • ISFS(Interactive Ship Familiarization System)  ISFS is a recent transition fo MSFT. System Architecture:MSFT Run as different process MSFT  VE veiwer component and speech interface run as two separate processes.  Speech interface : using a total IBM solution : • ViaVoice. • IBM’s SMAPI. • IBM’s SRCL grammar. Platform : PIII 500MHz ISFS  A recent transistion of MSFT.  Using VRML as 3D modeling language.  Using JSAPI as interface to speech engine. • ViaVoice totally support JSAPI. • VRML support Java as a scripting language  Other structure is identical to MSFT system. Platform : Xeon 2.0GHz ->Need more computing power! Why Chose to Use Standalone VRML Brwoser?  Security Limitations.(detail will be discussed later)  VM Limitations.(detail will be discussed later)  Provide opportunities to customize interface to VRML browser. In my personal experience,system usually become unstable when speech engine work with VRML Plugin via EAI’s Java interface. Security Limitations  JRE imposes security limitations on Java Applets.  JSAPI was unable to establish a connection with speech engine unless we explicitly reconfig the security settings. Limited VM  Most VRML Browser ‘s EAI were implemented using ActiveX thus only support Microsoft’s old VM which dosen’t support most modern functions of Java. • Ex:This may force us to use Java AWT instead of swing which provide better GUI. Providing GUI as VUI Fallback  GUI provides a fallback in case the speech recognizer is having trouble accurately transcribing the user’s voice.  GUI is adjusted dynamically to provide one-to-one correspondence to VUI . Paper 2 Conclusion  The Speech Interface is needed because GUI and VE Viewer both rely on direct manipulation and keep our hand too busy.  As HCI become increasingly multimodel,care must be taken to integrate in natural manner. Future Work  VRML is more close to Object –oriented and tree-structured.  It is hard to represent them in RDBMS.  Must find some way to store model data easily and efficiently. Personal thought : Using XML Database. Discussions Switchable! Q&A

Speech Interface to Virtual Reality Applications - 廖峻鋒(Chun

Related documents

Products

Support

Speech Interface to Virtual Reality Applications - 廖峻鋒(Chun

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib