Speech Interface to Virtual Reality Applications - 廖峻鋒(Chun

advertisement
Speech Interface to Virtual
Reality Applications
Authors
Wauchope, K., S. Everett, D. Tate, T. Maney
M.Cernak, A.Sannier
Reporter
Chun-Feng Liao
References
This report discuss 2 implementations of Speech Interface
to Virtual Reality Applications.
 M.Cernak, A.Sannier ,Technical Report, “Command Speech
Interface to Virtual Reality Applications”,Virtual Reality
Applications Center at Iowa State University of Science and
Technology, June 2002.
 Wauchope, K., S. Everett, D. Tate, T. Maney, "Speech-Interactive
Virtual Environments for Ship Familiarization," 2nd
International EuroConference on Computer and IT Applications
in the Maritime Industries (COMPIT '03), Hamburg, Germany,
May 14-17, 2003, pp. 70-83.
Agenda





Introduction
Paper I
Paper II
Conclusion
System design Discussion
Introduction
 Both papers are newly published.(2002,2003)
 These 2 papers address technical details of
Speech-VR integration.\
 The 2nd paper take more modern approach .
 Both of them use similar architecture.(and are
also similar to ours!)
Ex:Choosing VRML + Java Speech API platform and
encountered serveral difficult problems such as java security
constraint and were force to use a “brwoser as an application ”
instead of “browser as an applet”
Paper I
 M.Cernak, A.Sannier ,Technical Report,
“Command Speech Interface to Virtual
Reality Applications”,Virtual Reality
Applications Center at Iowa State
University of Science and Technology,
June 2002.
Purposes of this paper
 Describe an approach to control VR
applications using multimodal
command speech interface (CSI)based
on dialog modeling.
 Used to imporve the usability of
VRAC’s C6 .
VRAC : Virtual Reality Applications Center
C6 is a Virtual Reality System developed by VRAC.
Multimodal Interaction





Command Addressing,used to trigger
system start to record user’s voice for
recognition.
U :MoleBio
S :Yes
U :(Targeting the atom 512 by mouse)
U :Go There !
S :OK (goto Atom number 512 ).
U: User , S: System
System Architecture
Dialog Management and Speech facilities
VR System
System Architecture




VR : VRAC’s C6
TTS : Festival
SR : CSLU Toolkit
Platform : Windows OS on PII 400
Three Main Components(1)
 Speech Synthesis (TTS) : Festival .
Three Main Components(2)
 CSLU Toolkit :Dialog Modeling ,
Speech Recognition and Nature
Language Processing.
 CSLU was implemented in C and Tcl/tk ,
developed by OGI (Oregon Graduate
Institute )
CSLU (Center of Spoken Language Understanding)
Three Main Components(3)
 Communication Bridge to VR
application.
 To Integrate CSLU(Speech) and C6(VR).
How to Integrate CSLU and C6
 Initial Attempt : CORBA
• C6 support CORBA .
• Try to use “Combat” as tcl extension as
CORBA Client but failed.
• Try to use “Tcl Blend”:
- Tck->Java->CORBA->C6 (efficient problems)
• Result : use TCP socket.
Natural Language Processing
 Instead of using standard JSGF , the
authors use a custom grammar and
wrote a specific parser to evaluate it.
 Very similar to JSGF.
 We will not discuss the custom
grammar in detail here.
SCI Test Environment
 A RAD (GUI) tool that help developers
to quickly build the dialog flow.
Paper I Conclusion
 Major advantage of this system is quick
deployment.
 The problematic area is the Speech
Recognition Accuracy(provided by
CSLU) was poor.
 US Navy also developed a Speech
Inteface to VR System , they will
imporved the interaction with VR in
terms of their method.
Future Work
 Change TTS and SR to IBM ViaVoice .
• Support JSAPI(Java Speech API)
• Java is easier to communicate with C6 via
CORBA.
Paper II
 Wauchope, K., S. Everett, D. Tate, T. Maney,
"Speech-Interactive Virtual Environments for
Ship Familiarization," 2nd International
EuroConference on Computer and IT
Applications in the Maritime Industries
(COMPIT '03), Hamburg, Germany, May 1417, 2003, pp. 70-83.
Introduction
 This paper intruduce 2 systems which
help newly-aboard crews of US Navy
ships to be familiar with their
environment quickly.
User : Tell me
where is Rom
101 !
Motivation
 Architects of US Navy Ships heavily
use CAD tools to design ship models.
 CAD file can be transferred to 3D
model format with little effort.
 Accroding to author’s previous
research ,this Virtual Envirionment did
shorten crews’ learning time.
Systems introduced
 2 Systems
• MSFT(Multimodal Ship Familiarization
Tool)
• ISFS(Interactive Ship Familiarization
System)
 ISFS is a recent transition fo MSFT.
System Architecture:MSFT
Run as different process
MSFT
 VE veiwer component and speech
interface run as two separate processes.
 Speech interface : using a total IBM
solution :
• ViaVoice.
• IBM’s SMAPI.
• IBM’s SRCL grammar.
Platform : PIII 500MHz
ISFS
 A recent transistion of MSFT.
 Using VRML as 3D modeling language.
 Using JSAPI as interface to speech
engine.
• ViaVoice totally support JSAPI.
• VRML support Java as a scripting language
 Other structure is identical to MSFT
system.
Platform : Xeon 2.0GHz ->Need more computing power!
Why Chose to Use
Standalone VRML Brwoser?
 Security Limitations.(detail will be discussed later)
 VM Limitations.(detail will be discussed later)
 Provide opportunities to customize
interface to VRML browser.
In my personal experience,system usually become
unstable when speech engine work with VRML Plugin via EAI’s Java interface.
Security Limitations
 JRE imposes security limitations on Java
Applets.
 JSAPI was unable to establish a
connection with speech engine unless
we explicitly reconfig the security
settings.
Limited VM
 Most VRML Browser ‘s EAI were
implemented using ActiveX thus only
support Microsoft’s old VM which
dosen’t support most modern functions
of Java.
• Ex:This may force us to use Java AWT
instead of swing which provide better GUI.
Providing GUI as VUI
Fallback
 GUI provides a fallback in case the
speech recognizer is having trouble
accurately transcribing the user’s voice.
 GUI is adjusted dynamically to provide
one-to-one correspondence to VUI .
Paper 2 Conclusion
 The Speech Interface is needed because
GUI and VE Viewer both rely on direct
manipulation and keep our hand too
busy.
 As HCI become increasingly
multimodel,care must be taken to
integrate in natural manner.
Future Work
 VRML is more close to Object –oriented
and tree-structured.
 It is hard to represent them in RDBMS.
 Must find some way to store model
data easily and efficiently.
Personal thought : Using XML Database.
Discussions
Switchable!
Q&A
Download