AudioSense: A Simulation - Electronic Visualization Laboratory

advertisement
AudioSense: A Simulation
Progress Report
EECS 578
Allan Spale
Background of Concept
• Taking the train home and listening to
the sounds around me
• How would deaf people be able to perceive
the environment?
• What assistance would be useful in helping
people adapt to the environment?
Project Goals
• Develop a CAVE application that will
simulate aspects of audio perception
• Display the text of “speaking” objects in
space
• Display the description text of “nonspeaking” objects in space
• Display visual cues of multiple sound
sources
• Allow the user to selectively listen to
different sound sources
Topics in the Project
• Augmented reality
• Illustrated by objects in a virtual environment
• 3D sound
• Simulated by an object’s interaction property
• Speech recognition
• Simulated by text near the object
• Will remain static during simulation
• Virtual reality / CAVE
• Method for presenting the project
• Not discussed in this presentation
Augmented Reality
• Definition
• “…provides means of intuitive information
presentation for enhancing situational
awareness and perception by exploiting
the natural and familiar human interaction
modalities with the environment.”
-- Behringer et al. 1999
Augmented Reality:
Device Diagnostics
• Architecture components aid in
performing a diagnostic tests
• Computer vision used to track the object in
space
• Speech recognition (command-style) used
for user interface
• 3D graphics (wireframe and shaded
objects) to illustrate an object’s internal
structure
• 3D audio emits from an item that allows the
user to find the location within the object
Augmented Reality
• Device
diagnostics
Augmented Reality
• Device
diagnostics
Augmented Reality:
Device Diagnostics
• Summary
• Providing 3D graphics and sound helps the
user better diagnose items
• Might also want text information on the display
• Tracking methodology still needs
improvement
• Speech recognition of commands could be
expanded to include annotation
• Utilize IP connection to distribute
computing power from the wearable
computer
Augmented Reality:
Multimedia Presentations in the Real World
• Mobile Augmented Reality System
(MARS)
• Tracking performed by Global Positioning
System (GPS) and another device
• Display is a see-through and headmounted
• Interaction based on location and gaze
• Additional interaction provided by hand-held
device
Augmented Reality:
Multimedia Presentations in the Real World
• System overview
• Selection occurs through proximity or gaze
direction followed by a menu system
• Information presentation
• Video (on hand-held deivce) or images
accompanied by narration (on head-mounted
display)
• Virtual reality (for places that are not able to be
visited)
• Augmented reality (illustrate where items were)
Augmented Reality
• Multimedia
presentations
in the
real world
Augmented Reality
• Multimedia
presentations
in the
real world
Augmented Reality:
Multimedia Presentations in the Real World
• Conclusions
• Current system is too heavy and visually
undesirable
• Might want to make hand-held display a palmtop computer
• Permit authoring of content
• Create a collaboration between indoor and
outdoor system users
3D Sound:
Audio-only Web Browsing
• Must overcome difficulties with utilizing 3D
sound
• X axis sounds identifiable, Y and Z axes sounds
are not identifiable
• Need exists to create structure in audio
rendered web pages
• Document reading appears spatially from left to
right in an adequate amount of time
• Utilize earcons and selective listening
• Provide meta-content for quick document
overview
3D Sound
• Audio-only
Web browsing
3D Sound:
Audio-only Web Browsing
• Future work
• Improve link information that extends
beyond web page title and time duration
• Benefits of auditory browsing aids
• Improved comprehension
• Better browsing experience for visually
impaired and sited users
3D Sound:
Interactive 3D Sound Hyperstories
• Hyperstories
• Story occurring in a hypermedia context
• Forms a “nested context model”
• World objects can be passive, active,
static, or dynamic
3D Sound:
Interactive 3D Sound Hyperstories
• AudioDoom
• Like computer game of Doom, but different
• All world objects represented with sound
• Sound represented in a “volume” almost
parallel to the user’s eyes
• User interacts with the world objects using
an ultrasonic joystick with haptic
functionality
• Organized by partitioned spaces
3D Sound
• Interactive
3D sound
hyperstories
3D Sound
• Interactive
3D sound
hyperstories
3D Sound:
Interactive 3D Sound Hyperstories
• Despite elapsed time between sessions,
users remembered the world structure
well
• Authors illustrate the possibility of
“render[ing] a spatial navigable structure
by using only spatialized sound.”
• Opens the possibilities for educational
software for the blind within the
hyperstory context
Speech Recognition:
Media retrieval and indexing
• Problems with media retrieval and
indexing
• Lots of media being generated; too costly
and time-consuming to index manually
• Ideal system design
• Speaker independence
• Noisy-recording environment capability
• Open vocabulary
Speech Recognition:
Media retrieval and indexing
• Using Hidden Markov Models the
system achieved the results in Table 1
• To improve results, “using string
matching techniques” will help
overcome recognition stream errors
Speech Recognition:
Media retrieval and indexing
• String matching strategy
• Develop the search term
• Divide the recognition stream into a set of
sub-strings
• Implement an initial filter process
• “Identify edit operations for remaining substrings in [the] recognition stream”
• Calculate the similarity measure for the
search term and matched strings
Speech Recognition
• Media retrieval and indexing
Speech Recognition:
Media retrieval and indexing
• Results of implementing the string
matching strategy
• Permitting more operations improved recall
performance but degraded precision
performance
• Despite low performance rates, a system
performing these tasks will be
commercially viable
Speech Recognition:
Continuous Speech Recognition
• Problems with continuous speech
recognition
• Has unpredictable errors that are unlike
other “predictable” user input errors
• The absence of context aids makes
recognition difficult for the computer
• Speech user interfaces are still in a
developmental stage and will improve over
time
Speech Recognition:
Continuous Speech Recognition
• Two modes
• Keyboard-mouse and speech
• Two tasks
• Composition and transcription
• Results
• Keyboard-mouse tasks were faster and
more efficient than speech tasks
Speech Recognition:
Continuous Speech Recognition
• Correction methods
• Two general correction methods
• Inline correction, separate proofreading
• Speech inline correction methods
• Select text and reenter, delete text and reenter,
use correction box, correct problems during
correction
Speech Recognition
• Continuous speech recognition
Speech Recognition
• Continuous speech recognition
Speech Recognition:
Continuous Speech Recognition
• Discussion of errors
• Inline correction is preferred by users
•
•
•
•
regardless of modality
Proofreading had increased usage with
speech because of unpredictable system
errors
Keyboard-mouse involved deleting and
reentering the word
Despite ability to correct inline with speech,
errors typically occurred during correction
Dialog boxes used as a last resort
Speech Recognition:
Continuous Speech Recognition
• Discussion of results
• Users still do not feel that they can be
productive using a speech interface for
continuous recognition
• More studies must be conducted to
improve the speech interface for users
Project Implementation
• Write a CAVE application using YG
• 3D objects simulate sound producing
objects
• No speech recognition will occur since
predefined text will be attached to each object
• Objects will move in space
• Objects will not always produce sound
• Objects may not be in the line of sight
Project Implementation
• Write a CAVE application using YG
• Sound location
• Show directional vectors for each object that
emits a sound
– Longer the vector, the farther away the
object is from the user
– X, Y will use arrowheads, Z will use dot / "X"
symbol
– Dot is for an object behind the user, "X"
symbol is for an object in front of the user
– Only visible if sound can be “heard” by the
user
Project Implementation
• Write a CAVE application using YG
• Sound properties
• Represented using a square
• Size represents volume/amplitude (probably
will not consider distance that affects volume)
• Color represents pitch/frequency
• Only visible if sound can be “heard” by the user
Project Implementation
• Write a CAVE application using YG
• Simulate “cocktail party effect”
• Allow user to enlarge text from an object that is
far away
• Provide configuration section to ignore certain
sound properties
– Volume/amplitude
– Pitch/frequency
Project Tasks Completed
•
•
•
•
Basic project design
Have read some documentation about YG
Tested functionality of YG in my account
Established contacts with people that have
programmed CAVE applications using YG
• Will provide 3D models and code that
demonstrates some functionalities of YG features
upon request
• Will help with answering questions and
demonstrating and explaining features of YG
Project Timeline
• Week of March 25
• Practice modifying existing YG programs
• Collect needed 3D models for program
• Week of April 1
• Code objects and their accompanying text
• Implement movement patterns for objects
Project Timeline
• Week of April 8
• Attempt to “turn on and off” the sound of objects
• Work with interaction properties of objects that will
determine visualizing sound properties
• Week of April 15
• Continue working on visualizing sound properties
• Work on “enlarging/reducing” text of an object
Project Timeline
• Week of April 22
• Create simple sound filtering menus
• Test program in CAVE
• EXAM WEEK: Week of April 29
• Practice presentation
• Present project
Bibliography
Behringer, R., Chen, S., Sundareswaran, V., Wang, K., and
Vassiliou, M. (1998). A Novel Interface for Device Diagnostics
Using Speech Recognition, Augmented Reality Visualization, and
3D Audio Auralization, in Proceedings of IEEE International
Conference on Multimedia Computing and Systems Vol I,
Institute of Electrical and Electronics Engineers, Inc., 427-432.
Goose, S. and Moller, C. (1999). A 3D Audio Only Interactive Web
Browser: Using Spatialization to Convey Hypermedia Document
Structure, in Proceedings of the seventh ACM international
conference on Multimedia (Orlando FL, October 1999), ACM
Press, 363-371.
Bibliography
Hollerer, T., Feiner, S., and Pavlik, J. (1998). Situated
Documentaries: Embedding Multimedia Presentations in the
Real World, in Proceedings of the 3rd International Symposium
on Wearable Computers (October 1999, San Francisco CA),
Institute of Electrical and Electronics Engineers, Inc., 1-8.
Karat, C.-M., Halverson, C., Horn, D., and Karat, J. (1999). Patterns
of Entry and Correction in Large Vocabulary Continuous Speech
Recognition Systems, in CHI '99, Proceeding of the CHI 99
conference on Human factors in computing systems: the CHI is
the limit (Pittsburgh PA, May 1999), ACM Press, 568-575.
Bibliography
Lumbreras, M., Sanchez, J. (1999). Interactive 3D Sound
Hyperstories for Blind Children, in CHI '99, Proceeding of the CHI
99 conference on Human factors in computing systems: the CHI
is the limit (Pittsburgh PA, May 1999), ACM Press, 318-325.
Robetison, J., Wong, W. Y., Chung, C., Kim, D. K. (1998). Automatic
Speech Recognition for Generalised Time Based Media Retrieval
and Indexing, in Proceedings of the sixth ACM international
conference on Multimedia (Bristol UK, September 1998), ACM
Press, 241-246.
Download