James A Personal Mobile Universal Speech Interface for Electronic Devices

advertisement
James
A Personal Mobile Universal
Speech Interface for Electronic
Devices
Current Speech Application Concept
Phone Client
PDA Client
Speech Application
Backend
Computer Client
Current Electronic Devices
???
???
Speech Application
Backend
???
???Questions???
History: Why is there a conceptual gap?
Motivation: Is speech a useful modality for
“other” electronic devices?
Hardware: How would one get speech in “other”
devices?
Architecture: What should the system look like?
Dialog: What should/will these conversations be
like?
History
Why is there a conceptual gap?
Speech is still hard.
That will change.
Motivation
Is speech a useful modality for “other”
electronic devices?
It seems probable.
There has been some positive
research (see Microsoft)
Ideas?
Hardware
How would one get speech in “other”
devices?
No need to as long as devices are remote
controlled via a known interface.
Refer to system architecture.
Architecture
Havi
adapter
X10
adapter
Mobile
Speech Client
Dialog
USI Model
Artificial subset language
Tree-structured functions
Universal primitives
User-directed
Great for recognition
Entirely declarative (automatic)
James
Digital
camera
Stereo
(mode)
<turns stereo
on>
tuner
auxiliary
x-bass
volume
volume
down
off
(radio
band)
(status)
AM
play
stop
(mode)
off
volume
up
on
CD
control
info
Play
mode
pause
frequenc
y
Device
mode
play
camera
stop
VCR
repeat
#
single
track
off
fast fwd
single
disc
rewind
disc
WXXX
FM
Media
type
all discs
station
Digital
video
record
#
pause
unknow
n
frequency
seek
Other
devices…
track
#
forward
step
#
backwar
d
station
backward
next track
WXXX
last track
random
on
off
VHS
forward
none
Keywords
hello-james
options
where-am-i, where-was-i
go-ahead, ok
status
goodbye
what-is, what-is-the
how-do-i
more
Session Management
hello-james/goodbye
User: blah blah blah...
System: ignoring user
User: hello-james
System: stereo, digital camera
User: stereo
System: stereo here
User: goodbye
System: goodbye
User: blah blah blah...
System: ignoring user
Query
what-is path/status
User: what-is-the am frequency
System: the am frequency is five hundred
thirty
User: what-is random
System: random is off
User: what-is-the stereo
System: the stereo is tuner
help/exploration/implicit navigation
how-do-i.../options/path options
User: control alarm clock radio options
System: alarm, clock, radio, sleep...
User: more
System: x10, stereo
User: stereo options
System: while turning stereo on: off, am, fm,
auxiliary, cd...
invocation/specification/implicit
exploration/navigation
Path
User: stereo auxiliary
System: while turning the stereo on and
switching to auxiliary: auxiliary
User: cd
System: while switching to cd mode: cd
User: play
System: while playing a cd: play
list navigation
More
User: radio band am options
System: bracketed list [frequency, kabc, k001,
k002, k003, k004, k005][fm][off, volume][alarm,
clock, sleep][x10, stereo] rendered: frequency,
kabc, k001, k002...
User: more
System: k003, k004, k005...
User: more
System: fm, off, volume...
orientation
where-am-i
User: what is the disc
System: the disc is three
User: where am i
System: stereo cd disc
Research Questions
Is the subset language learnable?
Once learned, is it efficient?
Are user mistakes infrequent enough?
Are system mistake infrequent enough?
Can one generalize from one device to
another?
Is the subset language well retained?
Download