James A Personal Mobile Universal Speech Interface for Electronic Devices Current Speech Application Concept Phone Client PDA Client Speech Application Backend Computer Client Current Electronic Devices ??? ??? Speech Application Backend ??? ???Questions??? History: Why is there a conceptual gap? Motivation: Is speech a useful modality for “other” electronic devices? Hardware: How would one get speech in “other” devices? Architecture: What should the system look like? Dialog: What should/will these conversations be like? History Why is there a conceptual gap? Speech is still hard. That will change. Motivation Is speech a useful modality for “other” electronic devices? It seems probable. There has been some positive research (see Microsoft) Ideas? Hardware How would one get speech in “other” devices? No need to as long as devices are remote controlled via a known interface. Refer to system architecture. Architecture Havi adapter X10 adapter Mobile Speech Client Dialog USI Model Artificial subset language Tree-structured functions Universal primitives User-directed Great for recognition Entirely declarative (automatic) James Digital camera Stereo (mode) <turns stereo on> tuner auxiliary x-bass volume volume down off (radio band) (status) AM play stop (mode) off volume up on CD control info Play mode pause frequenc y Device mode play camera stop VCR repeat # single track off fast fwd single disc rewind disc WXXX FM Media type all discs station Digital video record # pause unknow n frequency seek Other devices… track # forward step # backwar d station backward next track WXXX last track random on off VHS forward none Keywords hello-james options where-am-i, where-was-i go-ahead, ok status goodbye what-is, what-is-the how-do-i more Session Management hello-james/goodbye User: blah blah blah... System: ignoring user User: hello-james System: stereo, digital camera User: stereo System: stereo here User: goodbye System: goodbye User: blah blah blah... System: ignoring user Query what-is path/status User: what-is-the am frequency System: the am frequency is five hundred thirty User: what-is random System: random is off User: what-is-the stereo System: the stereo is tuner help/exploration/implicit navigation how-do-i.../options/path options User: control alarm clock radio options System: alarm, clock, radio, sleep... User: more System: x10, stereo User: stereo options System: while turning stereo on: off, am, fm, auxiliary, cd... invocation/specification/implicit exploration/navigation Path User: stereo auxiliary System: while turning the stereo on and switching to auxiliary: auxiliary User: cd System: while switching to cd mode: cd User: play System: while playing a cd: play list navigation More User: radio band am options System: bracketed list [frequency, kabc, k001, k002, k003, k004, k005][fm][off, volume][alarm, clock, sleep][x10, stereo] rendered: frequency, kabc, k001, k002... User: more System: k003, k004, k005... User: more System: fm, off, volume... orientation where-am-i User: what is the disc System: the disc is three User: where am i System: stereo cd disc Research Questions Is the subset language learnable? Once learned, is it efficient? Are user mistakes infrequent enough? Are system mistake infrequent enough? Can one generalize from one device to another? Is the subset language well retained?