Command Management System For Next

advertisement
Designing Systems for
Next-Generation I/O Devices
Mitchell Tsai, Peter Reiher, Jerry Popek
UCLA
May 20, 1999
Problem
• Next-Generation I/O performs poorly with
existing applications and operating systems.
– Examples of next-generation sensors/actuators
• Speech, vision, handwriting, physical location…
– AI meets real General-Purpose Systems.
• Not in the sandbox anymore!
– What should OSs provide for these technologies?
Current Systems
Keyboard
& Mouse
GUI
Interface
OS &
Applications
Requires 100% accuracy in critical situations
One input at a time, from one source
Grammar
Sounds
Speech
Recognition
Engine
OS &
Applications
Speech Enabler
Best Phrase
“Make the text blue”
Command
TextRange.Font.Color = ppAccent1
80-99% accuracy
Noise & Errors
• Existing Metrics (Accuracy & Speed) are not good enough.
• Dictation: 99% accuracy at 150 wpm
10
40
X sec/error = 20% time correcting errors!
Type
Time (sec)
Tspeech
38
2
Tdelay 488 33
Tcorrections
131
Tproof-reading
29
Ttotal
230
Speed (wpm) % Total Time
160
16%
85
14%
30
57%
26
13%
9
26
100%
Ttotal = Tspeech + Tdelay + Tcorrections + Tproof-reading
Command & Control Errors
1) Most programs have No Undo capability
2) One Keystroke Loss
– Cancel in MS Money
– Paste instead of Copy on PalmPilot
3) Undo requires advanced knowledge
– MS Word accidental shift to outline mode
4) Undo is inconsistent between programs
– One text selection (Outlook Mail) or two (Netscape Mail)
From Dictation to Commands
• Commands are worse than dictation
– Con: Errors can be irreversible and/or dangerous
– Con: Dictation delays processing to increase accuracy
– Pro: Smaller grammars produce higher accuracy
• Error handling “ad hoc” & insufficient
– Handled twice by sensor processor & application
– Programmers design custom interfaces (or programs!)
– Users confused by inconsistencies
• How to leverage new inputs?
– Context-sensitive and ambiguous commands
Outline
•
•
•
•
Problems of Next-Generation Sensors
BabySteps: Some Dialogue Management Services
Related Work
Design Issues for Post-GUI Environments
Next-Generation Sensors
• Direct
– speech, handwriting, vision (eye gaze, pointing,
gesture)
• Indirect
– vision (head and eye focus), geographic location,
identification badges, emotions (affective
computing).
• Traditional
– network connectivity, computer resources.
4 Main Problems of Next-Generation Sensors
1) Noise
– “Make this b… red”, Sporadic incorrect GPS readings
2) Errors
– Accidental user errors, Sensor processor mistakes
3) Ambiguity
– “Make this box red”: Which box?
4) Fragmentation
– Simultaneous inputs from speech, pointing, & vision
Sequences of Errors
• Series of commands
– “cd thisdir; mv foo ..; rm *”
• Linear Undo Stack problems
– Accidentally undo a few operations (X, Y, Z)
– Type “A”
– Lose all operations on the stack (X, Y, Z)
• Quit without Save, Accidental Command Mode
– Oops!, Confirmed a “Yes/No/Cancel” box.
BabySteps:
Some Dialogue Management Services
• Command Manager
– Command Services
– Command Properties
PowerPoint:
Context-Sensitive
Speech & Mouse
• Context Manager
– Analyze Behavior Patterns
– Explicit Contexts (Internal, Dialogue, and External)
• Communicating Ambiguous Information
– Probabilistic
– Richer, Task-based, Annotated
BabySteps
Sounds
Speech
Interpreter
Command
Processing
Dangerous
commands
Grammar for
context 7
Safe
commands
Command
Processing
Modules
Context Management
OS &
Applications
Command
Properties
for context 7.
“We are in
context 7 now.”
Command Management
1) Command Services must be provided by OS
– Recording, editing, filtering,...
2) Command Properties must be communicated to OS
– Ambiguous, context-sensitive events (from sensors)
– Safety, reversibility, usage patterns, cost (from applications)
3) Command Processing Modules
– Safety Filter, Usage Tracker, Cost Evaluator
How Speech Recognition Works
I’ll of view
Aisle
loathe
you
Acoustic Model Best Match
I loathe you
I’ll
of
view
Language Model Best Match
I love you
I
love
Hugh
Two Model Best Match
I love Hugh
Best in different context
4 Models in Current Systems: Acoustic, Language, Vocabulary, Topic
Methods for Better Accuracy
• Speech Engines can produce scored output
Score (Phrase | Sound) = –100 to 100
• Combine sensor information with application or OS
information using likelihoods(L).
L(Command | Sound, Context) = L(Command | Context)
* L(Command | Phrase, Context)
* L(Phrase | Sound)
where L(A) = F(A) / (AF(A) – F(A))
and F(A) can be P(A) or some other scoring function
Explicit Contexts From
User Behavior Analysis
• Example:
– Context A = a priori probabilities for “editing” commands
– Context B = a priori probabilities for “viewing” commands
• Other Types of Explicit Contexts
–
–
–
–
Variations on Least Recently Used (LRU)
Simple Markov Models
Hidden Markov Models (HMMs)
Bayesian Networks
Probabilistic Context-Sensitive Events
High-level
Events
Select “box 3”, “line 4”, and “box 10”
Mid-level
Events
90% Region X, 10% Region Y
Low-level
Events
Fuzzy Mouse Movement
Probabilistic Objects in Events
“Thicken”
Type = Speech
PClarification = 0.6
NCommands = 3
Command[1] = “Thicken line 11”, L[1] = 0.61
Command[2] = “Thicken line 13”, L[2] = 0.24
Command[3] = “Quit”, L[3] = 0.15
User Clarification
• Consider PClarification, the probability that we should clarify the
command with the user:
PClarification = [1-L(CommandML, Context)]
* LReversible(CommandML, Context)
* LCost(CommandML, Context)
CommandML is the Most Likely command.
LReversible = 0 to 1 (1 means fully reversible)
LCost = 0 to 1 (a normalized version of cost)
• Reversibility and cost can reduce seriousness of errors, but
they may increase the total time required to finish a task!
• What is the relative utility of different types of clarification?
BabySteps: Additional Factors
• Performance Evaluation
– Error Hierarchy
– New Commands
– “Ambiguity is a Strength, not a Problem”
• “Transparency is not the best policy.”
– How to get Feedback from the user?
• Passive/Active
– Different Types of “Cancel”
• “Oops”, “Wrong”, “Backtrack”
Application Performance: Error Types
• Desired Effect
• Inaction
2%
13%
• Confirmation
• Minor
0%
0%
1%
8%
8%
2%
1%
8%
– Undoable
• Medium
– Fixable (1 command)
– Fixable (Few commands)
– Unrecoverable (Many commands)
• Major
– Exit without Save, Application Crash/Freeze
9%
5%
Extended Benefits for Applications
Sound
Speech
Interpreters
• Mouse: Fuzzy Pointing
Command
Processing
Command
Processing
Modules
• Combining speech & mouse commands
– Speech: “Make these arrows red.”
– Mouse: Move around arrows and other objects.
Ambiguity & Context = Convenience
OS &
Apps
Ambiguity can be a Strength
• Ambiguity is usually considered a problem.
– If the user makes a precise command, and sensors provide perfect
interpretation, then the application should know exactly what to do.
• Exact precision by the user may be impossible or extremely
time-consuming. Consider PowerPoint:
– Moving the cursor to change modes
• Select Object  Move Object => Resize Object  Copy Object
– Selecting objects (and groups of objects)
• Very close and/or overlapping (esp. with invisible boundaries)
• From layers of different groups
– Making object A identical with object B in size, shape, color, etc...
BabySteps Summary
• New sensors & user inputs present a family of problems
– Noise, Errors, Ambiguity, Fragmentation
• BabySteps: Some Dialogue Management Services
1) Command Management - Command Services & Command Properties
2) Context Management - Analyze Behavior Patterns, Explicit Contexts
3) Communicate Ambiguous Information - Probabilistic, Richer
• Performance Evaluation
–
–
–
–
New Metrics: Total Task Time, Error Hierarchy
New Commands: Will they pass usability threshold?
Transparency vs. Communication (User Feedback & Control)
Ambiguity is a Strength
BabySteps approach to 4 Main Problems
1) Noise
– Facilitate closer interaction between sensor processors & applications
– Reduce impact of errors through command & context management
2) Errors
– Use user behavior analysis to detect, fix, and/or override errors.
– Ask user for help based on context and command properties
3) Ambiguity
– Limited context-sensitive speech and mouse
4) Fragmentation
– Probabilistic, temporal multimodal grammars not handled yet
Related Work
• Context-Handling Infrastructures
– Context Toolkit: Georgia Tech
• Provides context widgets for reusable solutions to context
handling [Salber, Dey, Abowd 1998, 1999].
• Multimodal Architectures (Human-Computer Interfaces)
– QuickSet: Oregon Graduate Institute
• First robust approach to reusable scalable architecture which
integrates gesture and voice. [Cohen, Oviatt, et al. 1992, 1997, 1999].
• Context Advantages for Operating Systems
– File System Actions: UC Santa Cruz
• Uses Prediction by Partial Match (PPM) to track sequences of File
System Events for a predictive cache [Kroeger 1996, 1999].
Related Work
• CHI-99
– “Nomadic Radio: Scaleable and Contextual Notification for
Wearable Audio Messaging”: MIT
• Priority, Usage Level, & Conversations [Sawney, Schamandt 1999].
– LookOut, “Principles of Mixed-Initiative User Interfaces”: MSFT
• Utility of Action vs. Non-action vs. Dialog. [Horvitz 1999].
– “Patterns of Entry and Correction in Large Vocabulary Continuous
Speech Recognition Systems”: IBM/Univ. of Michigan
• Compares Dragon, IBM, & L&H. Speech 14 cwpm (vs. keyboard 32
cwpm). [Karat, Halverson, Horn, Karat 1999].
– “Model-based and Empirical Evaluation of Multimodal Interactive
Error Correction”: CMU/ Universität Karlsruhe
• Models multimodal error correction attempts using
TAttempt = TOverhead + R*TInput [Suhm, Myers, Waibel 1999].
Related Work
• Multimodal Grammars
– Oregon Graduate Institute [Cohen, Oviatt, et al. 1992, 1997].
– CMU [Vo & Waibel 1995, 1997].
• Command Management
– Universal Undo [Microsoft]
– Task-Based Windows UI [Microsoft]
• Context Management (CONTEXT-97, CONTEXT-99)
– AAAI-99 Context Workshop
• “Operating Systems Services for Managing Context” [Tsai 1999]
– AAAI-99 Mixed-Initiative Intelligence
• “Baby Steps Towards Dialogue Management” [Tsai 1999]
• Probabilistic & Labeled Information in OS
– Eve [Microsoft]
Post-GUI Systems
Artificial Intelligence
User Interfaces
Operating Systems
Next-Generation Sensors/Actuators
Real People
Computer
People
Special
People
General
Public
Design Issues for
Post-GUI Environments
• Performance may be driven by mobility & ubiquity.
–
–
–
–
Hard to beat desktop performance, except for specialized tasks
But why not design good macros? Or use 2+ pointers/mice?
Even with no video screen or keyboard, use buttons (e.g. PalmPilot)
Speech and video good for rapid acquisition of data
• What are new tasks for smart mobile environments?
–
–
–
–
Summarize ongoing tasks (e.g. “Car, what was I doing?)
Real dialogue is mixed-initiative (All commands are backgrounded!)
Control of multiple applications (Consider JAWS. Is this needed?)
Context-sensitive communication (Where’s the nearest pizza?)
Possible Changes
• Explicit Contexts for Communication
– For users, or for system services
– What format for communicating events & contexts?
– What command properties should applications support?
• Database-like Rollback/Transactions for Application Commands
– In addition to Elephant File System (HotOS 1999)
– Making the entire computer more bulletproof, temporal history
– Support dialogue management rather than linear commands
• Command and Task History
– How to handle? Databases? Trees? Human conversation?
– Real Dialogue Management
Possible Changes II
• “Faster is not better.”
– “Courteous Computing” (Horvitz, Microsoft)
– Pre-executing tasks works best in MS Outlook with 1 sec delay
– Alternative to “Yes/No” dialog = Announce action & wait 1 sec
• User I/O must be buffered, filtered, & managed
– Normal dialog is a series of background commands
– Speech-only output may be a queue of application output requests
– Variable environment conditions
• low/high bandwidth connections & Video/PalmPilot
– What if user must switch modalities midstream?
• Separate SAPI, GUI may not work - Need Multimodal API
Possible Changes III
• Applications not designed for multiple commands.
– Currently submenus & dialog box sequences help narrow context.
– Procedures  GUI event loops  Post-GUI dialogue
• Windows event systems aren’t either.
• I/O not designed for rapid interactive haptic/visual systems.
– 1/3 sec (300 ms) responses good for conscious responses
– But not for unconscious actions
• 1 ms visual tracking, 70 ms haptic responses, 150 ms visual responses
• Cost/Delay of sensor processors extremely high
– How to give e-mail system priority responsiveness?
• Unified resource management, Soft Real-Time Systems
– Governed by new Command Properties and Context Knowledge
Possible Changes IV
• Use Probabilistic & Multi-faceted Info throughout OS
– Task-based file identification
– Multiple configuration setups (NT dialup)
• Applications could be designed for ambiguous and contextsensitive commands
• Context-based Adaptive Computing, Active Networks
• Will a more context-aware system provide resiliency?
– Rather than super-slow AI learning?
Possible Changes V
• How do we support transition to real English dialogue?
• “Computerese” may co-exist with
– natural human spoken & gestural languages
– command-line & GUI computer interfaces
• Can other protocol learn from human languages?
– Use ambiguity, synonyms.
– Different Types of ACKs, NACKs
Future Directions
• If the System & Algorithm people can provide X, can the UI
people design good ways use this information?
• If the UI or Device has characteristic Y, what must the system
and algorithm people provide?
• New sensors & user inputs present a family of problems
– Noise, Errors, Ambiguity, Fragmentation
• User I/O may need a whole family of User Dialogue services,
similar to networking, file management, or process control.
Download