Command Management System For Next

Designing Systems for Next-Generation I/O Devices Mitchell Tsai, Peter Reiher, Jerry Popek UCLA May 20, 1999 Problem • Next-Generation I/O performs poorly with existing applications and operating systems. – Examples of next-generation sensors/actuators • Speech, vision, handwriting, physical location… – AI meets real General-Purpose Systems. • Not in the sandbox anymore! – What should OSs provide for these technologies? Current Systems Keyboard & Mouse GUI Interface OS & Applications Requires 100% accuracy in critical situations One input at a time, from one source Grammar Sounds Speech Recognition Engine OS & Applications Speech Enabler Best Phrase “Make the text blue” Command TextRange.Font.Color = ppAccent1 80-99% accuracy Noise & Errors • Existing Metrics (Accuracy & Speed) are not good enough. • Dictation: 99% accuracy at 150 wpm 10 40 X sec/error = 20% time correcting errors! Type Time (sec) Tspeech 38 2 Tdelay 488 33 Tcorrections 131 Tproof-reading 29 Ttotal 230 Speed (wpm) % Total Time 160 16% 85 14% 30 57% 26 13% 9 26 100% Ttotal = Tspeech + Tdelay + Tcorrections + Tproof-reading Command & Control Errors 1) Most programs have No Undo capability 2) One Keystroke Loss – Cancel in MS Money – Paste instead of Copy on PalmPilot 3) Undo requires advanced knowledge – MS Word accidental shift to outline mode 4) Undo is inconsistent between programs – One text selection (Outlook Mail) or two (Netscape Mail) From Dictation to Commands • Commands are worse than dictation – Con: Errors can be irreversible and/or dangerous – Con: Dictation delays processing to increase accuracy – Pro: Smaller grammars produce higher accuracy • Error handling “ad hoc” & insufficient – Handled twice by sensor processor & application – Programmers design custom interfaces (or programs!) – Users confused by inconsistencies • How to leverage new inputs? – Context-sensitive and ambiguous commands Outline • • • • Problems of Next-Generation Sensors BabySteps: Some Dialogue Management Services Related Work Design Issues for Post-GUI Environments Next-Generation Sensors • Direct – speech, handwriting, vision (eye gaze, pointing, gesture) • Indirect – vision (head and eye focus), geographic location, identification badges, emotions (affective computing). • Traditional – network connectivity, computer resources. 4 Main Problems of Next-Generation Sensors 1) Noise – “Make this b… red”, Sporadic incorrect GPS readings 2) Errors – Accidental user errors, Sensor processor mistakes 3) Ambiguity – “Make this box red”: Which box? 4) Fragmentation – Simultaneous inputs from speech, pointing, & vision Sequences of Errors • Series of commands – “cd thisdir; mv foo ..; rm *” • Linear Undo Stack problems – Accidentally undo a few operations (X, Y, Z) – Type “A” – Lose all operations on the stack (X, Y, Z) • Quit without Save, Accidental Command Mode – Oops!, Confirmed a “Yes/No/Cancel” box. BabySteps: Some Dialogue Management Services • Command Manager – Command Services – Command Properties PowerPoint: Context-Sensitive Speech & Mouse • Context Manager – Analyze Behavior Patterns – Explicit Contexts (Internal, Dialogue, and External) • Communicating Ambiguous Information – Probabilistic – Richer, Task-based, Annotated BabySteps Sounds Speech Interpreter Command Processing Dangerous commands Grammar for context 7 Safe commands Command Processing Modules Context Management OS & Applications Command Properties for context 7. “We are in context 7 now.” Command Management 1) Command Services must be provided by OS – Recording, editing, filtering,... 2) Command Properties must be communicated to OS – Ambiguous, context-sensitive events (from sensors) – Safety, reversibility, usage patterns, cost (from applications) 3) Command Processing Modules – Safety Filter, Usage Tracker, Cost Evaluator How Speech Recognition Works I’ll of view Aisle loathe you Acoustic Model Best Match I loathe you I’ll of view Language Model Best Match I love you I love Hugh Two Model Best Match I love Hugh Best in different context 4 Models in Current Systems: Acoustic, Language, Vocabulary, Topic Methods for Better Accuracy • Speech Engines can produce scored output Score (Phrase | Sound) = –100 to 100 • Combine sensor information with application or OS information using likelihoods(L). L(Command | Sound, Context) = L(Command | Context) * L(Command | Phrase, Context) * L(Phrase | Sound) where L(A) = F(A) / (AF(A) – F(A)) and F(A) can be P(A) or some other scoring function Explicit Contexts From User Behavior Analysis • Example: – Context A = a priori probabilities for “editing” commands – Context B = a priori probabilities for “viewing” commands • Other Types of Explicit Contexts – – – – Variations on Least Recently Used (LRU) Simple Markov Models Hidden Markov Models (HMMs) Bayesian Networks Probabilistic Context-Sensitive Events High-level Events Select “box 3”, “line 4”, and “box 10” Mid-level Events 90% Region X, 10% Region Y Low-level Events Fuzzy Mouse Movement Probabilistic Objects in Events “Thicken” Type = Speech PClarification = 0.6 NCommands = 3 Command[1] = “Thicken line 11”, L[1] = 0.61 Command[2] = “Thicken line 13”, L[2] = 0.24 Command[3] = “Quit”, L[3] = 0.15 User Clarification • Consider PClarification, the probability that we should clarify the command with the user: PClarification = [1-L(CommandML, Context)] * LReversible(CommandML, Context) * LCost(CommandML, Context) CommandML is the Most Likely command. LReversible = 0 to 1 (1 means fully reversible) LCost = 0 to 1 (a normalized version of cost) • Reversibility and cost can reduce seriousness of errors, but they may increase the total time required to finish a task! • What is the relative utility of different types of clarification? BabySteps: Additional Factors • Performance Evaluation – Error Hierarchy – New Commands – “Ambiguity is a Strength, not a Problem” • “Transparency is not the best policy.” – How to get Feedback from the user? • Passive/Active – Different Types of “Cancel” • “Oops”, “Wrong”, “Backtrack” Application Performance: Error Types • Desired Effect • Inaction 2% 13% • Confirmation • Minor 0% 0% 1% 8% 8% 2% 1% 8% – Undoable • Medium – Fixable (1 command) – Fixable (Few commands) – Unrecoverable (Many commands) • Major – Exit without Save, Application Crash/Freeze 9% 5% Extended Benefits for Applications Sound Speech Interpreters • Mouse: Fuzzy Pointing Command Processing Command Processing Modules • Combining speech & mouse commands – Speech: “Make these arrows red.” – Mouse: Move around arrows and other objects. Ambiguity & Context = Convenience OS & Apps Ambiguity can be a Strength • Ambiguity is usually considered a problem. – If the user makes a precise command, and sensors provide perfect interpretation, then the application should know exactly what to do. • Exact precision by the user may be impossible or extremely time-consuming. Consider PowerPoint: – Moving the cursor to change modes • Select Object  Move Object => Resize Object  Copy Object – Selecting objects (and groups of objects) • Very close and/or overlapping (esp. with invisible boundaries) • From layers of different groups – Making object A identical with object B in size, shape, color, etc... BabySteps Summary • New sensors & user inputs present a family of problems – Noise, Errors, Ambiguity, Fragmentation • BabySteps: Some Dialogue Management Services 1) Command Management - Command Services & Command Properties 2) Context Management - Analyze Behavior Patterns, Explicit Contexts 3) Communicate Ambiguous Information - Probabilistic, Richer • Performance Evaluation – – – – New Metrics: Total Task Time, Error Hierarchy New Commands: Will they pass usability threshold? Transparency vs. Communication (User Feedback & Control) Ambiguity is a Strength BabySteps approach to 4 Main Problems 1) Noise – Facilitate closer interaction between sensor processors & applications – Reduce impact of errors through command & context management 2) Errors – Use user behavior analysis to detect, fix, and/or override errors. – Ask user for help based on context and command properties 3) Ambiguity – Limited context-sensitive speech and mouse 4) Fragmentation – Probabilistic, temporal multimodal grammars not handled yet Related Work • Context-Handling Infrastructures – Context Toolkit: Georgia Tech • Provides context widgets for reusable solutions to context handling [Salber, Dey, Abowd 1998, 1999]. • Multimodal Architectures (Human-Computer Interfaces) – QuickSet: Oregon Graduate Institute • First robust approach to reusable scalable architecture which integrates gesture and voice. [Cohen, Oviatt, et al. 1992, 1997, 1999]. • Context Advantages for Operating Systems – File System Actions: UC Santa Cruz • Uses Prediction by Partial Match (PPM) to track sequences of File System Events for a predictive cache [Kroeger 1996, 1999]. Related Work • CHI-99 – “Nomadic Radio: Scaleable and Contextual Notification for Wearable Audio Messaging”: MIT • Priority, Usage Level, & Conversations [Sawney, Schamandt 1999]. – LookOut, “Principles of Mixed-Initiative User Interfaces”: MSFT • Utility of Action vs. Non-action vs. Dialog. [Horvitz 1999]. – “Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems”: IBM/Univ. of Michigan • Compares Dragon, IBM, & L&H. Speech 14 cwpm (vs. keyboard 32 cwpm). [Karat, Halverson, Horn, Karat 1999]. – “Model-based and Empirical Evaluation of Multimodal Interactive Error Correction”: CMU/ Universität Karlsruhe • Models multimodal error correction attempts using TAttempt = TOverhead + R*TInput [Suhm, Myers, Waibel 1999]. Related Work • Multimodal Grammars – Oregon Graduate Institute [Cohen, Oviatt, et al. 1992, 1997]. – CMU [Vo & Waibel 1995, 1997]. • Command Management – Universal Undo [Microsoft] – Task-Based Windows UI [Microsoft] • Context Management (CONTEXT-97, CONTEXT-99) – AAAI-99 Context Workshop • “Operating Systems Services for Managing Context” [Tsai 1999] – AAAI-99 Mixed-Initiative Intelligence • “Baby Steps Towards Dialogue Management” [Tsai 1999] • Probabilistic & Labeled Information in OS – Eve [Microsoft] Post-GUI Systems Artificial Intelligence User Interfaces Operating Systems Next-Generation Sensors/Actuators Real People Computer People Special People General Public Design Issues for Post-GUI Environments • Performance may be driven by mobility & ubiquity. – – – – Hard to beat desktop performance, except for specialized tasks But why not design good macros? Or use 2+ pointers/mice? Even with no video screen or keyboard, use buttons (e.g. PalmPilot) Speech and video good for rapid acquisition of data • What are new tasks for smart mobile environments? – – – – Summarize ongoing tasks (e.g. “Car, what was I doing?) Real dialogue is mixed-initiative (All commands are backgrounded!) Control of multiple applications (Consider JAWS. Is this needed?) Context-sensitive communication (Where’s the nearest pizza?) Possible Changes • Explicit Contexts for Communication – For users, or for system services – What format for communicating events & contexts? – What command properties should applications support? • Database-like Rollback/Transactions for Application Commands – In addition to Elephant File System (HotOS 1999) – Making the entire computer more bulletproof, temporal history – Support dialogue management rather than linear commands • Command and Task History – How to handle? Databases? Trees? Human conversation? – Real Dialogue Management Possible Changes II • “Faster is not better.” – “Courteous Computing” (Horvitz, Microsoft) – Pre-executing tasks works best in MS Outlook with 1 sec delay – Alternative to “Yes/No” dialog = Announce action & wait 1 sec • User I/O must be buffered, filtered, & managed – Normal dialog is a series of background commands – Speech-only output may be a queue of application output requests – Variable environment conditions • low/high bandwidth connections & Video/PalmPilot – What if user must switch modalities midstream? • Separate SAPI, GUI may not work - Need Multimodal API Possible Changes III • Applications not designed for multiple commands. – Currently submenus & dialog box sequences help narrow context. – Procedures  GUI event loops  Post-GUI dialogue • Windows event systems aren’t either. • I/O not designed for rapid interactive haptic/visual systems. – 1/3 sec (300 ms) responses good for conscious responses – But not for unconscious actions • 1 ms visual tracking, 70 ms haptic responses, 150 ms visual responses • Cost/Delay of sensor processors extremely high – How to give e-mail system priority responsiveness? • Unified resource management, Soft Real-Time Systems – Governed by new Command Properties and Context Knowledge Possible Changes IV • Use Probabilistic & Multi-faceted Info throughout OS – Task-based file identification – Multiple configuration setups (NT dialup) • Applications could be designed for ambiguous and contextsensitive commands • Context-based Adaptive Computing, Active Networks • Will a more context-aware system provide resiliency? – Rather than super-slow AI learning? Possible Changes V • How do we support transition to real English dialogue? • “Computerese” may co-exist with – natural human spoken & gestural languages – command-line & GUI computer interfaces • Can other protocol learn from human languages? – Use ambiguity, synonyms. – Different Types of ACKs, NACKs Future Directions • If the System & Algorithm people can provide X, can the UI people design good ways use this information? • If the UI or Device has characteristic Y, what must the system and algorithm people provide? • New sensors & user inputs present a family of problems – Noise, Errors, Ambiguity, Fragmentation • User I/O may need a whole family of User Dialogue services, similar to networking, file management, or process control.

Command Management System For Next

Related documents

Products

Support

Command Management System For Next

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib