Out of Context: Computer Systems That Adapt To, and Learn From, Context Henry Lieberman and Ted Selker Media Laboratory Massachusetts Institute of Technology {lieber, selker}@media.mit.edu Abstract A growing realization is that computer systems will increasingly need to be sensitive to their context. Traditionally, hardware and software were conceptualized as input-output systems: systems that took input explicitly given to them by a human, and acted upon that input alone to produce an explicit output. Now, this view is being seen as being too restrictive. Smart computers, intelligent agent software, and digital devices of the future will have to operate on data that is not explicitly given to them, data that they observe or gather for themselves. These operations may be dependent on time, place, weather, user preferences, or the history of interaction. In other words, context. But what exactly is context? We'll look at perspectives from software agents, sensors and embedded devices, and also contrast traditional mathematical and formal approaches. We'll see how each treats the problem of context, and discuss the implications for design of contextsensitive hardware and software. Why is context important? We are in the midst of many revolutions in computers and communication technologies: ever faster and cheaper computers, software with more and more functionality, and embedded computing in everyday devices. Yet much about the computer revolution is still unsatisfactory. Faster computers do not necessarily mean more productivity. More capable software is not necessarily easier to use. More gadgets sometimes cause more complications. What can we do to make sure that the increased capability of our artifacts actually improves peoples’ lives? 2 Several sub-fields of computer science propose paths to a solution. The field of Artificial Intelligence tells us that making computers more intelligent will help. The field of HumanComputer Interaction tells us that more careful user-centered design and testing of directmanipulation interfaces will help. And indeed they will. But in order for these solutions to be realized, we believe that they will have to grapple with a problem that has previously been given short shrift in these and other fields: the problem of context. We propose that a considerable portion of what we call "intelligence" in Artificial Intelligence or "good design" in Human-Computer Interaction actually amounts to being sensitive to the context in which the artifacts are used. Doing "the right thing" entails that it be right given the user’s current context. Many of the frustrations of today’s software: cryptic error messages, tedious procedures, and brittle behavior are often due to the program taking actions that may be right given the software’s assumptions, but wrong for the user’s actual context. The only way out is to have the software know more about, and be more sensitive to, context. Many aspects of the physical and conceptual environment can be included in the notion of context. Time and place are some obvious elements of context. Personal information about the user is part of context: Who is the user? What does he or she like or dislike? What does he or she know or not know? History is part of context. What has the user done in the past? How should that affect what happens in the future? Information about the computer system and connected networks can also be part of context. We might hope that future computer systems will be selfknowledgeable -- aware of their own context. Notice how little of today’s software takes any significant account of context. Most of today’s software acts exactly the same, regardless of when and where and who you are, whether you are new to it or have used it in the past, whether you are a beginner or an expert, whether you are using it alone or with friends. But what you may want the computer to do could be different under all those circumstances. No wonder our systems are brittle. What is context? Beyond the "black box" Why is it so hard for computer systems to take account of context? One reason is that, traditionally, the field of Computer Science has taken a position that is antithetical to the context problem: the search for context-independence. Many of the abstractions that computer science and mathematics rely on: functions, predicates, subroutines, I/O systems, and networks, treat the systems of interest as black boxes. Stuff goes in one side, stuff comes out the other side, and the output is completely determined by the input. 3 Input Output Application Figure 1. The traditional computer science "black box" We would like to expand that view to take account of context as an implicit input and output to the application. That is, the application can decide what to do based, not only upon the explicitly presented input, but also on the context, and its result can affect not only the explicit output, but also the context. Context can be considered to be everything that affects the computation except the explicit input and output. Context is: • State of the user • State of the physical environment • State of the computational environment • History of user-computer-environment interaction Explicit Input Context-aware Application Explicit Output Figure 2. Context is everything but the explicit input and output And, in fact, even this diagram is too simple. To be more accurate, we should actually close the loop, bringing the output back to the input. This acknowledges the fact that the process is actually an iterative one, and state that is both input to and generated by the application persists over time and constitutes a feedback loop. 4 One consequence of this definition of context is that what you consider context depends on where you draw the boundary around the system you are considering. This affects what you will consider explicit and what you will consider implicit in the system. When talking about humancomputer interfaces, the boundary seems relatively clear, because the boundary between human and computer action is sharp. Explicit input given to the system requires explicit user interface actions -- typing and/or menu or icon selection in response to a prompt or at the time the user expects the system's actions to occur. Anything else counts as context -- history, the system's use of file and network resources, time and place if they matter, etc. If we're talking about a internal software module, or the software interface between two modules, it gets less clear what counts as context, because that depends on what we consider "external" to that particular module. Indeed, one of the moves that many computer scientists make to deal with troublesome aspects of context is "reification" -- to redraw the boundaries so that what was formerly external to a system then becomes internal. The lesson is to always be clear about where the boundaries of a system are. Anything outside is context, and it can never be made to go away completely. The Context-Abstraction Tradeoff The temptation to stick to the traditional black box view comes from the desire for abstraction. Mathematical functions derive their power precisely from the fact that they ignore context, so they are assumed to work correctly in all possible contexts. Context-free grammars, for example, are simpler than context-sensitive grammars and so are preferable if they can be used to describe a language. Side-effects in programming languages are changes to or dependencies on context, and are shunned because they thwart repeatability of computation. Thus, there is a tradeoff between the desire for abstraction and the desire for context sensitivity. We believe that the pendulum has now swung too far in the direction of abstraction, and work in the near future should concentrate more on re-introducing context sensitivity where it is appropriate. Since the world is complex, we often adopt a divide-and-conquer strategy at first, assuming the divided pieces are independent of each other. But a time comes when it is necessary to move on to understanding how each piece fits in its context. The reason to move away from the black box model is that we would like to challenge several of the assumptions that underlie this model. First, the assumption of explicit input. In user interfaces, explicit input from the user is expensive; it slows down the interaction, interrupts the user's train of thought, and raises the possibility of mistakes. The user may be uncertain about what input to provide, and may not be able to provide it all at once. Everybody is familiar with the hassle of continually re-filling out forms on the Web. If the system can get the information it needs from context [stored somewhere else, remembered from a past interaction], why ask you for it again? Devices that sense the environment and use speech recognition or visual recognition may act on input that they sense that may or may not be explicitly indicated by the user. Therefore, in many user interface situations, the goal is to minimize input explicitly provided by the user. 5 Similarly, explicit output from a computational process is not always desirable, particularly because it places immediate demands on the user's attention. Hiroshi Ishii [Wisneski, et. al. 98] and others have worked on "ambient interfaces" where the output is a subtle changing of barelynoticeable environmental factors such as lights and sounds, the goal being to establish a background awareness rather than force the user's attention to the system's output. Finally, there is the implicit assumption that the input-output loop is sequential. In practice in many user interface situations, input and output may be going on simultaneously, or several separate I/O interactions may be overlapped. While traditional command-line interfaces adhered to a strict sequential conversational metaphor between the user and the machine, graphical interfaces and virtual reality interfaces could have many user and system elements active at once. Multiple agent programs, groupware, and multiprocessor machines can all lead to parallel activity that goes well beyond the sequential assumptions of the explicit I/O model. Putting context in context So, given the above description of the context problem, how do we make our systems more context-aware? Two parallel trends in the hardware and software worlds make this transformation increasingly urgent. On the hardware side, shrinking computation and communication hardware and cheaper sensors and perceptual technologies have made embedding computing in everyday devices more and more practical. This gives the devices the ability to sense the world around them and to act upon that information. But how? Devices can easily get overwhelmed with sensory data, so they must figure out which is worth acting on and/or reporting to the user. That is the challenge which we intend to meet with context-aware computing. On the software side, we view the movement towards software agents [Bradshaw 97], [Maes 94] as trying to reduce the complexity of direct-manipulation screen-keyboard-and-mouse interfaces by shifting some of the burden of dealing with context from the human user to a software agent. As these agent interfaces move off the desktop, and small hardware devices take a more decision-making and proactive role, we see the convergence of these two trends. Discussion of aspects of context-aware systems as an industrial design stance can be found in the companion paper [Selker and Burleson 2000], which also details some additional projects in augmenting everyday household objects with context-aware computing. In the next sections of this paper "Context for software agents" and "Context for embedding computing", we detail several of our projects in these areas for which we believe the context problem to be a motivating force. These projects show case studies of how to deal with the context problem on a practical application level as well as provide illustrations of the techniques and problems that arise. We then broaden our view to, very quickly, survey some views that other fields have taken on the context problem, particularly traditional approaches in AI and mathematical logic. Sociology, linguistics, and other fields have also dealt with the context problem in their own way, and 6 although we cannot exhaustively treat these fields here, an overview of the various perspectives is helpful in situating our work, before we conclude. Context for User Interface Agents The context problem has special relevance for the new generation of software agents that will soon be both augmenting and replacing today's interaction paradigm of direct-manipulation interfaces. We tend to conceptualize a computer system as being like a box of tools, each tool being specialized to do a particular job when it is called upon by the user. Each menu operation, icon, or typed command can be thought of as being a tool. Computer systems are now organized around so-called "applications", or collections of these tools that operate on a structured object, such as a spreadsheet, drawing, or collection of e-mail messages. Each application can be thought of as establishing a context for user action. The application says what actions are available to the user and what they can operate upon. Leaving one application and entering another means changing contexts -- you get a different set of actions and a different set of objects to operate on. Each tool works only in a single context and only when that particular application is active. Any communication of data between one application and another requires a stereotypical set of actions on the part of the user [Copy, Switch Application, Paste]. One problem with this style of organization is that many things the user wishes to accomplish are not implementable completely within a single application. For example, the user may think of "Arrange a trip" as a single task, but it might involve use of an e-mail editor, a Web browser, a spreadsheet, and other applications. Switching between them, transferring relevant data, worrying about things getting out of sync, differences in command sets and capabilities between different applications, remembering where you were in the process, etc. soon get out of hand and make the interface more and more complex. If we insist on maintaining the separation of applications, there is no way out of this dilemma. How do we deal with this in the real world? We might delegate the task of arranging a trip to a human assistant, such as a secretary or a travel agent. It then becomes the job of the agent to decide what context is appropriate, what tools are necessary to operate in each context, and determine what elements of the context are relevant at any moment. The travel agent knows that we prefer an aisle seat and how to select it using the airline's reservation system, whether we've been cleared for a wait-listed seat, how to lower the price by changing airline or departure time, etc. It is this kind of job that we are going to have to delegate more and more to software agents if we want to maintain simplicity of interaction in the face of the desire to apply computers to ever more complex tasks. Agents and user intent The primary job of the agent is to understand the intent of the user. There are only two choices: either the agent can ask the user for their intent, or the agent can infer the user's intent from context. Asking the user is fine in many situations, but explicit queries for everything soon get 7 tiring for the user. We rely on our human agents to learn from past experience and to be able to "pick up" information they need from context. We expect human agents to be able to "piece together" partial information that comes from different sources and different times in order to solve a problem. Today's software doesn't learn from past interactions, always asks explicit questions, and can only deal with information explicitly presented to it when it is ready to receive it. This will have to change if we are to make computers ever more useful. Getting context-sensitivity right is no easy task. It is particularly risky because if you get it wrong, it becomes very noticeable and annoying to the user. As in any first generation technology, there will be occasional misfeatures. As an example, the feature in Microsoft Word that automatically capitalizes the first word of a sentence can occasionally get it wrong when you type a word following an abbreviation. By itself, that's not so bad. But this gets even more frustrating as the user tries to undo the "correction" and is repeatedly "re-corrected". One can take this as an argument not to do the correction at all. But perhaps the cure is more contextsensitivity rather than less. The system could notice that its suggestion was rejected by the user, and possibly also note the abbreviation so its performance improves in the future. Instructibility and generalization from context All we have to start with for learning, in humans as well as with machines, is concrete experience in specific situations. For that knowledge to be of any use, it has to be generalized, and so generalization is the key problem for the agent to infer the intent of the user. Generalization means remembering what the user did, and removing some of the details of the particular context so that the same or analogous experience will be applicable in different situations. This involves an essential tradeoff: A conservative approach sticks closely to the concrete experience, and so achieves increased accuracy at the expense of restricting applicability to only those situations that are very similar to the original. A liberal approach tries to do as much abstraction as possible, so that the result will be widely applicable, but at the increased risk of not being faithful to the user's original intentions. We'll illustrate this relationship between generalization and context by talking about several projects that try to make software agents instructible, using the technique of Programming by Example [Cypher 93], sometimes also called Programming by Demonstration. This technique couples a learning agent onto a conventional direct-manipulation graphical interface, such as a text or graphic editor. The agent records the actions performed by the user in the interface. and produces a generalized program that can later be applied to analogous examples. Authors in this field have noted the "data description problem", which amounts to the problem of how to use context to decide how much to generalize the recorded program. The system often has to make the choice of whether to use extensional descriptions [describing the object according to its own properties] or intentional descriptions [describing the object according its role or relationships with other objects]. 8 Sometimes, knowledge of the application domain provides enough context to disambiguate actions. In the programming by example graphical editor Mondrian [Lieberman 93], the system describes objects selected by the user according to a set of graphical properties and relations. These relations are determined by extensional graphical properties [top, bottom, left, right, color] and by intentional properties of the object's role in the demonstration [an object designated as an example might be described as "the first argument to the function being defined"]. We provide the user with two different interactive ways to establish a context: graphically, via attaching graphical annotations to selected parts of the picture [Lieberman 93], or by speech input commands that advise the software agent "what to pay attention to" [Stoehr and Lieberman 95]. "Maintain height" Spe ech in put Cont ext of gr aphical ob jec ts Mou se in pu t Clic k at (27 , 52 ) Click at (112 , 52 ) and Dr ag to (149, 217) Rem emb er -Ac tion Rem emb er -Po int Rem emb er -Po int Draw a rec tangl e from the left to p co rne r of the fir st a rgu ment, to a poi nt 1 /3 o f th e wi dth of the first arg umen t, a nd w hose height is 1 65 pixels Figure 3. Mondrian generalizes according to graphical context, user advice context, and demonstration context Note that neither the user's verbal instruction alone nor the graphical action alone make sense "out of context". It is only when the system interprets the action in the context of the demonstration and the context of the user's advice that the software agent has enough information to generalize the action. User advice to a system thus forms an important aspect of context. It can be either given during a demonstration, as in Mondrian, or afterward. Future systems will necessarily have to give the user the ability to critique the system's performance after the fact. User critiques will serve to debug the system's performance, and serve as a primary mechanism for controlling the learning behavior of the system. The ability to accept critiques will also increase user satisfaction with systems, since there will be less pressure on "getting it right the first time". Notice that making use of critiques also involves a generalization step. When the user says "don’t do that again", it is the responsibility of the system to figure out what "that" refers to, by deciding which aspects of 9 the context are relevant. The ability to modify behavior based on critiques is an essential difference between genuine human dialog and the rigid "dialog boxes" offered by today's computer interfaces. A different kind of use of user context is illustrated by the software agent Letizia [Lieberman 97]. Letizia implements a kind of observational learning, in that it records and analyzes user actions to heuristically compute a profile of user interests. That user profile is then used as context in a proactive search for information of interest to the user. Letizia tracks a user's selections in a Web browser and does a "reconnaissance" search to find interesting pages in the neighborhood of the currently viewed page. In this agent, it is analysis of the user's history that provides the context for anticipating what the user is likely to want next. Letizia shows that there is a valuable role for a software agent in helping the user to identify intersections of past context [browsing history] with current context [the currently viewed Web page and other pages a few links away from it]. Agent lear ns pr ofil e of user inter ests : Broker (17.35), Investment (8.5), Merri ll (7.22)... Figure 4. Letizia helps the user intersect past context [history of browsing concerning investment brokers] with current context ["Smart Money"] to find "Broker Ratings" Letizia illustrates a role for software agents in helping the user deal with "context overload". In situations such as browsing the Web, so much information is potentially relevant that the user is overwhelmed by the task of finding out what elements of the context are actually relevant. It is here that the software agent can step in, and make the first cut that heuristically tries to guess which of the available resources might be relevant, and puts the resources most likely to be relevant at the user's fingertips. 10 A simple way to deal with the profusion of possible interpretations of context is for the system to compute a set of plausible interpretations and let the user choose among them. It is therefore a way for the user to give advice about context to the system. In Grammex ["Grammars by Example"], [Lieberman, Nardi and Wright 98] the user can teach the system to recognize text patterns by presenting examples, letting the system try to parse them, and then interacting with the system to explain the meaning of each part. Text patterns such as e-mail addresses, chemical formulas, or stock ticker symbols are often found in free text. Apple's Data Detectors [Nardi et al 98] provides an agent that uses a parser to pick such patterns out of their embedded textual context and apply a set of predetermined actions appropriate to the type of object found. For each text fragment, Grammex heuristically produces a menu containing all the plausible interpretations of that string in its context. An example string "media.mit.edu" could mean either exactly that string, a string of words separated by periods, or a recursive definition of a host name. Grammex illustrates how a software agent can assist a user by computing a set of plausible contexts, then asking the user to confirm which one is correct. Figure 5. Picking an e-mail address out of context and applying an action to it [top]. Teaching the system how to recognize an e-mail address using Grammex [bottom]. Emacs Menus [Fry 97] is a programming environment that sits in a text editor and analyzes the surrounding text to infer a context in which a popup menu can suggest plausible completions. If you're in a context where it's plausible to type a variable, the system can read your program and supply you with the names of all the variables that would make sense at that point. This expands the notion of context to mean not just "what's in the neighborhood right now" but, more generally, "what's typically in a neighborhood like this". That significantly improves the system's usefulness and perceived intelligence. 11 Models of context: User, Task and System Models All computer systems have some model of context, even if it is only implicit. The computer “knows” what instructions it can process at each stage. It knows what input it expects, in what order, and it knows what error messages to issue when there is a problem. Historically these have been static descriptions, represented by files, internal data structures, or encoded procedurally, and used only for a single purpose. The computer's models take the form of a description of the system itself, a system model, the user's state, history and preferences, a user model, and the goals and actions intended to be performed by the user, a task model. To create context-aware applications, user, task and system models should best be dynamic and have the ability to explain themselves. Computer users always hold ideas of what the system is, and what they can do with it. Part of the usual system model is that if I start the right programs and type the right things, the computer will do things for me. Some users understand advanced concepts like client-server systems, or disk swap space, others don't. The system's model may change [for example, a new version of a browser might integrate e-mail]. The user may or may not be aware of this integration. Historically, computer system models are implicitly held in function calls that expect other parts of the system to be there. Better for contextual computing are systems which represent a system model explicitly, and try to detect and correct differences between the user's system model and what the system can actually do. This is analogous to "naïve physics" in physics education, where we help people understand how what they think they know, wrong as it may be, affects their ability to reason about the system. A dynamic system model would be able to be queried about what the system could do and perhaps even change its response as it was crashing or being upgraded. Norman [Norman 83] stressed that users frequently have models of what the system can do that are incomplete, sometimes intentionally so. They adopt strategies that are deliberately suboptimal in order to defensively protect themselves against the possibility or consequences of errors. Discrepancies between a system's assumption that the user has perfect knowledge of commands and objects, and the user's actual partial understanding can lead to problems. This further argues for dynamic and explanatory models so that the user and system can come to a shared understanding of the system's capabilities. A user’s task model is always changing. The user believes they are finishing the easy part of their homework and need a simple calculation. As the result is different than they expect their task model should change. Likewise the computer has a notion of what we might do with it (e.g. A user would never turn me off without shutting down…). A typical graphical help system like wizards uses a static task model. These typically assume that the user will always do things in a certain way. The wizards in Microsoft Windows take a user through a linear procedure. Any change from the shown procedure is not explained. So wizards don't help the user learn to generalize from a specific situation. Better approaches can actually use the context to teach the user concepts that improve their understanding and future performance. The COACH system (discussed below) takes has a taxonomy of tasks that a user might use a particular computer for. This taxonomy allows the system to have a dynamic task model. Sunil 12 Vemuri [Vemuri 99] is building "What Was I Thinking?" which records segments of speech, and, without necessarily completely understanding the speech, and maps the current segment onto a similar segment that occurred in the recent past. This system expects that user’s have different tasks. Computer programs that anticipate changing uses are more context aware. The computer has a model of what it thinks the user can do, the user model. In most cases this model is that the user knows all of its commands and should enter them correctly. Explicit user models have been proposed and sometime used for some time. The Grundy system [Rich 83] used a stereotype, a list of user attributes like age, sex and nationality, to help choose books in a library. Such a stereotype is an example of the kind of user model that can allow a computer to take user context into account. "Do What I Mean" or DWIM [Lewis and Norman 87] incorporated a system model that changed as a person wrote a program, to know about program variables and functions that they had written. DWIM would notice that a person was typing a function name that wasn’t defined. It would act as thought it believed that the person's intended task was to type a known function name and would look for a similar defined name. DWIM used a dynamic user model to search through user defined words for possible spelling analogs that might be intended. Unfortunately, DWIM also had some bad interface decisions that led users to largely reject it, since corrected in modern successors which interactively display suggested corrections and permit easy recovery in case of misguesses. Early in the 1990s Charles River Systems' Open Sesame tracked user actions in the Mac operating system and offered predictive completion of operation sequences, such as opening certain windows after opening a particular application . However, just having a user model does not ensure an improved system. In the 1970s and 80s, Sophie [Brown, et. al. 75] and other systems attempted to drive an electronic teaching system from an expert user model. It was found that novice users could not be modeled as experts who had some knowledge missing. The things that a user needed to know had more to do with the users than the expert they might become [Sleeman and Brown 82]. The Cognitive Adaptive Computer Help, or COACH system [Selker 94], uses adaptive task user and system models to improve the kind of help that can be given users. This system was actually shown to be able to use such contextual information to give help to users that improved their ability to learn to program. The COACH system took this further. As well as a dynamic system model, COACH kept track of user experience and expertise in deciding what help to give them. The system recorded which constructs a user had used and how often. It notes whether the user's command was accepted by the computer, and adds usage examples and error examples from other users' experiences. COACH also adds user defined constructs to the system model, so that the user explicitly teaches the computer. Context dependent help has referred to help delivered which is relevant to the commands and knowledge active when the help is brought up on the computer. Relative to our notions of context, such help systems are using a system model to make help appropriate to the situation at 13 the time of the query. Not surprisingly such context dependent or integrated help has been shown to improve users ability to utilize it [Borenstein 85]. COACH is a proactive adaptive help system that explains procedures to the user in terms of the user's own context. In contrast to help systems or wizards that use one example for one problem, COACH explains its procedures using the present context. COACH was implemented first for teaching any programming language that could be typed through an Emacs-like editor or C shell command line. A user study demonstrated that COACH’s adaptive contextual help could improve LISP students' ability to learn and write LISP. COACH has models of each task at the novice, intermediate, professional and expert level. Later versions of COACH were deployed as OS/2 Smart Guides. This used graphical, animated, and audio commentary to teach users how to use many of the important GUI interaction techniques. In the illustration below, COACH demonstrates a drag interaction. We call this image a slug trail, which leaves the important visual context surrounding when to press, move and lift up on the mouse. Another technique developed for COACH is the mask, a graphical annotation creating a see-through grid which highlights important selectable things in the context. COACH adaptively creates the mask on the working GUI. Figure 6. COACH explains operations in terms of the user model ["Level 1" at right], system model [Masking disabled text at left], and task model [Icon acted upon and state of mouse at right]. Context for Embedded Computing The toilet flushes when walked away from. The clock tells us that it's time to get up. These are all examples of context awareness without a user needing to type something into a computer. 14 When computers can sense the physical world, we might dispense with much of what keyboards and mice are used for. Using knowledge of what we do, what we have done, where we are, and how we feel about our actions and environment is becoming a major part of the user interface research agenda. People say many things …Its what they do that counts. One of the obvious advantages of context-aware computing is that it does not rely on the symbolic. Symbolic communication must be interpreted through language; communication through context focuses on what we actually do and where we are. People often communicate multiple messages simultaneously, which might be hard to separate. Messages often communicate things about ourselves (age, health, sleep, social interest in each other, other things on our mind, priority of this, level of organization, level in the organization, social background etc.) as well as their ostensible content. Peoples' speech acts tend to be full of errors. We communicate what we think should be interpretable, but often underspecify the utterance. Speaking is by nature unrelated to physical things it talks about (people, places things); without feedback it is hard to check one's points of view. Our logic or the correspondence of what we are saying with the thing we are talking about is often flawed. The modality of verbal communication is often less dense than direct observation of physical acts. Describing what part of something should be observed or manipulated in a particular way can be quite cumbersome compared to actually doing it and having an observer watch. Some things are easier done than said. Things in your head move to things in your life With the advent of ever smaller, faster, and cheaper computing and communications, computing devices will become embedded in everyday devices: clothing, walls, furniture, kitchen utensils, cars, and many different kinds of handheld gadgets that we will carry around with us. Efforts such as the Media Lab's Things That Think and Wearable Computing projects, Xerox Parc's Ubiquitous Computing [Weiser 91], Motorola's Digital DNA, and others, are aimed towards this future. It is our hope that these devices will enhance our lives and not just add an annoyance factor, but the outcome will depend upon whether the devices can take action that is appropriate to the context in which they will be used. Everybody finds cell phones a convenience until they ring inappropriately at a restaurant. Phone companies bristle at the fact that people habitually keep cell phones turned off to guard against just this sort of intrusion. Of course, vibration-ring phones are one simple solution that doesn't require context understanding, and illustrates how sometimes a simple context-free design can work. But even the vibration ring puts you in a dilemma about whether to answer or not. Potentially, a smarter cell phone of the future could have a GPS system and know that it is in a restaurant, and take a message instead. Or they could respond differently to each caller, and know which callers should immediately be put through to its owner and which could be deferred. 15 Context will be useful in cutting down the interface clutter that might otherwise result from having too many small devices, each with its own interface. Already, the touch-tone interface to a common office phone is getting so overloaded with features that most users have difficulty. Specializing "information appliance" devices to a particular task, as recommended by Norman [Norman 98] simplifies each device, but leaves the poor user with a proliferation of devices, each with its own set of buttons, display, power supply, user manual and warranty card. As we have noted above, context can be a powerful factor in reducing user input, in embedded computing devices as well as desktop interfaces. Interfaces for physical devices put different constraints on user interaction than do screen interfaces. Display real-estate is small, if there's any display at all, and space for buttons or other interaction elements is also restricted, if indeed they have any at all. Users need to keep their attention focused on the real-world task, not on interacting with the device. Transcription: Translating context into action Perhaps the greatest potential in embedded sensory and distributed computing is the possibility of eliminating many of the transcription tasks that are otherwise foisted upon users. Transcription occurs when the user must manually provide as input some data which otherwise could be collected or inferred from the environment. Transcription relies on the user to perform the translation. This simple act frequently introduces errors. Mitch Stein and Krishna Nathan's reasons for promoting ideas that lead to the CrossPad Product were in response to the handwritten-notes-to-typing transcription that people often perform. Various approaches to reducing the transcription overhead, with the CrossPad and other products, were evaluated by [Landay and Davis 99]. Transcription can also take the form of translating a goal that the user has intuitively into a formal language that the computer understands; the most extreme example being translation of procedural goals into a programming language. The difficulty of learning new computer interfaces, and of programming is largely traceable to the cognitive barriers imposed by this kind of transcription. Smart devices are becoming interfaces to computational elements which sense and remember aspects of the surrounding context to improve their relationship with people. They will be able to listen to and recognize speech input, perform simple visual recognition, and perhaps even sense the emotional state of the user, via sensors being developed in Rosalind Picard's Affective Computing project [Picard 97]. Within a very short time, all desktop software, including Internet access, will record parts of their human interface, sensors will record people's actions in offices, labs and other environments. The challenge for the software will be to decide what parts of the context are relevant. At the MIT Media Lab's Context-Aware Computing Lab, we have developed several interfaces that illustrate the power of automatically sensing context to eliminate unnecessary transcription tasks. These divide into two categories. The first is using context-aware devices to augment the static environment, such as intelligent furniture with embedded computing and interaction capabilities. The second is augmenting the users themselves, via wearable, portable or attachable devices. 16 The simplest project in the area of intelligent furniture is the Talking Couch, developed at IBM Almaden's USER group. When a couch is positioned in a lobby, it often serves the purpose of inviting the user to take a break to wait for something to happen. Usually magazines are set on the tabletop, or a TV might be on in the corner. The digital couch does more -- it orients the visitor, suggesting what they could do during their break. The couch speaks to its occupant when they sit down, it informs them about what is going on, what time it is and what they might like to do. It says when the conference it was designed for has scheduled its next break, when the cafeteria might be open, whom the next speaker is. If the occupant is wearing a Personal Digital assistant (PDA) that was made for it, the couch further points out specific user model based information that could be relevant to that user. "It is always good to take a break; you have three things you said you wanted to work on when you had time…" Another message that the system gives its occupant is " in 15 minutes you have to give a talk.. in the auditorium…" The first generic reminder message and the more timely talk reminder message point out how being reminded might be useful or irritating. The talking couch creates a user model of preferences and ambitions. Without the net-connected PDA, the couch works with the dynamic system model of the place it is in. It creates a task model which is that a person sitting on the couch would like to be oriented to things going on in the area. With the PDA it adds to this a schedule based model of the user. Another project instruments a bed with computing capabilities, mounting a projection screen above the bed. When we get in bed you expect to take a calm relaxing break. An alarm clock that looks like a sunrise on the ceiling might be nice, especially if you could change the time of sunrise to when you get up. Wouldn't it be nice to go to sleep with the stars in the sky? How about a constellation game to put you to sleep? If you play it too long should it ask if you want to get up later. Reading off of the ceiling allows you to not have to prop yourself up on your elbows or a pillow to read. A multimedia bed provides contextually appropriate content such as slowly waking-up, reading, entertainment, gesture recognition and postural correction and awareness. 17 Figure 7. The electronic bed. Context can augment the mobile user rather than the static environment. An electronic oven mitt and trivet we call the Talking Trivet system uses context to transform a thermometer into a cooking safety watchdog. The talking trivet uses task models of temperatures on an oven mitt to decide how to communicate to a cook about the thing it is in contact with The talking trivet is a digital enhancement of a common oven-mitt /hot-pad. Sensing and memory in an oven mitt make it a better tool than a simple thermometer reading. The oven mitt uses a computer to take time and temperature into account in telling a user if food is in need of re-warming (under 90 degrees F), hot and ready to eat , ready to take out (hotter than boiling water will dry out food, browning starts in soon after) or on fire (above 454 degrees F). The model is key to the value of the temperature reading. The goal of appropriate concerning the importance of what the mitt is doing could save a kitchen fire. The uses of a hot pad are simple, the goals of the user are obvious from its temperature. The talking trivet could well be in a better position to know when to take a pizza out of an oven than a person would it measure the temperature of the oven when the 72 degree pizza is put in a 550 degree oven. The model of pizza makes the talking trivet might tell the user that the pizza should be done. If the user touches the talking trivet to a 100 degree pan after 10 minutes, it should use its contextual model to reflect that this must be much more massive than a pizza and express alarm to the user so that they don’t burn their roast under 550 degree a broiler. This 18 example is included to underscore the value of a task model in a contextual object such as the talking trivet. The trivet need not be “told” anything explicit. It can act as a fire alarm, cooking coach and egg timer based only on what it experiences and its models of cooking and the kitchen. Figure 8. Talking Trivet The view of context from other fields Mathematical and formal approaches to AI Many other fields have treated the problem of context. In what follows, we present a little of our perspective on how other fields have viewed the context problem. First, several areas of mathematics and formal approaches to Artificial Intelligence have tried to address context in reasoning. When formal axiomatizations of common sense knowledge were first used as tools for reasoning in AI systems, it quickly became clear that they could not be used blindly. Simple inferences like "If Tweety is a bird, then conclude that Tweety can fly" seemed plausible until you considered the possibility that Tweety might be a penguin or an ostrich, a stuffed bird, an injured bird, a dead bird, etc. It would be impossible to enumerate all the contingencies that would make the statement definitive. McCarthy [McCarthy 80] introduced the idea of circumscription as a way to contextualize axiomatic statements. Like many of the formal approaches that try to deal with the problem, what this technique amounts to is equivalent to saying that each logical predicate takes an extra argument to represent the context. The notation tries to make this extra argument implicit so as to avoid complicating proofs that use the technique. The best you can say, then, for commonsense 19 reasoning, is to assert a statement like "If X is a bird, then assume X can fly, unless something in the context explicitly prevents it". This is quite a hedge! In Artificial Intelligence, researchers have identified the so-called "frame problem". In planning and robotics systems that deal with sequences of actions, each action is typically represented as a function transforming the state of the world before the action to the state after the action. The frame problem is determining which statements that were true before the action remain true after the action. It is determining how the action affects and is affected by its context. Solving the frame problem requires making inferences about relevance and causal chains. In traditional mathematical logic, statements proven true remain true forever, a property called monotonicity. However, the addition of context changes that, since if we learn more about the context [for example, we learn that Tweety recently died], we might change our mind. Nonmonotonic logic studies this phenomenon. A standard method for dealing with nonmonotonicity in AI systems is the so-called "truth maintenance system", which records dependencies among inferences, and has the capability of retracting assertions if all the assumptions upon which they rest become invalid. Even traditional modal logics can be seen as a reaction to the context problem. Modal logics introduce quantifiers for "necessary" and "possible" truths, and are typically explained in terms of possible world semantics. Something is necessary if and only if it is true in every possible world, and possible just in case it is true in at least one world. Each possible world represents a context, thus modal logics enable reasoning about the dependence of statements upon context. A continuing issue in AI is also the role of background knowledge or "common sense" knowledge as context. A controversial position, probably best exemplified by Doug Lenat's CYC project [Lenat and Guha 90], maintains that intelligence in systems stems primarily from knowing a large number of simple facts, such as "Water flows downhill" or "If someone shouts at you, they are probably angry with you". The intuition is that even simple queries depend on understanding a large amount of context, common sense knowledge that remains unstated, but is shared among most people with a common language and culture. The CYC project has mounted a more than 10-year effort to codify such knowledge, and has achieved the world's largest knowledge base, containing more than a million facts. However, the usability of such a large knowledge base for interactive applications such as Web browsing, retrieval of news stories, or user interface assistants has yet to be proven. You can get knowledge in, but it's not so easy to get it out. The CYC approach could be labeled the "Size Matters" position. It could also be considered an outgrowth of the expert systems movement of the 1980s, where systems of rules for expert problem-solving behavior were created by interviewing domain experts, and the rule base matched to new situations to try to determine what to do. Expert systems were brittle, primarily because, since there was no explicit representation of the context in which the expertise was situated, small changes in context would cause previously entered rules to become inapplicable. Many researchers at Stanford, such as McCarthy and Mike Genesereth, worked on axiomatic representations of common-sense knowledge and theorem-proving techniques to make these representations usable. 20 AI is now turning towards some approaches in which large amounts of context are analyzed, partly using knowledge-based methods, partly statistically, to detect patterns or regularities that would enable better understanding of context. Data mining techniques can be viewed in this light. Data mining is a knowledge discovery technique that analyzes large amounts of data and proposes hypothetical abstractions that are then tested against the data. The Web has also encouraged the rise of Information Extraction [Lehnert 94] techniques, where Web pages are analyzed with parsers that stop short of complete natural language understanding. They do approximate inference from such techniques as TFIDF (Term Frequency times Inverse Document Frequency) keyword analysis, Latent Semantic Indexing, Lexical Affinity (inferring semantic relations from proximity of keywords), part-of-speech tagging and partial parsers. The availability of semantic knowledge bases such as WordNet [Fellbaum 98] also encourages partial understanding of context expressed in natural language text. Finally it is worth noting that there is a dissenting current in AI that decries the use of any sort of representation, and therefore denies the need for context-aware computing. This position is best represented in its most extreme form by Rodney Brooks [Brooks 91], who maintained that intelligence could be achieved in a purely reactive mode, without any need for maintaining a declarative representation. Abstraction is built up only by a subsumption architecture, where sets of reactive behaviors are successively subsumed by other reactive behaviors with greater scope. It should be clear from the our previous discussion that our position on context places emphasis on shared understanding of context between people and machines growing out of both the computer and the user observing and understanding mutual interaction. We believe that even though some knowledge may be built in in advance, it is impossible to assume, as does CYC, or the Stanford axiomatic theoreticians do, that most or all can be codified in advance. We also reject the Brooks position that representation and context is unnecessary for intelligence. We would like to structure an interaction so that users teach the system just a little bit of context with each interaction, and the system feeds back a little bit of its understanding at each step. We believe that in this way, representations of both common-sense and personal context can be built up slowly over time. Context in the Human-Computer Interface field Context plays a big role in Information Visualization and visual design in general. Tufte [Tufte 96] and other authors have noted that the choice of visual appearance of an interface element should depend on its context, since human perception tends to pick up similarities of color, shape or alignment of objects. Visual similarity in a design implies semantic similarity, whether it is intentional or not. A visual language of interface design needs to take account of these relationships, and automatic design tools can be designed that automatically map semantic relationships into visual design choices [Cooper 89], [Lieberman 96]. Introductory texts on user interface design stress the importance of interaction with end users. Designers are admonished to ask users them what they want, testing preliminary mock-ups or low-fidelity prototypes early in the design process. User-centered and participatory design practices have been especially widespread in Scandinavia. This gives the interface designers the 21 best understanding of the users' context in order to minimize mismatches between the designers' and the users' expectation. Context in sociology and behavioral studies Approaches in sociology have stressed the importance of context in observing how people behave and understanding their cognitive abilities. Lucy Suchman and colleagues [Suchman 87] have championed what they call the situated action approach, [Barwise and Perry 83] that stresses the effect of shared social context upon people's behavior. However, the situated critique focuses on getting system designers to adapt designs to context, and not on having the system itself dynamically adapt to context. Another relevant field is the activity theory approach [Nardi 86], growing out of a Russian psychology movement. Other related fields such as industrial systems engineering, ecological psychology, ethology and cognitive psychology also study context. They investigate how the contexts of behavior are critical for determining both what constitutes successful behavior and what strategies an agent must employ to generate that behavior. Probably the most striking work in understanding how context affects people's interaction with computers is Clifford Nass and Byron Reeves' Media Equation [Reeves and Nass 96]. Ingeniously designed experiments dramatically demonstrate how people transfer human social context into interaction with machines, voluntarily or not. Simplifying interfaces without dumbing them down Historically, using computers takes too much concentration. People take significant time learning and dealing with computer interaction rather than the task they are attempting. They must switch contexts from thinking about what they are interested in to thinking explicitly about what commands will have the effects they intend. The danger is that the presence of computers may distract from direct experience. This is like an eager relative so engrossed at taking pictures at a beach party that we wonder if they are truly experiencing the beach. Context-aware computing gives us a way out of this dilemma. Tools can get in the way of tasks, and context-aware computing gives us the potential of taking the tool out of the task. By having computers or devices sense automatically, remember history, and adapt to changing situations, the amount of unnecessary explicit interaction can be reduced, and our systems will wind up more responsive as a result. Everybody says they want simpler interfaces. But if the only way we can get simpler interfaces is to reduce functionality, that results in dumbing down our interfaces. Reduced functionality works well in simple situations, but can be inappropriate or even dangerous when the situation becomes more complex. Context-aware agents and context-sensitive devices can give us the sophisticated behavior we need from our artifacts without burdening the users with complex interfaces. 22 We’ve seen how software agents that record and generalize user interactions, and sensor-based devices that provide context-appropriate behavior hold the potential for getting us off the creeping-featurism treadmill. It’s now time to integrate what we’ve learned about context, both from the mathematical and sociological fields in which it has been traditionally studied, and from the new perspective that comes from AI and human-computer interfaces, to work towards the effortless success we always dream of. References Barwise, Jon and John Perry. Situations and Attitudes. Cambridge: MIT-Bradford, 1983. Bradshaw, Jeffrey, ed., Software Agents, AAAI/MIT Press, Menlo Park, CA, 1997 Borenstein; Nathaniel S, Help texts vs. help mechanisms: A new mandate for documentation writers; Proceedings of the Fourth International Conference on Systems documentation, 1985, Pages 78 - 83. Brooks, R. A., "Intelligence Without Representation", Artificial Intelligence Journal (47), 1991, pp. 139--159. Brown, J., R. Burton, and A. G. Bell, "Sophie: A Step toward a Reactive Environment," International Journal of Man Machine Studies, Vol. 7 (1975) Cooper, Muriel, Computers and Design, Design Quarterly , 142 (1989) pp. 22-31. Cypher, Allen: Watch What I Do: Programming by Demonstration, MIT Press, Cambridge, MA, 1993. Fellbaum, Christiane, WordNet: An Electronic Lexical Database, MIT Press, 1998. Fry, Christopher, Programming on an Already Full Brain, Communications of the ACM, April 1997. Landay, J. and R. C. Davis, Making sharing pervasive: Ubiquitous computing for shared note taking, IBM System Journal, Vol 38, No. 4, 1999. Langley, Pat, Elements of Machine Learning, Morgan Kauffman, 1996. Laurel, Brenda, ed. The Art of Human-Computer Interface Design. New York: Addison-Wesley Publishing, 1989. 23 Lehnert, W.G. "Cognition, Computers and Car Bombs: How Yale Prepared Me for the 90's", in Beliefs, Reasoning, and Decision Making: Psycho-logic in Honor of Bob Abelson (eds: Schank & Langer), Lawrence Erlbaum Associates, Hillsdale, NJ. pp. 143-173, 1994. Lenat, D. B. and R. V. Guha. Building Large Knowledge Based Systems. Reading, Massachusetts: Addison Wesley, 1990. Lewis, Clayton, and Norman, Donald A. "Designing for Error". Baecker, Ronald M., and Buxton, William A.S. (eds.) Readings in Human-Computer Interaction: A Multi-disciplinary Approach. Los Altos, CA. Morgan Kaufmann Publishers, Inc. 1987. pg.621-626 Lieberman, Henry: Mondrian: A Teachable Graphical Editor, in [Cypher 93]. Lieberman, Henry, and David Maulsby: Software That Just Keeps Getting Better, IBM Systems Journal, Volume 35, Nos. 3 & 4, 1996. Lieberman, Henry, Intelligent Graphics: A New Paradigm , Communications of the ACM, August 1996. Lieberman, Henry, Autonomous Interface Agents, ACM Conference on Human-Computer Interface [CHI-97], Atlanta, March 1997, pp. 67-74. Lieberman, Henry, Bonnie Nardi and David Wright, Training Agents to Recognize Text by Example , in ACM Conference on Autonomous Agents [Agents-99], Seattle, May 1999. Also to appear in the Journal of Autonomous Agents and Multi-Agent Systems, 2000. Maes, Pattie, Agents that Reduce Work and Information Overload, Communications of the ACM, July 94. McCarthy, John: Circumscription ---A Form of Non-Monotonic Reasoning, Artificial Intelligence Journal, 1980, Vol. 13, pp. 27--39 Nardi, Bonnie, Context and Consciousness : Activity Theory and Human-Computer Interaction , MIT Press, 1996. Nardi, B., Miller, J. and Wright, D. Collaborative, Programmable Intelligent Agents. March, 1998, Communications of the ACM. Pp. 96-104. Norman, D.A. (1983). Some observations on mental models. In D. Gentner & A. Stevens (Eds.), Mental models (pp. 15-34). Hillsdale, NJ: Lawrence Erlbaum. Norman, Donald, The Invisible Computer, MIT Press, 1998. Picard, Rosalind: Affective Computing, MIT Press, 1997. 24 Reeves, Byron and Clifford Nass, The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places, Cambridge University Press, 1996. Rich, Elaine, Users are Individuals: Individualizing User Models, International Journal of ManMachine Studies 18 (1983), pp. 199-214. Selker, Ted, COACH: A Teaching Agent That Learns, Communications of the ACM 37, 7 (Jul. 1994), Pages 92 - 99. Selker, Ted, and Burleson, Winslow, Context-Aware Design and Interaction in Computer Systems, IBM Systems Journal, Volume 39, Numbers 3&4, 2000. Sleeman, Derek and J S Brown, (l982). Introduction: Intelligent Tutoring Systems: An Overview. In Intelligent Tutoring Systems, edited by D H Sleeman & J S Brown, Academic Press, pp. 1-11. Stoehr, Elizabeth and Henry Lieberman, Hearing Aid: Adding Verbal Hints to a Learning Interface, ACM Multimedia Conference, San Francisco, October 1995. Suchman, Lucy, Plans and Situated Actions. Cambridge University Press, Cambridge, England, 1987. Tufte, Edward, Visual Explanation, Graphics Press, 1996. Vemuri, Sunil. What Was I Thinking? Personal communication, 1999. Publication forthcoming. Class project for MIT course, "Out of Context". http://context99.www.media.mit.edu/courses/context99/ Weiser, Mark. "The Computer for the 21st Century." Scientific American (September 1991):94104. Wisneski, C., Ishii, H., Dahley, A., Gorbet, M., Brave, S., Ullmer, B. and Yarin, P., Ambient Displays: Turning Architectual Space into an Interface between People Digital Information, in Proceedings of International Workshop on Cooperative Buildings (CoBuild '98), (Darmstadt, Germany, February 1998), Springer Press, pp. 22-32. Yan, Hao and Ted Selker, A Context-Aware Office Assistant, ACM International Conference on Intelligent User Interfaces (IUI-2000), New Orleans, January 2000.