Wearable Computers as Intelligent Agents

advertisement
Wearable Computers as Intelligent Agents
Thad Starner
GVU Center, College of Computing
Georgia Institute of Technology
Atlanta, GA 30332-0280 USA
Forty years ago, pioneers such as J.C.R. Licklider and Douglas Englebart championed the idea of interactive
computing as a way of creating a “man-machine symbiosis” where mankind would be enabled to think in
ways that were previously impossible [3, 5]. Unfortunately, in those times, such a mental coupling was
limited to sitting in front of a terminal and keying in requests. Today, wearable computers, through their
small size, proximity to the body, and usability in almost any situation, may enable a more intimate form of
cognition suggested by these early visions. Specifically, wearable computers may begin to act as intelligent
agents during everyday life, assisting in a variety of tasks depending on the user’s context.
1 Wearable computing
Figure 2: Wearable computing platforms worn at
MIT and Georgia Tech.
Figure 1: MicroOptical Corporation’s head-up display embedded in prescription eyeglasses.
Wearable computers often perform background tasks
such as providing reminders, capturing information
or experience, and retrieving time-sensitive information in support of the user. In order to perform such
tasks, the interface must take on a form of “symbiosis” between the user and the computational assistant.
J.C.R. Licklider, an early Advanced Research
Projects Agency (ARPA) director for the United
States whose research agenda is often credited for
laying the foundations of the Internet and interactive
computing, hinted at such a symbiosis in his writings
in 1960 [5]:
Wearable computers are generally equated with
head-up, wearable displays, one-handed keyboards,
and custom computers worn in satchels or belt packs
(see Figures 1-2). Often, the public focuses more
on the gadgetry involved in wearable computing
than on the field’s goals. Wearable computing may
be described as the pursuit of an interface ideal –
that of a continuously-worn intelligent assistant that
augments the wearer’s memory, intellect, creativity,
communication skills, physical senses, and physical
abilities. Unlike desktop applications such as spreadsheets or graphics drawing programs which monopolize the user’s attention, wearable computer applications should be designed to take a secondary role.
“Man-computer symbiosis” is a sub1
class of man-machine systems. There are
many man-machine systems. At present
however, there are no man-computer symbioses. ... The hope is that, in not too
many years, human brains and computing machines will be coupled together very
tightly and that the resulting partnership
will think as no human brain has ever
thought and process data in a way not approached by the information-handling machines we know today.
To achieve the tight partnership suggested by
Licklider in daily life, the computer must be a constant companion for the user. The computer must
share in the experience of the user’s life, drawing
input from the user’s environment to learn how the
user reasons and communicates about the world. As
it learns, the computer can provide increasingly useful and graceful assistance. Ideally, a wearable computer
1. Persists and provides constant access to information services: Designed for everyday and
continuous use, the wearable can interact with
the user at any given time, interrupting when
necessary and appropriate. Correspondingly,
the wearable can be accessed by the user
quickly and with very little effort.
2. Senses and models context: In order to provide
the best cognitive support for the user, the wearable must try to observe and model the user’s
environment, the user’s physical and mental
state, and the wearable’s own internal state.
3. Adapts interaction modalities: The wearable
should adapt its input and output modalities automatically to those which are most appropriate and socially graceful at the time. In many
instances, the computer interface will be secondary to the user’s primary task and should
take the minimal necessary amount of the user’s
attention. In addition, the interface should guarantee privacy when appropriate, adapt to its user
over time, and encourage personalization.
4. Augments and mediates interactions with the
environment: The wearable should provide universal information support for the user in both
the physical and virtual realms. The wearable
should mediate between automation or computation in the environment and the user to present
a interface consistent with the user’s preferences and abilities. In addition, the wearable
should manage potential interruptions, such as
phone calls or e-mail, to best serve its user.
Obviously, this is an ambitious list of desired attributes, requiring many years of research. However,
the effort may provide insights into human intelligence and everyday human-computer interfaces.
2 Why use wearable computers?
2.1 Portable consumer electronics
Some people wear too many computers already. A
businessman might be seen carrying a personal digital assistant (PDA) to keep his contacts, a cellular
telephone to make calls, a pager to help screen incoming phone calls and messages, a laptop for business applications, an electronic translator for practicing a new language, and a calculator wristwatch.
These devices contain very similar components: a
microprocessor, memory, screen, keyboard or input
stylus, battery, and, in the case of the first three, a
wireless modem [12].
Many other portable consumer electronic devices
are evolving to have similar components as well:
the compact disc player is becoming the computerdriven MP3 player, the portable dictaphone is being replaced by solid state audio digitizers, digital
cameras are beginning to supplant the film camera
markets, and digital camcorders have outmoded their
analog counterparts. These devices share many of
the same components; their main distinctions are determined by their sensors, interfaces, and application
software. Why not take advantage of commonality
in components to eliminate cost, weight, and redundancy and improve connectivity and services?
One could imagine a box the size of a deck of
cards enclosing a powerful yet energy-conserving
CPU and a large capacity data storage device. This
pocket-sized wearable computer has one output display – an LED to indicate that it is on and that its
wireless network is functioning. The wireless network connects peripherals to the wearable computer
in a radius of about two to three meters centered at
the body.
The user defines the functionality of the wearable computer by the peripherals chosen. For example, if the user likes to listen to music, wireless earphones allow access to the MP3’s stored on the wearable computer’s hard drive. Adding a walnut-sized
camera and the appropriate software transforms the
wearable computer into a camcorder. With an Internet modem, the wearable becomes a pager, cellular
phone, web-browser, and e-mail reader. Connecting
medical sensors to the wearable concentrates many
diagnostic and recording devices into one unit.
Thus, with wearable computing and a wireless
body-centered network, redundancy in components
can be reduced. In addition, functionality can be
increased as derivative services are created leveraging the convergence of media on the body. The
portable consumer electronics market will become
fertile ground for rapid innovation. Companies can
respond to a new need or niche market quickly with
an appropriate peripheral or software without having
to re-design all the subsystems on each iteration. Sophisticated portable electronics will become cheaper
and more powerful for the consumer, and the computer industry will get a new, attractive upgrade path
to pursue.
IR LEDs
IR Pass Filter
Reflected light
Board Camera
Hand
IR Light
2.2 Mediating interactions in the world
The introduction of the windows, icon, menu, and
pointer (WIMP) desktop metaphor in graphical user
interfaces provided a means by which a user interface could mediate between applications and the
user. Similarly, wearable computers will help
provide a consistent interface to computationallyaugmented objects in the physical world. For example, the Gesture Pendant (Figures reffig:pendant24) is a computer vision-based gesture recognizer
worn as a piece of jewelry (the design is similar to
the desk-based Toshiba “Motion Processor” project
[15]). The device acts as a universal remote control to stereos, televisions, VCRs, home thermostats,
and lighting systems [14]. Remote controls may
be difficult for older adults to use due to dexterity,
vision, or cognitive impairments; instead, the Gesture Pendant recognizers broad gestures to perform
the same task. Since many functions are similar for
many devices, one gesture may provide an intuitive
command across the devices. For example, raising
or lowering a finger in front of the pendant might
raise or lower the volume of a stereo, increase or decrease the temperature setting on a thermostat, and
brighten or dim the lights depending on which object
the wearer faces. Thus, the user may define or learn
one interface and then apply it repeatedly to many
different devices depending on the context. One
can imagine a more general purpose wearable computer that, through software, adapts itself to provide
a consistent interface to whatever electronics or computer systems are available in the environment. No
matter where in the world the wearable user travels,
the wearable provides an appropriate interface in the
user’s native language to home automation that may
be available in the user’s environment. Such infrastructure will be a boon to the handicapped. For example, imagine a tactile display for a blind wearable
user that renders simple graphics on the user’s back
[2, 7]. When provided with a wireless interface and
the appropriate software hooks, the wearable computer can translate images from graphical systems,
Figure 3: The Gesture Pendant’s infrared light emitting diodes invisibly illuminate the area in front of
the user’s body so that the camera embedded in the
pendant can recognize the user’s gestures. A filter
that blocks visible light helps the vision system ignore lights in the environment
such as those used on interactive map kiosks, into
tactile images with which the user can interact. Already wearable computer systems are being investigated for assisting blind users crossing the wide,
busy streets of Atlanta. These systems ”watch” for
infrared beacons embedded in walk street signals and
translate this into a tactile wayfinding system for the
user [10].
2.3 Aiding human to human communication
The wearable can also assist in human to human
communication. For example, Telesign, an ongoing research project at Georgia Tech, uses a hatmounted, downward looking camera in an effort to
recognize sign language (Figure 5). A goal of Telesign is to demonstrate a limited working prototype of
a cellular phone for the Deaf as shown in Figure 6.
Such a system could also be used as part of an interpreting system when a deaf person needs to communicate with a hearing person who does not sign. In
carefully controlled laboratory conditions with user
dependent training, the recognition system performs
sentence-level American Sign Language recognition
at 97% accuracy with signs chosen from a forty word
vocabulary. Currently, we are trying to design computer vision and interface techniques that will allow
the creation of a more mobile system.
Similar systems might be used to translate between spoken languages. Carnegie Mellon researchers have already demonstrated a English to
Croatian speech translator for use in the Bosnian
Hello Joe
TDD
Camera
We need ...
Computer and call
control
Wireless Video
Display
Gesture Pendant
PC
Control Devices
Slink-e
X10
SIGN TRANSLATION
SERVICE (STS):
Control Signals to Devices
Caller 1: Hello Joe
TDD 1: We need some help
on the contract
STS: OK guys. You caught
me walking to my next
appointmnet. Let me find some
place to stand and sign ...
Television
Stereo System
Lava Lamp
Figure 4: The Gesture Pendant allows users to control appliances in their home through gesture.
peace-keeping mission [13].
Figure 6: Telesign: A cellular phone system for the
hearing-impaired.
camera to look at a particular object. Correspondingly, the wearable could mediate reality [6] for the
user by displaying what it is seeing with its camera
on the user’s head-up display. Such a system may
be valuable for augmenting what the user could normally perceive, similar to providing range information while using a telephoto lense. This mediation
could also provide protection against distracting or
dangerous visual input, such as directly looking at
the sun while scanning the sky.
Figure 5: The hat-mounted camera, pointed downward to the hands and the corresponding view.
World/
Joint Task Context
External
Stimuli
Symbiotic System
2.4 Providing context sensitive reminders and intelligent assistants
Context sensitive interfaces may become very powerful with wearable computers. The wearable might
liste to the wearer’s conversations and provide reminders as appropriate to the current topic [8, 16].
Such reminders might range from the text of the last
e-mail message exchanged between the participants
of an ongoing conversation to recommendations of
local restaurants that both participants would enjoy.
2.4.1 Perception and symbiosis
In one approach to creating intelligent assistants, the
wearable computer may be a first-class participant in
the user’s world (see Figure 7). The wearable senses
the world from the same “first-person” perspective
as the human. Both the wearable and the human
have the potential to directly influence the sensors
of each other. For example, the human, through his
head movement, could direct the wearable’s video
Interaction
Bridge
Human
Wearable
Goals
Effect
Figure 7: Wearable/user symbiosis.
In addition, both the wearable and the user can
manipulate the environment. In most cases, the
wearable will effect the environment by communicating with other pieces of automation, such as the
user’s cellular phone, vehicle, or personal robotics,
while the user will effect the environment by performing a physical act, such as a making gesture or
utterance. Both the wearable and the human have the
ability to observe each other’s manipulations. The
wearable exploits its on-body cameras and microphones to observe the user’s gestures and speech,
and the user may use the wearable interface itself
to monitor the traffic history between the wearable
and other automation. In addition, the user should
be able to query the wearable as to the reasoning for
each action.
One of the strengths of the symbiotic system is the
cross-monitoring of the agents’ (human and wearable) actions. The wearable continuously tries to create and update a user model of the human’s goals
and abilities. The wearable can passively monitor
the human’s reactions to external stimuli and learn
to recognize meaningful events in the human’s life
[1]. One can imagine this behavioristic approach
being used as research direction for artificial intelligence; such an approach would try to emulate not
just human behavior, but a particular human’s behavior in a reasonable manner given novel stimuli. From
cognitive science we know that such a “user-model
complete” approach has fundamental limits due to a
lack of knowledge of the internal state of the user;
however, a wearable intelligent assistant can take a
more active approach. The wearable assistant may
gain some insight into the user’s goals by how the
wearer uses the computer. By maintaining a record
of all its interactions and the corresponding external conditions during those interactions, the assistant might learn to associate a certain set of external
stimuli with user desires. For example, if the user
keys “Joe Snyder” while alone in a shopping mall,
the wearable might learn to bring up a map showing
Joe’s current position on a map relative to the user.
However, if the user keys “Joe...” while Joe is in the
user’s immediate vicinity, the wearable displays the
last e-mail sent by Joe so that the users may proceed
with an informed conversation.
In addition to learning through interaction, the
wearable may form a user model by directly asking
the user for help. The assistant may associate a certain user behavior (for example, beginning a computer application for inventory management) with a
given context (like exiting a building at a geographical location indicating a supply warehouse). The assistant may then alert the user that it believes it can
recognize this behavior, ask whether or not it is a significant event, what the event should be called, and
if it should begin the computer application automatically when it next observes the event.
Of course, such intelligent assistants based on
real-world, first-person perception are very difficult
to create. Most computer gesture and speech recognition systems are designed for use in a laboratory
and observe the user from sensors mounted in the
environment instead of on the body. In most cases,
the variable noise conditions of mobility pose a significant barrier to the adoption of such systems.
2.4.2 Just-in-time Information: The Remembrance Agent (RA)
Computers perform well at storing data and executing repetitive functions quickly. Humans, on the
other hand, make intuitive leaps and recognize patterns and structure, even when passive. Thus, an
interface in which the wearable computer helps the
user remember and access information seems profitable. While word processing may comprise the majority of computer use, it requires only a small fraction of the computer’s processing power. Instead of
wasting the remaining power, an information agent
can use the time to search the user’s personal text
database for information relevant to the current task.
The names and short excerpts of the closest matching files can then be displayed. If the search engine is
fast enough, a continuously changing list of matches
can be maintained, increasing the probability that a
useful piece of information will be recovered. Thus,
the agent can act as a memory aid. Even if the user
mostly ignores the agent, he will still tend to glance
at it whenever there is a short break in his work. In
order to explore such a work environment, the Remembrance Agent [9] was created.
The benefits of the Remembrance Agent (RA) are
many. First, the RA provides timely information. If
the user is writing a paper, the RA might suggest relevant references. If the user is reading email and
scheduling an appointment, the RA may happen to
suggest relevant constraints. If the user is holding
a conversation with a colleague at a conference, the
RA might bring up relevant associations based on the
notes the user is taking. Since the RA “thinks” differently than its user, it often suggests combinations
that the user might never assemble himself. Thus, the
RA can act as a constant “brain-storming” system.
The Remembrance Agent can also help with personal organization. As new information arrives, the
RA, by its nature, suggests files with similar information. Thus, the user gets suggestions on where
to store the new information, avoiding the common
phenomenon of multiple files with similar content
(e.g. archives-linux and linux-archives). The first
trial of the prototype RA revealed many such inconsistencies within the sample database and even suggested a new research project by its groupings.
As a user collects a large database of private
knowledge, his RA becomes an expert on that knowledge through constant re-training. A goal of the RA
is to allow co-workers to access the “public” portions of this database conveniently without interrupting the user. Thus, if a colleague wants to know
about augmented reality, he simply sends a message to the user’s Remembrance Agent (e.g. thadra@cc.gatech.edu). The RA can then return its best
guess at an appropriate file. Thus, the user is not
interrupted by the query, and he never has to format
his knowledge explicitly, as with HTML. Knowledge
transfer may occur in a similar fashion. When an engineer trains his successor, he can also transfer his
RA’s database of knowledge on the subject so that
his replacement may continually receive the benefit
of his experience even after he has left. Finally, if a
large collective of people use Remembrance Agents,
queries can be sent to communities, not just individuals. This allows questions of the form “How do I
reboot a Sun workstation?” to be sent to 1000 coworkers whose systems, in their spare cycles, may
send a response. The questioner’s RA, who knows
how the user “thinks,” can then organize the responses into a top 10 list for convenience.
The Remembrance Agent (RA) is comprised of
two parts, the user interface and the search engine.
The user interface continuously watches what the
user types and reads, and it sends this information to
the search engine. The search engine finds old email,
notes files, and on-line documents that are relevant to
the user’s context. This information is then unobtrusively displayed in the user’s field of view.
The Remembrance Agent displays one-line suggestions at the bottom of the emacs display buffer,
along with a numeric rating indicating the relevancy
of the document. These items contain just enough
information to represent the contents of the full document being suggested. For example, the suggestion
line for a piece of email includes the subject line, the
sender, and the date received. The suggestion line
for a notes file contains the file name, owner, date
last modified, and a summary the file. With a simple
key combination, the user can display the full text
of a suggested document. The current information
retrieval implementation uses a term frequency, inverse document frequency algorithm for comparing
documents [11].
Figure 8 shows the output of the Remembrance
Agent (in the bottom buffer) while editing a document (top buffer). The reference database for the
RA was e-mail archives. Personal email and notes
files are good sources for an RA to reference, because these files are already personalized and automatically change with the user’s interests.
The Remembrance Agent can also be a method
for casual, low-overhead collaboration. Current everyday wearable computer users take copious notes
during their normal day to assist in their work and
recreation. As an informal experiment, three wearable computer users from the same laboratory combined their personal databases used in conjunction
with the RA so that they could gain the benefit of
each other’s past experiences. Interestingly, RA sug-
gestions from a colleague’s database can be quite
disquieting. The user recognizes the significance of
the suggestion and can almost claim the memory as
his own due to the similarity with his own experiences, but he knows that it isn’t his entry. These
“shadow memories” provide an asynchronous form
of collaboration where the collaborators are passive
and their wearable agents share information as a matter of the normal use of the wearable.
While the Remembrance Agent gains most of its
contextual information from typed text, wearable
computers have the potential to provide a wealth of
contextual features [4]. Additional sources of user
context information may include time of day, location, biosignals, face recognition, and object recognition. Currently the RA is being coupled with voice
recognition to provide information support during
conversations. Thus, the RA begins to show the advantages of wearable, context-driven augmented reality. However, with an even more comprehensive
agent, the wearable computer may be able to uncover trends in the user’s everyday life, predict the
user’s needs, and pre-emptively gather resources for
upcoming tasks [1].
Figure 8: The Remembrance Agent in use while
writing a paper. The bottom buffer shows the RA’s
suggestions for files relevant to the user’s current
work.
3 Conclusion
Wearable computing enables significant new research opportunities in interface, artificial intelligence, and perception. As research into perception
and user modeling through devices carried on the
body progresses, new intelligent interfaces will result that will reduce work and complexity and lead
to new capabilities.
4 Acknowledgements
This paper reflects the hard work of many members
of the Georgia Tech Contextual Computing Group
and the MIT Media Laboratory Wearable Computing
Project. In particular, Brad Rhodes and Brad Singletary have influenced the discussions herein.
References
[1] B. Clarkson and A. Pentland. Unsupervised clustering of ambulatory audio and video. In ICASSP,
1999.
[2] C. Collins, L. Scadden, and A. Alden. Mobility studies with a tactile imaging device. In Fourth Conf. on
Systems and Devices for the disabled, Seattle, WA,
June 1977.
[3] D. Engelbart. Augmenting human intellect: A conceptual framework. Technical Report AFOSR-3233,
Stanford Research Institute, Menlo Park, CA, October 1962.
[4] M. Lamming and M. Flynn. Forget-me-not: Intimate computing in support of human memory.
In FRIEND21: Inter. Symp. on Next Generation
Human Interface, pages 125–128, Meguro Gajoen,
Japan, 1994.
[5] J. Licklider. Man-computer symbiosis. IRE Transactions of Human Factors in Electronics, HFE-1(1):4–
11, March 1960.
[6] S. Mann. On the bandwagon or beyond wearable
computing. Personal Technologies, 1(4), 1997.
[7] J. Nissen. Cupid and tactile exploration of virtual
maps. In IEEE VRAIS98 Workshop on Interfaces for
Wearable Computers, March 1998.
[8] B. Rhodes. Just-In-Time Information Retrieval. PhD
thesis, MIT Media Laboratory, Cambridge, MA,
May 2000.
[9] B. Rhodes and T. Starner. Remembrance agent:
A continuously running automated information retreival system. In Proc. of Pract. App. of Intelligent Agents and Multi-Agent Tech. (PAAM), London,
April 1996.
[10] D. Ross and B. Blasch. Wearable interfaces for orientation and wayfinding. In ACM conference on Assistive Technologies, pages 193–200, 2000.
[11] G. Salton. The SMART Retrieval System - experiments in automatic document processing. PernticeHall, Inc., Englewood Cliffs, NJ, 1971.
[12] O. Shivers. Bodytalk and the bodynet: A personal
information infrastructure. Technical report, MIT
Laboratory for Computer Science, Personal Information Architecture Note 1, Cambridge, MA, 1993.
[13] A. Smailagic, D. Siewiorek, R. Martin, and
D. Reilly. Cmu wearable computers for real-time
speech translation. In IEEE Intl. Symp. on Wearable
Computers, San Francisco, CA, 1999.
[14] T. Starner, J. Auxier, D. Ashbrook, and M. Gandy.
The gesture pendant: A self-illuminating, wearable,
infrared computer vision system for home automation control and medical monitoring. In IEEE Intl.
Symp. on Wearable Computers, Atlanta, GA, 2000.
[15] Toshiba.
Toshiba’s motion processor recognizes gestures in real time.
Available at:
http://www.toshiba.com/news/980715.htm,
July
1998.
[16] B. Wong and T. Starner. Conversational speech
recognition for creating intelligent agents on wearables. In Human Computer Interface International
(HCII), New Orleans, LA, 2001.
Download