The Narrator : A Daily Activity Summarizer Using Simple Sensors in

advertisement
The Narrator : A Daily Activity Summarizer Using Simple
Sensors in an Instrumented Environment
Daniel Wilson
Robotics Institute
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15217 USA
dan.wilson@cs.cmu.edu
ABSTRACT
People tracking provides the basis for automatic
monitoring. This service can help people with disabilities
or the elderly live independently by providing day-to-day
information to physicians and family. The Narrator system
uses information generated by a tracker to generate
concise, scalable summaries of daily movement activity.
We demonstrate output from the Narrator as well as the
workings of an underlying tracker in an instrumented home
environment. We show that in a system made up almost
entirely of sensors that do not report identity information,
we can maintain identity information and recover from
identification errors.
Keywords
Ubiquitous Computing, People Tracking, Simple Sensors
INTRODUCTION
Knowledge of the identity and position of occupants in an
instrumented environment is a basic element of automatic
monitoring. Automatically generated summaries of daily
activities for people with cognitive disabilities can be used
to improve the accuracy of pharmacological interventions,
track illness progression, and lower caregiver stress levels
[7]. Additionally, [15] has shown that movement patterns
alone are an important indicator of cognitive function,
depression, and social involvement among people with
Alzheimer's disease.
Christopher Atkeson
Robotics / Human Computer Interaction
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15217 USA
cga@cs.cmu.edu
networks, RFID (Radio frequency identification) badges,
and infrared or ultrasound badges [1, 2, 3, 6, 9, 11, 13, 14].
Cost of sensors and sensor acceptance are pivotal issues,
especially in the home. Many people are uncomfortable
living with cameras and microphones. Laser scanning
devices are anonymous, but costly and have limited range.
We find that people are often unwilling, forget, change
clothes too often, or are not sufficiently clothed when at
home to wear a badge, beacon, set of markers, or RF tag.
Elderly individuals are often very sensitive to small
changes in environment [4], and a target population,
institutionalized Alzheimer's patients, frequently strip
themselves of clothing, including any wearable sensors [5].
We have chosen to explore a set of sensors that are already
present in many homes as part of security systems (motion
detectors, contact switches, and other simple binary
sensors).
These sensors are cheap, computationally
inexpensive, and do not have to be continuously worn or
carried. We aim for room level tracking, as our sensors do
not provide the higher spatial resolution of other types of
tracking systems.
In this paper we describe a people tracker and a derivative
service -- the Narrator. The Narrator is a finite state
machine that parses movement information provided by a
tracker and generates a concise, readable summary. Our
tracker consists of a discrete state Bayes filter and
associated models that use information gathered from
binary sensors to provide low-cost automatic tracking in a
home environment. We demonstrate results from an offline smoothing algorithm, although online filtering
techniques are possible. We instrumented a permanently
occupied home and conducted a series of experiments to
validate our approach.
Combining anonymous sensors and sensors that provide
identification information for people or object tracking is
an open problem. Our tracking problem is similar to object
identification. The goal is to determine if a newly observed
object is the same as a previously observed object. The
solution offered by [12] has been applied to tracking
automobile traffic using cameras, extending the technique
introduced by [8] to accommodate many sensors. In a
recent experiment [13], laser range finders and infrared
badges were used to track six people simultaneously in an
office environment for 10 minutes. The range finders
provide anonymous x,y coordinates while the badge system
identified occupants. Our system uses a single RFIDsensor with many anonymous sensors to provide roomlevel tracking. We collect data over long periods to provide
an ever-improving model of the unique motion patterns of
each occupant. These models can be used later for
occupant identification in lieu of additional ID-sensors.
RELATED WORK
NARRATOR
People tracking has been approached via a variety of
sensors, including cameras, laser range finders, wireless
The purpose of the Narrator system is to provide a
summary of daily movements, using information generated
automatically by a tracker in an instrumented environment.
This summary represents important daily events in a
compact, readable format, although the tracker provides
many thousands of second-by-second location predictions.
•
On the most basic level, the Narrator could produce an
English account of the second by second location
predictions. In our instrumented environment there were an
average of 2000 readings per day. This scheme would
produce volumes of not very useful information. Instead,
we make a few simplifying assumptions and provide userscalable levels of abstraction.
Sensor Granularity
We make two assumptions. First, although we track several
occupants simultaneously, we choose to create summaries
for one occupant at a time. We also report only movement
information and do not attempt activity recognition, except
for sleeping. For sleeping we use a simple rule – if an
occupant spends more than four hours in the bedroom, that
time is tagged as sleeping. Second, the Narrator directly
uses the maximum likelihood predictions of the tracker.
Each of these predictions has an associated posterior
probability, which we ignore for now. In future work we
plan to incorporate this confidence measure into the
Narrator's output.
•
Daniel woke at 8am. He walked to the bathroom
and stayed for 15 minutes. He walked downstairs
to the kitchen and stayed for 10 minutes.
He
passed through the foyer to the front porch and
left the house.
•
Daniel woke at 8am. He stayed on the second
floor for 15 minutes. He went to the first floor
and stayed for 10 minutes. He left the house.
•
Daniel woke at 8am. He stayed
minutes. He left the house.
We identify two areas in which reporting may be
abstracted. First, we use duration of time spent in a location
to scale the amount of information reported on that
movement. Second, we use sensor granularity to scale
reporting from room level up to house level.
Transient Locations
Some locations are less interesting than others, because
they are traversed constantly and quickly in order to reach
end locations. Usually, transient locations are stairways and
hallways. These locations demonstrate a marked decrease
in the average amount of time spent compared to other
locations. For example, in our experiments the staircases
had mean durations of 5.5 seconds and hallways had mean
durations of 10.3 seconds. On the other hand, the living
room and study had a mean of 8.2 minutes.
The transience property of a location determines what
detail to report travel through that location. We use a
threshold on mean duration spent in a room to identify
transient spaces. We fit a Gaussian to the amount of time
spent in these rooms to obtain an overall measure of
transience. The Narrator tags travel through any room as
transient if the amount of time spent there is within the
transient mean and variance. In this way we simplify the
summary without restrictive rules that completely ignore
certain areas. With this information the user may choose to
fully or partially ignore transient locations, and focus
instead upon end locations where the occupant spends the
most time. The below sentences were generated by the
Narrator and demonstrate the three scales.
•
Daniel entered the first floor hallway and
stayed for 2 seconds. Daniel entered the kitchen
and stayed for 10 minutes.
•
Daniel passed through the first floor hallway,
entered the kitchen and stayed for 10 minutes.
Daniel walked to the kitchen and stayed for 10
minutes.
The tracker can predict location at the granularity of
individual sensors, although the current implementation
reports at room level. The Narrator allows the user to scale
the granularity from room level to floor level and to the
entire house. The sentences below demonstrate room level,
floor level, and house level granularity, respectively.
home for 25
Algorithm
The Narrator algorithm is a conceptually simple
deterministic finite state machine. It is composed of a set
of states, an input alphabet, and a transition function that
maps symbols and states to the next state. The set of states
represent English words and phrases, while the input
alphabet is composed of sensor readings and times. To add
some variety to the language, some states have more than
one transition for a given symbol. A lookup table maps the
room and occupant ids reported by the tracker to room and
occupant names.
TRACKER
We wish to estimate the state of a dynamic system from
sensor measurements. In our case, the dynamical system is
one or more occupants and the instrumented environment.
For this paper we track people at the room level, so a
person's state, x, indicates which of N rooms they are in.
Measurements include data from motion detectors, pressure
mats, drawer and door switches, and radio frequency
identification (RFID) systems. We solve the tracking
problem off-line with a technique commonly known as
smoothing which uses information from both past and
future time steps, providing higher accuracy for off-line
purposes, such as a daily summary of movement activity.
Technological Infrastructure
We instrumented a house in order to conduct experiments
using real data. The three story house is home to two
males, one female, a dog, and a cat. Our environment
contains forty-nine sensors and twenty different rooms.
•
Radio Frequency Identification (RFID): We use low
frequency RFID to identify occupants entering and
leaving the environment. Each occupant and guest is
given a unique transponder, or 'tag'. When the credit card
sized tag nears the RFID antenna it emits a unique
identification number. Upon recognition of a tag the
tracker places a high initial belief that the occupant is at
the antenna location. Note that using this tag is no
different than using a house key; it is not necessary to
carry the tag throughout the environment.
•
•
Motion detectors: We use wireless X10 Hawkeye ™
motion detectors. Upon sensing motion a radio signal is
sent to a receiver, which transmits a unique signal over
the power line. This signal is collected by a CM11A
device attached to a computer. The detectors are petresistant, require both heat and movement to trigger, and
run on battery power for over one year. There are twenty
four motion detectors installed.
Contact switches:
Inexpensive magnetic contact
switches indicate a closed or open status. They are
installed on every interior and exterior door, selected
cabinet drawers, and refrigerator doors. There are twenty
four contact switches.
The sensors are monitored by a single Intel Pentium IV 1.8
GHz desktop computer with 512MB ram. We use an
expanded parallel port interface to monitor contact
switches, a serial interface to a CM11A device to monitor
motion detector activity, and a serial interface to the RFID
reader. All activity is logged in real-time to a MySQL
database.
Tracking Formulation
Our goal is to estimate the probability distribution for each
person's location, conditioned on sensor measurements.
This probability distribution, the tracking system's “belief”
or “information” state, is encoded as a length N vector,
whose elements give the probability of being in the
respective rooms. We use a discrete state Bayes filter to
maintain the belief state Bel. Our belief that a person u is in
room i at time t is:
Beltu [i ] = ptu ( x = i | y1 ,..., yt ) .
Here p() indicates probability and y1 ,..., yt denotes the
data from time 1 up to time t. Given a new sensor value, we
can update the beliefs for all rooms. For room i :
Beltu+1[i ] =
η ⋅ ptu+1 ( y | x = i ) ⋅
∑p
j =1... N
u
( xt +1 = i | xt = j ) ⋅ Beltu [ j ].
The variable, η , is a normalizing constant, so that the
elements of any Bel vector sum to 1. In using a Bayes filter,
we assume that our room-level states are Markov. This is
an approximation, and one research question is whether we
can accurately track people after making this
approximation. We assume that each person u has a
different motion model p u ( xt +1 = i | xt = j ) and sensor
model p u ( y | x) .
Data Association
Each sensor reading must be assigned to at least one
occupant or to a noise process. This is the data association
step. Our solution is to use an EM process to iteratively 1)
estimate the likelihood of each occupant independently
generating a given sensor sequence, and then 2) maximize
by re-assigning ownership of sensor values [10]. We use
the forward-backward algorithm to estimate the posterior
beliefs, and then maximize the following quantity:
∑p
u
t
( y | x) ⋅ Beltu ( x) .
x
Occupant Independence
Currently, we assume that occupants behave independently,
an obvious approximation. In reality occupant movements
are highly correlated. Conditioning on the presence of
several other occupants increases the computational
complexity of the problem, while including guests causes
further growth in the number of required models. For this
paper we were interested in testing the performance of a
simpler model.
Motion model
The equation, p u ( xt | xt −1 ) , represents the motion model for
a specific occupant. This model takes into account where
the occupant was at the previous time step and predicts
how likely the current room is now. Our data is a time
series of sensor measurements. All occupants are
constantly generating streams of data that are combined in
the database. For this reason, we learn motion models for
each occupant using the entire database of sensor readings
in which that occupant is home alone. We map each sensor
to a state that represents a room and counted to generate an
[N x N] table of transition probabilities.
EXPERIMENTS
We performed an uncontrolled experiment on a single
occupant using 1288 sensor readings from when that
occupant was home alone, collected over a two-day period.
During this time one person moved through the house,
visiting every sensor and moving with varying speed and
direction. The occupant conducted several common tasks,
such as making a sandwich and using the computer. The
system was not running while the occupant slept. The
tracker used a motion model trained for the occupant being
tracked. Accuracy is measured as the fraction of time that
the room location was predicted correctly. We performed
10 trials, training motion and sensor models on 90% and
testing on a rolling 10%. Using smoothing we found an
accuracy of 99.6% ± 0.4.
We also report results from five days of continuous,
unplanned, everyday movement of one to three people in
the house. We measured tracker performance over a
continuous five-day period. The tracker used individual
motion models for the three occupants. There were no
guests during this period. To evaluate performance we had
to hand-label the data. To make hand labeling feasible we
gathered additional information from eight wireless
keypads. The keypads have one button for each of the three
occupants and one for guests. During that week when
anyone entered a room with a keypad, they pushed the
button corresponding to their name. This information acted
as road signs to help the human labeler disambiguate the
data stream and correctly label the movements and identity
of each occupant.
There were approximately 2000 sensor readings each day
for a total of 10441 readings. When the house was
occupied on average there was one occupant at home 13%
of the time, two occupants home 22% of the time, and all
three occupants home for 65% of the time. Note that each
night every occupant slept in the house. On the whole, the
tracker correctly classified 74.5% sensor readings
corresponding to 84.3% of the time. There was no
significant difference in accuracy between occupants. The
tracker was accurate 84.2% of the time for one occupant,
81.4% for two occupants, and 87.3% for three occupants.
Accuracy for three occupants drops to 74.5% when
sleeping periods are removed.
CONCLUSION
We described the Narrator, a service that uses information
from a tracker to provide daily movement summaries. We
described algorithms that exploit information from binary
sensors to perform tracking of several occupants
simultaneously. We validated our algorithms using
information gathered from an instrumented environment in
a series of experiments and provided example output of the
Narrator.
REFERENCES
1. Abowd, G., Atkeson, C., Bobick, A., Essa, I.,
MacIntyre, B., Mynatt, E., and Starner, T. (2000).
Living Laboratories: The Future Computing
Environments Group at the Georgia Institute of
Technology. In Proceedings of the 2000 Conference on
Human Factors in Computing Systems (CHI 2000), The
Hague, Netherlands, April 1-6, 2000.
2. Addlesee, M., Curwen, R., Hodges, S., Newman, J.,
Steggles, P., Ward, A., Hopper, A. Implementing a
Sentient Computing System. IEEE Computer Magazine,
Vol. 34, No. 8, August 2001, pp. 50-56.
3. Bennewitz, M., Burgard, W., and Thrun, S. Learning
motion patterns of persons for mobile service robots. In
Proc. of the IEEE Int. Conference on Robotic &
Automation (ICRA), 2002.
4. Burgio, L., Scilley, K., Hardin, M., Janosky, J., Bonino,
P., Slater, S., and Engberg, R. (1994). Studying
Disruptive Vocalization and Contextual Factors in the
Nursing Home Using Computer-Assisted Real-Time
Observation. Journal of Gerontology, Vol. 49, No. 5,
Pages 230-239.
5. Burgio, L., Scilley, K., Hardin, M., Hsu, C. (2001).
Temporal patterns of disruptive vocalization in elderly
nursing home residents. International Journal of
Geriatric Psychiatry. 16, 378-386.
6. Clarkson, B., Sawhney, N., and Pentland, A. (1998).
Auditory Context Awareness via Wearable Computing.
In the Proceedings of the Perceptual User Interfaces
Workshop, San Francisco, CA.
7. Davis, L., Buckwalter, K., Burgio, L. (1997).
Measuring Problem Behaviors in Dementia: Developing
a Methodological Agenda. Adv. Nurs. Sci, 20(1),40-55.
8. Huang, T., and Russell, S. Object identification in a
Bayesian context. In Proceedings of the Fifteenth
International Joint Conference on Artificial Intelligence
(IJCAI-97), Nagoya, Japan, August 1997. Morgan
Kaufmann.
9. Kanade, T., Collins, R., Lipton, R., Burt, P., and
Wixson, L. Advances in cooperative multi-sensor video
surveillance. In Proceedings of the 1998 DARPA Image
Understanding Workshop, volume 1, pages 3-24,
November 1998.
10. McLachlan, G.J., and Krishnan, T. (1997). The EM
Algorithm and Extensions. Wiley Series in Probability
and Statistics, 1997.
11. Mozer, M. C. (1998). The neural network house: An
environment that adapts to its inhabitants. In M. Coen
(Ed.), Proceedings of the American Association for
Artificial Intelligence Spring Symposium on Intelligent
Environments (pp. 110-114). Menlo, Park, CA: AAAI
Press.
12. Pasula, H., Russell, S., Ostland, M., and Ritov, Y.
Tracking many objects with many sensors. In
Proceedings of the Sixteenth International Joint
Conference on Artificial Intelligence (IJCAI),
Stockholm, Sweden, 1999. IJCAI.
13. Schulz, D., Fox, D., and Hightower, J. People Tracking
with Anonymous and ID-Sensors using RaoBlackwellised Particle Filters. In Proceedings of the
Eighteenth International Joint Conference on Artificial
Intelligence (IJCAI), 2003.
14. Sidenbladh, H. and M. J. Black. (2001), Learning image
statistics for Bayesian tracking. In: IEEE International
Conference on Computer Vision, ICCV, Vol. 2. pp.
709-716.
15. VanHaitsma, K., Lawton, M.P., Kleban, M., Klapper,
J., and Corn, J. (1997). Methodological Aspects of the
Study of Streams of Behavior in Elders with Dementing
Illness. Alzheimer Disease and Associated Disorders.
Vol. 11, No. 4, pp. 228-238
Download