The Narrator : A Daily Activity Summarizer Using Simple Sensors in an Instrumented Environment Daniel Wilson Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15217 USA dan.wilson@cs.cmu.edu ABSTRACT People tracking provides the basis for automatic monitoring. This service can help people with disabilities or the elderly live independently by providing day-to-day information to physicians and family. The Narrator system uses information generated by a tracker to generate concise, scalable summaries of daily movement activity. We demonstrate output from the Narrator as well as the workings of an underlying tracker in an instrumented home environment. We show that in a system made up almost entirely of sensors that do not report identity information, we can maintain identity information and recover from identification errors. Keywords Ubiquitous Computing, People Tracking, Simple Sensors INTRODUCTION Knowledge of the identity and position of occupants in an instrumented environment is a basic element of automatic monitoring. Automatically generated summaries of daily activities for people with cognitive disabilities can be used to improve the accuracy of pharmacological interventions, track illness progression, and lower caregiver stress levels [7]. Additionally, [15] has shown that movement patterns alone are an important indicator of cognitive function, depression, and social involvement among people with Alzheimer's disease. Christopher Atkeson Robotics / Human Computer Interaction Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15217 USA cga@cs.cmu.edu networks, RFID (Radio frequency identification) badges, and infrared or ultrasound badges [1, 2, 3, 6, 9, 11, 13, 14]. Cost of sensors and sensor acceptance are pivotal issues, especially in the home. Many people are uncomfortable living with cameras and microphones. Laser scanning devices are anonymous, but costly and have limited range. We find that people are often unwilling, forget, change clothes too often, or are not sufficiently clothed when at home to wear a badge, beacon, set of markers, or RF tag. Elderly individuals are often very sensitive to small changes in environment [4], and a target population, institutionalized Alzheimer's patients, frequently strip themselves of clothing, including any wearable sensors [5]. We have chosen to explore a set of sensors that are already present in many homes as part of security systems (motion detectors, contact switches, and other simple binary sensors). These sensors are cheap, computationally inexpensive, and do not have to be continuously worn or carried. We aim for room level tracking, as our sensors do not provide the higher spatial resolution of other types of tracking systems. In this paper we describe a people tracker and a derivative service -- the Narrator. The Narrator is a finite state machine that parses movement information provided by a tracker and generates a concise, readable summary. Our tracker consists of a discrete state Bayes filter and associated models that use information gathered from binary sensors to provide low-cost automatic tracking in a home environment. We demonstrate results from an offline smoothing algorithm, although online filtering techniques are possible. We instrumented a permanently occupied home and conducted a series of experiments to validate our approach. Combining anonymous sensors and sensors that provide identification information for people or object tracking is an open problem. Our tracking problem is similar to object identification. The goal is to determine if a newly observed object is the same as a previously observed object. The solution offered by [12] has been applied to tracking automobile traffic using cameras, extending the technique introduced by [8] to accommodate many sensors. In a recent experiment [13], laser range finders and infrared badges were used to track six people simultaneously in an office environment for 10 minutes. The range finders provide anonymous x,y coordinates while the badge system identified occupants. Our system uses a single RFIDsensor with many anonymous sensors to provide roomlevel tracking. We collect data over long periods to provide an ever-improving model of the unique motion patterns of each occupant. These models can be used later for occupant identification in lieu of additional ID-sensors. RELATED WORK NARRATOR People tracking has been approached via a variety of sensors, including cameras, laser range finders, wireless The purpose of the Narrator system is to provide a summary of daily movements, using information generated automatically by a tracker in an instrumented environment. This summary represents important daily events in a compact, readable format, although the tracker provides many thousands of second-by-second location predictions. • On the most basic level, the Narrator could produce an English account of the second by second location predictions. In our instrumented environment there were an average of 2000 readings per day. This scheme would produce volumes of not very useful information. Instead, we make a few simplifying assumptions and provide userscalable levels of abstraction. Sensor Granularity We make two assumptions. First, although we track several occupants simultaneously, we choose to create summaries for one occupant at a time. We also report only movement information and do not attempt activity recognition, except for sleeping. For sleeping we use a simple rule – if an occupant spends more than four hours in the bedroom, that time is tagged as sleeping. Second, the Narrator directly uses the maximum likelihood predictions of the tracker. Each of these predictions has an associated posterior probability, which we ignore for now. In future work we plan to incorporate this confidence measure into the Narrator's output. • Daniel woke at 8am. He walked to the bathroom and stayed for 15 minutes. He walked downstairs to the kitchen and stayed for 10 minutes. He passed through the foyer to the front porch and left the house. • Daniel woke at 8am. He stayed on the second floor for 15 minutes. He went to the first floor and stayed for 10 minutes. He left the house. • Daniel woke at 8am. He stayed minutes. He left the house. We identify two areas in which reporting may be abstracted. First, we use duration of time spent in a location to scale the amount of information reported on that movement. Second, we use sensor granularity to scale reporting from room level up to house level. Transient Locations Some locations are less interesting than others, because they are traversed constantly and quickly in order to reach end locations. Usually, transient locations are stairways and hallways. These locations demonstrate a marked decrease in the average amount of time spent compared to other locations. For example, in our experiments the staircases had mean durations of 5.5 seconds and hallways had mean durations of 10.3 seconds. On the other hand, the living room and study had a mean of 8.2 minutes. The transience property of a location determines what detail to report travel through that location. We use a threshold on mean duration spent in a room to identify transient spaces. We fit a Gaussian to the amount of time spent in these rooms to obtain an overall measure of transience. The Narrator tags travel through any room as transient if the amount of time spent there is within the transient mean and variance. In this way we simplify the summary without restrictive rules that completely ignore certain areas. With this information the user may choose to fully or partially ignore transient locations, and focus instead upon end locations where the occupant spends the most time. The below sentences were generated by the Narrator and demonstrate the three scales. • Daniel entered the first floor hallway and stayed for 2 seconds. Daniel entered the kitchen and stayed for 10 minutes. • Daniel passed through the first floor hallway, entered the kitchen and stayed for 10 minutes. Daniel walked to the kitchen and stayed for 10 minutes. The tracker can predict location at the granularity of individual sensors, although the current implementation reports at room level. The Narrator allows the user to scale the granularity from room level to floor level and to the entire house. The sentences below demonstrate room level, floor level, and house level granularity, respectively. home for 25 Algorithm The Narrator algorithm is a conceptually simple deterministic finite state machine. It is composed of a set of states, an input alphabet, and a transition function that maps symbols and states to the next state. The set of states represent English words and phrases, while the input alphabet is composed of sensor readings and times. To add some variety to the language, some states have more than one transition for a given symbol. A lookup table maps the room and occupant ids reported by the tracker to room and occupant names. TRACKER We wish to estimate the state of a dynamic system from sensor measurements. In our case, the dynamical system is one or more occupants and the instrumented environment. For this paper we track people at the room level, so a person's state, x, indicates which of N rooms they are in. Measurements include data from motion detectors, pressure mats, drawer and door switches, and radio frequency identification (RFID) systems. We solve the tracking problem off-line with a technique commonly known as smoothing which uses information from both past and future time steps, providing higher accuracy for off-line purposes, such as a daily summary of movement activity. Technological Infrastructure We instrumented a house in order to conduct experiments using real data. The three story house is home to two males, one female, a dog, and a cat. Our environment contains forty-nine sensors and twenty different rooms. • Radio Frequency Identification (RFID): We use low frequency RFID to identify occupants entering and leaving the environment. Each occupant and guest is given a unique transponder, or 'tag'. When the credit card sized tag nears the RFID antenna it emits a unique identification number. Upon recognition of a tag the tracker places a high initial belief that the occupant is at the antenna location. Note that using this tag is no different than using a house key; it is not necessary to carry the tag throughout the environment. • • Motion detectors: We use wireless X10 Hawkeye ™ motion detectors. Upon sensing motion a radio signal is sent to a receiver, which transmits a unique signal over the power line. This signal is collected by a CM11A device attached to a computer. The detectors are petresistant, require both heat and movement to trigger, and run on battery power for over one year. There are twenty four motion detectors installed. Contact switches: Inexpensive magnetic contact switches indicate a closed or open status. They are installed on every interior and exterior door, selected cabinet drawers, and refrigerator doors. There are twenty four contact switches. The sensors are monitored by a single Intel Pentium IV 1.8 GHz desktop computer with 512MB ram. We use an expanded parallel port interface to monitor contact switches, a serial interface to a CM11A device to monitor motion detector activity, and a serial interface to the RFID reader. All activity is logged in real-time to a MySQL database. Tracking Formulation Our goal is to estimate the probability distribution for each person's location, conditioned on sensor measurements. This probability distribution, the tracking system's “belief” or “information” state, is encoded as a length N vector, whose elements give the probability of being in the respective rooms. We use a discrete state Bayes filter to maintain the belief state Bel. Our belief that a person u is in room i at time t is: Beltu [i ] = ptu ( x = i | y1 ,..., yt ) . Here p() indicates probability and y1 ,..., yt denotes the data from time 1 up to time t. Given a new sensor value, we can update the beliefs for all rooms. For room i : Beltu+1[i ] = η ⋅ ptu+1 ( y | x = i ) ⋅ ∑p j =1... N u ( xt +1 = i | xt = j ) ⋅ Beltu [ j ]. The variable, η , is a normalizing constant, so that the elements of any Bel vector sum to 1. In using a Bayes filter, we assume that our room-level states are Markov. This is an approximation, and one research question is whether we can accurately track people after making this approximation. We assume that each person u has a different motion model p u ( xt +1 = i | xt = j ) and sensor model p u ( y | x) . Data Association Each sensor reading must be assigned to at least one occupant or to a noise process. This is the data association step. Our solution is to use an EM process to iteratively 1) estimate the likelihood of each occupant independently generating a given sensor sequence, and then 2) maximize by re-assigning ownership of sensor values [10]. We use the forward-backward algorithm to estimate the posterior beliefs, and then maximize the following quantity: ∑p u t ( y | x) ⋅ Beltu ( x) . x Occupant Independence Currently, we assume that occupants behave independently, an obvious approximation. In reality occupant movements are highly correlated. Conditioning on the presence of several other occupants increases the computational complexity of the problem, while including guests causes further growth in the number of required models. For this paper we were interested in testing the performance of a simpler model. Motion model The equation, p u ( xt | xt −1 ) , represents the motion model for a specific occupant. This model takes into account where the occupant was at the previous time step and predicts how likely the current room is now. Our data is a time series of sensor measurements. All occupants are constantly generating streams of data that are combined in the database. For this reason, we learn motion models for each occupant using the entire database of sensor readings in which that occupant is home alone. We map each sensor to a state that represents a room and counted to generate an [N x N] table of transition probabilities. EXPERIMENTS We performed an uncontrolled experiment on a single occupant using 1288 sensor readings from when that occupant was home alone, collected over a two-day period. During this time one person moved through the house, visiting every sensor and moving with varying speed and direction. The occupant conducted several common tasks, such as making a sandwich and using the computer. The system was not running while the occupant slept. The tracker used a motion model trained for the occupant being tracked. Accuracy is measured as the fraction of time that the room location was predicted correctly. We performed 10 trials, training motion and sensor models on 90% and testing on a rolling 10%. Using smoothing we found an accuracy of 99.6% ± 0.4. We also report results from five days of continuous, unplanned, everyday movement of one to three people in the house. We measured tracker performance over a continuous five-day period. The tracker used individual motion models for the three occupants. There were no guests during this period. To evaluate performance we had to hand-label the data. To make hand labeling feasible we gathered additional information from eight wireless keypads. The keypads have one button for each of the three occupants and one for guests. During that week when anyone entered a room with a keypad, they pushed the button corresponding to their name. This information acted as road signs to help the human labeler disambiguate the data stream and correctly label the movements and identity of each occupant. There were approximately 2000 sensor readings each day for a total of 10441 readings. When the house was occupied on average there was one occupant at home 13% of the time, two occupants home 22% of the time, and all three occupants home for 65% of the time. Note that each night every occupant slept in the house. On the whole, the tracker correctly classified 74.5% sensor readings corresponding to 84.3% of the time. There was no significant difference in accuracy between occupants. The tracker was accurate 84.2% of the time for one occupant, 81.4% for two occupants, and 87.3% for three occupants. Accuracy for three occupants drops to 74.5% when sleeping periods are removed. CONCLUSION We described the Narrator, a service that uses information from a tracker to provide daily movement summaries. We described algorithms that exploit information from binary sensors to perform tracking of several occupants simultaneously. We validated our algorithms using information gathered from an instrumented environment in a series of experiments and provided example output of the Narrator. REFERENCES 1. Abowd, G., Atkeson, C., Bobick, A., Essa, I., MacIntyre, B., Mynatt, E., and Starner, T. (2000). Living Laboratories: The Future Computing Environments Group at the Georgia Institute of Technology. In Proceedings of the 2000 Conference on Human Factors in Computing Systems (CHI 2000), The Hague, Netherlands, April 1-6, 2000. 2. Addlesee, M., Curwen, R., Hodges, S., Newman, J., Steggles, P., Ward, A., Hopper, A. Implementing a Sentient Computing System. IEEE Computer Magazine, Vol. 34, No. 8, August 2001, pp. 50-56. 3. Bennewitz, M., Burgard, W., and Thrun, S. Learning motion patterns of persons for mobile service robots. In Proc. of the IEEE Int. Conference on Robotic & Automation (ICRA), 2002. 4. Burgio, L., Scilley, K., Hardin, M., Janosky, J., Bonino, P., Slater, S., and Engberg, R. (1994). Studying Disruptive Vocalization and Contextual Factors in the Nursing Home Using Computer-Assisted Real-Time Observation. Journal of Gerontology, Vol. 49, No. 5, Pages 230-239. 5. Burgio, L., Scilley, K., Hardin, M., Hsu, C. (2001). Temporal patterns of disruptive vocalization in elderly nursing home residents. International Journal of Geriatric Psychiatry. 16, 378-386. 6. Clarkson, B., Sawhney, N., and Pentland, A. (1998). Auditory Context Awareness via Wearable Computing. In the Proceedings of the Perceptual User Interfaces Workshop, San Francisco, CA. 7. Davis, L., Buckwalter, K., Burgio, L. (1997). Measuring Problem Behaviors in Dementia: Developing a Methodological Agenda. Adv. Nurs. Sci, 20(1),40-55. 8. Huang, T., and Russell, S. Object identification in a Bayesian context. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97), Nagoya, Japan, August 1997. Morgan Kaufmann. 9. Kanade, T., Collins, R., Lipton, R., Burt, P., and Wixson, L. Advances in cooperative multi-sensor video surveillance. In Proceedings of the 1998 DARPA Image Understanding Workshop, volume 1, pages 3-24, November 1998. 10. McLachlan, G.J., and Krishnan, T. (1997). The EM Algorithm and Extensions. Wiley Series in Probability and Statistics, 1997. 11. Mozer, M. C. (1998). The neural network house: An environment that adapts to its inhabitants. In M. Coen (Ed.), Proceedings of the American Association for Artificial Intelligence Spring Symposium on Intelligent Environments (pp. 110-114). Menlo, Park, CA: AAAI Press. 12. Pasula, H., Russell, S., Ostland, M., and Ritov, Y. Tracking many objects with many sensors. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 1999. IJCAI. 13. Schulz, D., Fox, D., and Hightower, J. People Tracking with Anonymous and ID-Sensors using RaoBlackwellised Particle Filters. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI), 2003. 14. Sidenbladh, H. and M. J. Black. (2001), Learning image statistics for Bayesian tracking. In: IEEE International Conference on Computer Vision, ICCV, Vol. 2. pp. 709-716. 15. VanHaitsma, K., Lawton, M.P., Kleban, M., Klapper, J., and Corn, J. (1997). Methodological Aspects of the Study of Streams of Behavior in Elders with Dementing Illness. Alzheimer Disease and Associated Disorders. Vol. 11, No. 4, pp. 228-238