Context-Aware Activity Recognition using TAN Classifiers by Neil C. Chungfat Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May 2002 @ Neil C. Chungfat, MMII. All rights reserved. The author hereby grants to MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis and to grant others the right to do so. MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUL 3 1 2002 LIBRARIES . Author .... . . . Department of Electrical Engineering and Computer Science May 24, 2002 Certified by... Stephen S. Intille Thesis Supervisor Accepted by...... Arthur C. Smith Chairman, Department Committee on Graduate Theses 2 Context-Aware Activity Recognition using TAN Classifiers by Neil C. Chungfat Submitted to the Department of Electrical Engineering and Computer Science on May 24, 2002, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract This thesis reviews the components necessary for designing and implementing a realtime activity recognition system for mobile computing devices. In particular, a system utilizing GPS location data and tree augmented naive Bayes (TAN) classifiers is described and evaluated. The system can successfully recognize activities such as shopping, going to work, returning home, and going to a restaurant. Several different sets of features are tested using both the TAN algorithm and a test bed of other competitive classifiers. Experimental results show that the system can recognize about 85% of activities correctly using a multinet version of the TAN algorithm. Although efforts were made to design a general-purpose system, findings indicate that the nature of the position data and many relevant features are person-specific. The results from this research provide a foundation upon which future activity aware applications can be built. Thesis Supervisor: Stephen S. Intille Title: Research Scientist 3 4 Acknowledgments My heartfelt thanks to the many people who supported me during the last five years at MIT. In particular: my parents who have always stood behind me; my advisor, Stephen Intille who offered his guidance and experience throughout this project and refused to let me feel lost during the course of this year; my friends who have made the last five years of hardwork and sleepless nights all worth it. Thanks also to the National Science Foundation for funding this project and my research this past year. 5 6 Contents 1 2 3 Introduction 15 1.1 M otivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2 T he Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Related Work 21 2.1 Mobile Context-Aware Computing . . . . . . . . . . . . . . . . . . . 21 2.2 Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Choosing and Qualifying Actions 25 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1.1 Location Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1.2 Personal Information Management (PIM) Data . . . . . . . . 27 3.1.3 Maps and Landmarks . . . . . . . . . . . . . . . . . . . . . . . 28 3.1.4 Lexical and Knowledge Databases . . . . . . . . . . . . . . . . 29 3.1.5 Other Sensor Data . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Identifying Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3.1 Where - Geographic References . . . . . . . . . . . . . . . . . 31 3.3.2 When - Temporal References . . . . . . . . . . . . . . . . . . 32 3.3.3 Why - PIM data . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.4 How - Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.5 Maintaining Generality . . . . . . . . . . . . . . . . . . . . . . 33 Data Sources 7 4 4.1 Goals ......... 4.2 Approach 4.3 5 7 35 .................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2.1 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2.2 Naive Bayesian Classifiers . . . . . . . . . . . . . . . . . . . . 37 TAN Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3.1 Structure Determination . . . . . . . . . . . . . . . . . . . . . 39 4.3.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3.3 TAN Multinets . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Results and Analysis 43 5.1 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2.1 Single Point Distance . . . . . . . . . . . . . . . . . . . . . . . 44 5.2.2 Changing Distances . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2.3 Trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 . . . . . . . . . . . . . . . . . . . 46 5.3 6 35 Recognizing Activities Classification Results and Analysis 5.3.1 Algorithm Results . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3.2 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Discussion 51 6.1 Feasibility of an Activity Recognition System . . . . . . . . . . . . . 51 6.2 Extensions and Improvements . . . . . . . . . . . . . . . . . . . . . . 52 6.2.1 Other Activities . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.2.2 Low-Level Data . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.2.3 High-Level Knowledge . . . . . . . . . . . . . . . . . . . . . . 53 6.2.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Conclusion 55 A System Overview A.1 Hardware 57 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 57 A.2 Software . . . . . . . . . . . .... . . . . . . . . . . . . . . . . . . 57 . . . . . . . . . . . . . . . . . . 58 A.2.1 GPSRecorder A.2.2 GPSTracker . . . . . . . . . . . . . . . . . . . . . . . 59 A.2.3 TAN . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 A.2.4 Experience Sampling . . . . . . . . . . . . . . . . . . 61 B Activity and Landmark Data 63 . . . . . . . . . . . . . . . . . . 63 B.2 Online Resources . . . . . . . . . . . . . . . . . . . 65 B.3 Labeled Map Areas . . . . . . . . . . . . . . . . . . 66 B.1 Selecting Activities 69 C Tests . . . . . . . . . . . . . . . 69 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 C.2.1 TAN Structure Determination . . . . . . . . . . . . . . . . . . 70 C.2.2 Naive Bayes Training . . . . . . . . . . . . . . . . . . . . . . . 70 C.1 Training and Testing with Different Users C.2 Wordnet Trials 9 10 List of Figures 2-1 The IBM Linux watch (a) and Casio watch equipped with GPS receiver (b) ......... 22 ..................................... . . . . . . . . . . . . . . . . . . . . . 3-1 iPAQ device with GPS receiver 3-2 A map of the area near the MIT campus with different colors repre- 26 senting the various kinds of areas used in this work. . . . . . . . . . . 28 4-1 A naive Bayes network . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4-2 A TAN network for the Adist(prev) dataset. . . . . . . . . . . . . . . 39 5-1 A graph depicting the predicted classifier accuracy of each classifier for each dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 A-1 The maximum spanning trees with associated IPx values for the con. . . . . . . . . . . . . . . . 60 B-1 The complete map with labels, used for our trials. . . . . . . . . . . . 67 tacts (left) and nursery (right) datasets. 11 12 List of Tables 5.1 Summary of classifier results per feature set (mean ± variance). . . . 46 5.2 Confusion matrix for ADiff(prev) . . . . . . . . . . . . . . . . . . . . 49 B.1 Complete activity list with incidence count . . . . . . . . . . . . . . . 64 C.1 Confusion matrix for data gathered by a second user and evaluated using a system trained with the ADiff(prev) dataset . . . . . . . . . . 13 69 14 Chapter 1 Introduction Mobile computing devices such as cellular phones and personal digital assistants (PDAs) have become more prevalent in recent years with handheld device use expected to increase 19% in 2002 to 14.75 million units [37]. In conjunction with their widespread use, these devices continue to become both more powerful and more portable. To evolve beyond more than simple replacements for paper organizers, applications must intelligently take advantage of a user's location and intentions to provide useful services. Context-aware computing aims to create applications that leverage information about the user's environment to improve his experience by making more natural and intuitive interfaces [16]. This thesis outlines the high-level design goals for an activity recognition system and describes the knowledge sources that are available to build such a system. In particular, a system was designed and implemented that uses a simple training process and allows for the automatic recognition of some activities without complex knowledge engineering. The recognition system deals with noisy sensor data, and infers personalized activity recognition models without explicitly providing the system with hard-coded rules. While past systems simply match a user's location to one particular action, this research attempts to go beyond these models by considering more complex situations where a single location may be related to more than one action and for which context is not determined by location alone. The results from this system are discussed and used to speculate on the possibilities 15 for future work. 1.1 Motivation A variety of context-aware applications could take advantage of the capabilities of an accurate activity recognition system. Applications that are aware of the user's environment become more personally relevant by presenting information that applies to the situation at hand, thereby improving the user experience. A key benefit of the system described here is that it relies on location data that is relatively easy to obtain and does not require deploying and calibrating cameras or other more expensive sensory devices. The availability of location data makes this recognition system useful for a variety of applications. * Preventative Medicine. This research grew from a larger effort focused on designing new interfaces and technologies that address preventative health care. Recognizing the activities of an individual both inside and outside of the home provides important data that can be used to infer patterns of behavior. Visionbased systems have been implemented that can recognize motions such as sitting up, falling backwards, squatting, and walking [29] as well as interactions between different people [34]. While it is possible to detect more detailed actions inside a controlled or enclosed environment, such as the home, it is far more difficult to apply the same vision-based systems in the outside world. To obtain equally useful observations outside of the home, an activity recognition system could be used to measure the frequency and types of exercise the user engages in and the frequency of other daily routines. Healthy living could also be influenced by providing advice on nutrition when, for example, it recognizes the user is going out to lunch. Accumulating observations on behavior combined with offering advice at the right time and place can help alert the individual to recognize and remedy potential health risks [38]. It is therefore important that a preventative healthcare system readily distinguish how actions are being executed (i.e. a walk is considered exercise but taking a drive is not) and be sufficiently accurate in 16 its detection process for medical applications. * Response Adaptive Applications. Integrated with other applications and systems, a location-aware system could perform a variety of useful tasks. Many of these are based on the application's ability to provide the right information at the right time. Knowing what the user is doing or is about to do is therefore essential for such applications to avoid presenting untimely and inappropriate information. Memory aid applications could remind the user to buy milk when they sense the user going to the supermarket or to pick up dry-cleaning on the way home from work. Planning applications that anticipate a user's most frequently taken routes could alert the user to hazards such as construction or blocked roads and suggest alternative routes or make reservations for a restaurant the user is heading towards. Social applications could notify friends and family when the user is coming by for a visit or allow someone to ask their neighbor to run an errand if he is already enroute to some destination. An ideal system will therefore be able to run in real-time, recognizing activities quickly to allow for relevant information to be presented to the user before the action is completed. e Games. Both educational and recreational games could make use of this activity recognition system. Electronic Arts recently released the innovative Majestic game [43] that attempted to create a unique and personalized gaming experience by contacting users through phone calls, faxes, e-mails, and Internet messaging. An additional depth to this type of gaming experience could be achieved by incorporating a user's real world actions into the game's storyline. One can imagine a game that adapts itself to the user's routines, thereby personalizing the experience on a per-player basis. The activity recognition will therefore have to be trained to the user's personal habits and routines relatively painlessly, without requiring a great deal of effort from the user. 17 1.2 The Task To design a system that accommodates the aforementioned applications, a few issues must be addressed. First, what kind of data is available that will be useful for this task? This in-turn influences the resolution of activities that can be recognized by the system as well as the types of algorithms that can be used to actually perform the recognition. The research described here attempts to investigate these questions. A system to detect several routine actions including going to work, going home, and going shopping is described as well as how it was designed, implemented, and tested. This system utilizes a combination of GPS and zoning-type map data (i.e. parks, residential neighborhoods, businesses, and restaurants) to represent a user's movements and activities. As the location the system perceives is merely a point on the globe, the class of actions it was designed to recognize were not as fine-grained as other systems that utilize computer vision to obtain data (i.e. [29, 11]). Instead, the focus of this work is on understanding the potential of machine-learning classifiers to recognize activities primarily defined by the type of area where they occur - for example shopping, going to work, and returning home. To be truly useful, an activity recognition system should be able to adapt itself to the habits of the user. However, since each person has his own habits, some supervision on the part of the user is necessary. Ideally, this training process should be as simple as possible and not require complex knowledge engineering. The algorithm should not use features that are overly user-specific and that require hard-coding information into the system. 1.3 Outline The remainder of this paper is structured as follows. Related work on other contextaware and activity-based systems is discussed next in Chapter 2. Chapter 3 covers the types of data that are available for use in activity recognition and how these translate into useful features. A discussion of the approach taken to recognize activities is 18 contained in Chapter 4 followed by a review of the results in Chapter 5. Chapter 6 contains a discussion of lessons learned from this experience as well as possibilities for future work. We then summarize and conclude in Chapter 7. 19 20 Chapter 2 Related Work 2.1 Mobile Context-Aware Computing Within the next few years, portable technology will become more prevalent and more useful for everyday activities. Current PDAs accommodate cameras, GPS receivers, barcode scanners, as well as a multitude of other peripherals. As a result, we anticipate that an integrated "all-in-one" device will be possible within a few years that is both more powerful and more portable than current devices. With this in mind, the activity recognition system described here may one day be able to work using a PDA with a wristwatch form factor. In fact, IBM researchers have already prototyped a "smart-watch" that runs Linux [32] (see Figure 2-1(a)), and watches are on the market with GPS capability (Figure 2-1(b)). While some context-aware studies have focused on improving user-interaction with desktop computers, our research focuses on providing useful information and services to users as they go about their daily lives. Location-aware handheld computers have been used to create intelligent tour guides (see [1, 10, 41, 13]) that present text, audio, and video information to users as they walk around a pre-determined area. Information pertaining to the location such as a restaurant review or historical background are provided when the user reaches a particular location that triggers the response. For these applications, a simple "are they near this location?" is sufficient to serve as the context since an assumption is made that the user is interested in the information 21 (b) (a) Figure 2-1: The IBM Linux watch (a) and Casio watch equipped with GPS receiver (b) that will be provided. These applications therefore only assume one context for a particular location. In other words, the applications present, at most, one type of information per place and do not take into account the possibility that the user is interested in something else or is engaged in multiple simultaneous activities. Other similar applications have been developed to help shoppers locate items in a supermarket and provide nutritional or discount information related to each shopper's location and past history [10, 4]. The addition of a shopper's buying habits can provide enough information to personalize the experience by reminding a user that he last bought milk a week ago or informing him that his favorite kind of cereal is currently on-sale. For an action recognition system, more information than the current location of the user is needed to make an accurate prediction. The implications of user preference for particular routes also influences how the system must function, whereas, for example, the tour guide applications need not be concerned with these details. For them, it does not matter how the user arrived at the location, merely that he is there. The requirement that the system be adaptable on a per-user basis requires more sophisticated testing procedures than a system that functions the same way for all users. 22 2.2 Activity Recognition The computer vision community has devoted much research into motion recognition. One technique, known as background subtraction, can distinguish moving people and objects within a room. Over time, the camera learns a statistical model of the image representing the static environment (objects that remain still). When a person enters the room, it compares the new image to its version of the background and highlights the differences, assuming that these "blobs" are people moving about [19]. Beyond tracking people themselves, extensions of this technique can be used to track gestures, including arm or hand motions such as those used in sign language [42] and even automobiles to study traffic patterns [26]. Vision systems have also used probabilistic models to perform action recognition. Pentland et al. have used hidden Markov models (HMMs) and features extracted from video data to recognize patterns of human interaction [34], Tai Chi hand movements [7], and the behavior of people driving automobiles [33]. The data for these vision-based systems was gathered in controlled environments with mounted cameras. The benefit of GPS technology is that it is already omnipresent and therefore any service that makes use of it can be deployed without requiring the placement and calibration of sensors in the environment. The set of actions that a vision-based system is designed to recognize are dependent on camera placement, lighting conditions, and many other environmental factors. This makes it difficult for these systems to adapt to new places or to be extended to recognize different kinds of actions. Another important difference is that the vision systems are trained to recognize complete actions such as the gesture of standing up. The action recognition system that we envision, on the other hand, must attempt to make a prediction of the action that is currently taking place (and may take a long time to complete) to be of the most use. Once the action has taken place, it may be too late to present useful information to the user. As a result, some of the models used for recognition by the vision-based systems do not apply for our system. HMMs can be used to model 23 actions in vision systems because they model a discrete state of the system. The data we are concerned with is not as detailed, which makes it hard to specify the specific states and transitions in our system. A HMM could potentially be layered on top of our system to model user habits, though this requires a robust underlying system that can accurately predict what a user is doing. For this system, we consider the class of machine-learning classification algorithms. This is fitting because our data is well represented by distinct features that we wish to cluster into categories. In particular, we examine the tree augmented naive (TAN) version of the Bayes classifier because of its combination of training simplicity and competitive performance [23]. 24 Chapter 3 Choosing and Qualifying Actions An important goal in designing this system was to minimize the number of hard-coded rules required by the system. While we could have used rules explicitly connecting places with actions (i.e. supermarkets with grocery shopping) as in prior work [1], our goal was for the system to learn these associations through training data. We wish to do this so that the system can adapt to a user's personal habits. For example, a shopping mall that contains restaurants could be frequented by one user to shop, another to work, and yet another for meals. To make this learning process possible, both sources of high-level knowledge and methods of making this knowledge available to the system are required. 3.1 Data Sources There are several potential sources of data for an activity-recognition system. In addition to location data, the applications of the PDA device as well as several geographic tools are available as resources. 3.1.1 Location Data The Global Positioning System (GPS)was developed by the United States Department of Defense and released for civilian use in 1990. The system consists of 24 satellites and 25 Figure 3-1: iPAQ device with GPS receiver several ground stations around the world, which are used to determine the precise position, velocity, and altitude of a properly equipped receiver. A receiver within range of these signals measures the distance based on the travel time between the receiver and a minimum of three satellites and uses these to triangulate its position. With selective availability (an intentional degradation of the GPS signal) turned off since May 2000, accuracy is possible within 10 m of the actual position [18, 25]. The most severe limitation of GPS technology is that it must operate within lineof-sight of the orbiting satellites as the weak signals cannot penetrate buildings or dense foliage [15, 18]. As a result, positions cannot be taken if the receiver is inside or close to buildings or other obstructions. Using a receiver in urban areas is further complicated by multipath error, which results from signals being reflected by buildings or other surfaces near the receiver. This can introduce additional error of up to half a meter [15]. 26 Acquiring a good signal lock from a GPS receiver requires on the order of tens of seconds, depending on the availability and accuracy of the last known position. From a cold start (lacking memory of the last recorded position), obtaining a reliable position lock requires about 40 seconds in an unobstructed location. This can improve significantly to under 20 seconds if the receiver contains memory of the last position and that position is relatively close to the receiver's new location. For this work, a PDA device equipped with a GPS receiver (see Figure 3-1) and custom software was used to acquire GPS data. The device could either be run continuously or scheduled to gather data at regular intervals (i.e. every 15 minutes). In addition, whenever the user powered on the device, the software would run in the background and attempt to get a position fix. Position error was noticeable in the data we gathered with the position being skewed up to 125 m when the receiver was in an enclosed area. The effects of noise on the results of our system are noted in Section 5.2. 3.1.2 Personal Information Management (PIM) Data The applications that come pre-installed on PDAs store valuable personal information management (PIM) data entered directly by the user. This data provides a snapshot of some of the activities that the user believes are important and can therefore help the recognition algorithms to infer activity. If the user has some appointment scheduled, it is likely that some action is taking place that coincides with the appointment. The algorithm can then attempt to use this data to make a more accurate prediction. An up-to-date appointment book is a strong indicator for some action - i.e. a lunch appointment or a meeting at work. The PIM data is easy to obtain and can be very precise. However, because it requires user entry, appointment text can be misspelled or abbreviated and the appointments themselves may simply be out of date. As a result, clustering this noisy data is a problem in its own right that must be overcome before incorporating it into a recognition scheme. 27 N - Central square (business area) 1 / "I ;;G,-= = - AAAM=- - Zi J Cambridge Public Library Galleria (shopping) residential areas Legal Seafoods (restaurant) church University, Park Hotel / 4 Star Market (supermarket) - MIT ( (nvriy Esplanade (recreational) 4, school Bak Bay (residential area) Briggs field (recreational area) Figure 3-2: A map of the area near the MIT campus with different colors representing the various kinds of areas used in this work. 3.1.3 Maps and Landmarks Longitude and latitude positions returned by the GPS receiver are only useful in combination with map data such as the residential and commercial zones of a city or specific names and locations of restaurants and businesses. To obtain this information, which is not currently publicly available in convenient formats, a graphical user interface (GUI) was developed that allows for quick and detailed labeling of specific areas. This was used to identify specific areas of Cambridge, MA in the locale of the MIT campus. Specifically, we identified business, recreational, residential, shopping, and university areas as well as specific locations including banks, churches, hotels, libraries, museums, post offices, restaurants, schools, and supermarkets (see Figure 3-2 or Appendix B.3 for a complete list of places). In all, 103 areas and places were labeled for our experiments. Very specific in28 formation, such as street-name data and the locations of a particular user's home or workplace were avoided to prevent making the system too user-specific. Along the same lines, individual places of the same type are not distinguished. For example, restaurants and businesses do not carry a label and are categorized simply as "business" or "restaurant." For our experiments, the closest distance to each of the area types is calculated and used as a feature, thus ignoring any more specific detail about the place other than its position and type. There are many geographic resources such as Geographic Information System (GIS) databases and mapping references that are available on-line (see Appendix B.2 for more detail). As mentioned above, finding a suitable source of city-level information for parks and monuments could be useful. While reverse-geocoding (translating a coordinate to an address) services could be applicable, they do not currently provide the detail (i.e. the category of the establishment at the address) that was required for this application. 3.1.4 Lexical and Knowledge Databases Existing databases, such as encyclopedias and lexical references contain a great deal of high-level knowledge that could be useful for activity recognition algorithms. For example, WordNet is a lexical database based on psycholinguistic principles. From a high level, it can be thought of as a dictionary ordered by synonym sets (synsets) or syntactic categorization [20]. A convenient API makes it possible to easily search the database for synsets as well as definitions and words associated with a particular term. Lexical freenet' (lexfn) is a system that utilizes the synonym relations of WordNet but also organizes names and places based on relationships between words. Terms such as "Kmart" and "shopping" are linked due to their close-proximity in multiple documents found in a corpus of news broadcasts [5]. This feature is less useful for our purposes than using more traditional semantic relationships, which lexfn can lhttp://www.lexfn.com 29 use to find connections between words. These relationships include synonymous, generalizes, specializes, comprises, and part of allowing for the development of a metric to determine the relevance between pairs of words. A system might then be able to use this information as part of its training process to learn what features are more relevant for particular activities. For example, a park might be classified more closely as a recreation area than a grocery store, therefore leading to the conclusion that someone is more likely to go for a walk in a park. Unfortunately, the database itself is not perfect, sometimes returning semantic relationships that do not seem to make sense or returning a shorter series of relations for two words that do not seem to be related. This makes it hard to anticipate if a system trained on the basis of these links would truly reflect reality. The choice of words to describe the features also becomes much more important since it determines the quality of the relationships that will be returned by the database. 3.1.5 Other Sensor Data Data from sensors such as barcode scanners and light meters can also provide contextual information about the user. A barcode scanner in use may indicate the user is grocery shopping, while a light meter can provide information about the weather or determine if a user is outdoors. Other technologies such as Bluetooth can provide information about available resources and how far the user is from these locations. Although our system does not currently make use of sensor data other than GPS locations, it can be extended to accommodate data from these other sensors. 3.2 Identifying Activities One goal of this work was to identify the sorts of activities that the system should and could recognize. "Typical" daily activities were solicited from research affiliates by asking them for a "list of high-level actions" that they normally do in a week. Of the twenty-six responses, these included activities such as "grocery shopping," "going to work," "going to class," "laundry," "watching television," "checking e-mail," and 30 "reading" (Appendix B.1 contains a complete list of the responses). A subset of these were chosen for this study by first grouping the activities by hand, selecting those activities that occur outdoors (due to the limitations of GPS) and then picking those that were most appropriate for a university setting and that were frequently listed. The final set included the following ten activities: going to work, going home, going to class, grocery shopping, going for a walk, shopping (i.e. at a mall), going to visit someone, going to eat lunch, going to eat dinner, and running errands. Some popular responses that were omitted included gardening (few gardens in the area), phoning, and cooking (both generally indoor activities). 3.3 Features The features that were considered fall into the four broad categories of "where" (geographic data), "when" (temporal data), "why" (PIM data), and "how" (travel method) with the final category - "what" - being what the system is supposed to determine. 3.3.1 Where - Geographic References For our system, the locations served as reference points, which were compared to locations and areas identified within the neighborhood where the user was located. Using this data, several first-order features (those that can be derived directly from low-level data) can be observed. Points can be organized by their distance from each other or from the labeled areas and landmarks. Higher level features can take into account the starting and ending points of the trip as well as landmarks that are passed while the activity takes place. As our system primarily uses location data, several different sets of derived features of varying complexity were used. These are described in greater detail in Section 5.2. 31 3.3.2 When - Temporal References First-order features include the time of day and the day of the week when the action occurred. Higher order features take into account past events and patterns of behavior. An example could be recording the last landmark passed by the user or the last time the user performed the action of concern (i.e. the last time a user went to the grocery store). Our system incorporates many first-order temporal features but does not consider some of the more complex second-order features. While these could be useful for our system, our experiments omit these features to measure the base effectiveness of the algorithms using a more general set of attributes. We, however, suggest that future work should investigate using these higher order temporal features. These features encode illustrative patterns of user behavior and would be useful for applications that are built on top of this system. 3.3.3 Why - PIM data The personal information management (PIM) data can provide a clue as to why the user is engaging in a certain action. For example, is there a meeting in an hour, hence the trip to work? The "why" can provide valuable information, especially when a user deviates from more common patterns of behavior, such as going to school at night for a special seminar or concert. Due to the complications involved in clustering PDA data, our recognition system does not make use of these features though it would be a reasonable extension to include in the near future. 3.3.4 How - Speed How a person performs the activity can be a useful feature. Two different activities may be differentiated if the user drives for one, but is walking for another. Recognizing exercise habits rely on accurately recognizing these differences. The recognition system described here makes use of the average speed the user travels from some start location. More complex features are difficult to measure due to the limitations of the GPS to function in enclosed environments such as vehicles. 32 3.3.5 Maintaining Generality Choosing a general set of features is a fairly difficult problem. By nature, humans are creatures of habit and routine. There might be several ways to go from A to B, but everyone has their preferred route that they are predisposed to choose. The location data that we record as a user moves from place to place therefore contains an intrinsic personal character. Despite this hurdle, attempts were made to keep the system from becoming too user-specific. Features that are dependent on userspecific data (i.e. home and work) were intentionally avoided. In addition, tracking the specific streets traversed on a route was avoided to prevent biasing the system to particular routes. Including these features, however, might make it possible to create a more powerful system that caters to a single user but can recognize some of the more complex actions. This is discussed further in Section 6.1. 33 34 Chapter 4 Recognizing Activities 4.1 Goals The task of classifying an instance as one of a set of pre-defined categories is a classic problem in machine learning. As a result, there are a variety of well-known methods varying both in complexity and effectiveness. Selecting the best algorithmic approach required careful evaluation of several design goals. " Training the system should be simple without requiring complex or expert knowledge engineering. " The data input into the system should be easy to obtain requiring minimal input from the user. " The algorithm should be able to take advantage of data mined from the PDA device and be extensible to accommodate data from other sources. " The activities recognized should be useful for some application and should be predicted with some certainty estimate by the algorithm that can be used to create interfaces that degrade gracefully. " The algorithm should have a reasonably high level of accuracy (>80%) and be relatively fast to accommodate a real-time recognition system. 35 4.2 Approach The goals, stated above, constrain the algorithms that could be considered. Rulebased systems, while powerful, often require complex knowledge engineering that is difficult and time consuming. In addition, they do not easily provide a means of relating uncertainty with a given decision [39]. Neural nets, while known for their strong performance are slow to train, requiring many passes over the training data. In addition, the network structure is a black box; the decision model is hidden within the network structure, making it difficult to understand how the system decided upon a result [40, 39]. Decision trees are also well-known as being competitive classifiers and our results confirm that they perform well on our datasets. However, determining the structure of a decision tree can be quite complex, requiring heuristics to determine when to stop building the tree and how to prune unnecessary branches. As a result, if the heuristics are not chosen carefully, the system may overfit the training data. [39]. The requirement to present an estimate of the certainty for a given decision suggests the use of a probabilistic classifier. Among the most well-known of these is the Bayesian network. In a Bayesian network, attributes that describe the problem being modeled are represented by nodes, while dependencies between these attributes are symbolized by arcs connecting the nodes. A complicated system will therefore be represented by a complex graph [35]. The tree augmented naive Bayesian (TAN) classifier is one variant of this type of classifier that places limits on the complexity of the network, thereby reducing the overhead of automatic structure determination from a training set and helping to prevent overfitting [23, 12]. Training the network requires a dataset that contains enough examples to provide a realistic view of the situation that is being modeled. These examples are then used to calculate probabilities that will later be referred to during the classification of new instances. Because these probabilities are pre-calculated, Bayesian methods, as a whole, classify new instances relatively quickly. The TAN algorithm is therefore relatively simple, yet still performs competitively against other more complex algorithms despite its lack of dependency 36 encodings [23]. In general, Bayesian methods provide a powerful and (depending on the network structure) computationally efficient and intuitive method of modeling the uncertainty in complex situations [40]. 4.2.1 Bayesian Networks The set of classifying techniques based on Bayes rule are known as Bayesian networks (also called belief networks [27]). Given n attributes, A 1 , A 2 , ..., An, with values a,, a 2 , ..., an Bayes rule states that we predict the probability that these attributes represent some class value, c, in C as follows [44]: P(ai, a2 , ... , an|C=c)P(C=c)(41 P(ai, a2, ... ,I an) P(C = clai, a 2 , ... , an) = (4.1) If the network is provided with a dataset that contains a full-range of possible examples, the probabilities for each value of c can thus be calculated and the highest selected as the most probable class for the represented instance. Formally, a Bayesian network can be described as a pair B = (G, 0) with G representing a directed acyclic graph that encodes a joint probability distribution over a set of random variables, U = {X 1, ..., Xn}. Nodes in G represent the attributes, while arcs represent dependencies between the attributes. Each node can be considered independent of its children, given its parents in G. The set E contains the quantita- tive parameters that define the probability distribution of the network. A parameter 0 Xi rJ, = PB(xi|Hx) is defined for each pair of xi and Ix, where Hx, represents the set of parents of Xi. The joint probability distribution over U is thus given by n PB (X1, .. , Xn) = n 17 P (Xi I x) = f Oxi inX, i=1 4.2.2 (4.2) i=1 Naive Bayesian Classifiers Naive Bayes is perhaps the simplest form of a Bayes network. The model assumes that the attributes used for classification are all conditionally independent - a rarely 37 clas ftme day University SUPenn 'et Recreationial Shoppinigmall Schoot Business Figure 4-1: A naive Bayes network true and "naive" assumption. As a consequence of this assumption, a naive Bayes network can be represented as a tree with the class node as the root and each of the attributes as a leaf (see Figure 4-1). Despite its simplicity and the strong independence assumption, naive Bayes has been shown to be competitive with other more complex classifiers. The success of these models may be due to low variance in the model in combination with high bias resulting from the independence assumption [22]. Effectively, the low variance cancels out the effect of the bias, making accu- rate classification possible. The independence assumption simplifies the calculation in (4.1) [44]: P(ai, a2, ..., an) = P(a|c)P(a 2 c)...P(anIc) (4.3) Therefore, a naive Bayesian network is trained by simply computing the probability of each attribute given the class using the instances contained in the training set. Alternatively, the priors can be explicitly specified if no training set is available or if enough detail is known about the system. 4.3 TAN Classifiers Tree augmented naive Bayes (TAN) classifiers relax the strong independence assumption inherent in naive Bayesian networks by allowing at most one additional arc between attributes. Compared with other, more complex forms of Bayesian networks, determining the structure of these graphs is not an intractable problem since the total number of arcs in the graph is limited, thereby reducing the possible search space. TAN networks are therefore a good compromise between the simplicity of 38 class tunn day speed University Library School Recreational ShoppingMall Hotel Residential Laboratory Business Museum Restaurant Church Supennarket Figure 4-2: A TAN network for the Adist(prev) dataset. naive Bayesian networks and a more realistic representation of the situation being modeled. In addition, they perform as well, if not better than naive Bayesian networks and other more complex classifiers on standard test sets [23]. 4.3.1 Structure Determination The structure of a TAN network is the same as that of a naive Bayesian network save for the possibility of at most one additional incoming arc for each attribute node (see Figure 4-2). The graph reflects some of the dependencies found in the training data. For example, the connection of "Business" with "Restaurant" indicates that the influence of the proximity of a business is closely tied to that of restaurant areas for the actions contained in this set of data. Determining the structure of a TAN graph 39 reduces to determining which attributes influence each other most strongly and then connecting the nodes that represent these two attributes with an arc. When using purely discrete attributes, Friedman et al. [23] utilize conditional mutual information as a metric for measuring this influence. This function is as follows: I (A1 ; AjIC) = E (4.4) P(ai, aj, c) log P (ai,Pa c) Paj)~jc a1 ,aj ,c Thus over the set of class variables, C, we calculate Ip(Ai, A IC) for all i = j. We then use these values to determine which arcs should be present in the final graph. This amounts to calculating the maximum spanning tree in a complete undirected graph with the weight on the arc connecting nodes i and j equal to I,(Ai, A3 IC). Determining the structure is polynomial, with a time complexity of O(n2N), where N is the number of training instances [23]. 4.3.2 Training Training a TAN network amounts to calculating the prior probabilities of each attribute, given the class based on the occurrences of the attributes in a training set. The only difference between a TAN network and a naive Bayesian network is that in TAN networks, connected attributes must be accounted for in the prior-probability calculations. To provide for missing attribute values in the training set, a smoothing function is applied that effectively assumes that there is at least one of each possible value for each attribute in the training set. Referred to as the LaPlace estimator [40, 30], Cerquides [9] derives the functions appropriate for building a TAN network from a multinomial sampling approach, where CountD(Xi) is the number of instances in the training set that contain where Xi = xi and #Val(X) is the number of values of attribute or class X. CountD(Xi, Oi nx= fxi) + A gtaes(P) CountD(Hx)+ A#Val(Xi)SC(Xi) States(P*) 40 (4.5) acaj ca) + #Val(C)#Va(Aj)#Va(Aj) CountD(c, aj) + #Val(C)#VaI(A,) -ContD CountD(ai, c) + Oaic - #VaI(C)#Val(A) CountD(C) 0C~ltN + ( (4.7) COflD(C) CountD(C) (4.6) +#Val(C) #+ lC #Val(C) (4.8) This smoothing function insures that no instances arise that are classified with an absolute probability of 0 or 1. The last three equations are special cases of (4.5); (4.6) refers to the formula used when an attribute has another attribute as its parent in addition to the class, (4.7) is used when only the class node is the parent, and (4.8) is used for the probability of the occurrence of the class itself. From (4.1) we now have the necessary numbers to compute the probability for a given set of attributes. Missing attribute values are simply omitted from the calculation. This does not pose a problem since the value is omitted for all class values. This makes the algorithm more robust and resilient to noisy datasets. 4.3.3 TAN Multinets Instead of creating a single TAN network structure for all classes, we can generate a different classifier for each class. This can produce better results if the relationships between the attributes and classes vary widely [23]. There is no added complexity since each training instance is only used a single time to build the model for the class it belongs to. As a result, the complexity for building a multinet with a training set of size N remains O(nr2 N). The datasets we generated were run on both a single TAN and a multinet approach since the different activities being recognized should be influenced quite differently by the area-type attributes. Performance should therefore show an improvement using the multinet approach [14]. 41 42 Chapter 5 Results and Analysis 5.1 Data Acquisition Data was gathered using a consumer-grade GPS receiver attached to a PDA device. GPS locations were recorded to a file as a user walked around the vicinity of the MIT campus. After completing an action (i.e. arriving at work), the user would log the current time and the action that was just completed on paper. Although software was developed to allow the device to record data both continuously and at periodic intervals, only the continuous mode was used to gather the maximum amount of data over a short period of time. In this fashion, a week's worth of typical activity was gathered by a single user walking around with the outfitted PDA. This training set of 3307 instances was then transferred to a desktop computer and converted into attribute vectors using several methods described below. The trained system was also used to test data collected from other users to determine the effectiveness of the system in dealing with other user's routes. A long term goal of this project is to be able to create a stand-alone system that would require an initial training phase. To facilitate this, an experience sampling tool was developed to electronically log the actions of a user throughout the day at fortyfive minute intervals, independent of GPS location data. We anticipate that the data gathered by this software could be helpful in training the system to a user's everyday habits when combined with location data. In addition to the current activity, the 43 program asks the user to note his present location to serve as a rough indicator of position. An program to solicit this data has many advantages over our previous paper system since we can control the frequency at which we request data from the user. Appendix A.2.4 contains additional details. 5.2 Features All feature vectors incorporated the day of the week, the average speed traveled from what was speculated to be the starting position, and the time of day as discretized attributes. Time was categorized as morning (6 am - 12 pm), afternoon (12pm 6pm), evening (6pm - 9pm), and night (9pm - 6am). When using discrete versions of the data, the average speed was discretized into walking speed (less than 4 mph), running speed (less than 8 mph), biking (less than 15 mph), and motorized (anything greater than 15 mph). As the features are determined from the raw GPS data offline, additional features can be added later and tested on the various classifiers. 5.2.1 Single Point Distance The first experiment evaluated each valid point recorded by the receiver by calculating the minimum distance to each of the 18 area types. This created a vector with 21 attributes. As each point is examined individually, no knowledge of the starting point or path is considered. 5.2.2 Changing Distances The next set of experiments attempted to model how the distances change as a user approaches his destination. The training data was first separated into sets, each representing one trip from start to destination point (i.e. going to work begins at home and ends at the work place). The starting point that is referenced can be chosen as the first location that is recorded after a long period when the GPS is unable to receive a position. This indicates the user was inside or out of range of 44 the positioning satellites. Because we were explicitly training the system, the start position was noted by the user as indicated by when they recorded the action they were performing. The first experiment (ADist(start)) took the starting position as a reference point and then calculated the minimum distances from this point to each of the 10 area types. For every other point in that trip, another set of minimum distances was taken, and the difference between this set and the reference set was calculated. For the discretized tests, a positive difference indicated that the subject was getting closer to an area, while a negative distance denoted the user was getting farther away. This metric is not entirely correct; for example, if the closest park to the start is now not the same park being referenced by the new location, the user may appear to be moving closer to a park area when this is not the case. As an alternate way of approaching this problem, we took the same difference but calculated the change in distance based on the most recently recorded point instead of the start point (ADist(prev)). Unfortunately, this method also has its oversights in that for two points taken very close in time, the change in distance may not vary by much and will be more sensitive to position error. 5.2.3 Trajectory To take into consideration the path someone would take when traveling from source to destination, we applied a similar difference in distance for each place type but this time only account for those locations that lie ahead of the user's position. In other words, we look at the trajectory the user is taking in reference to some point about 30 seconds prior and then prune out any areas that are behind this point (Traj (w/pruning)). To do this, areas that completely lie behind the line perpendicular to a ray connecting the reference point and the current position that intersects the reference point are elided. The subset of areas considered was continuously pruned with each data point such that any area that ever lay behind the user was not considered. Unfortunately, noise in the data could eliminate areas that would be important for consideration. As a result, a second trial was run that did not consider any past events (Traj(no 45 Naive Bayes TAN TAN Multinet Naive Bayes* IBL* C4.5* Decision Table* 1-Point 71.73 .026 80.27 ± .064 82.63 ± .085 70.47 ± .020 86.49 ± .037 85.26 ± .037 83.14 A .208 ADist(Start) 75.79 85.27 86.79 75.14 88.83 87.54 82.71 + ± ± ± ± i ± .036 .018 .027 .050 .038 .055 .021 ADist(prev) 84.16 + .032 92.60 ± .029 94.43 ± .020 83.13 ± .026 97.89 ± .041 96.19 ± .017 93.22 ± .085 Traj(w/pruning) 68.85 A: .034 88.64 ± .026 89.88 ± .049 68.13 ± .007 92.91 ± .034 91.97 ± .060 86.14 i .099 Traj(no 74.18 85.76 87.32 73.41 89.54 89.32 82.71 pruning) i .024 ± .032 ± .026 ± .017 ± .032 ± .023 ± .102 Table 5.1: Summary of classifier results per feature set (mean t variance). pruning)) - it considers only those areas in front of the user's trajectory but does not attempt to continuously refine this set. 5.3 Classification Results and Analysis Experiments were run using both the TAN algorithm (single and multinet), a naive Bayes implementation that utilized the multinomial distribution, and several other competitive classifiers found in the Weka Machine Learning Algorithms Toolkit (see [40]). These include a naive Bayes implementations [17, 28], the decision table classifier [31], instance-based learning (nearest-neighbor) classifier [21, 2], and the C4.5 classifier [36]. Ten 10-fold cross-validations were performed using the Weka Toolkit to insure that the training set was partitioned equivalently for each algorithm test. Table 5.1 reflects the mean and variance for these ten trials. 5.3.1 Algorithm Results The distance-change metric produces the highest classification results for all of the classifiers, with a predicted accuracy of 92% and 94% for the TAN and multinet approaches, respectively. For each of the tests, the TAN algorithm performs between three to six percent behind the top classifier - instance based learning (IBL)- with the TAN multinet approach consistently performing one to two percent better than the single TAN implementation. This results from the attributes having varying relationships, depending on the class of concern. The TAN classifier outperforms both naive Bayes approaches by eight to ten percent for all experiments. 46 The single-point distance metric performs surprisingly well, suggesting that merely the distances from the different area types carries a reasonable amount of relevant information. Although locations are not uniquely labeled in that each instance represents only a discretized distance from each of the area types, the classifiers are capably able to identify the action intended by the user. It is important to note that the data reflects the habits of one particular person, which may explain why this simple metric works so well. The tendencies of people to follow the same route from source to destination is probably responsible for these results. The instance-based learner (IBL) classifier relies on an n-dimensional distance metric to classify new instances. The distance is calculated between the unknown instance and each of the training instances, predicting the class of the closest instance as the class of the unknown [40]. The high performance of the IBL classifier over all experiments indicates that the data seems to cluster well over the vector space. The drawback of using the IBL approach in a real-time recognition algorithm arises when the training set becomes larger as each new instance must be compared to each instance of the training set during the classification process. A Bayesian or decisiontree approach, such as the C4.5 classifier, may be preferable in that these algorithms invest more computation in the training process to allow for faster classification of new instances [40]. The C4.5 classifier also performs well, especially when continuous values are used for the features. This classifier belongs to the decision tree family of classifiers that creates a tree down which a path is traversed during the classification process. Generating this tree requires calculating the optimum splits to cluster the training instances most effectively [40]. The inherent downsides of the algorithm result from the complexity involved in generating the decision tree. Heuristics are necessary to direct the process to avoid overfitting the data. Dealing with missing values can also be problematic resulting in some guesswork on the direction to follow down the decision tree [40]. The constraint on the number of dependencies in a TAN network places an explicit limitation on the network structure. This helps to prevent overfitting the data, provided the training examples are well distributed [12]. In addition, missing values 47 Classifer Performance on Feature Sets 100 90 % er - -e-TAN -e-- Multinet A NaiveBayes-Multnomia: NaiveBayes-Weka DecisionTable C4.5 --+- 11 -*- 75 * DecisionTable-Cont U C4.5-Cont 70 -/ -+-61-Cont 65 60 55 single point dist-source traj-30sec dist-prev Feature Set traj-source Figure 5-1: A graph depicting the predicted classifier accuracy of each classifier for each dataset. are simply omitted from the classification calculations - a simple and deterministic way of dealing with noisy data [40]. Although our version of the TAN classifier utilizes only discrete attributes, a version that handles continuous attributes exists and has shown to out-perform the discrete version [24]. To gauge the effectiveness of using continuous attributes, the Weka classifiers were tested on continuous versions of the datasets. As seen in Figure 5-1, the C4.5 and decision table classifiers perform very well on these tasks. This is most likely because the partitions created when forming the structure of both the decision table/tree are more easily done with a range of values as opposed to two or three discrete values. For the same reason, this makes it harder for the IBL classifier because the instances do not cluster as tightly in the continuous case. As would be expected, misclassified activities are often those that occur in the same general areas and are performed at around the same time of day (see table 5.2). As an example, an area with businesses commonly contain restaurants so someone could easily be going there to run errands or to eat lunch, both during the same time 48 a 154 1 4 3 1 8 0 0 1 9 b 0 134 3 0 5 0 3 0 3 3 c 2 23 529 9 2 9 9 0 6 22 d 0 1 8 96 9 0 0 6 2 1 e 1 5 15 7 429 15 2 5 5 7 f 2 2 0 0 6 414 1 0 3 3 g 0 0 5 0 0 15 371 0 23 17 h 1 0 0 0 8 0 0 145 3 0 i 1 5 6 0 17 7 21 3 177 8 j 20 10 45 9 0 1 27 0 7 313 <- 1 1 1 1 classified as a= Errands b= Class c = Work d= Grocery Shopping e= Home f =Shopping g =Walk h =Dinner i =Visit j =Lunch Table 5.2: Confusion matrix for ADiff(prev) frame. The difference between classifying this action correctly can be very subtle if the training data simply has more data for one activity over another. However, the system can distinguish between the two activities if there is some variation. For example, the system can tell the difference between two actions that use identical paths if another attribute, such as the time of day the action occurs, differs. This allows the system to distinguish between going to class and going to work even though these two actions occur in very similar locales. Thus subtle differences can create different contexts that the system is able to recognize and handle adequately. 5.3.2 Extensions Additional features could help to improve the performance of the classifiers. Social interaction plays a significant role in a person's activities. Many of the replies to our survey listed items such as talking with a friend and hanging out with someone. Determining who someone is with or if they're alone can provide evidence for or against the possibility of social activities. More temporal information may also provide helpful cues to improve the predicted recognition rate. For example, knowing the last time the user went to the grocery store and the relative frequency of this event can help predict the next time the user will go. If the system kept a record of the frequency of these events, a feature - perhaps the number of days since the action was last witnessed - could be used. Because of the robustness of the TAN algorithm to 49 deal with missing values, even if the system does not know this information in some circumstances, it should not pose much of a problem. Data from the PDA could also be used to create additional features that would be of use. Assuming that a separate clustering algorithm was run on the text of the appointments, if an appointment of some type was scheduled for some time in the near future from when the action was occurring, it could be incorporated as another feature. The strong performance of the continuous classifiers suggests that attention should be given to a continuous version of the TAN algorithm. Although it is more involved and complex than the discrete version, most real systems cannot be adequately modeled as sets comprised of completely nominal attributes. A continuous TAN implementation would therefore be useful in gauging its effectiveness on a more realistic system. 50 Chapter 6 Discussion 6.1 Feasibility of an Activity Recognition System As a first-pass at designing a reliable activity recognition system, our results are promising. The performance of the TAN classifier indicates that a system with relatively high accuracy is feasible. In addition, classification is fast for the TAN algorithm (on the same order as naive Bayes), making a real-time recognition system possible. Originally, a general system was envisioned that could be trained on one set of data acquired from multiple people and then used by anyone. Using two sets of data collected by two different people, the system was trained with one and tested with the other. As might be expected, the performance of the system varies, predicting the activities correctly when the same paths are used by both users but incorrectly categorizing them when similar paths correlate to two different actions by the two users. This indicates that despite explicitly excluding route-specific data, once trained to a particular user's normal routes and habits, the system does not perform well when used by other people. For example, a path that I normally take as part of a leisurely walk can also be used by someone else to get home. While there are differences in parts of the path that are taken, the data is similar enough that the system will often incorrectly classify the points (see Appendix C.1). As a result, the system works well, but only if trained and used by people who have similar habits - perhaps students who live in the same residential area and have similar patterns of behavior (i.e. attending 51 classes, going to the same locations, and eating at the same places). As the habits of a user are quite regular, it might be interesting to have the system attempt to detect what it believes to be deviations from normal behavior and then query the user to allow the system to continuously adapt to changing situations. In some situations, a general system may be preferable to one that is tailored for a single person or even a few particular people. In this case, training would ideally only have to be done once for the system to work for other people. A more general method of partitioning the data into attributes may be able to accomplish this or perhaps making use of some outside data source. As suggested before, linguistic databases are a source of very rich, high-level knowledge that could be used to improve this system (see Appendix C.2). A mixing-model could be used to combine the results from both a location-based TAN system and another system that taps into the linguistic database. This kind of model would balance both the knowledge and experience that are the strong points of the two different systems thereby creating a more general and robust product. Thus, two directions are possible for later study - investigating making a less user-specific system and also investigating how powerful a single-user system can become by incorporating more user-specific features. 6.2 6.2.1 Extensions and Improvements Other Activities The actions that are recognized were limited since the main data source was outdoor position data. However, the relative success of recognizing these simpler actions easily leads to more complex ideas. Most of the activities that are recognized are of the form going to <place>. Recognizing when someone is leaving from place and the transition to going to another place would be an interesting next step to pursue. In this case, the two events overlap, occurring simultaneously at some point in time. The probabilistic nature of a Bayesian classifier could be put to use by noting what the top two most likely events are and transitioning from the "leaving" to the "going" 52 action based on their respective probabilities. Activities where the passing of time is an indicator, such as waiting or running into a friend would be interesting to investigate. As compared to viewing changing distances as an indicator of activity, these activities involve no observable movement which in itself also an indicator. Likewise, if GPS data is not available, it would be interesting to see if the algorithm could be adjusted to predict activities based on the last known position and whatever other attributes can be observed (i.e. having lunch is more likely around noon while on a shopping trip to the mall). 6.2.2 Low-Level Data A more robust positioning system would be one of the more useful low-level extensions to the recognition system. The widespread use of cellular base stations could potentially result in a useful and more robust positioning system. With the strength of cellular signals powerful enough to function well close to and inside buildings, a unified location system may soon be possible [8]. As previously discussed in Section 3.1.1, GPS is limited by the need to be within line-of-sight of the orbiting satellites. A positioning system that is functional both inside and outside buildings would allow for a more complete description of a user's activities. Situations where GPS is currently non-functional include covered vehicles (unless the receiver is under a windshield), subway trains and stations, and inside and close to buildings. The services that people interact with occur inside these locations. Therefore, the most immediate benefits from an aware device that one could imagine are automating these services, making indoor data necessary. 6.2.3 High-Level Knowledge Utilizing other sources of knowledge can increase the range of activities that are recognized. Reverse-geocoding would allow the system to take a position and find a street address from the data. If another look-up procedure could then classify the address (i.e. a residence, business, restaurant, etc.), a more precise system could 53 be modeled that utilizes this information. This would be particularly helpful in ambiguous cases, such as when a user does a series of activities that make it hard to distinguish the beginning of one and the start of another. If the system could see the type of location where the user pauses, it may be able to notice particular patterns of behavior and make it possible to recognize higher-level activities. 6.2.4 Applications There are several applications that could be developed on top of the activity recognition system as it exists now. Once implemented to work in real-time on a PDA, the system could be used to provide user-specified information at appropriate times. Going beyond the location-aware tour guides, users could arrange for their shopping list to be displayed when it detects they are going to the store or remind them to do something on their way home from work. Games that change based on the types of activities being performed could also be designed and even be combined with routines that promote exercise. 54 Chapter 7 Conclusion Smaller and more powerful devices are becoming more prevalent, thereby creating a framework for developing smart applications. The research described here discusses some of the requirements for activity recognition systems and presents an attempt at building a system that satisfies these conditions. Location data and map data are some of the resources used for this project, though there are many sources of higherknowledge that are available and ought to be used if a viable method to integrate this data is found. The TAN algorithm was chosen for its attractive combination of simplicity, extensibility, probabilistic properties, and competitive classification accuracy. Findings indicate that the TAN multinet classifier performs relatively well, though it is slightly outperformed in accuracy by some more complex but sometimes slower algorithms. Though efforts were made to prevent user-specific knowledge from explicitly being encoded into the system, findings indicate that a combination of the types of activities we chose to recognize and the choice of location data resulted in a system that contains personal biases. Personal habits and preferences for specific routes from place to place make the system perform well for people with similar habits but do not make for a reliable general-use system. The results show promise for building accurate activity recognition systems that will allow for a new generation of context-aware applications and devices. 55 56 Appendix A System Overview The main components of the system can be broken down into a component used to gather GPS data and another to process the data into a form suitable for analysis using a classification algorithm. A.1 Hardware The hardware component of the system consists of a Compaq iPAQ 3650 PocketPC running Windows CE. The unit is powered by a 206MHz StrongARM processor with 32 MB of RAM. The device is particularly attractive since it can be outfitted with PCMCIA and CompactFlash devices when used in conjunction with the appropriate expansion sleeve. A Dual-Slot PC card sleeve allows for up to two such devices to be used simultaneously on the iPAQ. For our research, we equipped the iPAQ with a TeleType CompactFlash GPS receiver and external antenna. A.2 Software When possible, all applications were written in Java. The major exception to this occurred when dealing directly with the iPAQ device. For such occasions, Java Native Interface (JNI) wrappers were written to allow access to C library calls from the Java side. The main software components of the system are as follows: 57 " A library of device specific functions (package ccalls), such as controlling the volume and scheduling programs to run at specific times. The functions were written as part of a dynamic link library (dll) in C with JNI wrappers to provide functionality from the Java side. " A Java recorder program (package gpsrecorder) receives GPS data gathered by a C thread (port.c) that interacts with the COM port used by the GPS receiver. The program is scheduled to run at regular intervals (PeriodicPositionRecorder.java) throughout the day using the functions in the PocketPC's notify library. " A Java tracking program (package gpstracker) to display recorded GPS data and to provide an interface to identify areas (LabeledArea.java) of interest (i.e. parks, shopping centers, restaurants). " The algorithmic part of the software (package tan) that consists of a parsing unit to convert raw GPS data into an attribute vector and a system to train and classify new instances using naive Bayes and TAN classifying schemes. " An experience sampling application (package actionrecorder) that runs on the iPAQ to log actions of the user independent of GPS data. A.2.1 GPSRecorder The GPS receiver continuously creates and writes strings to port COM 4 of the iPAQ. The C thread reads these strings and then uses JNI to call a Java procedure to store the string in a buffer. Strings are read from the buffer every second and then processed. Four types of GPS strings are output by the TeleType GPS receiver, but only two of them, GGA and RMC strings, contain longitude/latitude data. Furthermore, even these strings must be position locked (have enough satellites in view) to contain accurate/useable information. We therefore reject any non-GGA/RMC strings as well as any non-locked strings. Data can be read and written to a file continuously 58 using the RecorderInterface class or at specified intervals using the PeriodicPositionRecorder class. A.2.2 GPSTracker The tracker program was written to visualize the data recorded from the device. Functionality was also provided to label the various areas of interest using a simple GUI. To do this, a LabeledArea class was written to represent each of these areas. Of particular note is that each LabeledArea can calculate the distance from a specified latitude/longitude point to the area. While we could use a straightforward Cartesian distance calculation, the differing scales between longitude and latitude must be accounted for as well as the fact that a degree of longitude is shorter as one approaches the poles. In addition, because of the oblate-spheroid shape of the earth, this calculation is not a reasonable approximation for distances greater than 20 km. Instead, we make a reasonable assumption that the earth is spherical and use Haversine's formula for all our distance calculations: 1 Ion 1 Alon = 1on2 - Alat = lat 2 - lat1 Alat a = sin2 2 d Rearth * C . + cos(lati) * cos(lat 2 ) * sin2 2 c = 2 * arcsin(min(1, Va)) A.2.3 = ___ 2 TAN The TAN classifier described in [23] was implemented in Java for discrete attributes. Both a stand-alone version as well as an equivalent implementation under the Weka [40] framework were implemented. One difference that must be noted is that the smoothing function used is that described by Cerquides [9]. This function was pre'http://www.census.gov/cgi-bin/geo/gpsfaq?Q5.1 59 parents 0.028775264 has nurs age 0.027556624 spec_prescrip 0.0061469073 0.02144219 tear_prod-rate health 0.0034497764 housing 0.002351112 0.005265595 children atigmatisi 0.003346517 social 0.0020270755 finance 0.0018864607 formi Figure A-1: The maximum spanning tre(,s with associated IPx values for the contacts (left) and nursery (right) datasets. ferred over the one described in the original paper because it always classifies an instance, while the original sometimes would assign a probability of zero to an instance for all classes. In addition to the standard TAN algorithm, a multinet variant was also implemented as well as a version of the naive Bayes classifier that made use of the multinomial distribution. These served as useful baselines for determining the effectiveness of the algorithm. Implementation Details Generating a TAN network from training data is not a very complex process. However, because counts for all possible pairs of values between attributes must be maintained, it is hard to visualize the procedure. Training data must first be loaded to determine these counts, which are then used to build a maximum spanning tree based on the conditional mutual information measure (as detailed in section 4.3.1). For reference, Figure A-1 shows possible spanning trees for the Weka provided contacts dataset 60 and the UCI nursery dataset [6]. Both of these datasets contain purely nominal values. The spanning trees for the graphs are calculated using the maximum-flow version of Prim's greedy algorithm [3]. Once the structure of the graph is known, the prior-probabilities are calculated by the multinomial formulas, based on the pair wise value/attribute counts [9]. For the two reference sets, the TAN algorithm classifies 12147 of the 12960 nursery instances correctly (when using the training set also as the test set) and all 24 of the contacts dataset. The multinet implementation is almost identical save for that each class must create and store its own network that is later used to calculate the priorprobabilities. For the multinet, the nursery dataset classifies 12359 of the training instances correctly, while the contacts set, while again 100% of the instances in the contacts set are classified correctly. A.2.4 Experience Sampling An application was written to log the activities of a user every forty-five minutes. The device beeps to alert the user, who then selects an activity or notates what he is doing (if the choice is not available) in addition to his current location. This information is then appended to a file on the device. The user has the option to ignore the alert, although the device will continue its attempts forty-five minutes later. In the cases where GPS is available, the experience sampling data can be used to train the system according to the user's personal habits. The most poignant issues with gathering information in this fashion are that the data may contain noise (i.e. typos, personal abbreviations, etc.) and that we risk annoying the user if we query too frequently. However, if samples are taken too far apart then the data that is collected may not be sufficient to train the system. Creating a tool that depends on user input can be quite complex given that one must motivate the user to interact routinely with the device. Users must have a reason to carry (and recharge) the device and be rewarded with positive feedback when they cooperate with the requests for data. Periods of inactivity should result in the device asking for requests more frequently and more loudly. The user should be 61 able to temporarily turn off the requests with the device automatically rescheduling the reminders some time later. Additionally, the application must be robust enough to handle situations when the battery runs low so as not to lose data. The method of data entry is also important since we do not want to frustrate the user. Unfortunately, voice recognition is not reliable enough and using the stylus can be tedious. One must therefore weigh user convenience versus the desired level of detail in the collected data when designing an interface that allows the user to write as little as possible. 62 Appendix B Activity and Landmark Data B.1 Selecting Activities To determine what actions might be the most useful to detect, an email message was sent to several mailing lists requesting this input. Specifically, the message asked people to "send back a list of high-level actions you do in a week." Table B.1 contains a summary of the 26 responses that were received, roughly categorized, with each activity followed by the number of responses that included the action. The constraints of the system immediately made detection of some of these actions impossible, namely those listed under the headings "Indoor" and "Home Tasks." The top tasks were considered but selected with university students in mind as they would most likely be serving as the subjects for the testing process. As a result, activities such as gardening were not considered, while going to class was included. The remaining nine activities chosen were noted as being good representatives of their respective category (going to lunch, going to dinner, grocery shopping, going to work, going home, visiting someone). Others were chosen because they encompassed some of the more specific activities mentioned (i.e. running errands, shopping). In the case of going for a walk, we felt this was a suitable substitution for exercising, jogging, biking, since our subjects were more likely to be walking around with the device and because exercising seemed to indicate different things to different people on the survey (i.e. going to the gym or pool versus taking a jog or a short walk). 63 Indoor phoning(16) email(14) reading (12) watch tv(11) sleeping (9) reading news(7) organizing schedule/planning(5) cleaning(4) dressing(4) pick up mail(4) computer related(3) discussing(3) feed pet (3) homework(3) showering(3) write paper(3) contemplate life/daydream(2) spouse activities(2) woodshop(2) brush teeth(1) chores(1) computer game(1) doubting(1) drafting (1) family time (1) fly(1) help w/hw(1) hugging/kissing(1) journal(1) making models(1) motivating staff(1) nap(1) nurturing family(1) photography(1) plotting (1) practice music(1) pray(1) research(1) resting(1) sex(1) solid modeling(1) studying(1) take drugs/meds(1) volunteer(1) waking up(1) wedding planning(1) Xeroxing(1) Place Specific exercise(12) go to work(7) work(6) go to a movie(5) class(4) get gas(3) watch sports event(2) library(2) church(2) concert(1) zoo(1) hair appointment(1) museum(1) park(1) go to bookstore(1) car check up(1) golf( 1) doctor/medical visit(1) take kids to daycare(1) take kids to school(1) Food Related eat meal(14) eat dinner(5) eat lunch(4) eat out(3) eat breakfast(2) takeout(1) eat brunch(1) treat (ice cream)(1) Shopping shop(3) shop for supplies(2) shop for clothes(1) shop for gifts(1) Transportation Related drive(5) go home(4) subway(4) go for a walk(3) jogging(3) walk pet(3) explore city(3) bike(2) carpool(1) get fresh air(1) go out(1) hike(1) scootering(1) crossing street(1) traveling(1) Social meeting(5) visit someone(4) hanging out with someone(3) meeting someone(3) play(2) play w/kids(2) drinking(2) check on neighbors (2) meet new people(1) meet someone(1) party(1) play w/pets(1) rent a movie(1) talk with friends (1) walk and talk to a friend(1) bills(1) Errands laundry(8) grocery(7) errands(2) drop off dry-cleaning(2) drop off recycling(2) outdoor market(1) post office(1) banking(1) get film developed(1) buy pet food(1) budgeting(1) home improvements( 1) mowing(1) water plants(1) Other club sports(2) geocache(1) Home Tasks cooking(8) gardening(6) dishes/kitchen(4) garbage(2) Table B.1: Complete activity list with incidence count 64 By categorizing the responses as in Table B.1, we can speculate how to recognize some of the actions that we did not consider. For those activities related to a place (i.e. getting gas or picking up dry cleaning), marking the locations where these occur seems to be the obvious step to take, though it might not necessarily be required. Social applications are somewhat trickier, requiring some kind of indication that another user is involved - perhaps through two devices sharing information that the users are in close proximity to one another for an extended period of time. Transportation related activities require additional data on how the user moves around (i.e. scootering) or the location of subway or bus stops. As GPS does not work inside buses or underground, additional temporal knowledge may be required. For example, recognizing a sequence of events (i.e. a user went to a subway station, disappeared for a short while, and then emerged farther away than if he had walked) could be used to infer these more complex actions. B.2 Online Resources There are several online resources available for maps, GIS data, and other related data. " Maps. Of the various online map resources, MapsOnUs.com is one of the few that allows the user to label selected positions with longitude/latitude coordinates. This formed the foundation for the tracking location by providing the data required for interpolating positions on the map. The US Census Bureau (tiger.census.gov) also provides some useful mapping resources, most involving census statistics by area. " Geocoding. Geocoding is the process of converting an address to a lati- tude/longitude position. There are many commercial services that advertise their geocoding services to the public. Some have free trials - Geocodec technologies (www.geocodec.com) and TeleAtlas (www.teleatlas.com) 65 9 Reverse Geocoding. This process converts coordinates to street addresses. Less information is available about these services though TeleAtlas (www.teleatlas.com) claims to have a service available as do many other places, though free demos are hard to come by. * Geographic Information Systems (GIS). GIS is a method of viewing different layers of data that are tied through a particular geographic area. Population statistics, topography, and streets are all examples of such data. ESRI (www.esri.com) provides many free GIS resources including viewers, sample code, and links to other free GIS resources. Cambridge, MA also has its own GIS site (gis.ci.cambridge.ma.us) with links to maps and different data sets. To learn more about GIS in general, visit www.gis.com. B.3 Labeled Map Areas Figure B-1 shows the complete list of areas that were used for the trials described in chapter 5. 66 - W { S r; . t** [j IP J 00 "Y/e ON 00 -IC I 67 Figure B-1: The complete map with labels, used for our trials. c; Z .............. - .- ....................... 71 Ik Ire .. ................... .....- - 68 Appendix C Tests C.1 Training and Testing with Different Users Using the GPS data collection tool, a set of 1007 position instances was gathered by a second user. The data was then evaluated using the system trained with data gathered by the initial user. The system performed quite poorly, correctly predicting only 261 of the instances (25.82%). In comparison, cross-validation on the data from the second user accurately categorized 95.53% of the actions correctly. Figure C.1 shows that the better predicted activities were class, shopping, and walk which were performed using routes very similar to those taken by the initial user. The misclassified instances represent activities that were done at different times a 0 1 4 22 0 8 4 1 0 0 b 0 46 8 0 0 13 0 0 0 1 c 0 0 15 0 0 0 10 1 0 17 d 0 0 0 0 0 0 0 13 0 0 e 0 4 13 29 0 0 71 25 0 0 f 0 0 0 9 0 37 17 5 0 6 g 0 0 60 0 0 0 144 17 0 0 h 0 0 4 11 0 0 0 15 0 0 i 0 24 44 16 0 15 143 22 0 39 j 0 0 39 0 0 22 8 0 0 4 <- classified as a = Errands b = Class c = Work d = Grocery Shopping e= Home f= Shopping g= Walk h= Dinner i i= Visit 1 j = Lunch 1 1 1 | Table C. 1: Confusion matrix for data gathered by a second user and evaluated using a system trained with the ADiff(prev) dataset 69 or using different routes than the initial user. This is indicative of the sensitivity of the system for a particular user's normal habits. C.2 Wordnet Trials A limited amount of work was done to investigate using lexical databases both for structure determination of TAN networks and calculating prior-probabilities for a naive Bayes implementation. For both of these tasks, the Lexical Freenet database was used as a front end to the WordNet database for calculating the "distance" between words. This can be done by entering two words and selecting the connecting relationships that are desired. Lexfn will then return the shortest paths it finds that connect the two words. This was used as a simple metric to determine the strength of the relationship between two words. C.2.1 TAN Structure Determination Lexfn connections were used to create a complete graph where the attributes represented the nodes and the weight on the arcs was equal to the number of links in the shortest path between the two words. A minimum spanning tree was then calculated from this graph and the resulting network used in place of the one normally calculated using the conditional mutual information metric. The system was then trained as normal. Tests with this technique predict a performance of 81.6% on the ADist(prev) feature set (versus 84.16% with our naive Bayes implementation). C.2.2 Naive Bayes Training Using the features from the ADist(prev) feature set, Lexfn was used to determine the prior-probabilities. For the non-area features (day, time, speed), the path length of the values of the features and the class value was used to calculate P(CjAj = v). To invert these probabilities (greater probability for a shorter path length), normalize them, and allow for multinomial smoothing, the following formula was used, where 70 sum is the total number of path lengths over all values of the attribute: 1 + sum - pathLength(v, C) (sum * (Count(Ai) - 1)) + Count(Ai) For the remaining features (change in distance from the areas), the attribute value (place name) was used to find the distance between it and the class value. This value was subtracted from a constant, set at 10, and then divided by the same constant to arrive at the value for "closer" while one minus this value was used for the value of "farther" for the attribute. Results from this experiment were low, classifying only 26% of the instances correctly. Part of the difficulty in applying this technique towards a location-based system is that lexical knowledge might not be the right source to find relationships between walking and parks or visits and residential areas. While these make sense to us as humans, they are not necessarily part of the definition of walking or visiting. 71 C.1 Bibliography [1] G. Abowd, C. Atkeson, J. Hong, S. Long, R. Kooper, and M. Pinkerton. Cyberguide: A mobile context-aware tour guide. Baltzer/A CM Wireless Network, 3(5):421-423, 1997. [2] D. Aha. Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms. InternationalJournal of Man-Machine Studies, 36(2):267-287, 1992. [3] R. Ahuja, T. Magnanti, and J. Orlin. Network Flows Theory, Algorithms, and Applications. Prentice Hall, 1993. [4] A. Asthana, M. Cravatts, and P. Krzyzanowski. An indoor wireless system for personalized shopping assistance. In Proceedings of IEEE Workshop on Mobile Computing Systems and Applications, pages 69-74. IEEE Computer Society Press, December 1994. [5] D. Beeferman. Lexical discovery with an enriched semantic network. In Proceedings of the ACL/COLING Workshop on Applications of Wordnet in Natural Langugage Processing Systems, pages 358-364, 1998. [6] C.L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998. [7] M. Brand, N. Oliver, and A. Pentland. Coupled hidden Markov models for complex action recognition. In Proceedings of IEEE CVPR97, 1996. [8] J. Caffery and G. Stuber. Overview of radiolocation in CDMA cellular systems. IEEE Communications, 36(4):38-45, April 1998. 73 [9] J. Cerquides. Applying general Bayesian techniques to improve TAN induction. In Knowledge Discovery and Data Mining, pages 292-296, 1999. [10] W. Chan. Project voyager: Building an Internet presence for people, places, and things. Masters Thesis, Massachusetts Institute of Technology, May 2001. [11] G. Chen and D. Kotz. A survey of context-aware mobile computing research. Technical Report TR2000-381, Dept. of Computer Science, Dartmouth College, November 2000. [12] J. Cheng and R. Greiner. Learning Bayesian belief network classifiers. In Proceedings of the Fourteenth Canadian Conference on Artificial Intelligence, 2001. [13] K. Cheverst, N. Davies, K. Mitchell, A. Friday, and C. Efstratiou. Developing a context-aware electronic tourist guide: some issues and experiences. In CHI, pages 17-24, 2000. [14] C.K. Chow and C.N. Liu. Approximating discrete probability distributions with dependence trees. IEEE Trans. on Infomation Theory, 14:462-467, 1968. [15] P. Dana. Global positioning system overview. http://www. colorado.edu/geography/gcraft/notes/gps/gps.html. [16] A. Dey, G. Abowd, and D. Salber. A context-based infrastructure for smart environments. In Proceedings of the First International Workshop on Managing Interactions in Smart Environments, pages 114-128, 1999. [17] R. Duda and P Hart. Pattern classification and scene analysis. John Wiley, 1973. [18] P. Enge and P. Misra. Scanning the issue/technology. In Proceedings of the IEEE, volume 87, January 1999. [19] I.A. Essa. Computers seeing people. Al Magazine, 20(2):69-82, 1999. [20] C. Fellbaum. WordNet: An electronic lexical database. The MIT Press, 1998. 74 [21] E. Fix and J Hodges Jr. Discriminatory analysis; non-parametric discrimination: Consistency properties. Technical Report 21-49-004(4), USAF School of Aviation Medicine, Randolph Field, Texas, November 1951. [22] J.H. Friedman. On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1:55-77, 1997. [23] N. Friedman, D. Geiger, M. Goldszmidt, G. Provan, P. Langley, and P. Smyth. Bayesian network classifiers. Machine Learning, 29:131, 1997. [24] N. Friedman, M. Goldszmidt, and T. Lee. Bayesian network classification with continuous attributes: getting the best of both discretization and parametric fitting. In Proc. 15th InternationalConf. on Machine Learning, pages 179-187. Morgan Kaufmann, San Francisco, CA, 1998. [25] I. Getting. Perspective/Navigation - The Global Positioning System. IEEE Spectrum, 30(12):36-38, 43-47, December 1993. [26] T. Huang, D. Koller, J. Malik, G. Ogasawara, B. Raio, S. Russell, and J. Weber. Automatic symbolic traffic scene analysis using belief networks. In Proc. Nat. Conf. on Artificial Intelligence, pages 966-972. AAAI Press, 1994. [27] F.V. Jensen. An Introduction to Bayesian Networks. University College London Press, 1996. [28] P. Langley and S. Sage. Induction of selective Bayesian classifiers. In Proc. Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, pages 399-406, 1994. [29] A. Madabhushi and J.K. Aggarwal. A Bayesian approach to human activity recognition. In Proceedingsof the Second IEEE Workshop on Visual Surveillance, 1998. [30] A. Mccallum and K. Nigam. A comparison of event models for naive Bayes text classification. In AAAI-98 Workshop on Learningfor Text Categorization,1998. 75 [31] J. Metzner and B. Barnes. Decision Table Languages and Systems. Academic Press, 1977. [32] C. Narayanaswami and M. T. Raghunath. Application design for a smart watch with a high resolution display. In Proceedings of the Fourth InternationalSymposium on Wearable Computers (ISWC'00), 2000. [33] N. Oliver and A. Pentland. Graphical models for driver behavior recognition in a SmartCar. In Proceedings of IEEE Intl. Conference on Intelligent Vehicles 2000, Detroit Michigan, October 2000. [34] Nuria M. Oliver, Barbara Rosario, and Alex Pentland. A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):831-843, 2000. [35] J. Pearl. ProbabilisticReasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Mateo, CA, 1988. [36] J. Quinlan. C4.5: Programsfor machine learning. Morgan Kaufmann, 1993. [37] N. Schroder. Reality check: Mobile and wireless software in 2002. Technical Report AV-15-3847, Gartner Research, January 2002. [38] J.S. Stevenson and R. Topp. Effects of moderate and low intensity long-term exercise by older adults. Res Nurs Health, 13:209-218, 1990. [39] P.H. Winston. Artificial Intelligence, Third Edition. Addison-Wesley Publishing Company, Reading, Massachusetts, 1993. [40] I. Witten and E. Frank. Data Mining: PracticalMachine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, October 1999. [41] A. Woodruff, Aoki, A. Hurst, and M. Szymanski. Electronic guidebooks and visitor attention. In ICHIM (1), pages 437-454, 2001. 76 [42] C. Wren, A. Azarbayejani, T. Darrell, and A. Pentland. Pfinder: Real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):780-785, 1997. [43] N. Young, R. Guggenheim, and R. Moore. The making of Majestic. Game Developer Magazine, April 2001. [44] H. Zhang and C.X. Ling. An improved learning algorithm for augmented naive Bayes. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 581-586, 2001. 77