Context-Aware Activity Recognition using TAN Classifiers C.

Context-Aware Activity Recognition using TAN
Classifiers
by
Neil C. Chungfat
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
May 2002
@ Neil C. Chungfat, MMII. All rights reserved.
The author hereby grants to MIT permission to reproduce and
distribute publicly paper and electronic copies of this thesis and to
grant others the right to do so.
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
JUL 3 1 2002
LIBRARIES
.
Author ....
.
.
.
Department of Electrical Engineering and Computer Science
May 24, 2002
Certified by...
Stephen S. Intille
Thesis Supervisor
Accepted by......
Arthur C. Smith
Chairman, Department Committee on Graduate Theses
2
Context-Aware Activity Recognition using TAN Classifiers
by
Neil C. Chungfat
Submitted to the Department of Electrical Engineering and Computer Science
on May 24, 2002, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
Abstract
This thesis reviews the components necessary for designing and implementing a realtime activity recognition system for mobile computing devices. In particular, a system
utilizing GPS location data and tree augmented naive Bayes (TAN) classifiers is
described and evaluated. The system can successfully recognize activities such as
shopping, going to work, returning home, and going to a restaurant. Several different
sets of features are tested using both the TAN algorithm and a test bed of other
competitive classifiers. Experimental results show that the system can recognize
about 85% of activities correctly using a multinet version of the TAN algorithm.
Although efforts were made to design a general-purpose system, findings indicate
that the nature of the position data and many relevant features are person-specific.
The results from this research provide a foundation upon which future activity aware
applications can be built.
Thesis Supervisor: Stephen S. Intille
Title: Research Scientist
3
4
Acknowledgments
My heartfelt thanks to the many people who supported me during the last five years
at MIT. In particular: my parents who have always stood behind me; my advisor,
Stephen Intille who offered his guidance and experience throughout this project and
refused to let me feel lost during the course of this year; my friends who have made
the last five years of hardwork and sleepless nights all worth it.
Thanks also to the National Science Foundation for funding this project and my
research this past year.
5
6
Contents
1
2
3
Introduction
15
1.1
M otivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.2
T he Task
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
1.3
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
Related Work
21
2.1
Mobile Context-Aware Computing
. . . . . . . . . . . . . . . . . . .
21
2.2
Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
Choosing and Qualifying Actions
25
3.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.1.1
Location Data . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.1.2
Personal Information Management (PIM) Data
. . . . . . . .
27
3.1.3
Maps and Landmarks . . . . . . . . . . . . . . . . . . . . . . .
28
3.1.4
Lexical and Knowledge Databases . . . . . . . . . . . . . . . .
29
3.1.5
Other Sensor Data . . . . . . . . . . . . . . . . . . . . . . . .
30
3.2
Identifying Activities . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.3
Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.3.1
Where - Geographic References . . . . . . . . . . . . . . . . .
31
3.3.2
When - Temporal References
. . . . . . . . . . . . . . . . . .
32
3.3.3
Why - PIM data . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.3.4
How - Speed
. . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.3.5
Maintaining Generality . . . . . . . . . . . . . . . . . . . . . .
33
Data Sources
7
4
4.1
Goals .........
4.2
Approach
4.3
5
7
35
....................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
4.2.1
Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . .
37
4.2.2
Naive Bayesian Classifiers . . . . . . . . . . . . . . . . . . . .
37
TAN Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
4.3.1
Structure Determination . . . . . . . . . . . . . . . . . . . . .
39
4.3.2
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
4.3.3
TAN Multinets . . . . . . . . . . . . . . . . . . . . . . . . . .
41
Results and Analysis
43
5.1
Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
5.2
Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
5.2.1
Single Point Distance . . . . . . . . . . . . . . . . . . . . . . .
44
5.2.2
Changing Distances . . . . . . . . . . . . . . . . . . . . . . . .
44
5.2.3
Trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
. . . . . . . . . . . . . . . . . . .
46
5.3
6
35
Recognizing Activities
Classification Results and Analysis
5.3.1
Algorithm Results
. . . . . . . . . . . . . . . . . . . . . . . .
46
5.3.2
Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
Discussion
51
6.1
Feasibility of an Activity Recognition System
. . . . . . . . . . . . .
51
6.2
Extensions and Improvements . . . . . . . . . . . . . . . . . . . . . .
52
6.2.1
Other Activities . . . . . . . . . . . . . . . . . . . . . . . . . .
52
6.2.2
Low-Level Data . . . . . . . . . . . . . . . . . . . . . . . . . .
53
6.2.3
High-Level Knowledge
. . . . . . . . . . . . . . . . . . . . . .
53
6.2.4
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
Conclusion
55
A System Overview
A.1
Hardware
57
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
57
A.2 Software . . . . . . . . . . .
....
. . . . . . . . . . . . . . . . . .
57
. . . . . . . . . . . . . . . . . .
58
A.2.1
GPSRecorder
A.2.2
GPSTracker . . . . .
. . . . . . . . . . . . . . . . . .
59
A.2.3
TAN . . . . . . . . .
. . . . . . . . . . . . . . . . . .
59
A.2.4
Experience Sampling
. . . . . . . . . . . . . . . . . .
61
B Activity and Landmark Data
63
. . . . . . . . . . . . . . . . . .
63
B.2 Online Resources .
. . . . . . . . . . . . . . . . . .
65
B.3 Labeled Map Areas
. . . . . . . . . . . . . . . . . .
66
B.1
Selecting Activities
69
C Tests
. . . . . . . . . . . . . . .
69
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
C.2.1
TAN Structure Determination . . . . . . . . . . . . . . . . . .
70
C.2.2
Naive Bayes Training . . . . . . . . . . . . . . . . . . . . . . .
70
C.1 Training and Testing with Different Users
C.2 Wordnet Trials
9
10
List of Figures
2-1
The IBM Linux watch (a) and Casio watch equipped with GPS receiver
(b) .........
22
.....................................
. . . . . . . . . . . . . . . . . . . . .
3-1
iPAQ device with GPS receiver
3-2
A map of the area near the MIT campus with different colors repre-
26
senting the various kinds of areas used in this work. . . . . . . . . . .
28
4-1
A naive Bayes network . . . . . . . . . . . . . . . . . . . . . . . . . .
38
4-2
A TAN network for the Adist(prev) dataset. . . . . . . . . . . . . . .
39
5-1
A graph depicting the predicted classifier accuracy of each classifier for
each dataset.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
A-1 The maximum spanning trees with associated IPx values for the con. . . . . . . . . . . . . . . .
60
B-1 The complete map with labels, used for our trials. . . . . . . . . . . .
67
tacts (left) and nursery (right) datasets.
11
12
List of Tables
5.1
Summary of classifier results per feature set (mean ± variance).
. . .
46
5.2
Confusion matrix for ADiff(prev) . . . . . . . . . . . . . . . . . . . .
49
B.1 Complete activity list with incidence count . . . . . . . . . . . . . . .
64
C.1 Confusion matrix for data gathered by a second user and evaluated
using a system trained with the ADiff(prev) dataset . . . . . . . . . .
13
69
14
Chapter 1
Introduction
Mobile computing devices such as cellular phones and personal digital assistants
(PDAs) have become more prevalent in recent years with handheld device use expected to increase 19% in 2002 to 14.75 million units [37].
In conjunction with
their widespread use, these devices continue to become both more powerful and more
portable.
To evolve beyond more than simple replacements for paper organizers,
applications must intelligently take advantage of a user's location and intentions to
provide useful services.
Context-aware computing aims to create applications that leverage information
about the user's environment to improve his experience by making more natural and
intuitive interfaces [16]. This thesis outlines the high-level design goals for an activity
recognition system and describes the knowledge sources that are available to build
such a system. In particular, a system was designed and implemented that uses a
simple training process and allows for the automatic recognition of some activities
without complex knowledge engineering. The recognition system deals with noisy
sensor data, and infers personalized activity recognition models without explicitly
providing the system with hard-coded rules. While past systems simply match a
user's location to one particular action, this research attempts to go beyond these
models by considering more complex situations where a single location may be related
to more than one action and for which context is not determined by location alone.
The results from this system are discussed and used to speculate on the possibilities
15
for future work.
1.1
Motivation
A variety of context-aware applications could take advantage of the capabilities of
an accurate activity recognition system. Applications that are aware of the user's
environment become more personally relevant by presenting information that applies
to the situation at hand, thereby improving the user experience. A key benefit of
the system described here is that it relies on location data that is relatively easy
to obtain and does not require deploying and calibrating cameras or other more
expensive sensory devices. The availability of location data makes this recognition
system useful for a variety of applications.
* Preventative Medicine. This research grew from a larger effort focused on
designing new interfaces and technologies that address preventative health care.
Recognizing the activities of an individual both inside and outside of the home
provides important data that can be used to infer patterns of behavior. Visionbased systems have been implemented that can recognize motions such as sitting
up, falling backwards, squatting, and walking [29] as well as interactions between
different people [34]. While it is possible to detect more detailed actions inside
a controlled or enclosed environment, such as the home, it is far more difficult
to apply the same vision-based systems in the outside world. To obtain equally
useful observations outside of the home, an activity recognition system could
be used to measure the frequency and types of exercise the user engages in and
the frequency of other daily routines. Healthy living could also be influenced by
providing advice on nutrition when, for example, it recognizes the user is going
out to lunch. Accumulating observations on behavior combined with offering
advice at the right time and place can help alert the individual to recognize and
remedy potential health risks [38]. It is therefore important that a preventative
healthcare system readily distinguish how actions are being executed (i.e. a walk
is considered exercise but taking a drive is not) and be sufficiently accurate in
16
its detection process for medical applications.
* Response Adaptive Applications. Integrated with other applications and
systems, a location-aware system could perform a variety of useful tasks. Many
of these are based on the application's ability to provide the right information
at the right time. Knowing what the user is doing or is about to do is therefore
essential for such applications to avoid presenting untimely and inappropriate
information. Memory aid applications could remind the user to buy milk when
they sense the user going to the supermarket or to pick up dry-cleaning on
the way home from work. Planning applications that anticipate a user's most
frequently taken routes could alert the user to hazards such as construction or
blocked roads and suggest alternative routes or make reservations for a restaurant the user is heading towards. Social applications could notify friends and
family when the user is coming by for a visit or allow someone to ask their
neighbor to run an errand if he is already enroute to some destination. An ideal
system will therefore be able to run in real-time, recognizing activities quickly
to allow for relevant information to be presented to the user before the action
is completed.
e Games. Both educational and recreational games could make use of this activity recognition system. Electronic Arts recently released the innovative Majestic game [43] that attempted to create a unique and personalized gaming
experience by contacting users through phone calls, faxes, e-mails, and Internet
messaging. An additional depth to this type of gaming experience could be
achieved by incorporating a user's real world actions into the game's storyline.
One can imagine a game that adapts itself to the user's routines, thereby personalizing the experience on a per-player basis. The activity recognition will
therefore have to be trained to the user's personal habits and routines relatively
painlessly, without requiring a great deal of effort from the user.
17
1.2
The Task
To design a system that accommodates the aforementioned applications, a few issues
must be addressed. First, what kind of data is available that will be useful for this
task? This in-turn influences the resolution of activities that can be recognized by
the system as well as the types of algorithms that can be used to actually perform
the recognition. The research described here attempts to investigate these questions.
A system to detect several routine actions including going to work, going home,
and going shopping is described as well as how it was designed, implemented, and
tested. This system utilizes a combination of GPS and zoning-type map data (i.e.
parks, residential neighborhoods, businesses, and restaurants) to represent a user's
movements and activities. As the location the system perceives is merely a point on
the globe, the class of actions it was designed to recognize were not as fine-grained as
other systems that utilize computer vision to obtain data (i.e. [29, 11]). Instead, the
focus of this work is on understanding the potential of machine-learning classifiers
to recognize activities primarily defined by the type of area where they occur - for
example shopping, going to work, and returning home.
To be truly useful, an activity recognition system should be able to adapt itself
to the habits of the user.
However, since each person has his own habits, some
supervision on the part of the user is necessary. Ideally, this training process should be
as simple as possible and not require complex knowledge engineering. The algorithm
should not use features that are overly user-specific and that require hard-coding
information into the system.
1.3
Outline
The remainder of this paper is structured as follows. Related work on other contextaware and activity-based systems is discussed next in Chapter 2. Chapter 3 covers the
types of data that are available for use in activity recognition and how these translate
into useful features. A discussion of the approach taken to recognize activities is
18
contained in Chapter 4 followed by a review of the results in Chapter 5. Chapter 6
contains a discussion of lessons learned from this experience as well as possibilities
for future work. We then summarize and conclude in Chapter 7.
19
20
Chapter 2
Related Work
2.1
Mobile Context-Aware Computing
Within the next few years, portable technology will become more prevalent and more
useful for everyday activities. Current PDAs accommodate cameras, GPS receivers,
barcode scanners, as well as a multitude of other peripherals. As a result, we anticipate that an integrated "all-in-one" device will be possible within a few years that
is both more powerful and more portable than current devices. With this in mind,
the activity recognition system described here may one day be able to work using a
PDA with a wristwatch form factor. In fact, IBM researchers have already prototyped a "smart-watch" that runs Linux [32] (see Figure 2-1(a)), and watches are on
the market with GPS capability (Figure 2-1(b)).
While some context-aware studies have focused on improving user-interaction with
desktop computers, our research focuses on providing useful information and services
to users as they go about their daily lives. Location-aware handheld computers have
been used to create intelligent tour guides (see [1, 10, 41, 13]) that present text, audio,
and video information to users as they walk around a pre-determined area. Information pertaining to the location such as a restaurant review or historical background
are provided when the user reaches a particular location that triggers the response.
For these applications, a simple "are they near this location?" is sufficient to serve as
the context since an assumption is made that the user is interested in the information
21
(b)
(a)
Figure 2-1: The IBM Linux watch (a) and Casio watch equipped with GPS receiver
(b)
that will be provided. These applications therefore only assume one context for a
particular location. In other words, the applications present, at most, one type of
information per place and do not take into account the possibility that the user is
interested in something else or is engaged in multiple simultaneous activities. Other
similar applications have been developed to help shoppers locate items in a supermarket and provide nutritional or discount information related to each shopper's location
and past history [10, 4]. The addition of a shopper's buying habits can provide enough
information to personalize the experience by reminding a user that he last bought
milk a week ago or informing him that his favorite kind of cereal is currently on-sale.
For an action recognition system, more information than the current location of
the user is needed to make an accurate prediction. The implications of user preference for particular routes also influences how the system must function, whereas,
for example, the tour guide applications need not be concerned with these details.
For them, it does not matter how the user arrived at the location, merely that he
is there. The requirement that the system be adaptable on a per-user basis requires
more sophisticated testing procedures than a system that functions the same way for
all users.
22
2.2
Activity Recognition
The computer vision community has devoted much research into motion recognition.
One technique, known as background subtraction, can distinguish moving people
and objects within a room. Over time, the camera learns a statistical model of the
image representing the static environment (objects that remain still). When a person
enters the room, it compares the new image to its version of the background and
highlights the differences, assuming that these "blobs" are people moving about [19].
Beyond tracking people themselves, extensions of this technique can be used to track
gestures, including arm or hand motions such as those used in sign language [42]
and even automobiles to study traffic patterns [26]. Vision systems have also used
probabilistic models to perform action recognition. Pentland et al. have used hidden
Markov models (HMMs) and features extracted from video data to recognize patterns
of human interaction [34], Tai Chi hand movements [7], and the behavior of people
driving automobiles [33].
The data for these vision-based systems was gathered in controlled environments
with mounted cameras. The benefit of GPS technology is that it is already omnipresent and therefore any service that makes use of it can be deployed without
requiring the placement and calibration of sensors in the environment. The set of
actions that a vision-based system is designed to recognize are dependent on camera
placement, lighting conditions, and many other environmental factors. This makes
it difficult for these systems to adapt to new places or to be extended to recognize
different kinds of actions.
Another important difference is that the vision systems are trained to recognize
complete actions such as the gesture of standing up. The action recognition system
that we envision, on the other hand, must attempt to make a prediction of the action
that is currently taking place (and may take a long time to complete) to be of the
most use.
Once the action has taken place, it may be too late to present useful
information to the user. As a result, some of the models used for recognition by
the vision-based systems do not apply for our system. HMMs can be used to model
23
actions in vision systems because they model a discrete state of the system. The data
we are concerned with is not as detailed, which makes it hard to specify the specific
states and transitions in our system. A HMM could potentially be layered on top of
our system to model user habits, though this requires a robust underlying system that
can accurately predict what a user is doing. For this system, we consider the class
of machine-learning classification algorithms. This is fitting because our data is well
represented by distinct features that we wish to cluster into categories. In particular,
we examine the tree augmented naive (TAN) version of the Bayes classifier because
of its combination of training simplicity and competitive performance [23].
24
Chapter 3
Choosing and Qualifying Actions
An important goal in designing this system was to minimize the number of hard-coded
rules required by the system. While we could have used rules explicitly connecting
places with actions (i.e. supermarkets with grocery shopping) as in prior work [1],
our goal was for the system to learn these associations through training data. We
wish to do this so that the system can adapt to a user's personal habits. For example,
a shopping mall that contains restaurants could be frequented by one user to shop,
another to work, and yet another for meals. To make this learning process possible,
both sources of high-level knowledge and methods of making this knowledge available
to the system are required.
3.1
Data Sources
There are several potential sources of data for an activity-recognition system. In addition to location data, the applications of the PDA device as well as several geographic
tools are available as resources.
3.1.1
Location Data
The Global Positioning System (GPS)was developed by the United States Department
of Defense and released for civilian use in 1990. The system consists of 24 satellites and
25
Figure 3-1: iPAQ device with GPS receiver
several ground stations around the world, which are used to determine the precise
position, velocity, and altitude of a properly equipped receiver. A receiver within
range of these signals measures the distance based on the travel time between the
receiver and a minimum of three satellites and uses these to triangulate its position.
With selective availability (an intentional degradation of the GPS signal) turned off
since May 2000, accuracy is possible within 10 m of the actual position [18, 25].
The most severe limitation of GPS technology is that it must operate within lineof-sight of the orbiting satellites as the weak signals cannot penetrate buildings or
dense foliage [15, 18]. As a result, positions cannot be taken if the receiver is inside
or close to buildings or other obstructions. Using a receiver in urban areas is further
complicated by multipath error, which results from signals being reflected by buildings
or other surfaces near the receiver. This can introduce additional error of up to half
a meter [15].
26
Acquiring a good signal lock from a GPS receiver requires on the order of tens
of seconds, depending on the availability and accuracy of the last known position.
From a cold start (lacking memory of the last recorded position), obtaining a reliable
position lock requires about 40 seconds in an unobstructed location. This can improve
significantly to under 20 seconds if the receiver contains memory of the last position
and that position is relatively close to the receiver's new location.
For this work, a PDA device equipped with a GPS receiver (see Figure 3-1) and
custom software was used to acquire GPS data.
The device could either be run
continuously or scheduled to gather data at regular intervals (i.e. every 15 minutes).
In addition, whenever the user powered on the device, the software would run in the
background and attempt to get a position fix. Position error was noticeable in the
data we gathered with the position being skewed up to 125 m when the receiver was
in an enclosed area.
The effects of noise on the results of our system are noted in
Section 5.2.
3.1.2
Personal Information Management (PIM) Data
The applications that come pre-installed on PDAs store valuable personal information
management (PIM) data entered directly by the user. This data provides a snapshot
of some of the activities that the user believes are important and can therefore help the
recognition algorithms to infer activity. If the user has some appointment scheduled,
it is likely that some action is taking place that coincides with the appointment. The
algorithm can then attempt to use this data to make a more accurate prediction.
An up-to-date appointment book is a strong indicator for some action - i.e. a lunch
appointment or a meeting at work.
The PIM data is easy to obtain and can be very precise. However, because it
requires user entry, appointment text can be misspelled or abbreviated and the appointments themselves may simply be out of date. As a result, clustering this noisy
data is a problem in its own right that must be overcome before incorporating it into
a recognition scheme.
27
N -
Central square
(business area)
1
/
"I
;;G,-=
=
- AAAM=-
-
Zi
J
Cambridge
Public Library
Galleria
(shopping)
residential
areas
Legal Seafoods
(restaurant)
church
University,
Park Hotel
/
4
Star Market
(supermarket)
- MIT
(
(nvriy
Esplanade
(recreational)
4,
school
Bak Bay
(residential area)
Briggs field
(recreational area)
Figure 3-2: A map of the area near the MIT campus with different colors representing
the various kinds of areas used in this work.
3.1.3
Maps and Landmarks
Longitude and latitude positions returned by the GPS receiver are only useful in
combination with map data such as the residential and commercial zones of a city or
specific names and locations of restaurants and businesses. To obtain this information, which is not currently publicly available in convenient formats, a graphical user
interface (GUI) was developed that allows for quick and detailed labeling of specific
areas. This was used to identify specific areas of Cambridge, MA in the locale of the
MIT campus. Specifically, we identified business, recreational, residential, shopping,
and university areas as well as specific locations including banks, churches, hotels,
libraries, museums, post offices, restaurants, schools, and supermarkets (see Figure
3-2 or Appendix B.3 for a complete list of places).
In all, 103 areas and places were labeled for our experiments. Very specific in28
formation, such as street-name data and the locations of a particular user's home or
workplace were avoided to prevent making the system too user-specific. Along the
same lines, individual places of the same type are not distinguished. For example,
restaurants and businesses do not carry a label and are categorized simply as "business" or "restaurant." For our experiments, the closest distance to each of the area
types is calculated and used as a feature, thus ignoring any more specific detail about
the place other than its position and type.
There are many geographic resources such as Geographic Information System
(GIS) databases and mapping references that are available on-line (see Appendix B.2
for more detail). As mentioned above, finding a suitable source of city-level information for parks and monuments could be useful. While reverse-geocoding (translating
a coordinate to an address) services could be applicable, they do not currently provide
the detail (i.e. the category of the establishment at the address) that was required
for this application.
3.1.4
Lexical and Knowledge Databases
Existing databases, such as encyclopedias and lexical references contain a great deal
of high-level knowledge that could be useful for activity recognition algorithms. For
example, WordNet is a lexical database based on psycholinguistic principles. From
a high level, it can be thought of as a dictionary ordered by synonym sets (synsets)
or syntactic categorization [20]. A convenient API makes it possible to easily search
the database for synsets as well as definitions and words associated with a particular
term.
Lexical freenet' (lexfn) is a system that utilizes the synonym relations of WordNet
but also organizes names and places based on relationships between words. Terms
such as "Kmart" and "shopping" are linked due to their close-proximity in multiple
documents found in a corpus of news broadcasts [5]. This feature is less useful for
our purposes than using more traditional semantic relationships, which lexfn can
lhttp://www.lexfn.com
29
use to find connections between words.
These relationships include synonymous,
generalizes, specializes, comprises, and part of allowing for the development of a metric
to determine the relevance between pairs of words. A system might then be able to
use this information as part of its training process to learn what features are more
relevant for particular activities. For example, a park might be classified more closely
as a recreation area than a grocery store, therefore leading to the conclusion that
someone is more likely to go for a walk in a park. Unfortunately, the database itself
is not perfect, sometimes returning semantic relationships that do not seem to make
sense or returning a shorter series of relations for two words that do not seem to be
related. This makes it hard to anticipate if a system trained on the basis of these links
would truly reflect reality. The choice of words to describe the features also becomes
much more important since it determines the quality of the relationships that will be
returned by the database.
3.1.5
Other Sensor Data
Data from sensors such as barcode scanners and light meters can also provide contextual information about the user. A barcode scanner in use may indicate the user
is grocery shopping, while a light meter can provide information about the weather
or determine if a user is outdoors. Other technologies such as Bluetooth can provide
information about available resources and how far the user is from these locations.
Although our system does not currently make use of sensor data other than GPS
locations, it can be extended to accommodate data from these other sensors.
3.2
Identifying Activities
One goal of this work was to identify the sorts of activities that the system should
and could recognize. "Typical" daily activities were solicited from research affiliates
by asking them for a "list of high-level actions" that they normally do in a week. Of
the twenty-six responses, these included activities such as "grocery shopping," "going
to work," "going to class," "laundry," "watching television," "checking e-mail," and
30
"reading" (Appendix B.1 contains a complete list of the responses). A subset of these
were chosen for this study by first grouping the activities by hand, selecting those
activities that occur outdoors (due to the limitations of GPS) and then picking those
that were most appropriate for a university setting and that were frequently listed.
The final set included the following ten activities: going to work, going home, going
to class, grocery shopping, going for a walk, shopping (i.e. at a mall), going to visit
someone, going to eat lunch, going to eat dinner, and running errands. Some popular
responses that were omitted included gardening (few gardens in the area), phoning,
and cooking (both generally indoor activities).
3.3
Features
The features that were considered fall into the four broad categories of "where"
(geographic data), "when" (temporal data), "why" (PIM data), and "how" (travel
method) with the final category - "what" - being what the system is supposed to
determine.
3.3.1
Where - Geographic References
For our system, the locations served as reference points, which were compared to
locations and areas identified within the neighborhood where the user was located.
Using this data, several first-order features (those that can be derived directly from
low-level data) can be observed. Points can be organized by their distance from each
other or from the labeled areas and landmarks. Higher level features can take into
account the starting and ending points of the trip as well as landmarks that are passed
while the activity takes place. As our system primarily uses location data, several
different sets of derived features of varying complexity were used. These are described
in greater detail in Section 5.2.
31
3.3.2
When - Temporal References
First-order features include the time of day and the day of the week when the action occurred. Higher order features take into account past events and patterns of
behavior. An example could be recording the last landmark passed by the user or
the last time the user performed the action of concern (i.e. the last time a user went
to the grocery store). Our system incorporates many first-order temporal features
but does not consider some of the more complex second-order features. While these
could be useful for our system, our experiments omit these features to measure the
base effectiveness of the algorithms using a more general set of attributes. We, however, suggest that future work should investigate using these higher order temporal
features. These features encode illustrative patterns of user behavior and would be
useful for applications that are built on top of this system.
3.3.3
Why - PIM data
The personal information management (PIM) data can provide a clue as to why the
user is engaging in a certain action. For example, is there a meeting in an hour, hence
the trip to work? The "why" can provide valuable information, especially when a user
deviates from more common patterns of behavior, such as going to school at night for
a special seminar or concert. Due to the complications involved in clustering PDA
data, our recognition system does not make use of these features though it would be
a reasonable extension to include in the near future.
3.3.4
How - Speed
How a person performs the activity can be a useful feature. Two different activities
may be differentiated if the user drives for one, but is walking for another. Recognizing exercise habits rely on accurately recognizing these differences. The recognition
system described here makes use of the average speed the user travels from some start
location. More complex features are difficult to measure due to the limitations of the
GPS to function in enclosed environments such as vehicles.
32
3.3.5
Maintaining Generality
Choosing a general set of features is a fairly difficult problem. By nature, humans
are creatures of habit and routine. There might be several ways to go from A to
B, but everyone has their preferred route that they are predisposed to choose. The
location data that we record as a user moves from place to place therefore contains
an intrinsic personal character. Despite this hurdle, attempts were made to keep
the system from becoming too user-specific. Features that are dependent on userspecific data (i.e. home and work) were intentionally avoided. In addition, tracking
the specific streets traversed on a route was avoided to prevent biasing the system to
particular routes. Including these features, however, might make it possible to create
a more powerful system that caters to a single user but can recognize some of the
more complex actions. This is discussed further in Section 6.1.
33
34
Chapter 4
Recognizing Activities
4.1
Goals
The task of classifying an instance as one of a set of pre-defined categories is a classic
problem in machine learning. As a result, there are a variety of well-known methods
varying both in complexity and effectiveness. Selecting the best algorithmic approach
required careful evaluation of several design goals.
" Training the system should be simple without requiring complex or expert
knowledge engineering.
" The data input into the system should be easy to obtain requiring minimal
input from the user.
" The algorithm should be able to take advantage of data mined from the PDA
device and be extensible to accommodate data from other sources.
" The activities recognized should be useful for some application and should be
predicted with some certainty estimate by the algorithm that can be used to
create interfaces that degrade gracefully.
" The algorithm should have a reasonably high level of accuracy (>80%) and be
relatively fast to accommodate a real-time recognition system.
35
4.2
Approach
The goals, stated above, constrain the algorithms that could be considered. Rulebased systems, while powerful, often require complex knowledge engineering that is
difficult and time consuming. In addition, they do not easily provide a means of
relating uncertainty with a given decision [39]. Neural nets, while known for their
strong performance are slow to train, requiring many passes over the training data. In
addition, the network structure is a black box; the decision model is hidden within the
network structure, making it difficult to understand how the system decided upon a
result [40, 39]. Decision trees are also well-known as being competitive classifiers and
our results confirm that they perform well on our datasets. However, determining the
structure of a decision tree can be quite complex, requiring heuristics to determine
when to stop building the tree and how to prune unnecessary branches. As a result,
if the heuristics are not chosen carefully, the system may overfit the training data.
[39].
The requirement to present an estimate of the certainty for a given decision suggests the use of a probabilistic classifier. Among the most well-known of these is
the Bayesian network. In a Bayesian network, attributes that describe the problem
being modeled are represented by nodes, while dependencies between these attributes
are symbolized by arcs connecting the nodes. A complicated system will therefore
be represented by a complex graph [35]. The tree augmented naive Bayesian (TAN)
classifier is one variant of this type of classifier that places limits on the complexity
of the network, thereby reducing the overhead of automatic structure determination
from a training set and helping to prevent overfitting [23, 12]. Training the network
requires a dataset that contains enough examples to provide a realistic view of the situation that is being modeled. These examples are then used to calculate probabilities
that will later be referred to during the classification of new instances. Because these
probabilities are pre-calculated, Bayesian methods, as a whole, classify new instances
relatively quickly. The TAN algorithm is therefore relatively simple, yet still performs
competitively against other more complex algorithms despite its lack of dependency
36
encodings [23]. In general, Bayesian methods provide a powerful and (depending on
the network structure) computationally efficient and intuitive method of modeling
the uncertainty in complex situations [40].
4.2.1
Bayesian Networks
The set of classifying techniques based on Bayes rule are known as Bayesian networks (also called belief networks [27]). Given n attributes, A 1 , A 2 , ..., An, with values a,, a 2 , ..., an Bayes rule states that we predict the probability that these attributes
represent some class value, c, in C as follows [44]:
P(ai, a2 , ... , an|C=c)P(C=c)(41
P(ai, a2, ...
,I
an)
P(C = clai, a 2 , ... , an) =
(4.1)
If the network is provided with a dataset that contains a full-range of possible
examples, the probabilities for each value of c can thus be calculated and the highest
selected as the most probable class for the represented instance.
Formally, a Bayesian network can be described as a pair B = (G, 0) with G representing a directed acyclic graph that encodes a joint probability distribution over a set
of random variables, U = {X 1, ..., Xn}. Nodes in G represent the attributes, while
arcs represent dependencies between the attributes. Each node can be considered
independent of its children, given its parents in G. The set
E
contains the quantita-
tive parameters that define the probability distribution of the network. A parameter
0
Xi
rJ,
= PB(xi|Hx) is defined for each pair of xi and Ix,
where Hx, represents the
set of parents of Xi. The joint probability distribution over U is thus given by
n
PB (X1, .. , Xn) =
n
17 P (Xi I x) = f Oxi inX,
i=1
4.2.2
(4.2)
i=1
Naive Bayesian Classifiers
Naive Bayes is perhaps the simplest form of a Bayes network. The model assumes
that the attributes used for classification are all conditionally independent - a rarely
37
clas
ftme
day
University
SUPenn
'et
Recreationial
Shoppinigmall
Schoot
Business
Figure 4-1: A naive Bayes network
true and "naive" assumption. As a consequence of this assumption, a naive Bayes
network can be represented as a tree with the class node as the root and each of the
attributes as a leaf (see Figure 4-1). Despite its simplicity and the strong independence assumption, naive Bayes has been shown to be competitive with other more
complex classifiers. The success of these models may be due to low variance in the
model in combination with high bias resulting from the independence assumption
[22].
Effectively, the low variance cancels out the effect of the bias, making accu-
rate classification possible. The independence assumption simplifies the calculation
in (4.1) [44]:
P(ai, a2, ..., an) = P(a|c)P(a
2 c)...P(anIc)
(4.3)
Therefore, a naive Bayesian network is trained by simply computing the probability
of each attribute given the class using the instances contained in the training set.
Alternatively, the priors can be explicitly specified if no training set is available or if
enough detail is known about the system.
4.3
TAN Classifiers
Tree augmented naive Bayes (TAN) classifiers relax the strong independence assumption inherent in naive Bayesian networks by allowing at most one additional arc between attributes. Compared with other, more complex forms of Bayesian networks,
determining the structure of these graphs is not an intractable problem since the
total number of arcs in the graph is limited, thereby reducing the possible search
space. TAN networks are therefore a good compromise between the simplicity of
38
class
tunn
day
speed
University
Library
School
Recreational
ShoppingMall
Hotel
Residential
Laboratory
Business
Museum
Restaurant
Church
Supennarket
Figure 4-2: A TAN network for the Adist(prev) dataset.
naive Bayesian networks and a more realistic representation of the situation being
modeled. In addition, they perform as well, if not better than naive Bayesian networks and other more complex classifiers on standard test sets [23].
4.3.1
Structure Determination
The structure of a TAN network is the same as that of a naive Bayesian network
save for the possibility of at most one additional incoming arc for each attribute node
(see Figure 4-2). The graph reflects some of the dependencies found in the training
data. For example, the connection of "Business" with "Restaurant" indicates that the
influence of the proximity of a business is closely tied to that of restaurant areas for
the actions contained in this set of data. Determining the structure of a TAN graph
39
reduces to determining which attributes influence each other most strongly and then
connecting the nodes that represent these two attributes with an arc. When using
purely discrete attributes, Friedman et al. [23] utilize conditional mutual information
as a metric for measuring this influence. This function is as follows:
I (A1 ; AjIC) = E
(4.4)
P(ai, aj, c) log P (ai,Pa c)
Paj)~jc
a1 ,aj ,c
Thus over the set of class variables, C, we calculate Ip(Ai, A IC) for all i =
j.
We
then use these values to determine which arcs should be present in the final graph.
This amounts to calculating the maximum spanning tree in a complete undirected
graph with the weight on the arc connecting nodes i and
j
equal to I,(Ai, A3 IC).
Determining the structure is polynomial, with a time complexity of O(n2N), where
N is the number of training instances [23].
4.3.2
Training
Training a TAN network amounts to calculating the prior probabilities of each attribute, given the class based on the occurrences of the attributes in a training set.
The only difference between a TAN network and a naive Bayesian network is that in
TAN networks, connected attributes must be accounted for in the prior-probability
calculations. To provide for missing attribute values in the training set, a smoothing
function is applied that effectively assumes that there is at least one of each possible
value for each attribute in the training set. Referred to as the LaPlace estimator
[40, 30], Cerquides [9] derives the functions appropriate for building a TAN network
from a multinomial sampling approach, where CountD(Xi) is the number of instances
in the training set that contain where Xi = xi and #Val(X) is the number of values
of attribute or class X.
CountD(Xi,
Oi nx=
fxi) +
A gtaes(P)
CountD(Hx)+ A#Val(Xi)SC(Xi)
States(P*)
40
(4.5)
acaj
ca) + #Val(C)#Va(Aj)#Va(Aj)
CountD(c, aj) + #Val(C)#VaI(A,)
-ContD
CountD(ai, c) +
Oaic -
#VaI(C)#Val(A)
CountD(C)
0C~ltN
+
(
(4.7)
COflD(C)
CountD(C)
(4.6)
+#Val(C)
#+
lC
#Val(C)
(4.8)
This smoothing function insures that no instances arise that are classified with
an absolute probability of 0 or 1. The last three equations are special cases of (4.5);
(4.6) refers to the formula used when an attribute has another attribute as its parent
in addition to the class, (4.7) is used when only the class node is the parent, and (4.8)
is used for the probability of the occurrence of the class itself. From (4.1) we now
have the necessary numbers to compute the probability for a given set of attributes.
Missing attribute values are simply omitted from the calculation. This does not pose
a problem since the value is omitted for all class values. This makes the algorithm
more robust and resilient to noisy datasets.
4.3.3
TAN Multinets
Instead of creating a single TAN network structure for all classes, we can generate a
different classifier for each class. This can produce better results if the relationships
between the attributes and classes vary widely [23]. There is no added complexity
since each training instance is only used a single time to build the model for the class
it belongs to. As a result, the complexity for building a multinet with a training
set of size N remains O(nr2 N). The datasets we generated were run on both a single
TAN and a multinet approach since the different activities being recognized should be
influenced quite differently by the area-type attributes. Performance should therefore
show an improvement using the multinet approach [14].
41
42
Chapter 5
Results and Analysis
5.1
Data Acquisition
Data was gathered using a consumer-grade GPS receiver attached to a PDA device.
GPS locations were recorded to a file as a user walked around the vicinity of the MIT
campus. After completing an action (i.e. arriving at work), the user would log the
current time and the action that was just completed on paper. Although software
was developed to allow the device to record data both continuously and at periodic
intervals, only the continuous mode was used to gather the maximum amount of data
over a short period of time. In this fashion, a week's worth of typical activity was
gathered by a single user walking around with the outfitted PDA. This training set
of 3307 instances was then transferred to a desktop computer and converted into
attribute vectors using several methods described below. The trained system was
also used to test data collected from other users to determine the effectiveness of the
system in dealing with other user's routes.
A long term goal of this project is to be able to create a stand-alone system that
would require an initial training phase. To facilitate this, an experience sampling tool
was developed to electronically log the actions of a user throughout the day at fortyfive minute intervals, independent of GPS location data. We anticipate that the data
gathered by this software could be helpful in training the system to a user's everyday
habits when combined with location data. In addition to the current activity, the
43
program asks the user to note his present location to serve as a rough indicator of
position. An program to solicit this data has many advantages over our previous
paper system since we can control the frequency at which we request data from the
user. Appendix A.2.4 contains additional details.
5.2
Features
All feature vectors incorporated the day of the week, the average speed traveled from
what was speculated to be the starting position, and the time of day as discretized
attributes. Time was categorized as morning (6 am - 12 pm), afternoon (12pm 6pm), evening (6pm - 9pm), and night (9pm - 6am). When using discrete versions
of the data, the average speed was discretized into walking speed (less than 4 mph),
running speed (less than 8 mph), biking (less than 15 mph), and motorized (anything
greater than 15 mph). As the features are determined from the raw GPS data offline,
additional features can be added later and tested on the various classifiers.
5.2.1
Single Point Distance
The first experiment evaluated each valid point recorded by the receiver by calculating
the minimum distance to each of the 18 area types. This created a vector with 21
attributes. As each point is examined individually, no knowledge of the starting point
or path is considered.
5.2.2
Changing Distances
The next set of experiments attempted to model how the distances change as a user
approaches his destination. The training data was first separated into sets, each
representing one trip from start to destination point (i.e. going to work begins at
home and ends at the work place).
The starting point that is referenced can be
chosen as the first location that is recorded after a long period when the GPS is
unable to receive a position. This indicates the user was inside or out of range of
44
the positioning satellites. Because we were explicitly training the system, the start
position was noted by the user as indicated by when they recorded the action they
were performing.
The first experiment (ADist(start)) took the starting position as a reference point
and then calculated the minimum distances from this point to each of the 10 area
types. For every other point in that trip, another set of minimum distances was
taken, and the difference between this set and the reference set was calculated. For
the discretized tests, a positive difference indicated that the subject was getting closer
to an area, while a negative distance denoted the user was getting farther away. This
metric is not entirely correct; for example, if the closest park to the start is now not
the same park being referenced by the new location, the user may appear to be moving
closer to a park area when this is not the case. As an alternate way of approaching
this problem, we took the same difference but calculated the change in distance
based on the most recently recorded point instead of the start point (ADist(prev)).
Unfortunately, this method also has its oversights in that for two points taken very
close in time, the change in distance may not vary by much and will be more sensitive
to position error.
5.2.3
Trajectory
To take into consideration the path someone would take when traveling from source to
destination, we applied a similar difference in distance for each place type but this time
only account for those locations that lie ahead of the user's position. In other words,
we look at the trajectory the user is taking in reference to some point about 30 seconds
prior and then prune out any areas that are behind this point (Traj (w/pruning)). To
do this, areas that completely lie behind the line perpendicular to a ray connecting
the reference point and the current position that intersects the reference point are
elided. The subset of areas considered was continuously pruned with each data point
such that any area that ever lay behind the user was not considered. Unfortunately,
noise in the data could eliminate areas that would be important for consideration.
As a result, a second trial was run that did not consider any past events (Traj(no
45
Naive Bayes
TAN
TAN Multinet
Naive Bayes*
IBL*
C4.5*
Decision Table*
1-Point
71.73
.026
80.27 ± .064
82.63 ± .085
70.47 ± .020
86.49 ± .037
85.26 ± .037
83.14 A .208
ADist(Start)
75.79
85.27
86.79
75.14
88.83
87.54
82.71
+
±
±
±
±
i
±
.036
.018
.027
.050
.038
.055
.021
ADist(prev)
84.16 + .032
92.60 ± .029
94.43 ± .020
83.13 ± .026
97.89 ± .041
96.19 ± .017
93.22 ± .085
Traj(w/pruning)
68.85 A: .034
88.64 ± .026
89.88 ± .049
68.13 ± .007
92.91 ± .034
91.97 ± .060
86.14 i .099
Traj(no
74.18
85.76
87.32
73.41
89.54
89.32
82.71
pruning)
i .024
± .032
± .026
± .017
± .032
± .023
± .102
Table 5.1: Summary of classifier results per feature set (mean t variance).
pruning)) - it considers only those areas in front of the user's trajectory but does not
attempt to continuously refine this set.
5.3
Classification Results and Analysis
Experiments were run using both the TAN algorithm (single and multinet), a naive
Bayes implementation that utilized the multinomial distribution, and several other
competitive classifiers found in the Weka Machine Learning Algorithms Toolkit (see
[40]). These include a naive Bayes implementations [17, 28], the decision table classifier [31], instance-based learning (nearest-neighbor) classifier [21, 2], and the C4.5
classifier [36]. Ten 10-fold cross-validations were performed using the Weka Toolkit
to insure that the training set was partitioned equivalently for each algorithm test.
Table 5.1 reflects the mean and variance for these ten trials.
5.3.1
Algorithm Results
The distance-change metric produces the highest classification results for all of the
classifiers, with a predicted accuracy of 92% and 94% for the TAN and multinet
approaches, respectively. For each of the tests, the TAN algorithm performs between
three to six percent behind the top classifier - instance based learning (IBL)- with
the TAN multinet approach consistently performing one to two percent better than
the single TAN implementation.
This results from the attributes having varying
relationships, depending on the class of concern. The TAN classifier outperforms
both naive Bayes approaches by eight to ten percent for all experiments.
46
The single-point distance metric performs surprisingly well, suggesting that merely
the distances from the different area types carries a reasonable amount of relevant
information. Although locations are not uniquely labeled in that each instance represents only a discretized distance from each of the area types, the classifiers are
capably able to identify the action intended by the user. It is important to note that
the data reflects the habits of one particular person, which may explain why this
simple metric works so well. The tendencies of people to follow the same route from
source to destination is probably responsible for these results.
The instance-based learner (IBL) classifier relies on an n-dimensional distance
metric to classify new instances. The distance is calculated between the unknown
instance and each of the training instances, predicting the class of the closest instance
as the class of the unknown [40]. The high performance of the IBL classifier over all
experiments indicates that the data seems to cluster well over the vector space. The
drawback of using the IBL approach in a real-time recognition algorithm arises when
the training set becomes larger as each new instance must be compared to each
instance of the training set during the classification process. A Bayesian or decisiontree approach, such as the C4.5 classifier, may be preferable in that these algorithms
invest more computation in the training process to allow for faster classification of
new instances [40].
The C4.5 classifier also performs well, especially when continuous values are used
for the features. This classifier belongs to the decision tree family of classifiers that
creates a tree down which a path is traversed during the classification process. Generating this tree requires calculating the optimum splits to cluster the training instances
most effectively [40]. The inherent downsides of the algorithm result from the complexity involved in generating the decision tree. Heuristics are necessary to direct the
process to avoid overfitting the data. Dealing with missing values can also be problematic resulting in some guesswork on the direction to follow down the decision tree
[40]. The constraint on the number of dependencies in a TAN network places an explicit limitation on the network structure. This helps to prevent overfitting the data,
provided the training examples are well distributed [12]. In addition, missing values
47
Classifer Performance on Feature
Sets
100
90
%
er
-
-e-TAN
-e-- Multinet
A
NaiveBayes-Multnomia:
NaiveBayes-Weka
DecisionTable
C4.5
--+- 11
-*-
75
*
DecisionTable-Cont
U
C4.5-Cont
70 -/
-+-61-Cont
65
60
55
single point
dist-source
traj-30sec
dist-prev
Feature Set
traj-source
Figure 5-1: A graph depicting the predicted classifier accuracy of each classifier for
each dataset.
are simply omitted from the classification calculations - a simple and deterministic
way of dealing with noisy data [40].
Although our version of the TAN classifier utilizes only discrete attributes, a
version that handles continuous attributes exists and has shown to out-perform the
discrete version [24]. To gauge the effectiveness of using continuous attributes, the
Weka classifiers were tested on continuous versions of the datasets. As seen in Figure
5-1, the C4.5 and decision table classifiers perform very well on these tasks. This
is most likely because the partitions created when forming the structure of both the
decision table/tree are more easily done with a range of values as opposed to two or
three discrete values. For the same reason, this makes it harder for the IBL classifier
because the instances do not cluster as tightly in the continuous case.
As would be expected, misclassified activities are often those that occur in the
same general areas and are performed at around the same time of day (see table 5.2).
As an example, an area with businesses commonly contain restaurants so someone
could easily be going there to run errands or to eat lunch, both during the same time
48
a
154
1
4
3
1
8
0
0
1
9
b
0
134
3
0
5
0
3
0
3
3
c
2
23
529
9
2
9
9
0
6
22
d
0
1
8
96
9
0
0
6
2
1
e
1
5
15
7
429
15
2
5
5
7
f
2
2
0
0
6
414
1
0
3
3
g
0
0
5
0
0
15
371
0
23
17
h
1
0
0
0
8
0
0
145
3
0
i
1
5
6
0
17
7
21
3
177
8
j
20
10
45
9
0
1
27
0
7
313
<-
1
1
1
1
classified as
a= Errands
b= Class
c = Work
d= Grocery Shopping
e= Home
f =Shopping
g =Walk
h =Dinner
i =Visit
j =Lunch
Table 5.2: Confusion matrix for ADiff(prev)
frame. The difference between classifying this action correctly can be very subtle if the
training data simply has more data for one activity over another. However, the system
can distinguish between the two activities if there is some variation. For example, the
system can tell the difference between two actions that use identical paths if another
attribute, such as the time of day the action occurs, differs. This allows the system to
distinguish between going to class and going to work even though these two actions
occur in very similar locales. Thus subtle differences can create different contexts
that the system is able to recognize and handle adequately.
5.3.2
Extensions
Additional features could help to improve the performance of the classifiers. Social
interaction plays a significant role in a person's activities. Many of the replies to our
survey listed items such as talking with a friend and hanging out with someone. Determining who someone is with or if they're alone can provide evidence for or against
the possibility of social activities. More temporal information may also provide helpful cues to improve the predicted recognition rate. For example, knowing the last
time the user went to the grocery store and the relative frequency of this event can
help predict the next time the user will go. If the system kept a record of the frequency of these events, a feature - perhaps the number of days since the action was
last witnessed - could be used. Because of the robustness of the TAN algorithm to
49
deal with missing values, even if the system does not know this information in some
circumstances, it should not pose much of a problem.
Data from the PDA could also be used to create additional features that would
be of use. Assuming that a separate clustering algorithm was run on the text of the
appointments, if an appointment of some type was scheduled for some time in the
near future from when the action was occurring, it could be incorporated as another
feature.
The strong performance of the continuous classifiers suggests that attention should
be given to a continuous version of the TAN algorithm. Although it is more involved
and complex than the discrete version, most real systems cannot be adequately modeled as sets comprised of completely nominal attributes. A continuous TAN implementation would therefore be useful in gauging its effectiveness on a more realistic
system.
50
Chapter 6
Discussion
6.1
Feasibility of an Activity Recognition System
As a first-pass at designing a reliable activity recognition system, our results are
promising. The performance of the TAN classifier indicates that a system with relatively high accuracy is feasible. In addition, classification is fast for the TAN algorithm
(on the same order as naive Bayes), making a real-time recognition system possible.
Originally, a general system was envisioned that could be trained on one set of
data acquired from multiple people and then used by anyone. Using two sets of data
collected by two different people, the system was trained with one and tested with
the other. As might be expected, the performance of the system varies, predicting
the activities correctly when the same paths are used by both users but incorrectly
categorizing them when similar paths correlate to two different actions by the two
users. This indicates that despite explicitly excluding route-specific data, once trained
to a particular user's normal routes and habits, the system does not perform well when
used by other people. For example, a path that I normally take as part of a leisurely
walk can also be used by someone else to get home. While there are differences in
parts of the path that are taken, the data is similar enough that the system will often
incorrectly classify the points (see Appendix C.1). As a result, the system works well,
but only if trained and used by people who have similar habits - perhaps students who
live in the same residential area and have similar patterns of behavior (i.e. attending
51
classes, going to the same locations, and eating at the same places). As the habits of
a user are quite regular, it might be interesting to have the system attempt to detect
what it believes to be deviations from normal behavior and then query the user to
allow the system to continuously adapt to changing situations.
In some situations, a general system may be preferable to one that is tailored for a
single person or even a few particular people. In this case, training would ideally only
have to be done once for the system to work for other people. A more general method
of partitioning the data into attributes may be able to accomplish this or perhaps
making use of some outside data source. As suggested before, linguistic databases are
a source of very rich, high-level knowledge that could be used to improve this system
(see Appendix C.2).
A mixing-model could be used to combine the results from
both a location-based TAN system and another system that taps into the linguistic
database. This kind of model would balance both the knowledge and experience that
are the strong points of the two different systems thereby creating a more general
and robust product. Thus, two directions are possible for later study - investigating
making a less user-specific system and also investigating how powerful a single-user
system can become by incorporating more user-specific features.
6.2
6.2.1
Extensions and Improvements
Other Activities
The actions that are recognized were limited since the main data source was outdoor
position data. However, the relative success of recognizing these simpler actions easily
leads to more complex ideas. Most of the activities that are recognized are of the
form going to <place>. Recognizing when someone is leaving from place and the
transition to going to another place would be an interesting next step to pursue. In
this case, the two events overlap, occurring simultaneously at some point in time.
The probabilistic nature of a Bayesian classifier could be put to use by noting what
the top two most likely events are and transitioning from the "leaving" to the "going"
52
action based on their respective probabilities.
Activities where the passing of time is an indicator, such as waiting or running
into a friend would be interesting to investigate. As compared to viewing changing
distances as an indicator of activity, these activities involve no observable movement
which in itself also an indicator. Likewise, if GPS data is not available, it would be
interesting to see if the algorithm could be adjusted to predict activities based on the
last known position and whatever other attributes can be observed (i.e. having lunch
is more likely around noon while on a shopping trip to the mall).
6.2.2
Low-Level Data
A more robust positioning system would be one of the more useful low-level extensions to the recognition system. The widespread use of cellular base stations could
potentially result in a useful and more robust positioning system. With the strength
of cellular signals powerful enough to function well close to and inside buildings, a
unified location system may soon be possible [8]. As previously discussed in Section
3.1.1, GPS is limited by the need to be within line-of-sight of the orbiting satellites.
A positioning system that is functional both inside and outside buildings would allow for a more complete description of a user's activities. Situations where GPS is
currently non-functional include covered vehicles (unless the receiver is under a windshield), subway trains and stations, and inside and close to buildings. The services
that people interact with occur inside these locations. Therefore, the most immediate
benefits from an aware device that one could imagine are automating these services,
making indoor data necessary.
6.2.3
High-Level Knowledge
Utilizing other sources of knowledge can increase the range of activities that are
recognized. Reverse-geocoding would allow the system to take a position and find a
street address from the data. If another look-up procedure could then classify the
address (i.e. a residence, business, restaurant, etc.), a more precise system could
53
be modeled that utilizes this information.
This would be particularly helpful in
ambiguous cases, such as when a user does a series of activities that make it hard to
distinguish the beginning of one and the start of another. If the system could see the
type of location where the user pauses, it may be able to notice particular patterns
of behavior and make it possible to recognize higher-level activities.
6.2.4
Applications
There are several applications that could be developed on top of the activity recognition system as it exists now. Once implemented to work in real-time on a PDA,
the system could be used to provide user-specified information at appropriate times.
Going beyond the location-aware tour guides, users could arrange for their shopping
list to be displayed when it detects they are going to the store or remind them to do
something on their way home from work. Games that change based on the types of
activities being performed could also be designed and even be combined with routines
that promote exercise.
54
Chapter 7
Conclusion
Smaller and more powerful devices are becoming more prevalent, thereby creating a
framework for developing smart applications. The research described here discusses
some of the requirements for activity recognition systems and presents an attempt
at building a system that satisfies these conditions. Location data and map data are
some of the resources used for this project, though there are many sources of higherknowledge that are available and ought to be used if a viable method to integrate
this data is found. The TAN algorithm was chosen for its attractive combination
of simplicity, extensibility, probabilistic properties, and competitive classification accuracy. Findings indicate that the TAN multinet classifier performs relatively well,
though it is slightly outperformed in accuracy by some more complex but sometimes
slower algorithms. Though efforts were made to prevent user-specific knowledge from
explicitly being encoded into the system, findings indicate that a combination of the
types of activities we chose to recognize and the choice of location data resulted in
a system that contains personal biases. Personal habits and preferences for specific
routes from place to place make the system perform well for people with similar
habits but do not make for a reliable general-use system. The results show promise
for building accurate activity recognition systems that will allow for a new generation
of context-aware applications and devices.
55
56
Appendix A
System Overview
The main components of the system can be broken down into a component used to
gather GPS data and another to process the data into a form suitable for analysis
using a classification algorithm.
A.1
Hardware
The hardware component of the system consists of a Compaq iPAQ 3650 PocketPC
running Windows CE. The unit is powered by a 206MHz StrongARM processor with
32 MB of RAM. The device is particularly attractive since it can be outfitted with
PCMCIA and CompactFlash devices when used in conjunction with the appropriate
expansion sleeve. A Dual-Slot PC card sleeve allows for up to two such devices to
be used simultaneously on the iPAQ. For our research, we equipped the iPAQ with a
TeleType CompactFlash GPS receiver and external antenna.
A.2
Software
When possible, all applications were written in Java. The major exception to this
occurred when dealing directly with the iPAQ device. For such occasions, Java Native
Interface (JNI) wrappers were written to allow access to C library calls from the Java
side. The main software components of the system are as follows:
57
"
A library of device specific functions (package ccalls), such as controlling the
volume and scheduling programs to run at specific times. The functions were
written as part of a dynamic link library (dll) in C with JNI wrappers to provide
functionality from the Java side.
" A Java recorder program (package gpsrecorder) receives GPS data gathered
by a C thread (port.c) that interacts with the COM port used by the GPS
receiver. The program is scheduled to run at regular intervals (PeriodicPositionRecorder.java) throughout the day using the functions in the PocketPC's
notify library.
" A Java tracking program (package gpstracker) to display recorded GPS data
and to provide an interface to identify areas (LabeledArea.java) of interest (i.e.
parks, shopping centers, restaurants).
" The algorithmic part of the software (package tan) that consists of a parsing
unit to convert raw GPS data into an attribute vector and a system to train
and classify new instances using naive Bayes and TAN classifying schemes.
" An experience sampling application (package actionrecorder) that runs on the
iPAQ to log actions of the user independent of GPS data.
A.2.1
GPSRecorder
The GPS receiver continuously creates and writes strings to port COM 4 of the iPAQ.
The C thread reads these strings and then uses JNI to call a Java procedure to store
the string in a buffer. Strings are read from the buffer every second and then processed. Four types of GPS strings are output by the TeleType GPS receiver, but only
two of them, GGA and RMC strings, contain longitude/latitude data. Furthermore,
even these strings must be position locked (have enough satellites in view) to contain
accurate/useable information. We therefore reject any non-GGA/RMC strings as
well as any non-locked strings. Data can be read and written to a file continuously
58
using the RecorderInterface class or at specified intervals using the PeriodicPositionRecorder class.
A.2.2
GPSTracker
The tracker program was written to visualize the data recorded from the device.
Functionality was also provided to label the various areas of interest using a simple
GUI. To do this, a LabeledArea class was written to represent each of these areas. Of
particular note is that each LabeledArea can calculate the distance from a specified
latitude/longitude point to the area.
While we could use a straightforward Cartesian distance calculation, the differing
scales between longitude and latitude must be accounted for as well as the fact that a
degree of longitude is shorter as one approaches the poles. In addition, because of the
oblate-spheroid shape of the earth, this calculation is not a reasonable approximation
for distances greater than 20 km. Instead, we make a reasonable assumption that the
earth is spherical and use Haversine's formula for all our distance calculations: 1
Ion 1
Alon
=
1on2 -
Alat
=
lat 2 - lat1
Alat
a =
sin2 2
d
Rearth * C
.
+ cos(lati) * cos(lat 2 ) * sin2
2
c = 2 * arcsin(min(1, Va))
A.2.3
=
___
2
TAN
The TAN classifier described in [23] was implemented in Java for discrete attributes.
Both a stand-alone version as well as an equivalent implementation under the Weka
[40]
framework were implemented. One difference that must be noted is that the
smoothing function used is that described by Cerquides [9]. This function was pre'http://www.census.gov/cgi-bin/geo/gpsfaq?Q5.1
59
parents
0.028775264
has nurs
age
0.027556624
spec_prescrip
0.0061469073
0.02144219
tear_prod-rate
health
0.0034497764
housing
0.002351112
0.005265595
children
atigmatisi
0.003346517
social
0.0020270755
finance
0.0018864607
formi
Figure A-1: The maximum spanning tre(,s with associated IPx values for the contacts
(left) and nursery (right) datasets.
ferred over the one described in the original paper because it always classifies an
instance, while the original sometimes would assign a probability of zero to an instance for all classes. In addition to the standard TAN algorithm, a multinet variant
was also implemented as well as a version of the naive Bayes classifier that made use
of the multinomial distribution. These served as useful baselines for determining the
effectiveness of the algorithm.
Implementation Details
Generating a TAN network from training data is not a very complex process. However,
because counts for all possible pairs of values between attributes must be maintained,
it is hard to visualize the procedure. Training data must first be loaded to determine
these counts, which are then used to build a maximum spanning tree based on the
conditional mutual information measure (as detailed in section 4.3.1). For reference,
Figure A-1 shows possible spanning trees for the Weka provided contacts dataset
60
and the UCI nursery dataset [6].
Both of these datasets contain purely nominal
values. The spanning trees for the graphs are calculated using the maximum-flow
version of Prim's greedy algorithm [3]. Once the structure of the graph is known, the
prior-probabilities are calculated by the multinomial formulas, based on the pair wise
value/attribute counts [9].
For the two reference sets, the TAN algorithm classifies 12147 of the 12960 nursery
instances correctly (when using the training set also as the test set) and all 24 of the
contacts dataset. The multinet implementation is almost identical save for that each
class must create and store its own network that is later used to calculate the priorprobabilities. For the multinet, the nursery dataset classifies 12359 of the training
instances correctly, while the contacts set, while again 100% of the instances in the
contacts set are classified correctly.
A.2.4
Experience Sampling
An application was written to log the activities of a user every forty-five minutes. The
device beeps to alert the user, who then selects an activity or notates what he is doing
(if the choice is not available) in addition to his current location. This information
is then appended to a file on the device. The user has the option to ignore the alert,
although the device will continue its attempts forty-five minutes later.
In the cases where GPS is available, the experience sampling data can be used to
train the system according to the user's personal habits. The most poignant issues
with gathering information in this fashion are that the data may contain noise (i.e.
typos, personal abbreviations, etc.) and that we risk annoying the user if we query
too frequently. However, if samples are taken too far apart then the data that is
collected may not be sufficient to train the system.
Creating a tool that depends on user input can be quite complex given that one
must motivate the user to interact routinely with the device.
Users must have a
reason to carry (and recharge) the device and be rewarded with positive feedback
when they cooperate with the requests for data. Periods of inactivity should result in
the device asking for requests more frequently and more loudly. The user should be
61
able to temporarily turn off the requests with the device automatically rescheduling
the reminders some time later. Additionally, the application must be robust enough
to handle situations when the battery runs low so as not to lose data. The method of
data entry is also important since we do not want to frustrate the user. Unfortunately,
voice recognition is not reliable enough and using the stylus can be tedious. One must
therefore weigh user convenience versus the desired level of detail in the collected data
when designing an interface that allows the user to write as little as possible.
62
Appendix B
Activity and Landmark Data
B.1
Selecting Activities
To determine what actions might be the most useful to detect, an email message
was sent to several mailing lists requesting this input. Specifically, the message asked
people to "send back a list of high-level actions you do in a week." Table B.1 contains
a summary of the 26 responses that were received, roughly categorized, with each
activity followed by the number of responses that included the action.
The constraints of the system immediately made detection of some of these actions
impossible, namely those listed under the headings "Indoor" and "Home Tasks."
The top tasks were considered but selected with university students in mind as they
would most likely be serving as the subjects for the testing process. As a result,
activities such as gardening were not considered, while going to class was included.
The remaining nine activities chosen were noted as being good representatives of
their respective category (going to lunch, going to dinner, grocery shopping, going to
work, going home, visiting someone). Others were chosen because they encompassed
some of the more specific activities mentioned (i.e. running errands, shopping). In
the case of going for a walk, we felt this was a suitable substitution for exercising,
jogging, biking, since our subjects were more likely to be walking around with the
device and because exercising seemed to indicate different things to different people
on the survey (i.e. going to the gym or pool versus taking a jog or a short walk).
63
Indoor
phoning(16)
email(14)
reading (12)
watch tv(11)
sleeping (9)
reading news(7)
organizing schedule/planning(5)
cleaning(4)
dressing(4)
pick up mail(4)
computer related(3)
discussing(3)
feed pet (3)
homework(3)
showering(3)
write paper(3)
contemplate life/daydream(2)
spouse activities(2)
woodshop(2)
brush teeth(1)
chores(1)
computer game(1)
doubting(1)
drafting (1)
family time (1)
fly(1)
help w/hw(1)
hugging/kissing(1)
journal(1)
making models(1)
motivating staff(1)
nap(1)
nurturing family(1)
photography(1)
plotting (1)
practice music(1)
pray(1)
research(1)
resting(1)
sex(1)
solid modeling(1)
studying(1)
take drugs/meds(1)
volunteer(1)
waking up(1)
wedding planning(1)
Xeroxing(1)
Place Specific
exercise(12)
go to work(7)
work(6)
go to a movie(5)
class(4)
get gas(3)
watch sports event(2)
library(2)
church(2)
concert(1)
zoo(1)
hair appointment(1)
museum(1)
park(1)
go to bookstore(1)
car check up(1)
golf( 1)
doctor/medical visit(1)
take kids to daycare(1)
take kids to school(1)
Food Related
eat meal(14)
eat dinner(5)
eat lunch(4)
eat out(3)
eat breakfast(2)
takeout(1)
eat brunch(1)
treat (ice cream)(1)
Shopping
shop(3)
shop for supplies(2)
shop for clothes(1)
shop for gifts(1)
Transportation Related
drive(5)
go home(4)
subway(4)
go for a walk(3)
jogging(3)
walk pet(3)
explore city(3)
bike(2)
carpool(1)
get fresh air(1)
go out(1)
hike(1)
scootering(1)
crossing street(1)
traveling(1)
Social
meeting(5)
visit someone(4)
hanging out with someone(3)
meeting someone(3)
play(2)
play w/kids(2)
drinking(2)
check on neighbors (2)
meet new people(1)
meet someone(1)
party(1)
play w/pets(1)
rent a movie(1)
talk with friends (1)
walk and talk to a friend(1)
bills(1)
Errands
laundry(8)
grocery(7)
errands(2)
drop off dry-cleaning(2)
drop off recycling(2)
outdoor market(1)
post office(1)
banking(1)
get film developed(1)
buy pet food(1)
budgeting(1)
home improvements( 1)
mowing(1)
water plants(1)
Other
club sports(2)
geocache(1)
Home Tasks
cooking(8)
gardening(6)
dishes/kitchen(4)
garbage(2)
Table B.1: Complete activity list with incidence count
64
By categorizing the responses as in Table B.1, we can speculate how to recognize
some of the actions that we did not consider. For those activities related to a place (i.e.
getting gas or picking up dry cleaning), marking the locations where these occur seems
to be the obvious step to take, though it might not necessarily be required. Social
applications are somewhat trickier, requiring some kind of indication that another user
is involved - perhaps through two devices sharing information that the users are in
close proximity to one another for an extended period of time. Transportation related
activities require additional data on how the user moves around (i.e. scootering) or the
location of subway or bus stops. As GPS does not work inside buses or underground,
additional temporal knowledge may be required. For example, recognizing a sequence
of events (i.e. a user went to a subway station, disappeared for a short while, and
then emerged farther away than if he had walked) could be used to infer these more
complex actions.
B.2
Online Resources
There are several online resources available for maps, GIS data, and other related
data.
" Maps. Of the various online map resources, MapsOnUs.com is one of the few
that allows the user to label selected positions with longitude/latitude coordinates. This formed the foundation for the tracking location by providing the
data required for interpolating positions on the map. The US Census Bureau
(tiger.census.gov) also provides some useful mapping resources, most involving
census statistics by area.
" Geocoding.
Geocoding is the process of converting an address to a lati-
tude/longitude position. There are many commercial services that advertise
their geocoding services to the public. Some have free trials - Geocodec technologies (www.geocodec.com) and TeleAtlas (www.teleatlas.com)
65
9
Reverse Geocoding. This process converts coordinates to street addresses.
Less information is available about these services though TeleAtlas (www.teleatlas.com)
claims to have a service available as do many other places, though free demos
are hard to come by.
* Geographic Information Systems (GIS). GIS is a method of viewing different layers of data that are tied through a particular geographic area. Population statistics, topography, and streets are all examples of such data. ESRI
(www.esri.com) provides many free GIS resources including viewers, sample
code, and links to other free GIS resources. Cambridge, MA also has its own
GIS site (gis.ci.cambridge.ma.us) with links to maps and different data sets. To
learn more about GIS in general, visit www.gis.com.
B.3
Labeled Map Areas
Figure B-1 shows the complete list of areas that were used for the trials described in
chapter 5.
66
-
W
{
S
r;
.
t**
[j
IP
J
00
"Y/e
ON
00
-IC
I
67
Figure B-1: The complete map with labels, used for our trials.
c;
Z
..............
- .- .......................
71
Ik
Ire
.. ...................
.....- -
68
Appendix C
Tests
C.1
Training and Testing with Different Users
Using the GPS data collection tool, a set of 1007 position instances was gathered
by a second user. The data was then evaluated using the system trained with data
gathered by the initial user. The system performed quite poorly, correctly predicting
only 261 of the instances (25.82%). In comparison, cross-validation on the data from
the second user accurately categorized 95.53% of the actions correctly.
Figure C.1 shows that the better predicted activities were class, shopping, and
walk which were performed using routes very similar to those taken by the initial
user. The misclassified instances represent activities that were done at different times
a
0
1
4
22
0
8
4
1
0
0
b
0
46
8
0
0
13
0
0
0
1
c
0
0
15
0
0
0
10
1
0
17
d
0
0
0
0
0
0
0
13
0
0
e
0
4
13
29
0
0
71
25
0
0
f
0
0
0
9
0
37
17
5
0
6
g
0
0
60
0
0
0
144
17
0
0
h
0
0
4
11
0
0
0
15
0
0
i
0
24
44
16
0
15
143
22
0
39
j
0
0
39
0
0
22
8
0
0
4
<-
classified as
a = Errands
b = Class
c = Work
d = Grocery Shopping
e= Home
f= Shopping
g= Walk
h= Dinner
i i= Visit
1 j = Lunch
1
1
1
|
Table C. 1: Confusion matrix for data gathered by a second user and evaluated using
a system trained with the ADiff(prev) dataset
69
or using different routes than the initial user. This is indicative of the sensitivity of
the system for a particular user's normal habits.
C.2
Wordnet Trials
A limited amount of work was done to investigate using lexical databases both for
structure determination of TAN networks and calculating prior-probabilities for a
naive Bayes implementation. For both of these tasks, the Lexical Freenet database
was used as a front end to the WordNet database for calculating the "distance"
between words. This can be done by entering two words and selecting the connecting
relationships that are desired. Lexfn will then return the shortest paths it finds that
connect the two words. This was used as a simple metric to determine the strength
of the relationship between two words.
C.2.1
TAN Structure Determination
Lexfn connections were used to create a complete graph where the attributes represented the nodes and the weight on the arcs was equal to the number of links in
the shortest path between the two words. A minimum spanning tree was then calculated from this graph and the resulting network used in place of the one normally
calculated using the conditional mutual information metric. The system was then
trained as normal. Tests with this technique predict a performance of 81.6% on the
ADist(prev) feature set (versus 84.16% with our naive Bayes implementation).
C.2.2
Naive Bayes Training
Using the features from the ADist(prev) feature set, Lexfn was used to determine the
prior-probabilities. For the non-area features (day, time, speed), the path length of
the values of the features and the class value was used to calculate P(CjAj = v). To
invert these probabilities (greater probability for a shorter path length), normalize
them, and allow for multinomial smoothing, the following formula was used, where
70
sum is the total number of path lengths over all values of the attribute:
1 + sum - pathLength(v, C)
(sum * (Count(Ai) - 1)) + Count(Ai)
For the remaining features (change in distance from the areas), the attribute value
(place name) was used to find the distance between it and the class value. This value
was subtracted from a constant, set at 10, and then divided by the same constant to
arrive at the value for "closer" while one minus this value was used for the value of
"farther" for the attribute.
Results from this experiment were low, classifying only 26% of the instances correctly.
Part of the difficulty in applying this technique towards a location-based
system is that lexical knowledge might not be the right source to find relationships
between walking and parks or visits and residential areas. While these make sense to
us as humans, they are not necessarily part of the definition of walking or visiting.
71
C.1
Bibliography
[1] G. Abowd, C. Atkeson, J. Hong, S. Long, R. Kooper, and M. Pinkerton. Cyberguide: A mobile context-aware tour guide. Baltzer/A CM Wireless Network,
3(5):421-423, 1997.
[2] D. Aha. Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms. InternationalJournal of Man-Machine Studies, 36(2):267-287,
1992.
[3]
R. Ahuja, T. Magnanti, and J. Orlin. Network Flows Theory, Algorithms, and
Applications. Prentice Hall, 1993.
[4] A. Asthana, M. Cravatts, and P. Krzyzanowski. An indoor wireless system for
personalized shopping assistance. In Proceedings of IEEE Workshop on Mobile Computing Systems and Applications, pages 69-74. IEEE Computer Society
Press, December 1994.
[5] D. Beeferman. Lexical discovery with an enriched semantic network. In Proceedings of the ACL/COLING Workshop on Applications of Wordnet in Natural
Langugage Processing Systems, pages 358-364, 1998.
[6] C.L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998.
[7] M. Brand, N. Oliver, and A. Pentland.
Coupled hidden Markov models for
complex action recognition. In Proceedings of IEEE CVPR97, 1996.
[8] J. Caffery and G. Stuber. Overview of radiolocation in CDMA cellular systems.
IEEE Communications, 36(4):38-45, April 1998.
73
[9] J. Cerquides. Applying general Bayesian techniques to improve TAN induction.
In Knowledge Discovery and Data Mining, pages 292-296, 1999.
[10] W. Chan. Project voyager: Building an Internet presence for people, places, and
things. Masters Thesis, Massachusetts Institute of Technology, May 2001.
[11] G. Chen and D. Kotz. A survey of context-aware mobile computing research.
Technical Report TR2000-381, Dept. of Computer Science, Dartmouth College,
November 2000.
[12] J. Cheng and R. Greiner. Learning Bayesian belief network classifiers. In Proceedings of the Fourteenth Canadian Conference on Artificial Intelligence, 2001.
[13] K. Cheverst, N. Davies, K. Mitchell, A. Friday, and C. Efstratiou. Developing
a context-aware electronic tourist guide: some issues and experiences. In CHI,
pages 17-24, 2000.
[14] C.K. Chow and C.N. Liu. Approximating discrete probability distributions with
dependence trees. IEEE Trans. on Infomation Theory, 14:462-467, 1968.
[15] P.
Dana.
Global
positioning
system
overview.
http://www. colorado.edu/geography/gcraft/notes/gps/gps.html.
[16] A. Dey, G. Abowd, and D. Salber. A context-based infrastructure for smart
environments. In Proceedings of the First International Workshop on Managing
Interactions in Smart Environments, pages 114-128, 1999.
[17] R. Duda and P Hart. Pattern classification and scene analysis. John Wiley,
1973.
[18] P. Enge and P. Misra.
Scanning the issue/technology. In Proceedings of the
IEEE, volume 87, January 1999.
[19] I.A. Essa. Computers seeing people. Al Magazine, 20(2):69-82, 1999.
[20] C. Fellbaum. WordNet: An electronic lexical database. The MIT Press, 1998.
74
[21] E. Fix and J Hodges Jr. Discriminatory analysis; non-parametric discrimination:
Consistency properties. Technical Report 21-49-004(4), USAF School of Aviation
Medicine, Randolph Field, Texas, November 1951.
[22] J.H. Friedman. On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data
Mining and Knowledge Discovery, 1:55-77, 1997.
[23] N. Friedman, D. Geiger, M. Goldszmidt, G. Provan, P. Langley, and P. Smyth.
Bayesian network classifiers. Machine Learning, 29:131, 1997.
[24] N. Friedman, M. Goldszmidt, and T. Lee. Bayesian network classification with
continuous attributes: getting the best of both discretization and parametric
fitting. In Proc. 15th InternationalConf. on Machine Learning, pages 179-187.
Morgan Kaufmann, San Francisco, CA, 1998.
[25] I. Getting.
Perspective/Navigation - The Global Positioning System. IEEE
Spectrum, 30(12):36-38, 43-47, December 1993.
[26] T. Huang, D. Koller, J. Malik, G. Ogasawara, B. Raio, S. Russell, and J. Weber.
Automatic symbolic traffic scene analysis using belief networks. In Proc. Nat.
Conf. on Artificial Intelligence, pages 966-972. AAAI Press, 1994.
[27] F.V. Jensen. An Introduction to Bayesian Networks. University College London
Press, 1996.
[28] P. Langley and S. Sage. Induction of selective Bayesian classifiers. In Proc. Tenth
Conference on Uncertainty in Artificial Intelligence, Seattle, WA, pages 399-406,
1994.
[29] A. Madabhushi and J.K. Aggarwal. A Bayesian approach to human activity
recognition. In Proceedingsof the Second IEEE Workshop on Visual Surveillance,
1998.
[30] A. Mccallum and K. Nigam. A comparison of event models for naive Bayes text
classification. In AAAI-98 Workshop on Learningfor Text Categorization,1998.
75
[31] J. Metzner and B. Barnes. Decision Table Languages and Systems. Academic
Press, 1977.
[32] C. Narayanaswami and M. T. Raghunath. Application design for a smart watch
with a high resolution display. In Proceedings of the Fourth InternationalSymposium on Wearable Computers (ISWC'00), 2000.
[33] N. Oliver and A. Pentland. Graphical models for driver behavior recognition in a
SmartCar. In Proceedings of IEEE Intl. Conference on Intelligent Vehicles 2000,
Detroit Michigan, October 2000.
[34] Nuria M. Oliver, Barbara Rosario, and Alex Pentland. A Bayesian computer
vision system for modeling human interactions. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 22(8):831-843, 2000.
[35] J. Pearl. ProbabilisticReasoning in Intelligent Systems: Networks of Plausible
Inference. Morgan Kaufmann Publishers, San Mateo, CA, 1988.
[36] J. Quinlan. C4.5: Programsfor machine learning. Morgan Kaufmann, 1993.
[37] N. Schroder. Reality check: Mobile and wireless software in 2002. Technical
Report AV-15-3847, Gartner Research, January 2002.
[38] J.S. Stevenson and R. Topp. Effects of moderate and low intensity long-term
exercise by older adults. Res Nurs Health, 13:209-218, 1990.
[39] P.H. Winston. Artificial Intelligence, Third Edition. Addison-Wesley Publishing
Company, Reading, Massachusetts, 1993.
[40] I. Witten and E. Frank. Data Mining: PracticalMachine Learning Tools and
Techniques with Java Implementations. Morgan Kaufmann, October 1999.
[41] A. Woodruff, Aoki, A. Hurst, and M. Szymanski. Electronic guidebooks and
visitor attention. In ICHIM (1), pages 437-454, 2001.
76
[42] C. Wren, A. Azarbayejani, T. Darrell, and A. Pentland. Pfinder: Real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 19(7):780-785, 1997.
[43] N. Young, R. Guggenheim, and R. Moore. The making of Majestic.
Game
Developer Magazine, April 2001.
[44] H. Zhang and C.X. Ling. An improved learning algorithm for augmented naive
Bayes. In Pacific-Asia Conference on Knowledge Discovery and Data Mining,
pages 581-586, 2001.
77