Real-time Visualization of Abstract Relationships

Real-time Visualization of Abstract Relationships
Between Devices
by
Arjun Ramegowda Narayanswamy
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Computer Science and Engineering
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2003
© Arjun Ramegowda Narayanswamy, MMIII. All rights reserved.
The author hereby grants to MIT permission to reproduce and
distribute publicly paper and electronic copies of this thesis document
in whole or in part.
MASSACHUSETTS INSTITUTE
OFTECHNOLOGY
JUL 3 0 2003
LIBRARIES
.............
A uthor .......
Departme6rElectrical Engineering and Computer Science
June 20, 2003
C ertified by ........
.................
Lawrence Rudolph
Principal Scientist
Thesis-Supervisor
Accepted by....
..
Arthur C. Smith
Chairman, Department Committee on Graduate Students
BARKER
2
Real-time Visualization of Abstract Relationships Between
Devices
by
Arjun Ramegowda Narayanswamy
Submitted to the Department of Electrical Engineering and Computer Science
on June 20, 2003, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Computer Science and Engineering
Abstract
In the near future, the living space of the user will be populated by many intercommunicating devices. We are motivated by the high-level vision of exposing these
relationships to the user so that they may be inspected and modified in real-time. We
have identified three sub-problems of this vision that we tackle in this thesis. These
are (1) the design of a directional device identification system, (2) the design and
implementations of a rest-detection system and an (3) evaluation of the tilt-meters
of the Mercury backpack. We have established five criteria for evaluating directional
identification systems. We propose a new tag system based on the principles of directional query, broadcast response and photo-triggered power tags that meets these
criteria. We have also designed and implemented a more robust rest-detection system by augmenting traditional tilt-meter readings with video data. The system is
implemented as a library and an API in Linux to facilitates easy integration into
applications. A sample application of rest-detection to Cricket beacon estimates is
presented. And finally, we have concluded our empirical analysis of the Mercury
backpack tilt-meters with the discovery of a set of anomalies that can direct further
development and debugging of these devices.
Thesis Supervisor: Lawrence Rudolph
Title: Principal Scientist
3
4
Acknowledgments
I would like to thank Larry Rudolph - for support and for freedom. Larry has been
kind enough to stand back this year and given me the opportunity to shape and define
my thesis on my own. I could walk in to his office at any time and argue any point
of any argument with him. If this thesis seems at all to make any sense, it's because
Larry has extensively revised and clarified my thinking. All flaws of argument are
mine, all clarity of thought is his.
On the 6th floor of L.C.S., I'd like to thank Kenneth Steele, Eugene Weinstein
and Jason Waterman. Thanks to Eugene and Jason for helping me get up to speed
on iPaqs, ipackages and Familiar Linux issues.
Ken has a been a marvel for his
unflagging patience with flagging hardware. I'm especially grateful to Ken for his
generous support in my tilts at tilt-meters. I've made extensive use of his time and
his patience.
Back in NE43-218, thanks to my lab-buddies for all their support. Todd Amicon, Jonathan 'Mr Numbers' Brunsman, Glenn 'The Weasel' Eguchi, Sonia 'One-eye'
Garg, Hana 'Miss Demeanour' Kim, Kan 'Hammerhead' Liu and Jorge Ortiz. Lab
has never been as fun before, and lab will never be as fun afterwards. I would also
like to thank Shelly Mae O'Gilve and Debbie Wan for their efforts to get me playing
with solder, bread-boards and photoresistors.
To my friends, especially Vikram Aditya Siddharth - thank you for standing by
me at this time. You were there when I worked, and you were there when I didn't
want to work. Thanks to Jennifer Huang, Buddhika Kottahachchi, Judy Y Chen,
Nora Szasz, and my long-suffering room-mate Shastri Sandy. To the BC Crew, the
Bangaloreans, the Indians, the Caribbeans and the Assorted- this has been a great
year of my life and thanks for helping it be that way.
To my family; father, sister and my inspirational mother Aruna. Not a single step
of the long, long road that's lead me here could have been taken without your love,
your support and your blessings. A Master's degree from MIT - who would have
believed it? And how many long-distance phone calls has it been? Thank you so
5
much, once again, for always, always being with me.
Finally, to God, to whom I owe everything including the love and friendship of all
the people above.
6
Contents
1
Introduction
1.1
9
O verview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Directional Device Identification
2.1
11
13
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.1.1
Speech and Pointing . . . . . . . . . . . . . . . . . . . . . . .
13
2.2
Chapter Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.3
Existing Device Identification Technologies . . . . . . . . . . . . . . .
15
2.4
Evaluation of Directional Identification Systems . . . . . . . . . . . .
18
2.4.1
Aesthetic Impact . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.4.2
C ost of Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.4.3
Differentiating between Identification Systems . . . . . . . . .
19
2.5
The "Right" Technology . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.6
A 'Point-and-Shout' System . . . . . . . . . . . . . . . . . . . . . . .
25
2.7
Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . .
26
3 Video-Enhanced Multi-modal Rest Detection
27
3.1
Introduction . . . . . . . . . . . . . . . . . . .
27
3.2
Chapter Overview . . . . . . . . . . . . . . . .
29
3.3
Previous Work
. . . . . . . . . . . . . . . . .
30
3.4
Challenges . . . . . . . . . . . . . . . . . . . .
31
3.4.1
Complexity and Cost . . . . . . . . . .
31
3.4.2
Tilt-Meters
32
. . . . . . . . . . . . . . .
7
3.5
3.6
3.7
3.8
3.9
4
. . . . . . . . . . . . . . .
34
3.5.1
Statistical Measures . . . . . . . . . . . . . . . . .
35
3.5.2
Experim ents . . . . . . . . . . . . . . . . . . . . .
36
3.5.3
O bservations
39
. . . . . . . . . . . . . . . . . . . .
Integra tion of Video and Tilt
. . . . . . . . . . . . . . .
40
3.6.1
Case: AIMAGE low, ATILT high . . . . . . . .
40
3.6.2
Case: AIMAGE high, ATILT low . . . . . . . .
41
3.6.3
A lgorithm . . . . . . . . . . . . . . . . . . . . . .
42
Implerr entation . . . . . . . . . . . . . . . . . . . . . . .
43
3.7.1
Goals . . . . . . . . . . . . . . . . . . . . . . . . .
43
3.7.2
Software Architecture. .....
45
.. .. .. .. .. ...
Results
48
3.8.1
Case Study: Crickets and CricketNav . . . . . . .
50
3.8.2
A Simple Rest-Aware Cricket Distance Estimator
53
Conclu sion and Future Work . . . . . . . . . . . . . . . .
55
Static Analysis of Mercury Backpack Tilt-meter
57
4.1
Motivation.
57
4.2
Tilt-meter Description
. . . . . . . . . . . . . .
. . . . . . . .
58
4.2.1
Composition . . . . . . . . . .
58
4.2.2
Layout . . . . . . . . . . . . .
59
4.2.3
Interpretation . . . . . . . . .
60
4.3
Static Experiments . . . . . . . . . .
61
4.4
Results . . . . . . . . . . . . . . . . .
62
4.4.1
Extreme Measurements . . . .
62
4.4.2
Normality . . . . . . . . . . .
62
4.4.3
Correlation
. . . . . . . . . .
69
4.4.4
Serial autocorrelation . . . . .
69
Inferences . . . . . . . . . . . . . . .
74
4.5
5
A Vide o-Enhanced Approach
Conclusion and Future Work
77
8
Chapter 1
Introduction
The past decade has seen a steady growth in the number of devices owned by a
single user. If the PC revolution of the 80's can be seen as the 'one person, one
computer' revolution, then pervasive computing can be described as the 'one person,
n computers' vision. We envisage a world where the living space of a user is populated
by many devices and are engaged in the development of infrastructure and interface
technologies that enable a user to manage these devices Intuitive interface technologies
that allow a user to manipulate devices in human terms are an important part of
this vision. In particular, we are interested in techniques that allow a naive user to
visualize relationships between devices communicating wirelessly in a room.
Despite that our devices communicate wirelessly, wires remain familiar paradigms
for expressing relationships between devices. All ranges of users- from the naive to
the expert- understand that "wiring" two devices together means that their behavior
is somehow connected. Similarly, seeing which devices are connected to which other
devices helps us understand if a particular device is "wired right". Therefore exposing
abstract relationships between devices as a form of "virtual wire" promises to provide
a simple interface of visualization and manipulation of device configurations. This is
particularly important now because:
" Computing will be used by users with all levels of technical sophistication
" The relationships between devices will change dynamically
9
ure 1-1: The Stargazer Vision: A possible screen-shot
e There will exist no visible physical link between intercommunicating devices
This thesis has been motivated by the open-ended vision of creating a Stargazeran application that can sense and display abstract relationships between devices as
a real-time superposition on a live video feed. The fundamental problem to be faced
is the challenge of looking at the video feed and assigning device IDs to portions of
pixels (in real time). We are interested in wxploring the challenges faced during the
design of such a system.
For instance, a directional device identification scheme is useful for this task.
However, no clear scheme exists for directional device identification; and the system
has only recently become of interest to augmented reality and pervasive computing
researchers.
Secondly, accurate position and heading information is necessary, but
existing location systems exhibit too much error. Mobile devices did not even have a
10
reliable way of detecting when they are at rest- hence the problem of rest-detection is
important in its own right. Finally, applications are being constructed today using the
Mercury backpack'. However the statistical nature of tilt-meter readings from this
backpack is poorly understood. Therefore this thesis aims not to develop a Stargazer
application, but to address the three problems described above.
1.1
Overview
Chapter 2 talks about directional device identification. A number of systems have
been developed that allow the user to indicate interest in a device by pointing at
it. We evaluate existing approaches using the criteria of directional query, ease of
sensing, infrastructural demands, tag power usage and detection range. Currently,
glyph recognition, IR transceivers and contact tags are promising technologies- but
none fully meet our evaluation criteria. We argue for an improved system based on
the principles of directional query, broadcast response and photo-triggered, powered
tags. We call this system a 'point-and-shout' system.
Chapter 3 addresses itself to the problem of rest-detection. Device rest-detection
is important for gesture recognition, power conservation, position triangulation and
heading drift cancellation.
Currently rest is principally detected using static tilt-
meters. However tilt-meters are fundamentally insensitive to horizontal translation.
A better rest-detection algorithm may be developed by augmenting tilt-meter readings
with video data. This paper combines video optical-flow readings with traditional tiltmeter readings to obtain a better estimator of device rest conditions. The algorithm
is implemented as a library and API in Familiar Linux for easy integration into
applications. Stand-alone rest-detection results are presented. A simple application
of rest-detection to Cricket beacon distance estimates is also presented.
Chapter 4 provides the results of an empirical investigation of the performance of
the Mercury backpack tilt-meters. These were initially performed in the lead-up to
1 A prototype hand-held sleeve that gives an off-the-shelf
iPaq access to tilt-meters, a video camera
and two PCMCIA Type II slots. See the Project Mercury page[20] for more information.
11
tackling rest-detection, but we found that these results are of interest in their own
right.
This thesis can either be read in toto, or each chapter can be read as a stand-alone
component. The conclusions of each individual chapter are presented in full at the
end of that chapter. They are also amalgamated and summarized in Chapter 5.
The key contributions of this thesis are:
" The argument for a 'Point-and-Shout' approach to directional device identification
" The development of a multi-modal rest-detection system
" The identification of anomalies in the tilt-meter readings of the Mercury backpack
12
Chapter 2
Directional Device Identification
2.1
Introduction
The past decade has seen a steady growth in the number of devices owned by a
single user. If the PC revolution of the 80's can be seen as the 'one person, one
computer' revolution, then pervasive computing can be described as the 'one person,
n computers' vision. We envisage a world where the living space of a user is populated
by many devices and are engaged in the development of infrastructure and interface
technologies that enable a user to manage and manipulate these devices. Intuitive
interface technologies that allow a user to manipulate devices in human terms are an
important part of this vision.
2.1.1
Speech and Pointing
Consider a fictional user walking into a room full of devices. The user is engaged in
a simple task. He has spotted a device of interest in the upper-far-right corner of the
room, and he wishes to reach out to the device and indicate an interest in it. How
might he go about this task?
Speech, writing and physical gestures are common ways in which users interact
with their environment.
Of these, speech and gestures are particularly useful as
sources of deictic (pointing) information. To indicate his interest, our user is likely
13
to say "The brown device in the upper, far, right corner of this room". Or he is likely
to point at the device and say "That device!".
Speech-recognition has evolved to the point where there exist systems that can understand our user's spoken command. However, there are still two challenges for using
speech as pointing information. These are complexity and restricted expressiveness.
First, speech-based deixis is more complex than pointing[17].
We can see this
already in our fictional example- the structure of the gesture is simpler than the
speech-only command. Understanding speech means that we not only have to understand what words people say, but we also have to understand all the different ways in
which they can express the same idea. This is a complicated task; physical gestures
are simpler and less ambiguous.
Second, speech-based deixis works well when the user already has a way of describing the object of interest. This is not always going to be the case, especially if we
have an unfamiliar object or a novice user. No matter how good a speech system is, it
will always have problems understanding what the user means when he (at a loss for
words) says the word "thingamajig". For pointing, physical gestures are expressive
in a way speech is not.
In pervasive computing, it is not likely that we will restrict ourselves to one form
of input. However, recognizing that speech is not a substitute for physical pointing
is important to our study of existing directional device identification technologies.
2.2
Chapter Overview
In Section 2.3 we will survey the literature to identify device identification technologies
that can be used for directional device identification.
In Section 2.4, we develop
five criteria to evaluate these technologies. We use these criteria to identify "best
practices" for directional identification and pick three promising existing technologies
(Section 2.5). However none of these technologies are fully satisfactory. Therefore,
in Section 2.6, we design a new directional device identification system that we call a
'Point-and-Shout' system.
14
Figure 2-1: Scenario: A user wishes to interact with a particular device in the room.
The user is labeled 'U', the devices in the room are labeled 'D' and the pervasive
computing infrastructure is labeled 'I'.
2.3
Existing Device Identification Technologies
Let us recast our initial example. Our scenario now consists of three elements: a
user, a set of devices, and a pervasive computing infrastructure (Figure 2-1). The
user may or may not carry a pointing device of some sort, the devices in the room
may or may not be tagged with an identifier of some sort. A number of systems have
been developed that either directly address, or can be modified to tackle the problem
of directional device identification.
Video-based natural gesture recognition is an example of a general purpose interface in which neither the user nor the devices are instrumented. Cameras are mounted
in a room to record the physical gestures of the user. The user communicates with the
pervasive environment using natural gestures. To achieve directional device identifi15
cation, the user points to a device to indicate his interest in it. The video feed is then
sent to a processor which computes the location and orientation of the user. This
information can be used in conjunction with a device-location database to discover
the identity of the device being pointed at.
Cricket[18] and Cricket Compass[19] are indoor systems that were developed to
provide mobile devices with location and orientation information. In these systems,
the ceiling space of the user is instrumented with a number of carefully positioned
RF/Ultrasound beacons. The user carries a listener which is a combination of an
RF receiver and an array of ultrasound detectors. By measuring the time of flight
of ultrasound pulses from the beacons, the user is able to compute 3-D location and
orientation. Given this information, one can achieve directional device identification
by looking up a comprehensive device-location database to discover the identity of
what we are pointing at.
Like Cricket and CricketCompass, there are a number of similar approaches that
(1) compute a 3-D pointing vector and then (2) lookup a device-location database to
identify the device being pointed at. The pointing vector may be discovered using
only one system(e.g. infra-red[32] or RF/ultrasound[19]) or by combining location
systems (e.g. infra-red[30], RF-only[2] or RF/ultrasound[31, 18]) with orientation
systems (e.g. inertial gyroscopes, digital compasses, tilt-meters).
Systems that do not use natural gestures or pointing vectors to identify devices
resort to labeling the devices in the room with an ID-tag of some sort. The tags may
be standard bar-codes[14, 25], visual glyphs[22] or radio-frequency tags[7, 23].
Rekimoto et.al.[22] describe CyberCode, a comprehensive device identification
scheme based on 2-D visual glyph recognition. CynerCode was developed to directly
address the problem of deictic device identification. Video is processed using a lowresolution CCD/CMOS camera coupled to a workstation processor. Each tag encodes
24-48 bits, and is constructed so that distance and orientation information may be
extracted by processing the visual image. The tags may either be printed on paper
(and affixed to a device) or they may be digitized (and displayed on a CRT screen).
Rekimoto describes some applications of this technology to augmented reality, direct
16
manipulation and gyroscope-enhanced mobile indoor navigation.
Bluetooth[7] is a cheap, low-power,short-range wireless technology that operates in
the unlicensed 2.4 GHz ISM band. Bluetooth is designed to replace wired connections
between devices. The Bluetooth Service Discovery Protocol[4] describes a method for
identifying devices and services in the vicinity of a Bluetooth device. Every service
available on a device is described by a 128-bit universally unique identifier (UUID).
Nearby services are discovered by making broadcast inquiries and pages. Bluetooth
is an example of an active radio tag system.
Radio Frequency Identification (RF-id) [23] is another wireless device identification
technology. An RF-id system typically consists of a reader (composed of an antenna
and a transceiver) and a set of tags (transponders). RF-id tags are cheap, and can be
engineered to be very small in size. In addition, RF-id tags may either be passive or
active. Passive RF-id tags are unpowered, and draw energy from the antenna of the
reader. The lifetime of this tag is practically unlimited They are typically read-only,
carry about 32-128 bits of information, and can be read from short distances. They
constitute a new class of tagging systems which we will call passive radio tag systems.
Active RF-id tags are coupled with a power-source. They can carry up to 1MB of
read-write data and can be accessed from longer distances by a lower-powered reader.
Want et.al.[29] describe a system for tagging real-world devices with passive RFid tags. Inspired by the DigitalDesk[33], passive RF-id tags are incorporated into
books and documents. The idea of a physical icon is implemented by instrumenting
different sides of a cube with different RF-id tags. The authors found that the tags
could be read 10-15cm away by using an RF-id reader connected to the serial port of
the computer.
The Cooltown initiative[12] tightly couples URLs with devices for detection, configuration and operation. Users can "e-Squirt" URLS at Web-enabled devices using
a short-range wireless link such as infra-red or Bluetooth. The project developed
tiny IR transceivers to transfer URLs using the IRDA protocol. Information about a
museum exhibit can be obtained by directing a browser to the URL a neighbouring
beacon broadcasts. Squirting a URL at printer or projector directs that device to
17
retrieve and output content stored at the transferred URL.
An iButton[11] is a chip housed in a stainless steel enclosure. The logical functions
provided by the chip range from a simple serial number to 64 Kbits and beyond of
non-volatile ROM and EPROM.
Every iButton has a globally
Figure 2-2: An iButton.
unique registration number, which
is etched on the inside of the
case. The information stored in
an iButton can be read by touching it with a special iButton probe.
The energy needed for information exchange may be "stolen"
from the data line of the probe.
Since tag and reader are in actual physical contact, the energy cost of data transfer is small. Information can be read or written to the iButton in a bit-sequential,
half-duplex manner.An iButton is an example of a contact tag.
2.4
Evaluation of Directional Identification Systems
The literature on augmented reality and directional device identification discusses desirable attributes of device identification systems. Chief amongst these are: aesthetic
impact, cost of tags, directional query, ease of sensing, infrastructuraldemands, tag
power usage, security and detection range.
2.4.1
Aesthetic Impact
Device identification researchers often worry about the aesthetic impact of their systems. This is a particular concern for those researchers constructing tagged device
identifications schemes, because they would like their tags to be as unobtrusive as
18
possible. However, deictic device identification systems may find it useful to present
the user with a target to point at, particularly if there are multiple devices clustered
close together or if the user intends to interact with the target device from a distance.
Therefore, we do not necessarily want our tagging systems to be invisible.
Additionally, we can expect that the aesthetics of the target will improve during the process of commercialization of the technology. Radio device tags may be
constructed to be more unobtrusive; visual tags may be written in UV reflective ink
to render them as imperceptible to the human eye as required. Deictic device identification technologies have not evolved to the point where aesthetics are a driving
argument behind system design. Therefore we do not believe that existing technologies should be differentiated on aesthetic impact.
2.4.2
Cost of Tags
All the device identification systems presented in Section 2.3 have been engineered so
that the monetary cost of the device tags is minimal. Natural gesture and pointing
vector systems require no tags, and hence incur no tagging costs. Visual glyphs may
be printed or displayed, and are extremely cheap. RF-id tags (passive and active) are
projected to cost under 10 cents a tag. iButtons currently cost approximately $1.00
apiece. By far the most expensive tagging system we have described are Bluetooth
and Infra-red, and both of these systems are already deployed on millions of devices
world-wide. We do not believe that the cost of tags is a significant impediment to
the widespread deployment of any of these technologies.
2.4.3
Differentiating between Identification Systems
Therefore, the attributes we use for differentiating one identification technology from
another are directional query, ease of sensing, infrastructural demands, tag power
usage and security/detection range.
e Directional Query
19
Directionality is important for device query so that we may accurately specify
the object of interest to us. Natural gesture and pointing vector systems are
inherently directional. Similarly, glyph recognition and IR transceiver systems
use line-of-sight visible light and IR for communication. Contact tags tightly
bound the mobility of the tag-reader and hence may be considered directional.
In contrast, the radio ID broadcasts used in Bluetooth and RF-ID for device
identification are omni-directional.
The length of antenna required to construct a directional antenna for an electromagnetic emission is directly proportional to the wavelength of the emission.
Radio transmissions have a much larger wavelength than visible or infra-red
light. A directional antenna for a 2.4Ghz radio transmission is typically 9-12
inches in length. The focussed output of a directional radio antenna is a potential health-hazard. Therefore radio is inherently unsuited to make short-range
directional queries.
e Ease of Sensing
Identification technologies differ from each other in the amount of computation
that must be done during the actual sensing of a device ID. Natural gesture
systems require the application of gesture detection and pose estimation algorithms every time a user points at a device.
Pointing vector systems require
that beacon readings be processed by a position triangulation algorithm before
a 3-D pointing vector is determined.
A device-location lookup is also neces-
sary. Visual glyph recognition requires that we process the video feed before we
can identify the ID of the device that we are pointing at. All of these systems
require computationally expensive sensing. In a related concern, passive radio
tag readers require a powerful antenna in order to obtain a response from RF-id
tags. The power consumption of this sensing technique is high.
In contrast, active radio tags, infra-red transceiver and contact tag identification
techniques can be recognized simply and without significant computation or
power consumption. Therefore we can use ease of sensing to differentiate device
20
identification technologies from one another.
" Infrastructural Demands
Technologies such as natural gesture recognition and CricketCompass require
the environment of the user to be instrumented significantly. Natural gesture
recognition requires that every room contain one or more video camera networked to an image processor.
CricketCompass requires that a number of
Cricket beacons be carefully positioned throughout the user's building. The
complexity and maintenance cost of these systems is high. In contrast, tagged
systems do not require any infrastructural outlay.
" Tag Power Usage
Unlike tag cost, the different device techniques vary significantly in the power
consumption of the device tags. Untagged schemes have zero tag power consumption. Passive tags such as visual glyphs and passive RF-id also do not
require any power source. iButtons do not use any energy for transmitting information and hence have long lifetimes. Active tags such as IR transceivers
and Bluetooth actively broadcast their IDs. Therefore these systems have a
lifetime that is significantly shorter than the other identification schemes.
" Security/Detection Range
For wireless communication systems security and detection range are interrelated concerns; which is why we choose to address them jointly. The broadcast
response of a radio tag means that nearby systems can snoop on the device ID
transmissions without necessarily having to be in the same room as the user.
These identification systems offer greater detection range but lower security.
In contrast line-of-sight communication systems are less susceptible to sniffing.
The greatest security is provided by contact communication systems - iButton,
for example, offer high security but low mobility communication.
The minimum security level required is ultimately a function of the particular
pervasive application being developed. However, for the purpose of evaluating
21
these different systems, we will assume that we can tolerate the security risk
of a short-range radio broadcast and will instead focus on the mobility and
detection range of the systems.
2.5
The "Right" Technology
Table 2.1: Comparison of existing directional device identification systems.
Technology
Natural Gestures
Pointing Vectors
Glyph Recognition
Active Radio Tags
Passive Radio Tags
IR Transceivers
Contact Tags
Directional
Query
V
/
X
X
V
V
Ease of
Sensing
X
X
X
/
X
/
/
Low Infrastructural
Demands
X
X
/
/
/
/
Low Tag
Power Usage
Detection
Range
V
/
X
/
X
/
/
X
V/
X
Table 2.1 summarizes the performance of various device identification systems
using the criteria in Section 2.4.3.
From this table, we can gather the following
insights for the design of a deictic device identification technology:
* Radio is not the right medium for directional query
Radio tags, both passive and active, suffer from the problem of not being directional in query. The omni-directionality of RF-id tags is a benefit for a number
of tracking and inventory systems; however this same feature makes it ill-suited
to the problem of directional device identification. In contrast, light or infrared based systems are line-of-sight. Therefore, light or infra-red is the preferred
physical medium with which to initiate a device query.
* Device IDs can be read cheaply if they are communicated using modulated
electromagnetic signals
22
The computational complexity of the video processing done by natural gesture
systems indicate that the sensing of device IDs via signal modulation is an
attractive approach. Such a system is also capable of holding more than the
36-128 bits provided by CyberCode without running into tag size constraints.
" A complex infrastructural outlay is unnecessary
Natural gesture recognition was developed to provide a generalized humancomputer interface, not to directly address the problem of directional device
identification. Similarly, pointing vector systems were designed to provide in-
door location and orientation systems. Therefore, the design rationale of these
systems includes complex environmental modification. As, a number of successful tagged identification systems have demonstrated, we do not need extensive
infrastructural modification to address the problem of directional device identification.
" Tags that constantly poll the environment are energy-inefficient
Bluetooth and IR transceiver tags are built around the notion of automatic
device discovery. The designers of these systems incorporated polling mechanisms into their systems because they wanted the user to do no more than
come within range of a reader; device identification was meant to be handled
as seamlessly and invisibly as possible. In contrast, directional device identification inherently requires human mediation. Chances are, the device tags and
readers will always be within the same room and hence always within detection
range of each other. However, we only want to initiate identification when the
user points at a device- we would like our identification system to stay dormant
otherwise.
Therefore, device tags that constantly poll the environment are a detrimentthis feature adds no benefit but instead reduces the lifetime of a tag. In the
absence of user query, we would like our tags to conserve energy and be as
passive as possible.
23
* Our system should allow for a large detection range
The physical contact requirement of iButtons restricts the kind of pervasive
applications that can be developed using this technology. Security concerns
permitting, there is a powerful argument for being able to sense IDs at a distance.
Therefore, of the seven technologies described in this chapter, three are particularly promising approaches to directional device identification. These are
1. Glyph Recognition
As demonstrated by CyberCode, this approach is directional, requires no infrastructural outlay, uses tags that consume no power, and can be queried from
a distance. The greatest disadvantage of this techniques is that the user must
carry around a powerful image processor as a pointing device. The cost and
power consumption of such a processor is significant.
2. IR Transceivers
This technology is directional, requires no infrastructural outlay, and transmits
IDs that can be sensed without requiring expensive processing. The downside
of this technology is that tags lose power by polling their environment and
transmitting information. Therefore the lifetime of this system is bound by the
battery life of the tags.
3. Contact Tags
iButtons implement ID transmission though parasitic power, hence these tags
have a long lifetime. Requiring the user to be physically in contact with the
tag means that this approach is the most secure of the ones we describe. However the requirement that one be in physical contact with the tag also severely
restricts the mobility of the user.
Depending on the particular requirements of the situation, one or the other of
these technologies may be a good fit.
24
2.6
A 'Point-and-Shout' System
Table 2.2: Comparison of Point-and-Shout with existing directional device identification systems.
Technology
Glyph Recognition
IR Transceivers
Contact Tags
Point-and-Shout
Directional Ease of
Query
Sensing
/
X
/
V
V
V
Low Infrastructural
Low Tag
Detection
Demands
Power Usage
Range
V
X
V
V
X
/
V
We propose a new system that meets all the criteria raised in Section 2.4.3. Our
system is a tagged system- the tags consist of a battery , a radio transmitter, and
a photo-sensitive switch. The tag reader consists of a radio receiver coupled with a
directional IR transmitter. The key components of this system are (1) Directional
query (2) Broadcast reply and (3) photo-triggered powered tags.
We envisage our reader sending out bursts of directional IR when the user intends
to query a tag. Upon detecting this burst of radiation via a passive photo-sensitive
switch, the tag switches on and broadcasts it's ID via radio for a few seconds. The
broadcast tag is sensed by an RF receiver on the reader. We call this system a
'Point-and-Shout' system.
The point-and-shout system has the advantage of using directionality only when
necessary. Consider that the energy cost of a directional transmission is composed
of two parts- the cost of aligning transmitter with receiver and the energy cost of
generating radiation. During 'point-and-shout' query, the user aligns the reader with
the target- thus the reader only pays the energy cost of generating radiation. During
directional response however, it is the responsibility of the tag to align its transmitter
with the reader's receiver. The cost and complexity of doing this automatically is
very high. Therefore, we abandon the idea of directional response and instead use a
broadcast mechanism for response.
The tag response is intended to be short-range and limited in time. The broadcast
25
medium could be IR or RF. However, we would like to keep query/trigger wavelengths
and response wavelengths distinct and non-overlapping to avoid (1) collision of response with query and (2) triggering of false responses by nearby tags. Therefore we
propose that response be made using radio transmissions.
The point-and-shout system does not require the installation of a complex infrastructure. Responses of the point-and-shout tags are as easy to read as Bluetooth
broadcasts or Cricket beacon broadcasts. The length of the tag response is not subject
to space constraints (like in CyberCode). A reader may be built out of cheap, readily
available components- and it does not require connection to a powerful processor.
Device tags may either be integrated with a device and draw power from it, or they
may be supplemented with a small battery and be stand-alone. Since we anticipate
that device tags will not be queried constantly, a single small battery can power tag
responses for a long time. Additionally, providing a tag with a power source means
that we do not impose a large energy drain on the reader. Therefore, tag responses
can now be read from a distance. This allows for the development of applications not
possible with iButtons.
Table 2.2 compares point-and-shout with glyph recognition, IR transceivers and
contact tags.
2.7
Conclusion and Future Work
In this chapter we have traced the arguments for a point-and-shout system, beginning
with the survey and evaluation of current device identification systems for the task
of directional device identification. We are currently looking to actually design and
construct these tags. The task of developing APIs and applications to manage this
system is also the object of future work in this area.
26
Chapter 3
Video-Enhanced Multi-modal Rest
Detection
3.1
Introduction
The position and orientation of a device in a 3 dimensional world can be modeled
by a 5-tuple < x, y, z, 0, 0 > where < x, y, z > gives the location of an object in 3-D
space and < 0, 0 > gives the orientationof the device. When the device is in 'motion'
in this three-dimensional world, one or more of x, y, z, 0 and 0 changes appreciably 1
with time, otherwise the device is at 'rest'.
We explore different techniques of robust rest-detection and construct a system
that exposes rest-information simply and cheaply to applications. Rest-detection can
be made robust by combining traditional tilt-meters with video-based optical flow
techniques. Our system is useful for applications in the domains of power conversation, context detection, position triangulation and drift cancellation.
Consider a power-constrained device running a deictic (i.e. pointing) application
of some sort. In this case, detecting that the device is at rest may imply that the user
intends to query or use the object at which the device is pointing. If the process of
identifying and querying the pointed-at-object is expensive, then it is good to delay
'The word 'appreciably' has been chosen judiciously. We well see in Section 3.7.1 that the notion
of 'rest' must be application-defined.
27
such operations until we are reasonably sure that the device is at rest.
To continue this point, power conservation is a particularly important challenge for
small mobile devices; it is often the single hardest constraint in application design.
Also, most query/lookup mechanisms involve an implicit exchange of data with a
networked information source- and the first hop of this exchange is made over a
wireless network. Wireless exchange of data is an extremely power-hungry operation,
especially for small resource-constrained mobile computers. We can use motion cues
to reduce our use of power-hungry operations.
Rest-detection is also helpful in identifying several gestures (user contexts). For
example, placing a device on the table or picking it up may have meaning. Fundamentally, context-detection applications are translating physical motion of a device into
high-level user intentions. Therefore, knowing when a device is at rest is important
for these applications.
Not surprisingly, being able to detect rest has applications in the field of location
and heading detection. A standard approach to location estimation is to assume
that the device is at rest, collect distance-estimates from multiple beacons and then
triangulate the position of the device in 3-D space. Popular systems that implement
such triangulation algorithms are Active Bat[31] and Cricket[18]. However, there is
a nagging epistemological concern with this approach. It can be illustrated by the
following conundrum:
The 'Assumption of Rest' Conundrum
1. To estimate your position you need to know you are at rest.
2. To estimate your rest-condition you need to know your position.
In other words, there currently exists no reliable way of knowing when a device is
actually at rest. In current implementations, the user manually signals to the position
triangulation mechanism that she intends to be stationary for the next few seconds.
She does this in order to give the triangulation algorithm time to collect the distance
readings it needs. The algorithm has no way of ensuring that this contract is met,
and therefore cannot decide if the readings it collects are corrupted by user motion.
28
Additionally, the distance estimates collected by the device exhibit error. To
mitigate the overall positioning error, positioning algorithms can either resort to
collecting multiple samples from each beacon, or to collecting individual samples
from multiple beacons. The latter approach involves designing our system with a
greater beacon density; since this is technically and economically challenging, we
would like to avoid this approach if possible. However, the cost of using approaches
that collect multiple readings from each beacon is application latency- we have to
wait longer to collect the needed readings. Since latency determines the end usability
of a positioning algorithm, location support systems today are designed to bound
latency and only then minimize cost.
Continuing, the standard approach to bounding latency is to a priori choose a
triangulation window of a fixed, static size so that worst-case latency is bounded.
However this approach is suboptimal because it ignores the observation that users of
a pervasive devices are at rest often, and for varying periods of time. Using a fixed
window size means that we impose high-level behavioral restrictions on the user. We
also lose access to optimizations that we might have been able to perform had we
chosen a larger triangulation window. Both of these are expensive failings.
A different argument applies in the domain of heading detection. Many heading detection systems fundamentally obtain heading information by integrating some
measure of change in heading. A consequence of this integration is that error is introduced at every instance of integration and the overall estimate of heading drifts with
time. Knowing when the device is stationary may enable us to attempt out-of-band
methods of drift cancellation.
3.2
Chapter Overview
In Section 3.3, we will survey the literature to find previous work that is relevant to
the problem of rest-detection. Section 3.4 will describe the concerns with previous
research that lead us to look for an improved solution. In Section 3.5, we argue for
a multi-modal rest-detection algorithm that incorporates video and tilt data. Sec29
tion 3.6 explores the integration of these inputs in more detail. Section 3.7 describes
the design and implementation our rest-detector, culminating with stand-alone restdetection results presented in Section 3.8. Finally, Section 3.8.1 illustrates the application of this research to a current problem in the domain of position triangulation.
3.3
Previous Work
Three different research communities have previously addressed the problem of restdetection. These are the mobile robotics, object tracking and human-computer interface research communities.
Determining a robot's location and heading in real-time has been called the global
positioning problem. A number of researchers have focussed on using video sensor data to enable a robot to detect its location and orientation. The major techniques in this area are landmark-recognition techniques[27, 13], camera-configuration
techniques[28], model-based approaches[6], and feature-based visual-map building
systems[15]. The last approach is remarkable because it does not require any a priori
map of the world. Chapter 9 of Borenstein et.al.[5] provides an excellent overview of
visual positioning systems in mobile robotics.
In the area of object-tracking, a large number of algorithms and systems have
been developed to accurately track a moving object via video data. For our purposes,
we may divide these systems into on-line or off-line systems. Off-line systems are
not useful to our goal of making rest-detection an application utility. An interesting
real-time, video-based position location systems is Project HiBall[32].
Project Hi-
Ball instrumented the ceiling space of the user with a dense array of LEDs, and
the position of the user was detected via a complex 'ball' that contains six infrared
sensors.
Human-computer interface researchers have long appreciated the importance of
determining when a device is at rest. The problem of detecting the motion and
rotation of the device in 3-D space came to be called one of context detection.
Rekimoto[21
attached tilt-meters to a Palm device.
30
Harrison et.al.[8], Small &
Ishii[26] and Bartlett[3] explored using tilt-meters to navigate. Hinckley et.al.[9] used
tilt-meters coupled with a touch sensor and infrared proximity detector to identify
when a hand-held device is being used as a voice recorder. The applications require
significant reliability from the tilt-meter readings- in one case requiring a very specific
relationship between detected x and y angles, and in another identifying tilt angles
to within ± 3 degrees. Hinckley et.al.[10] also describe the Video-Mouse, a six degree of freedom pointing device that consist of a video-enhanced mouse and a special
mouse-pad with a special grid. This device implements video-only rest and motion
detection by processing the image of the grid received by the mouse-camera. The
TEA project[24] extends the idea of multi-modal context detection both theoretically
and practically. They develop and use a customized hardware board fusing data from
8 sensors (photo-diode, two accelerometers, IR proximity detector, temperature, pressure, CO 2 gas, and sound). They offer an insightful distinction between physical and
logical sensors and use Kohonen maps to automatically detect contexts as regions of
points in 8-dimensional physical sensor data space. Their device was able to use this
approach to distinguish between rest, walking and placement in a briefcase.
3.4
Challenges
To the best of our knowledge, the is the first attempt to directly address the problem
of rest-detection. Previous work in this area has touched upon rest-detection only as
a related issue to a different problem- be it robot positioning, real-time tracking or
gesture recognition. However, the importance of rest-detection to a number of varied
fields suggests that this is a problem that merits study in its own right. Therefore,
we feel that previous approaches to rest-detection can be improved on the following
grounds: (1) Complexity (2) Cost and (3) Rectifying flaws in Tilt-meter Usage
3.4.1
Complexity and Cost
An approach based on extensive analysis means that the technique cannot be performed in real-time, or locally on a hand-held device. This automatically eliminates a
31
number of mobile robotics mapping-based algorithms. Additionally, while the problem of object-tracking via video remains an area of extensive research, these systems
are clearly overly complicated for our purposes. Our motivation in rest-detection is
not to track objects but merely to decide if we are at rest or not. Also, systems
such as HiBall that require expensive hardware or extensive instrumentation/tagging
of the user's work environment are not attractive simply because of the amount of
environmental modification they require.
3.4.2
Tilt-Meters
While tilt-meters have been powerful in certain areas of context detection, there exist
two fundamental problems with their use for the problem of rest-detection. Tilt-meter
approaches are (i) not sensitive to certain kinds of motion and (ii) not designed to be
be an accurate indicator of rest.
Static tilt-meters work by measuring their attitude relative to the Earth's gravity
field at a given point (gravity vector for short). If the location and orientation of
a device in 3-D space can be described by the 5-tuple < x, y, z, 0,
#
>, then static
tilt-meters are most sensitive to change in / and z. This is because change in these
parameters directly affects the relationship of a tilt-meter to the gravity vector at a
given point. Therefore, there are changes in x, y and 0 that can be made that do not
strongly affect readings from a static tilt-meter. The issue will be explored in greater
detail in Section 3.5.2.
Static tilt-meters today are designed to provide pervasive applications with tilt
information. They are constructed so that they can quickly tell an application it's
general orientation with respect to the gravity vector. The fundamental question
they attempt to answer is "Is the device closer to horizontal or vertical?" rather than
"Has the device orientation changed in the last few seconds?". Increasing the range
of orientations over which quick and accurate comparisons may be made means that
tilt-meters cannot be excessively sensitive to small changes from a recent state. Such
a sensitivity would distort the ability of the tilt-meter to make macro tilt comparisons.
This is a rather subtle issue with the design of tilt-meters wand it will be explored
32
30
....
-
Optical flow readings
Tilt-meter readings
1.5 a noise band for Tilt-meter
20
C
0
-a
10
0
E
~&
0
A
KR
M41
~M
A
N.AK'
C
-10
0
CD
0
-20
-3C
-4C
0
5
10
15
20
25
30
Time (seconds)
Figure 3-1: A 50-second trace of the output from sensors running on a Mercury
backpack. Both sensors are attempting to measure horizontal drift, albeit by different
techniques. The optical flow technique corresponds to a simple optical flow algorithm.
The tilt-meter readings correspond to readings from /dev/backpaq/accel. All readings
taken at 15Hz.
in greater depth in Chapter 4 where a thorough empirical investigation has been
conducted into the static error of the tilt-meters used in the Mercury backpack.
However, the consequences of this design trade-off can be seen in Figure 3.4.2; which
shows the results of a 50-second trace of two sensors running on a Mercury backpack
held in the hand of a user. The tilt-meter readings exhibit a lot of variation while at
rest.
33
3.5
A Video-Enhanced Approach
Combining video data with tilt-meter data is an approach that promises to meet the
constraints raised in Section 3.4. In this section, we will outline the pros and cons
of incorporating video data for rest-detection and then justify the construction of a
combined video and tilt approach.
Mobile-robotics research has already shown that video techniques can be used to
make position and heading estimations. However, much of this work has not been
widely used because it has not been cheap or simple to develop a mobile, sensor-rich,
video-capable platform for pervasive computing. Fortunately, this is no longer the
case- commercial cell-phones produced by Samsung and Nokia now come with built-in
video cameras, and video-capable PDAs are becoming widely available. For example,
the research presented in this paper was conducted using a Mercury backpack[20] at
the M.I.T. Lab for Computer Science. This device gives a regular off-the-shelf iPaq
access to tilt-meters, video and PCMCIA extension slots. It illustrates how video
data can now be considered an integral part of the sensor universe of a small, mobile
computer.
Video data is high-quality, digital and gravity independent.
It is sensitive to
horizontal translation motions in a way static accelerometers are not. And it offers
the choice of using detection and sensing algorithms that are not handicapped by
prior hardware-based design decisions. All of these are attractive arguments for a
video-based approach.
On the flip side, the use of video images means that a video-based rest detection
algorithm will always be susceptible to background motion. It is not reassuring for
our rest-decisions to be solely dependent on who walks across our camera at any given
time. Additionally, video based techniques are notoriously computationally intensive
- which make them hard to implement on small, mobile, resource-constrained devices.
The arguments for and against a pure video-based rest detections system are
summarized in Table
3.1.
Given this discussion, we can see that a more robust
approach is to combine information from both video and tilt sensors to produce a
34
Table 3.1: Pros and Cons of Video-Based Approaches to Two-Dimensional Rest Detection.
Pros
Cons
1. Sensitive to image change
1. Susceptible to back-ground motion
2. Improvable via software
3. Independent of Earth's gravity field
2. Potentially computationally expensive
better rest-detection system.
3.5.1
Statistical Measures
The video component is based on an Optical Flow algorithm which is simple, computationally inexpensive, provides a good measure of overall image drift, and resilient
to variations in image color, brightness and minor movement.
Consider a given frame P. Let shift(P,h, v) be the same frame except with each
pixel displaced horizontally by h columns and vertically by v rows. Let D be a binary
difference measure that operates across frames then. For any two consecutive frames
P and P_ 1 , we define:
optical-flow = < h,v > where D(Pn_1 , shift(P, h,v)) is minimum.
For the purposes of rest-detection, the vertical-flow component v is more of a
liability than a benefit. The information it provides is easily corrupted by scan lines
seen in standard video feeds. Additionally the information it provides is redundant
with pitch and tilt information obtainable from a tilt-meter. Therefore we choose to
ignore the vertical flow component and instead define AIMAGE where:
AIMAGE = AVERAGE(hi, h2 ...hk) where (hl, h 2 ... hk) is a window of horizontal
flow readings from an optical-flow algorithm.
For the tilt-meter component, we use a window of tilt-readings from a tilt-meter.
Unlike the optical-flow measure, the mean of the window is not informative in this
case because all it tells us is the current tilt of the device. We are not interested in
the current tilt, but in the change in tilt. Therefore we define ATILT, where:
35
ATILT = STD DEV(ri, r2...rk) where (ri, r2...kr) is a window of consecutive tilt
readings from a tilt-meter.
We hope to build a system where we receive new video and tilt-readings at approximately 10Hz. Therefore we arbitrarily choose a window size of 1 second so that our
AIMAGE and ATILT measures are obtained from at least 10 readings. Readings
from the two measures are never compared to each other numerically, hence we have
no need to normalize the two types of readings.
To guide the design of a video-based rest detector, we performed three experiments
with the hand held device (1) at rest in the palm of a user, (2) translating horizontally
across a room and (3) staying in place but pointing to different objects in the room.
3.5.2
Experiments
1. Hand-held Rest
The device is at rest in the hands of the user. The user holds the device steady,
as if attempting to use it as a pointing device. Over the course of this experiment,
the user made tapping motions on the iPaq, as if attempting to select portions of
the screen or input data. This models the device being used as a pointing device.
2. Horizontal Translation
In this experiment the user carries the device in the palm of his hand and walks
back and forth across a room. Over the course of a 50-second trial, the user walked
a distance of 30 meters. This experiment models the common case when a user is
using the device, but is not stationary.
3. Pitching and Rolling
In this experiment, the device was kept in one place but randomly pointed at
different objects in the room. This models the case when the device exhibits a lot
of motion.
36
60 -x - Pitching and Rolling
Horizontal Translation
....
Hand held Rest
50-
xX
xx
X
Sx
x
40
X
-
10
x
x
x(<
X
CZXXX
20
X
x
-
cc
xx
X
x
In6
20
3.
x.X
X
4
XXK
X
XA
X
x
0100304506
Time (seconds)
Figure 3-2: A 50-second trace of AIMAGE with a sliding window size of 1 second
and a sampling rate of 15Hz. A clear difference can be seen between readings at rest
and in motion.
37
90-x
Pitching and Rolling
Horizontal Translation
-.- Hand held Rest
80x
70x
7 -
0
..
50
1
X$Zx<4*
x
X X
x
--
x
x
x
x
x
XX.'
X'x
xx23456
xc
K
X.x
0Z
x~~
6060-
xx
x
f
Time (seconds)
Figure 3-3: A 50-second trace of the output from ATILT with a sliding window
size of 1 second window and a sampling rate of 15Hz. It is not easy to differentiate
between rest and horizontal translation.
38
Table 3.2: Performance of AIMAGE and ATILT in different rest conditions. The
sliding window size is 1 second, and the sampling frequency is 15Hz.
AIMAGE
Average Reading
hand-held rest
horizontal translation
pitching and rolling
0.19
9.87
30.06
ATILT
_
0.18
8.30
6.86
Average Reading
Standard Deviation
7.57
7.74
52.68
1.38
1.70
9.01
hand-held rest
horizontal translation
pitching and rolling
3.5.3
Standard Deviation
Observations
Figure 3-2 and Figure 3-3 compare the performance of vision and tilt techniques in
each of these situations. From the graphs, we can make the following observations:
I. The AIMAGE technique makes a sharp distinction between cases at rest and
in motion.
The readings for AIMAGE when the device is in motion are almost always
higher than the maximum values attained for AIMAGE when the device is at
hand-held rest. For horizontal translation, AIMAGE exhibits momentary low
readings that are caused by natural moments of rest in the gait of the user.
This suggests that the usage of an initial wait-period of a short duration will be
required to filter out conditions of momentary rest. Table 3.2 provides additional
statistics on these experiments.
II. ATILT cannot detect horizontal translation
From Figure 3-3 and Table 3.2, we see that is difficult to differentiate between
horizontal motion and rest based on tilt-meter readings alone. This means that
tilt-meters are not useful in identifying rest when the user is using the device
and walking around. However, ATILT is effective in robustly detecting gross
changes in device orientation.
39
3.6
Integration of Video and Tilt
Given our preliminary experiments, it is possible for us to define a two-sided range
LOW for readings from both AIMAGE and ATILT that corresponds to readings
when the device is at hand-held rest. Given our definition of LOW, we can define a
one-sided range called HIGH (more accurately, NOT-LOW) that contains all values
greater than the LOW range. Therefore, at all times our algorithm is given video
and tilt readings that are from {low, high} X {low, high} and asked to decide if it is
at instantaneous rest or instantaneous motion. The algorithm believes itself to be at
overall rest after a prolonged period of instantaneous rest.
The cases where both video and tilt data corroborate each other is easy to label.
If both AIMAGE and ATILT readings are in the low range, then it is likely that
we are in a period of instantaneous rest; if both readings are in the high range, then
we are are in a period of instantaneous motion. But what happens, when video and
tilt readings do not agree with each other?
3.6.1
Case: AIMAGE low, ATILT high
Consider the case where AIMAGE is low, but ATILT is high. Initially, we imagined
that such a situation could be produced by 'hoodwinking' the video by presenting
it with a dark or featureless surface, However, we found that the design of the optical flow algorithm is such that the output of AIMAGE in this case is not zero but
random. Intuitively this means that the algorithm cannot reliably pick one value of
flow over another, and hence becomes sensitive to very minute differences in image
quality. Therefore, switching off the lights or blocking the camera lens does not produce low values for AIMAGE. Similarly, we found that even apparently featureless
surfaces such as blank white-boards contained enough variation in pixel values for the
algorithm to detect motion.
In fact, the only way we were able to fool AIMAGE into making a false positive
error for rest detection was to affix a small highly-detailed image (in our case, a circuit
board) a short distance in front of the camera lens. The effort required to fool the
40
camera in this way suggests that this situation is unlikely to occur often. However,
if we did face the case AIMAGE low, ATILT high, are we to trust the video data
and assume rest, or are we to trust the tilt data and assume motion?
We observe that, unlike a camera, it is hard to artificially force the tilt-meter to
generate high readings without actually moving the device. While the a low reading
from ATILT may be ambiguous, a high reading from ATILT generally means that
the device is undergoing acceleration (and hence motion). Hence the correct decision
in this situation is to trust the tilt-meters and assume that we are in instantaneous
motion.
3.6.2
Case: AIMAGE high, ATILT low
What happens when AIMAGE readings are high, but ATILT readings are low?
Are we at rest, or are we in motion?
Horizontal translation is an example of an instantaneous motion situation where
ATILT readings are low but AIMAGE readings are high. Since horizontal motion
models the case where the user is walking and using the device, we cannot assume
that this is an unlikely situation. Therefore this set of readings may mean that we
are in motion.
However, AIMAGE readings may also go high due to momentary occlusions by
parts of the user's body; or by large moving portions of the background. One may
attempt to filter out background motion by using more complicated techniques than
optical-flow, but at the most fundamental level this is a decision that is undecidable
visually. Video is also sensitive to background noise. Therefore this set of readings
may also mean that we are at rest.
So unlike the AIMAGE low, ATILT high case, we cannot argue for trusting one
device over another. How do we resolve this situation? Essentially, our system can
either decide that the device is at rest (and run the risk of making a false positive
error) or decide that the device is in motion (and thereby risk a false negative error).
Looking at the examples given in Section 3.1, it can be seen that the cost associated
with a false positive error is greater. Erroneously deciding that the device is at rest
41
Table 3.3: Integration of Video and Tilt Readings. Given tilt (ATILT) and video
(AIMAGE) readings, this table tells us whether our algorithm treats the situation
as instantaneous rest or instantaneous motion. The algorithm transitions into overall
rest if and only if it is in instantaneous rest for an application-specified period of time.
AIMAGE
low
low
high
REST
MOTION
(Conservative rest detection)
MOTION
ATILT
high
MOTION
(Video is being 'hoodwinked')
could lead to unnecessary usage of expensive resources, wildly inaccurate position
estimations, and failed calibrations.
In contrast, assuming that we are in motion
means that higher level applications are forced to make looser assumptions about
their context. Hence it is preferable to err on the side of detecting too much motion
rather than detecting too much rest. Hence, the system in this situation should decide
that it is in instantaneous motion. 2
The complete integration of video AIMAGE and tilt-meter ATILT readings is
therefore as given in Table 3.3.
The text in bold tells us whether our algorithm
interpreted the event as instantaneous rest or instantaneous motion.
3.6.3
Algorithm
Let < m, t > be instantaneous readings of the AIMAGE
and ATILT metrics.
Given these, we can at every instant of time determine if our instantaneous-stateis
instantaneous-rest or instantaneous-motion. Our rest-detection algorithm is a simple
state machine that exists on one of three states - MOTION, WAIT and REST. We
assume that the algorithm is initially in the MOTION state, but transitions to the
REST state after being in instantaneous rest for an application-specified period of
time. The algorithm exits the REST state when it encounters an instance of instan2
Our system could also potentially ignore this conflict between video and tilt and do nothing at
all. However the argument for conservative rest detection means that we are safer believing ourselves
to be in motion.
42
taneous motion. We maintain a state variable called rest-counter that counts the
number of at-rest intervals the device has experienced since rebooting. Applications
that poll this system can use the current state of the rest-algorithm and the value of
the rest-counter to decide if they have been at rest since the last poll.
The algorithm is outlined in Figure 3-4.
Figure 3-4: Algorithm
rest-counter = 0
SWITCH instantaneous-state
CASE instantaneous-motion:
state = MOTION
CASE instantaneous-rest:
SWITCH state
CASE MOTION:
state = WAIT
CASE WAIT:
IF waited< init. wait period
state = WAIT
ELSE
state = REST
rest-counter++
ENDIF
CASE REST:
state = REST
END
END
3.7
Implementation
We intend for the implementation of our rest-detection system to meet the following
high-level goals:
3.7.1
Goals
* Application Programmability
The notion of 'rest' varies from application to application. An energy conservation application would consider itself at rest when unused for a few seconds.
43
A screensaver application may consider itself at rest only if unused for a few
minutes. Hence it is important that applications be important to define what
"rest' means to them.
o Ease of Development
Our aim is that the notions of rest and rest-detection be simply and clearly
defined, and that applications be written easily against an rest-detection mechanism. Hence it is important the rest-detection mechanism be simply and easily
integrated into applications.
o Local Code
On the hand-held computers of today power, CPU and memory are scarce
resources. Many visual algorithms are resource hungry. Extensive computation
may also not be shipped to a remote server because wireless network access is
a power-hungry operation. Hence it is important that our visual rest-detection
system be small, local and computationally inexpensive.
o Push/Pull Mechanism
In certain applications, it may suffice to query a system variable to determine if
the device is at rest. Other applications may require intimation when a device
is at rest. Therefore our rest-detection mechanism should support both polling
and call-back mechanisms.
The rest-detection system is written in C, with the video devices being accessed
via a Video4Linux API. The test hardware platform is an iPaq 3650 running Familiar
Linux over an ARM processor. The system defines a single object file with a simple
header file rest.h that defines the API. The API consists of a struct called a RestDetector and simple functions that allow an application to control sampling rates, define
rest-detection parameters and utilise the callbacks and thread-safe polling mechanisms provided by the system. The principal component of the system is a daemon
that reads video and tilt data from the underlying Linux file-system and executes the
44
algorithms discussed earlier in this section. We have been able to run this daemon at
speeds of upto 15Hz on our hardware platform.
Therefore a simple rest-aware application may be written as simply as in Table 3.4.
3.7.2
Software Architecture
The software components of the system are:
Device Layer
These comprises video and tilt-meters. The tilt-device is built from an Analog Devices
ADXL202E two-axis static accelerometer. The video device is composed of a CCD
device controlled by an FPGA. These devices are part of the Mercury backpack and
are presented in Linux as files in the /dev/ filesystem. They provide device-level file
I/O capabilities.
Data Presentation Layer
For the purposes of developing and experimenting with different rest-detection mechanisms, we found it useful to abstract over the low-level Linux file descriptors with
a more meaningful API. The API exposes and maintains useful sensor information.
The API provides higher-level layers the ability to change the frequency with which
the low-level devices are sampled. As seen in Chapter 4, this changes the fundamental
statistical nature of the readings from the devices.
For video, the API defines callbacks which can be used as hooks for higher-level
layers. A higher-layer application may thus be notified every time a new frame is
received. To control resource usage the higher layer can also control the frequency
with which the device is polled, and the maximum frequency at which callbacks are
made.
45
Table 3.4: A Sample Rest-Aware Application
#include<stdio.h>
#include''rest.h''
struct RestDetector *rd = &(struct RestDetector);
int main(int argc, char** argv) {
The application decides on the sampling
frequency and initial wait period
for the rest-detector
restinit(rd, 15, 1.0) ;
//
15Hz sampling, 1.0 second wait
The application specifies the function
that will be called at changes in state
//
//
restsetcallback(rd, callbackfn);
reststart(rd)
callbackfn will be called at rest
start the system
The rest detector may also be polled
if (rd->RESTSTATE == REST) {
}
reststop(rd)
//
}
46
stop the system
re 3-5: Software Architecture of the Rest-Detection System
47
Statistics Package
In the absence of a small, light statistics library that was compiled for the ARM
processor, we wrote a pared-down statistics package that implements a sliding-window
data structure (akin to a ring-buffer). The package provides utility functions that may
be used to determine the mean, median and standard deviation of the sliding window.
The size of the sliding window may be dynamically adjusted. The statistics package
has proves useful both for video and tilt data.
Rest-Detector
This is the layer that implements the algorithms discussed earlier in this section. At
its lower boundary it is a consumer of video and tilt information from the device presentation layer. At its higher boundary it presents a simple interface to applications
that allows for control of sampling rates, definition of rest-detection parameters and
provision of callbacks as well as thread-safe polling mechanisms for rest-detection.
3.8
Results
Figure 3-6 and Figure 3-7 model the performance of our rest-detection algorithm on
the sensor traces introduced in Section 3.5.2. The lower graph plots the state of the
rest-detection algorithm, which may be 0 - MOTION, 1 - WAIT or 2 - REST.
The performance of the rest-detection algorithm when the device is at rest in the
palm of the user is stellar (Figure 3-6). Both AIMAGE and ATILT readings are
always in the low range. Hence the algorithm waits for the inital period of 1 second,
and then transitions into REST state. The algorithm is resiliant to jitter caused by
the user's hands and the tapping motion of the stylus.
When looking at the performance of the algorithm in the case of horizontal translation (Figure 3-7), it becomes clear that the size of the initial wait-period is important.
There are several instances over the course of the user's walk when he is momentarily
at rest- these show up in our algorithm state trace as periods when the algorithm is
in the WAIT state. The application controls which of these periods of instantaneous
48
14A TILT
-
121-
A IMAGE
10
8
CZ,
i5
6
4
2
0
5
10
15
20
25
30
Time (seconds)
35
40
-
2
45
Algorithm state1 -
1-
05
10
15
20
25
30
35
40
45
Time (seconds)
Figure 3-6: A 50 second trace of the performance of the multi-modal algorithm when
the device is held in the palm of a user. The inital wait period is set at 1 second.
The sliding window size and the sampling frequency for AIMAGE and ATILT are
set at 1 second and 15Hz respectively.
rest count as periods of overall algorithm REST. In this experiment we see that a
one-second wait period means that the algorithm never settles into REST state.
The performance of the multi-modal algorithm when the device is being pointed
at different objects in the room (Figure 3-8) is also stellar. Both AIMAGE and
ATILT readings are high. Consequently the algorithm does not experience even a
single moment of instantaneous rest, and the overall state of the algorithm remains
firmly pegged in the MOTION state.
49
40A TILT
A IMAGE
30_20-
co
0-10
-20-30
5
10
I
15
20
I
I
25
30
Time (seconds)
35
40
45
I
|-Algorithm state
2 -
Fl
1-
5
10
I
15
20
25
30
Time (seconds)
35
FM
40
45
Figure 3-7: A 50 second trace of the performance of the multi-modal algorithm when
the device is held in the palm of a user walking about a room. The user walked a
distance of 30m over the course of this trace. The inital wait period is set at 1 second.
The sliding window size and the sampling frequency for AIMAGE and ATILT are
set at 1 second and 15Hz respectively.
3.8.1
Case Study: Crickets and CricketNav
In Section 3.1, we said that a rest-detection system will allow a location-support system to (1) validate that all distance measurements are taken at rest and (2) dynamically change the triangulation window to exploit optimizations enabled by unusually
long periods of rest. We will demonstrate both of these advantages using the Cricket
Location Support System as a case-study. Our hardware platform is a sensor-rich
Mercury backpack connected to a Cricket listener.
Cricket[18] is an indoor location system for pervasive computing environments that
uses a combination of RF and ultrasound technologies to provide a location-support
50
100A TILT
A IMAGE
80 60
cr.
CZ200
-20-40
5
10
15
20
25
30
35
40
45
50
Time (seconds)
2 -
--
Algorithm state
1-
0
5
10
15
20
25
30
Time (seconds)
35
40
45
50
Figure 3-8: A 50 second trace of the performance of the multi-modal algorithm when
the device is being pointed around the room (extreme motion). The inital wait period
is set at 1 second. The sliding window size and the sampling frequency for AIMAGE
and ATILT are set at 1 second and 15Hz respectively. The algorithm never deviates
from the MOTION state.
service to users and applications. The system comprises of wall-and-ceiling mounted
beacons and mobile passive listeners. The beacons publish information on an RF
signal. With each RF advertisement, the beacon transmits a concurrent ultrasonic
pulse. The listeners attached to devices listen for RF signals, and upon receipt of
the first few bits, listen for the corresponding ultrasonic pulse. When this pulse
arrives, they obtain a time-of-flight estimate for that ultrasound pulse. Multiplying
the time-of-flight estimate by the speed of sound gives us an distance estimate for the
broadcasting beacon. The actual < x, y, z > position of the listener can be estimated
by triangulating distance estimates from multiple beacons.
51
In "Design and Implementation of an Indoor Mobile Navigation System" [16], Miu
conducted a comprehensive investigation into the Cricket beacon distance estimates
and the accuracy of 2-D position triangulation using Crickets. He defined the sample
frequency (k) of a position estimate to be the number of distance estimates collected
from each beacon. He also defined the beacon multiplicity (m) of a position estimate
to be the number of distinct beacons from which distance estimates are collected.
Over the course of their investigation, he found that:
1. A MODE distance estimate produces more accurate readings than a MEAN estimate
2. MODE does not being to take effect until k > 5
3. A least-squares method of position triangulation effectively reduces error, especially when k is large
4. For k < 5 and m > 5 it is better to assume that the speed of sound is unknown.
Otherwise it is better to assume that the speed of sound is known
5. For k = 1, Cricket is accurate to within 30cm, 95% of the time.
After conducting this investigation into the positioning accuracy of Cricket, the
author then used the Cricket positioning information to construct an indoor mobile
navigation system called CricketNav. While designing this system, author ran up
against a tough application constraint - latency. He found that in order for their
navigation application to respond promptly to the user, the triangulation window
had to be bounded to be less than 5 seconds. Since increasing k increases the latency
of the application, this meant that the CricketNav application could only use position
estimates with k =1 and n = 3, 4, 5 etc. An entire body of optimizations [bullets
1 - 4 of the 5 bullets presented above] was rendered inaccessible to the CricketNav
application.
52
3.8.2
A Simple Rest-Aware Cricket Distance Estimator
Using the RestDetector , we constructed a simple application that tracks the distance
to a beacon. The application is rest-aware in the sense that the sample frequency k
increases when the application discovers itself to be at rest. As in CricketNav, the
initial value of k is 1.
Upon receipt of a new reading from the beacon, we check our rest-detector to
ensure that we have been at rest since the receipt of the last reading. If so, we add
the reading to our buffer and increment k by 1. If not, we flush the buffer, add the
new reading and set k = 1.
900
- - Cricket RAW, Rest-blind
....
Rest State (HI == rest)
Cricket MEAN, Rest-aware
800700-.-. - . -.- . -
600-
E
- .- .-.-.
-. -)--.-.-.
500-
C
CO,
-
400
300-
200 -
100
F
0
0
10
20
30
40
50
60
Time (s)
Figure 3-9: A 60-second trace of the distance readings from one Cricket beacon as
the user moved from one desktop to another. The spike in the distance readings as
the user moves is likely due to a reflected ultrasound pulse.
Figure 3-9 plots the distance estimates that we receive for the simple case of the
53
user walking from one desktop to another. The graph labeled 'Cricket RAW, Restblind' are the current distance estimates that CricketNav would use from the beacon.
The graph labeled 'Rest State' gives us the state of the Rest-Detector - high implies
we are at rest, low implies we are in motion.
The graph labeled 'Cricket MEAN, Rest-aware' is the the smoothed distance estimates obtained by dynamically incrementing k at rest. Only the rest-portions of this
graph are shown; when the device is in motion the readings from this technique are
identical to 'Cricket RAW, Rest-blind'. From looking at Figure 3-9, we can make the
following observations:
" We do not have to impose end-user mobility restrictions
We do not require that applications enter into a 'no-motion-for-n-seconds' contract with the end user. Alternatively, if such a contract is necessary, we can
now check to see if the contract was met. This allows our applications to adapt
to user behaviour, and not vice versa.
" Our distance readings may be guaranteed to belong to the same physical location
Having an out-of-band rest detection mechanism means that we have some
fundamental guarantees about the quality of our distance readings. We know
when it makes sense to use statistical estimation mechanisms (when we are at
rest), and when it does not make sense to do so (when we are in motion).
" The latency of the application is not degraded
When the distance estimation algorithm detects itself to have moved, it quickly
drops the size of the readings buffer to the default setting.
Therefore, the
worst case latency of the rest-aware averaging mechanism is no worse than that
obtained at k = 1. Additionally, the theoretical latency of the algorithm when
we are at rest is zero. This is because knowing that we are at rest means that
we do not need to wait for additional readings - we can just re-use the last
computed estimate.
54
* Extra long sequences of readings enable the deployment of more sophisticated
optimizations
As outlined in Section 3.8.1, there are a number of optimizations that we can
perform given a large number of readings from a single beacon. The MODE
distance metric may now be costlessly deployed.
An exponentially-weighted
moving average may be developed to reduce the jitter of the distance readings
during motion. Additional techniques may also be developed and implemented.
3.9
Conclusion and Future Work
There is a need for a system that makes rest-information simply and cheaply available
to higher level pervasive applications. In this chapter we have constructed a restdetection system that combines video data with traditional tilt-meter data for greater
robustness. We have found that this algorithm is an improvement over traditional
tilt-meter based approaches, and we have outlined a potential use for this technology
in the domain of position triangulation.
Future work for this project would involve the development of more complicated
algorithms that we can call upon in cases where video and tilt-data do not agree with
each other. However such algorithms must stay computationally inexpensive to fulfill
the goal of being an application utility. Secondly, the window size for our AIMAGE
and ATILT measures is arbitrarily set at 1 second. This may be too short a window
at low sampling rates.
The 'high' and 'low' mark for each measure may also be
configured dynamically, rather than hardcoded as it is now. We can imagine a short
user configuration routine that sets the values of these parameters dynamically.
Additionally, there is a need for better access to the Linux device file-descriptors.
While the Video4Linux API is well-defined, the API for accessing the tilt-meter is
obscure and platform-specific. The rest library also does not gracefully share the file
descriptors; current applications that use the video or the tilt-meter must be rewritten
to access that data. An implementation that just snoops on the device files without
locking them would be a better implementation.
55
56
Chapter 4
Static Analysis of Mercury
Backpack Tilt-meter
4.1
Motivation
The Mercury backpack[20] developed by Compaq Cambridge Research Laboratories
is one of the popular hardware platforms used for pervasive computing research at
the M.I.T. Laboratory for Computer Science. The backpack is designed for use as a
sleeve for the Compaq iPaq handheld computer. It contains a built in digital camera,
two built in simple tilt meters, one built in compound tilt meter, and two PCMCIA
Type II slots. This device provide a rich platform of functionality with which to
construct and demonstrate pervasive computing applications.
Our primary interest in the Mercury backpack has been to use its video and
tilt-meter devices for the development of the multi-modal rest-detection algorithm
presented in Chapter 3. Over the course of that project, we developed familiarity
with the performance of the Mercury tilt-meters. Since the backpack itself is gaining
popularity as a research platform, the backpack tilt-meters are of interest in themselves.
Therefore, this chapter (1) describes the composition, layout and interpretation of
the Mercury backpack tilt-meters and (2) presents an empirical analysis of the performance of the Mercury backpack tilt-meters at rest. We expect that this information
57
will be of use for further debugging and development of the backpack tilt-meters.
4.2
Tilt-meter Description
Our first task in this empirical analysis is to distinguish between the term accelerometer and the term tilt-meter. We will use the word accelerometer to refer to the
hardware chip that forms the core of the tilt-meter. The term tilt-meter on the other
hand will refer to the amalgamation of hardware accelerometer chips, power circuits,
driers, and APIs that expose tilt-data to pervasive applications on the backpack.
Secondly, we will distinguish between simple and compound tilt-meters. A simple
tilt-meter contains exactly one accelerometer chip. A compound accelerometer is
composed of more than one of these hardware chips amalgamated together, usually
in different physical orientations.
4.2.1
Composition
The tilt-meters in the Mercury backpack all use as their basic hardware component
the Analog Devices ADXL202E Dual Axis Accelerometer[1].
The ADXL202E chip
is typical of a generation of low-cost accelerometers that are making their way into
mobile devices. A Google search for "ADXL202E" shows that the chip is being used
in a wide variety of contexts from college research to commercial products.
The ADXL202E is a low-cost, small, flat, rectangular chip that measures the force
of acceleration from -2 g to +2g along the two axes in the plane of the chip. This
means that the accelerometer is able to detect both static acceleration due to gravity
and dynamic acceleration due to motion. The ADXL202E chip is designed to be
sampled at speeds of 1000Hz or below. For each axis, the output readings may be
offset and scaled by calibration parameters. Calibration factors may be stored in the
EEPROM of the chip or determined at boot-time and saved in dynamic memory.
Further information about the ADXL202E chip is available from the Analog Devices
web-site[1].
58
4.2.2
Layout
Three Accelerometers:
One in camera plane
One perpendicular
One on BApaq POFS
Figure 4-1: A cut-away of the rear of the Mercury backpack that shows the location
of the tilt-meters
The Mercury backpack contains three tilt-meters. The first of these is a simple
tilt-meter located on the backpack PCB. The second is another simple tilt-meter,
mounted in the backpack camera housing. The third tilt-meter is a compound tiltmeter, also present in the camera housing.
The physical distribution of accelerometers and tilt-meters in the Mercury backpack (visible in Figure 4-1) is worth noting. The two simple tilt-meters are both
mounted flat in the plane of their PCBs. The camera PCB swivels, therefore these
two tilt-meters are co-planar only when the backpack camera is pointing straight
ahead. Another significant difference between these two tilt-meters is that they are
mounted in different electrical circuits. The circuit in the camera PCB contains its
own voltage regulator; this means that readings from these two tilt-meters are subject
to different levels of power noise.
The compound tilt-meter consists of two ADXL202E chips mounted perpendic59
ular to each other. One of these chips lies flat in the plane of the camera housing
and another lies perpendicular to the plane of the camera housing. Both chips are
sensitive to pitch and roll movement, neither chip is sensitive to yaw. The advantage
of mounting two chips in this fashion is that it is possible to distinguish between the
backpack being placed face up and face down.
4.2.3
Interpretation
With a iPaq running Familiar Linux vO.6 and having Mercury backpack support, the
tilt-meter readings are accessible as files in the /dev filesystem. The file descriptors
provide a useful means of referring to the tilt-meters; we will refer to these tilt-meters
as /dev/backpaq/accel /dev/backpaq/cam-accel and /dev/backpaq/cam-accelxyz
Table 4.2.3 lists the tilt-meters and the /dev file that they correspond to.
Table 4.1: Linux file descriptors of tilt-meters present on Mercury backpack.
" Simple tilt-meter (on backpack PCB)
- /dev/backpaq/accel
" Simple tilt-meter (in camera housing)
- /dev/backpaq/cam-accel
" Compound tilt-meter (in camera housing)
- /dev/backpaq/cam-accel-xyz
It is important to reiterate here that /dev/backpaq/cam-accezxyz is not a "Three
Axis" Tilt-meter. In fact, none of the accelerometers in the Mercury backpack are
able to detect rotation around a vertical axis ('yaw' movement). This is because the
chips that constitute the tilt-meter are only sensitive to their attitude to the earth's
gravity vector. Rotation in the yaw plane does not change the relative attitude of
gravity to the accelerometer chips; hence the readings obtained by the tilt-meter do
not change.
Tilt in the pitch and roll planes may be obtained from the tilt-meter X and Y
60
readings. The X readings obtained from each of the three tilt-meters correspond
to motion along the roll axis of the backpack. In other words, if one were to hold
the backpack flat facing upwards, the X readings from the tilt-meters change with
rotation along the longitudinal axis of the backpack. Similarly, the Y readings alter
when the pitch of the backpack changes i.e. when the backpack's nose is tilted up or
down.
The ADXL202E specification states that the output of the accelerometer chips
(and hence the tilt-meters) gives us the component of gravity along each of the axis
of measurement of the chip. However, as explained in Section 4.2.1 these readings
may be scaled or offset by initial calibration factors.
After undoing the effects of offset and scaling, the accelerometer X and Y readings
may be interpreted in units of acceleration due to gravity (g). The tilt-meter readings
may then be converted to actual degrees of pitch and roll via these functions [1]
Pitch
Roll
4.3
arcsin(XReading/1.0g)
arcsin(Y -Reading/1.Og)
Static Experiments
The Mercury backpack tilt-meters are constructed out of Analog Devices ADXL202e[11
Dual-Axis accelerometers. We wish to measure the performance of these accelerometers in situ i.e. exposed to all the software and hardware error that would be experienced by a pervasive application running on the Mercury backpack. Therefore we
conducted the following experiment:
An iPaq in a Mercury backpack was placed flat on a stationary table and accessed
via remote login. Timed sequences of readings were taken from each accelerometer at
a series of sampling frequencies between 10Hz and 100Hz. Each sequence consists of
1000 readings. The data traces were then transferred to a workstation and analyzed
using a set of Matlab scripts1 . For each tilt-meter, we sought answers to the follow'The complete set of data and analysis scripts is available online at
61
ing questions: presence of extreme measurements, normality of readings, correlation
between X and Y readings and auto-correlation between consecutive readings. We
also studied how these parameters change as a function of the sampling rate of the
tilt-meter.
4.4
Results
4.4.1
Extreme Measurements
Figures 4-2, 4-3 and 4-4 plot statistics of the measurement data, taken while the device
is at rest. Each point represents statistics drawn from a set of 1000 readings taken
at a frequency.
Sampling frequencies vary between 10Hz and 1000Hz.
The graph
is normalized to average mean and average standard deviation across all sampling
frequencies.
From these figures we can see that:
" For /dev/backpaq/accel there are not many extreme measurements. The largest
reading is about 3 sigma away from mean.
" For the remaining devices the Y readings exhibit patterns similar to /dev/backpaq/accel
. The X readings alternate between one of two values.
4.4.2
Normality
From Figures 4-5, 4-6 and 4-7, it appears that:
" X and Y readings from the simple backpack tilt-meter (/dev/backpaq/accel)
are approximately normally distributed for a sample size of 1000 readings.
* Y readings from the simple and compound camera tilt-meter (/dev/backpaq/cam-accel
and /dev/backpaq/cam-accelxyz
) are normally distributed. X readings oscil-
late between one of two readings.
http://org.lcs.mit.edu/
62
4
A
2 -- - - - - - - - - - - - - - - - - - - - - - - - - - - -
~
-
Xmax
p+3
t (scaled)
-
A---
2
S-
p-3a
V X min
0
0)
0
-2
------------------
C
-4
101
-10 3
102
4
-------
6Z
C
- - -- ----
-
-
-
Y max
-A A
-
-
--
2
2
-
+3
(scaled)
V Y min
0
E
0
-2
V7
- - - - - - - - - C
_41
10
1
V
V
10 3
10S
Sampling frequency [in Hz]
Figure 4-2: A normalized graph of measurement data from the simple backpack tiltmeter (/deb/backpaq/accel), taken while the device is at rest. The mean, standard
deviation or extreme measurements all appear to be independent of sampling frequency. For a set of 1000 readings, the extreme readings lie approximately 3 a away
from the sample mean.
63
4
A
-t
--
CZ
--
2
A
Z
A
A
A
A
A
X max
+3a
(scaled)
- t m3y
Xmin
X
0)
C
E
-2
0
-4.
101
3
1
10
A
4
A
A Y max
A- -t (scaled)
AA
2
2
V Y min
0
E
0
-2
V7
.2
0
V
7
_41
I
1
10
V
V
.,,,,,
-v
I
10 3
102
Sampling frequency [in Hz]
Figure 4-3: A normalized graph of the measurement data from the simple camera tiltmeter (/dev/backpaq/cam-accel), taken while the device is at rest. The sample mean,
standard deviation and extreme readings do not vary significantly with sampling
frequency. For Y readings, the extrema lie approximately 3 a away from the sample
mean. For X readings, the extrema are distributed much closer to the sample mean.
64
4
A
-
2
A
- -
- -
- --------
-
-
-
X max
-
t+3c
-t (scaled)
A
V X min
0
V
CD
0
V
V-
-2
.2;
0Z
-4
101
10 3
102
4
Z
A
A
A
A
-A
A--
6Z
--
2
-
+3
a
(scaled)
p-3c
Y min
[
-
V
6)
Y max
_t_
0!
E
0
C
-2--
.3:
CD
4
10
----
,---------,--.,
-
--
10 3
102
Sampling frequency [in Hz]
Figure 4-4: A normalized graph of the measurement data from the compound camera tilt-meter (/dev/backpaq/cam-accelxyz), taken while the device is at rest. The
sample mean, standard deviation and extreme readings do not vary significantly with
sampling frequency. For Y readings, the extrema lie approximately 3 or away from
the sample mean. For X readings, the extrema are distributed much closer to the
sample mean.
65
250
N200
250
200-
-
0
0
0
0
0
0
150-
@D
©
CM
CM,
.E 100X<
150-
S 100
50
-
0
-20
0
20
>-
500
30
40
300
40
50
60
70
80
40
50
60
70
80
250
N
N250r
200-
I:
0r200
@D~
(n 150w
C0
150(Cd
COO)100
0,
X 50
0
-20
50
0
20
0
30
40
Figure 4-5: A histogram of the raw measurement data from the simple backpack tiltmeter (/dev/backpaq/accel), taken at 10 Hz and 1000 Hz sampling frequency. The
distributions are approximately normal.
66
700
N
600
CI
500
0)
400
CA,
(D
250
200 I
150
300
S 100
>
CC'
C')
200
>-
100
0
99
99.2
99.4
99.6
99.8
0
1852
100
700
300
600
500
0
400
0)
300
C)
1856
1858
1860
1862
1854
1856
1858
1860
1862
-
200-
CC'
200
150100
50
100
0
99
1854
250-
N
0
50 I
99.2
99.4
99.6
99.8
0'
100
1852
Figure 4-6: A histogram of the raw measurement data from the simple camera tiltmeter (/dev/backpaq/accel), taken at 10 Hz and 1000 Hz sampling frequency. The
distribution of the Y readings is approximately normal. The X distribution oscillates
between one of exactly two values.
67
800
2501
N
T 200
600
0
0
150
400
0
25
C
E
.100
200
01
99
>
I
99.2
99.4
99.6
99.8
r
50
01
1850
100
700
300-
600
250-
N
500
I
0)
400
1855
1860
1865
1855
1860
1865
200150
C.
CU)
300
CU
(1) 100
200
50
100
0
99
99.2
99.4
99.6
99.8
0'
1850
100
Figure 4-7: A histogram of the raw measurement data from the compound camera tilt-meter (/dev/backpaq/cam-accel-xyz), taken at 10 Hz and 1000 Hz sampling
frequency. The distribution of the Y readings is approximately normal. The X distribution oscillates between one of exactly two values.
68
4.4.3
Correlation
Table 4.2: Variation of X-Y Correlation with sampling frequency for each of the
three tilt-meters. The 95% upper and lower bounds are given as well. All of the X-Y
correlations are significantly different.
/dev/backpaq/accel
10Hz 20Hz 50Hz
-0.44 -0.44 -0.43
95% upper bound
Measured X-Y Correlation -0.49 -0.49 -0.48
-0.54 -0.54 -0.52
95% lower bound
100Hz
-0.41
-0.46
-0.51
200Hz
-0.48
-0.53
-0.57
300Hz
-0.45
-0.50
-0.55
500Hz
-0.51
-0.55
-0.59
1000Hz
-0.48
-0.53
-0.57
200Hz
0.33
0.27
0.21
300Hz I 500Hz
0.36
0.29
0.31
0.23
0.17
0.25
1000Hz
0.38
0.33
0.27
200Hz
0.35
0.30
0.24
300Hz
0.30
0.24
0.18
1000Hz
0.39
0.34
0.28
/dev/backpaq/cam-accel
10Hz
0.35
95% upper bound
Measured X- Y Correlation 0.29
0.24
95% lower bound
20Hz F50Hz
0.38
0.32
0.33
0.26
0.27
0.20
100Hz
0.35
0.29
0.23
/dev/backpaq/cam_accel_xyz
10Hz
0.34
95% upper bound
Measured X-Y Correlation 0.28
0.22
95% lower bound
20Hz
0.32
0.27
0.21
50Hz
0.35
0.30
0.24
100Hz
0.35
0.29
0.24
500Hz
0.28
0.22
0.16
Table 4.4.3 displays the variation in X-Y correlation for all three tilt-meters at a
variety of sampling frequencies. From this table, it can be seen that:
" All tilt-meters exhibit correlations that are significantly different from 0, even
at the 0.05 level.
" The correlations do not vary significantly with sampling frequency.
4.4.4
Serial autocorrelation
Figure 4-8 is a graph of sample autocorrelations of readings taken from the simple
backpack tilt-meter(/dev/backpack/accel).
The autocorrelation weights were fitted
using a linear model of order 10. From this figure we see that the simple backpack
tilt-meter exhibits oscillatory serial autocorrelation that gets larger with increase in
sampling frequency. In fact:
69
1
--
0
1
2
3
4
5
6
7
8
9
10
--
0
1
2
3
4
5
6
7
8
x
x
9
10
1
--
~~~
0~~~
0
-1
0-
x
y....-......
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
Lag index
6
7
8
9
10
-
0
Figure 4-8: A graph of sample auto correlations from the simple backpack tilt-meter
(/dev/backpaq/accel), made at different sampling frequencies. The autocorrelations
were calculated using a linear model of order 10. At high sampling frequencies,
measurements are highly autocorrelated.
70
All three tilt-meters exhibit an oscillatory serial autocorrelation similar to that
"
in Figure 4-8. Each reading is negatively correlated to the preceding reading.
" The serial autocorrelation gets dramatically worse with increase in sampling
frequency. Figures 4-9, 4-10 and 4-11 plot the auto-correlation between any two
consecutive readings from a given tilt-meter. From the graphs, it can be seen
that there is a threshold sampling frequency (around 500-600Hz) beyond which
the sample autocorrelations increase dramatically.
0. 4
r
0.2 H
-
-. - .-.
-
-
.
-
-
-
- -
-
-
-.-
0
-0.2|
0
0
-0.4
-0.6
-0.8
between Xnand Xbetween YnandY
-1
1
10
102
10 3
Sampling frequency [Hz]
Figure 4-9: A plot of the correlations between (X., X,-) and (Yr, Y._ 1 ) as a function
of the sampling frequency. As sampling frequency increases towards 1000Hz, the correlation between consecutive measurements grows in magnitude. At lower sampling
frequencies, the correlation between consecutive measurements is between 0 and -0.2
71
-
0
-
-
- -
-
-
- . . . . . -. . . . .- . . . .. . . . . . .--. . . . . . . . - . . ..-...
-0.1
.........-
.~- .-
--
-
-
.........-
.. .......
-
.........-
-0.2
-
-0.3
- -
...-
-
-
-
-
-0.4
0
c0
0
0
-0.5 -
............................
-
.
......
-0.6
-0.7
-0.8 X and
between
between
-0.9
Xn and Xn-1
1
between Y and Y-1
-1I
101
102
10 3
Sampling frequency [Hz]
Figure 4-10: A plot of the correlation between consecutive readings, made at different
sampling frequencies. At all sampling frequencies, the correlation between consecutive
measurements is slightly negative (-0.1). At sampling frequencies close to 1000Hz,
these negative correlations decrease dramatically to around -0.8.
72
0
-.
-0.1
- ......- - ..
- ...
.....-
......-
.....-
- ......-
.. . .-.
. .- . . . .
- .
-.
-.
. . . .. . . . .. . .. . .. .
-0.2
.. . . . . . . . . . . . . . -
-0.3
-0.4 -
0
0
- --......
.
.. .. . .
. . . . . .. . . . . .
.. . ..
- - ..- . .. .
- ..
- . .. . .. . ..
- .-..
.. .
-...
- . .. .
-0.5
.....-
. . . . .. . . . .. . .. .
-0.6 .. ..-. .. . . . . . . . .. . .
. . . . . . . . . . . . .V
. .. .. . .. .
. . . . .. . . . .. . . . .
-0.7
-0.8 [-0.9
between Xn and Xn-1
between Yn and Yn-1
-
-1
10
102
10 3
Sampling frequency [Hz]
Figure 4-11: A plot of the correlation between consecutive readings, made at different
sampling frequencies. As sampling frequency approaches 1000Hz, the correlation between consecutive measurements grows in magnitude. At lower sampling frequencies,
the correlation between consecutive measurements stays between 0 and -0.2.
73
4.5
Inferences
Table 4.3: Summary of results from static analysis of Mercury backpack tilt-meters.
accel
cam-accel_
cam-accel-xyz
Not seen
Not seen
Not seen
yes
yes
significant
unclear
yes
significant
unclear
yes
significant
approx. -0.5
approx. 0.3
approx. 0.3
oscillatory
oscillatory
oscillatory
oscillatory
oscillatory
oscillatory
Extreme measurements
Normality
X
Y
X- Y Correlation
X
Serial auto-correlation Y
Table 4.3 summarizes all the results from Section 4.4.
We found that readings rarely exceeded the 3 a range. There are no outliers in any
of the data traces. We takes this as proof that any statistical estimation algorithm
we develop can safely ignore the possibility of extreme readings. We also discovered
three unexpected things about static readings from the backpack tilt-meters:
I. X and Y readings are correlated
We expected that the error between X and Y readings would be uncorrelated.
The spec-sheet of the ADXL202E chip indicates that the error between X and Y
readings is uncorrelated. Additionally, the readings measure tilt in orthogonal
directions - there is no reason to assume that they would vary together. However,
the X and Y readings from each backpack tilt-meter are not independent- and
the exhibited correlation is statistically significant.
II. Oscillatory auto-correlations
As the sampling frequency of our analysis increases to the point where it is close
to the ADXL202E chip cycle, we expect to see significant auto-correlations in
our readings. This is borne out by our observation that readings taken at speeds
above 500Hz are very highly correlated with their predecessor.
However, what is of interest is the nature of the auto-correlation. As the sampling frequency gets closer to the accelerometer chip cycle, we expect to see
74
greater and greater positive auto-correlation between readings. In fact, since
each reading is less dependent on the previous reading, we expect the autocorrelation coefficients to be a series of monotonically decreasing positive numbers. This is not what we observe. The auto-correlation coefficients alternate in
sign and the entire auto-correlation coefficients series is indicative of an oscillatory process.
III. Anomalous X readings from /dev/backpaq/cam-accel and /dev/backpaq/cam-accel-xyz
We expected that the readings from the device would be normally distributed. In
general we have found this to true. We can cast some doubt on the accelerometer X readings from /dev/backpaq/cam-accel and /dev/backpaq/cam-accel-xyz
. These readings do not exhibit a Gaussian distribution but instead oscillate
between two values.
The statistically significant X-Y correlations point to incomplete isolation of the
accelerometers from PCB power noise. A rise or fall in the PCB voltage is a likely
reason for driving both X and Y accelerometer readings up and down at the same time.
We suspect that the oscillatory X correlations are indicative of a flaw in the design
on the tilt-meter drivers. We hypothesize that the tilt-meter driver contains state
that is not flushed from one reading to another. The reason behind the anomalous X
readings is as yet unknown to us. Currently the /dev/backpaq/accel ddevice is the
only tilt-meter that appears to give normally distributed X and Y readings. Therefore
this is the device we have preferentially used for our research.
75
76
Chapter 5
Conclusion and Future Work
In this thesis, we have made the following contributions:
" Directional device identification
In Chapter 2 we have traced the arguments for a point-and-shout system, beginning with the survey and evaluation of current device identification systems for
the task of directional device identification. We are currently looking to actually
design and construct these tags. The task of developing APIs and applications
to manage this system is also the object of future work in this area.
" Video-enhanced multi-modal rest detection
In Chapter 3 we have constructed a rest-detection system that combines video
data with traditional tilt-meter data for greater robustness. We have found that
this algorithm is an improvement over traditional tilt-meter based approaches,
and we have outlined a potential use for this technology in the domain of position triangulation. Future work for rest-detection would involve the development of more complicated algorithms that we can call upon in cases where
video and tilt-data do not agree with each other, the development of automatic
configuration routines, and an improvement in the file-handling behavior of the
implementation.
* Analysis of Mercury backpack tilt-meters
77
Chapter 4 identifies some anomalous characteristics of the tilt-meter readings
from the Mercury backpack. We believe that further exploration of these results
will lead to significant improvements in the reliability of the Mercury backpack
tilt-meters.
The Stargazer vision itself remains a source of further research challenges. Even
after we accurately identify the device that we are pointing at, the real-time tracking
of the resulting labeled pixels is hard to do in real-time on a resource-constrained
computer. The semantics and appearance of a "virtual-wire" interface is an open
research area as well. What kinds of wire overlays work best? What does connecting
two devices with avirtual wire "mean" ? All of these are questions that are open for
further investigation.
78
Bibliography
[1] Analog Devices, Norwood, MA.
Analog Devices ADXL202E Dual Axis Ac-
celerometer Dataheet.
[2] Paramvir Bahl and Vennkata N. Padmanabhan. RADAR: An in-building RF
based user location and tracking system. In Proceedings of the IEEE Infocom,
pages 775-784. IEEE, March 2000.
[3] Joel F. Bartlett. Rock 'n' scroll is here to stay. IEEE Computer Graphics and
Applications, 20(3):40-45, May/June 2000.
[4] Bluetooth Specification Part E. Service Discovery Protocol, November 1999.
[5] J. Borenstein, H.R. Everett, and L.Feng. Navigating Mobile Robots: Systems and
Techniques. A.K. Peters, Wellesley, Mass., 1996. This is available as a 'Where
am I' report from the University of Michigan.
[6] C. Fennema, A. Hanson, E. Riseman, J. R. Beveride, and R. Kumar. Modeldirected mobile robot navigation. IEEE Transactions on Systems, Man, and
Cybernetics, 20(6):1352-1369, November-December 1990.
[7] Jaap Haartsen, Mahmoud Naghshineh, Joh Inouye, Olaf J. Joeressen, and Warren Allen. Bluetooth: Vision, Goals, and Architecture. A CM Mobile Computing
and Communications Review, 2(4):38-45, October 1998.
[8] Beverly L. Harrison, Kenneth P. Fishkin, Anuj Gujar, Carlos Mochon, and Roy
Want. Squeeze me, Hold me, Tilt me ! An exploration of manipulative user
interfaces. In Proceedings of the Conference on Human Factors in Computing
79
Systems (CHI '98) : Making the Impossible Possible, pages 17-24. ACM Press,
April 18-23 1998.
[9] Ken Hinckley, Jeffrey S. Pierce, Mike Sinclair, and Eric Horvitz. Sensing techniques for mobile interaction. In Proceedings of the ACM Symposium on User
Interface Software and Technology (UIST '00), pages 91-100. ACM Press, 2000.
[10] Ken Hinckley, Mike Sinclair, Erik Hanson, Richard Szeliski, and Matt Conway.
The videomouse: A camera-based multi-degree-of-freedom input device. In Proceedings of the ACM Symposium on User Interface Software and Technology
(UIST '99), Novel Input, pages 103-112, 1999.
[11] iButton Home Page. http://www.ibutton.com.
[12] Tim Kindberg, John Barton, Jeff Morgan, Gene Becker, Debbie Caswell, Philippe
Debaty, Gita Gopal, Marcos Frid, Venky Krishnan, Howard Morris, John Schettino, and Bill Serra. People, places, things: Web presence for the real world. Technical Report HPL-2000-16, Hewlett Packard Laboratories, February 13 2000.
[13] Eric Krotkov. Mobile robot localization using A single image. In Proceedings
1989 IEEE International Conference on Robotics and Automation, pages 978983. IEEE, 1989.
[14] Beth M. Lange, Mark A. Jones, and James L. Meyers. Insight lab: an immersive team environment linking paper, displays, and data. In Proceedings of the
Conference on Human Factors in Computing Systems (CHI '98), pages 550-557.
ACM Press/Addison-Wesley Publishing Co., 1998.
[15] Larry Matthies and Steven A. Shafer. Error modeling in stereo navigation. IEEE
Journal of Robotics and Automation, RA-3(3):239-248, June 1987.
[16] Allen Miu. Design and Implementation of an Indoor Mobile Navigation System.
M.S. thesis, Massachusetts Institute of Technology, Electrical Engineering and
Computer Science Department, Jan 2002.
80
[17] Sharon L. Oviatt. Ten myths of multimodal interaction. Communications of the
ACM,42(11):74-81, February 1999.
[18] Nissanka B. Priyantha, Anit Chakraborty, and Hari Balakrishnan. The Cricket
Location-Support System. In Proceedings of the 6th Annual ACM International
Conference on Mobile Computing and Networking (MobiCom '00), pages 32-43.
ACM Press, August 6-11 2000.
[19] Nissanka B. Priyantha, Allen K. L. Miu, Hari Balakrishnan, and Seth J. Teller.
The cricket compass for context-aware mobile applications. In Proceedings of the
7th Annual ACM International Conference on Mobile Computing and Networking (MobiCom '01), pages 1-14. ACM Press, July 16-21 2001.
[20] Project Mercury. http://crl.research.compaq.com/projects/mercury/.
[21] Jun Rekimoto. Tilting operations for small screen interfaces. In Proceedings of
the ACM Symposium on User Interface Software and Technology (UIST '96),
Papers: Interaction Techniques (TechNote), pages 167-168, 1996.
[22] Jun Rekimoto and Yuji Ayatsuka.
Cybercode: Designing augmented reality
environments with visual tags. In Proceedings of Designing Augmented Reality
Environments 2000 (DARE '00), pages 1-10. ACM Press, April 2000.
[23] RF-id Home Page. http://www.aimglobal.org/technologies/rfid.
[24] A. Schmidt, K. A. Aidoo, A. Takaluoma, U. Tuomela, K. Van Laerhoven, and
W. Van de Velde. Advanced interaction in context. Lecture Notes in Computer
Science, 1707:89-??, 1999.
[25] Itiro Siio, Toshiyuki Masui, and Kentaro Fukuchi. Real-world interaction using
the fieldmouse.
In Proceedings of the 12th annual ACM symposium on User
Interface Software and Technology (UIST '99), pages 113-119. ACM Press, 1999.
[26] David Small and Hiroshi Ishii. Design of spatially aware graspable displays. In
Proceedings of ACM Conference on Human Factors in Computing Systems (CHI
'97), volume 2 of SHORT TALKS: Devices, pages 367-368, 1997.
81
[27] K. Sugihara. Some location properties for robot navigation using a single camera.
Computer Vision, Graphics and Image Processing, 42:112-129, 1988.
[28] R. Y. Tsai. An efficient and accurate camera calibration technique for 3D machine vision. In Proceedings of IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, (CVPR '86), IEEE Publ.86CH2290-5, pages
364-374. IEEE, June 22-26 1986.
[29] Roy Want, Kenneth P. Fishkin, Anuj Gujar, and Beverly L. Harrison. Bridging
physical and virtual worlds with electronic tags. In Proceedings of the Conference
on Human Factors in Computing Systems (CHI '99), pages 370-377. ACM Press,
May 15-20 1999.
[30] Roy Want, Andy Hopper, Veronica Falcao, and Jonathan Gibbons. The active
badge location system. ACM Transactions on Information Systems, 10(1):91102, January 1992.
[31] Andy Ward, Alan Jones, and Andy Hopper. A new location technique for the
active office. IEEE Personal Communications, 4(5):42-47, October 1997.
[32] Greg Welch, Gary Bishop, Leandra Vicci, Stephen Brumback, Kurtis Keller, and
D'nardo Colucci. The HiBall tracker: High-performance wide-area tracking for
virtual and augmented environments. In Proceedings of the A CM Symposium on
Virtual Reality Software and Technology (VRST '99), pages 1-10. ACM Press,
December 20-22 2000.
[33] Pierre Wellner. Interacting with Paper on the DigitalDesk. Communications of
the ACM, 36(7):86-96, July 1993.
82