Presenter Ivan Chiou
• All come from Electrical and Computer
Engineering, Carnegie Mellon University
– Zheng Sun, PhD student in CyLab
Mobility Research Center
– Aveek Purohit, Ph.D. candidate
– Raja Bose, Microsoft Silicon Valley,
KarMode LLC
– Pei Zhang, Assistant Research Professor
• Spartacus
– a mobile system that enables spatially-aware neighboring
device interactions with zero prior configuration.
– Using built-in microphones and speakers
– Doppler effect to enable an interaction through a pointing
– audio-based lower-power listening mechanism to trigger
the gesture detection service.
• Experiment
– 90% device selection accuracy within 3m
– lower energy consumption
• Recent research still require initial channel of
communication such as Wi-Fi or Bluetooth
• Spartacus’ Key contribution:
– a novel acoustic technique based on the Doppler
– a novel undersampling audio signal processing
– low-power listening(reduces energy consumption)
and without any manual actions from users
– Experimentally validation
• How it works:
– Spartacus interact by quickly
pointing her mobile phone
towards the targeting device.
– low-power listening using their
built-in microphones.
• an audio beacon with a short
duration as an initiator
• does not require any extra
– implemented on the Android
mobile platform without extra
• High Resolution Doppler-Shift Detection
– pointing gestures of average users are usually
transient (shorter than 0.5s)
– increases the frequency-domain resolution by 5X than
traditional FFT-based approaches
• High-Accuracy Device Selection
– Accurately estimate the peak frequency shifts
– implement a bandpass audio signal processing
pipeline to intermit high frequency acoustic noises
• Energy-effect Interaction Trigger
– a low-power audio listening protocol to trigger
incoming interaction
• How Spartacus detects the maximum peak
frequency shifts among those candidate
target devices?
– Since the user made the gesture directionally
towards the target device, the target device
would be able to observe the maximum
Doppler shift and to be selected.
• Deriving Angular Resolution
– where fA is the observed tone frequency of
DA, f0 the frequency of the original tone, Fs
the sampling rate, NFFT the number of FFT
points, and
the calculated frequency shift
expressed in terms of FFT points.
• Assume the target device is stationary
during the course of the gesture
• Improving Resolution using
– increasing the original tone
frequency f0
• stronger energy degradation
– increasing the number of FFT
points NFFT
• higher computational burden
– decreasing the sampling rate
• Spartacus at a very high
• Undersampling technique can
significantly reduce it
• Determining Undersampling Parameters
– A higher n
– a higher fL
• Avoided using fL higher than 19KHz since it will cause greater energy
– Commodity Device limits audio sampling rates
• include 8KHz, 16KHz, 32KHz, 44.1KHz, and 48KHz
• only when n=5, 6, or 7 given Fs = 44.1KHz, or when n = 4 given Fs =
• Angular resolution improved 26.7 degrees to 10 degrees.
• Bandpass Signal Processing Pipeline
– since the new sampling rate is much lower
than the Nyquist rate, aliasing arises in the
original sampled audio signals.
• We found that M = 1.5 led to
robust performance in
various indoor environments.
• After each device detects the
Doppler frequency shifts, all
the devices report their
frequency shift to the sender
device, along with the
device’s ID information.
• The sender device then
compares all the received
Doppler shifts and
determines the target device.
• Angular Gain through Pointing
– the number of FFT points is
2048, the smallest angular
resolution is 10 degrees when
the undersampling factor n is
equal to 7.
– when candidate devices are
close to the user (i.e. within
3m), the device selection
accuracy is better than the
• This angular change is significant
when the candidate devices DA
and DB are close to D0.
Assuming the user’s arm is 60cm,
the effective angular difference
is increased to 55", which makes
the two devices much easier to
be differentiated.
• How Spartacus Design for saving energy?
• Low-Power Audio Listening
– Advantages
• Ubiquitous Hardware Support
– No extra hardware and Only need Microphones and speakers
• Limited Range
– Easy to detecting neighboring devices within the same space
• Energy Efficient
– designed for continuous discovery.
– Protocol Two major modes
» Periodic Listening
• wake up (every Trx)
• Record sound for duration (drx).
» Beaconing
• After receive the beacon, switch to continuous listening mode to record the
• a short beacon duration consumes more energy
• Tradeoff between energy consumption vs. duty cycles
• Encodes the device ID using the Reed-Solomon coding
– Using a 16 Frequency Shift-Keying (FSK) scheme with a central frequency at 19KHz.
» Keys are using a 50Hz
» the transmission of the device ID is at least 200Hz lower than the gesture tone NO ambiguities
• Dealing with Wakeup Jitter
– It can be observed between
when an API starts recording
sound and when the system
actually begins recording.
– average jitter: 70ms,
standard deviation: 15ms
– empirical measurements to
solve this problem
• Dealing with Wakeup Jitter
– due to the existence of the wakeup jitter
an additional guard band
is used in the
• Hardware
– Android platform on Galaxy Tab, Nexus 7, Galaxy
Nexus, and HTC One S.
• Software implementation
– 4 components
• GestureSensing
– GestureSensing.makeGesture();
– GestureSensing.analyzeGesture();
• LowPowerListening
– LPL.start();
• AudioModem
• GUI.
• In Spartacus, we use tone frequencies higher than
20KHz : inaudible
– quantize the energy degradation of sound
• Devices:
– Sennheiser MKE 2P microphone
– Yamaha NX-U10 speaker
• energy degradation higher than 15KHz
– Mobile phone usually designed for human conversations and
music that is lower than 15KHz
– increases every 1KHz, the degradation of sound energies increases
5dB on speakers
– average 3.2dB/m energy decrease of sound from 1m to 6m
• These results indicate that, to reduce
energy degradation and increase
interaction range, audio tones with lower
frequencies should be leveraged.
• Challenging questions:
– How diversely do users point their phones, and how fast can a
user point?
– If the user points fast enough, how often does the target device
observe the highest frequency shift, thus the highest velocity, of
the gesture?
– If we want to estimate the frequency shifts, how much
frequency- and time-domain resolution do we need to
successfully capture the peak frequency shift inside of a gesture?
• Participator
– 12 participants (6 females)
– briefed the participants on the idea of Spartacus before the
– 10 gestures towards a target device 2m away from them, using
a Galaxy Nexus phone.
– detected hand trajectories of the participants using image
processing techniques
• Finding 1
– Three types of gesture
– most of the participants fully
stretched out their arms
– Focusing on evaluating this
vertically downward
gesture trajectory in the
current design of Spartacus.
• Finding 2
– facing towards the target device, with an
average ±7.5 angular bias.
– precisely point the phones towards the target
– selecting the target device using the
maximum velocity
• Finding 3
– The peak velocity of the gestures of all
participants was 3.4m/s on average
– Most of the gestures lasted less than one
second, and the peak velocities appeared and
diminished within 25ms.
– Spartacus needs a high time-domain
resolution to position the peak frequency
• Galaxy Nexus phone 25 times towards the
target device
• a peak velocity of about 3m/s.
• Select 20 from 25 gestures for analysis.
• captured at the two candidate devices at
44.1KHz, undersampled 7 times to 6.3KHz
• Performance with Distances and Angles
– As the distances between devices increase, the
device selection accuracy drops gradually
• Since tones and other frequency bands decreases as
the distances increase
– as decreases, the accuracy of device selection
• Evaluation metals sounds
– played a piece of rock music (i.e. “Burn It Down”
of Linkin Park)
– metal clangs can hardly reach frequencies above
18KHz, which has limited effect to Spartacus.
• limited space in these scenarios
– Only test to 1.5m with 30 degrees.
• Distance increase, the
performance slight decreases due
to the stronger multi-path effects
in the Cubicles and Hallway.
• All three cases, achieved higher
than 85% accuracy.
• Spartacus: 2014-point FFT processing
– takes 1.5s to process a 1s gesture
• traditional FFT: 8192-point FFT processing
– takes 8.7s
• compare the performance
under different duty cycles
– fixed each listening session to
• Hardware
– Galaxy Nexus mobile phones
• Each test time
– running low-power listening task
for 5min
• Result
– 4X lower energy consumption
than WiFi Direct
– 5.5X lower than the latest
Bluetooth 4.0 protocols
• Audio Processing in Mobile Sensing
– Microphones on Mobile sensing
• Miluzzo
– human conversation snippets for analyzing social activities
• SurroundSense
– combined with other sensing modalities
» accelerometers, cameras, and magnetometers to detect locations of users
for social context inferences
• Lu
– unknown social events can be automatically identified and easily labeled
– Microphones on Energy-efficient
• JigSaw and Darwin Phones
– enabling energy-efficient continuous sensing and collaborative learning
• MoVi
– multiple participants to create integrated social event records
• SwordFight
– Provide distance ranging technique using time difference of sound arrivals
• Spatially-Aware Device Interactions
– Point & Connect (P&C) proposed an interaction technique based
on time difference of sound arrivals.
• Enabling P&C may prevent the users from using their default WiFi
• launched the related service and continuously waiting for interaction
– consume significant energy.
– SoundWave
• Single-device interactions
– the laptop is both the transmitter and the receiver of Doppler effect, the
generated frequency shift is doubled.
• No extra infrastructure and no extra effort from users to initiate
• only supports devices in stationary placements
– Polaris
• Support spatially-aware indoor device interactions
• dealt with only absolute directional relationships of devices
• Energy-Efficient Interaction Triggers
– Be enabled on demand when the energy constraint is not a major
– Triggered by other traditional communication schemes, such as
Bluetooth or WiFi Direct.
• To solve that user has to wait for a couple of seconds for a “warmup
beacon”before doing the gesture in Spartacus
• Security Issues
– malicious device standing close by could pretend to have detected
higher Doppler shifts than other devices, so that it deceives the sender
into thinking it was the receiver.
– Only trusted and authenticated devices could be allowed to report their
Doppler shifts.
• After the user’s device determines the potential receiver who has reported the
maximal Doppler shifts, the name and identity of receiver’s owner would be
shown on the user’s device.
• Contentions Among Interaction Sessions
– Used in a crowded scenario(ex. airport)
• contentions could be an issue for device pairing techniques
– Need a contention coordination mechanism
• Spartacus, a spatially-aware interaction
High accuracy
Low latency
Low energy consumption
No extra hardware
Zero prior noisy configuration
Use in various conditions.
Experimental evaluations for Spartacus
• This paper only document the initial
gesture in its experiments? How about
other gestures detection that receiver can
recognize difference meanings of senders?
• If there are many children and adults who
have different height and stand close in
crowded scenario, how could the system
to separate tallest and shortest from all
selection targets?
Presenter Ivan Chiou

Spartacus_Spatially_Aware Interaction for Mobile Devices reported