Presenter Ivan Chiou • All come from Electrical and Computer Engineering, Carnegie Mellon University – Zheng Sun, PhD student in CyLab Mobility Research Center – Aveek Purohit, Ph.D. candidate – Raja Bose, Microsoft Silicon Valley, KarMode LLC – Pei Zhang, Assistant Research Professor • Spartacus – a mobile system that enables spatially-aware neighboring device interactions with zero prior configuration. – Using built-in microphones and speakers – Doppler effect to enable an interaction through a pointing gesture. – audio-based lower-power listening mechanism to trigger the gesture detection service. • Experiment – 90% device selection accuracy within 3m – lower energy consumption • Recent research still require initial channel of communication such as Wi-Fi or Bluetooth • Spartacus’ Key contribution: – a novel acoustic technique based on the Doppler effect – a novel undersampling audio signal processing pipeline – low-power listening(reduces energy consumption) and without any manual actions from users – Experimentally validation • How it works: – Spartacus interact by quickly pointing her mobile phone towards the targeting device. – low-power listening using their built-in microphones. • an audio beacon with a short duration as an initiator • does not require any extra hardware. – implemented on the Android mobile platform without extra hardware. • High Resolution Doppler-Shift Detection – pointing gestures of average users are usually transient (shorter than 0.5s) – increases the frequency-domain resolution by 5X than traditional FFT-based approaches • High-Accuracy Device Selection – Accurately estimate the peak frequency shifts – implement a bandpass audio signal processing pipeline to intermit high frequency acoustic noises • Energy-effect Interaction Trigger – a low-power audio listening protocol to trigger incoming interaction • How Spartacus detects the maximum peak frequency shifts among those candidate target devices? – Since the user made the gesture directionally towards the target device, the target device would be able to observe the maximum Doppler shift and to be selected. • Deriving Angular Resolution – where fA is the observed tone frequency of DA, f0 the frequency of the original tone, Fs the sampling rate, NFFT the number of FFT points, and the calculated frequency shift expressed in terms of FFT points. • Assume the target device is stationary during the course of the gesture • Improving Resolution using Undersampling – increasing the original tone frequency f0 • stronger energy degradation – increasing the number of FFT points NFFT • higher computational burden – decreasing the sampling rate Fs. • Spartacus at a very high frequency(18KHz) • Undersampling technique can significantly reduce it • Determining Undersampling Parameters – A higher n – a higher fL • Avoided using fL higher than 19KHz since it will cause greater energy degradation – Commodity Device limits audio sampling rates • include 8KHz, 16KHz, 32KHz, 44.1KHz, and 48KHz • only when n=5, 6, or 7 given Fs = 44.1KHz, or when n = 4 given Fs = 48KHz • Angular resolution improved 26.7 degrees to 10 degrees. • Bandpass Signal Processing Pipeline – since the new sampling rate is much lower than the Nyquist rate, aliasing arises in the original sampled audio signals. • We found that M = 1.5 led to robust performance in various indoor environments. • After each device detects the Doppler frequency shifts, all the devices report their frequency shift to the sender device, along with the device’s ID information. • The sender device then compares all the received Doppler shifts and determines the target device. • Angular Gain through Pointing Gestures – the number of FFT points is 2048, the smallest angular resolution is 10 degrees when the undersampling factor n is equal to 7. – when candidate devices are close to the user (i.e. within 3m), the device selection accuracy is better than the analysis. • This angular change is significant when the candidate devices DA and DB are close to D0. Assuming the user’s arm is 60cm, the effective angular difference is increased to 55", which makes the two devices much easier to be differentiated. • How Spartacus Design for saving energy? • Low-Power Audio Listening – Advantages • Ubiquitous Hardware Support – No extra hardware and Only need Microphones and speakers • Limited Range – Easy to detecting neighboring devices within the same space • Energy Efficient – designed for continuous discovery. – Protocol Two major modes » Periodic Listening • wake up (every Trx) • Record sound for duration (drx). » Beaconing • After receive the beacon, switch to continuous listening mode to record the gesture • a short beacon duration consumes more energy • Tradeoff between energy consumption vs. duty cycles • Encodes the device ID using the Reed-Solomon coding – Using a 16 Frequency Shift-Keying (FSK) scheme with a central frequency at 19KHz. » Keys are using a 50Hz » the transmission of the device ID is at least 200Hz lower than the gesture tone NO ambiguities • Dealing with Wakeup Jitter – It can be observed between when an API starts recording sound and when the system actually begins recording. – average jitter: 70ms, standard deviation: 15ms – empirical measurements to solve this problem • Dealing with Wakeup Jitter – due to the existence of the wakeup jitter , an additional guard band is used in the beacons. • Hardware – Android platform on Galaxy Tab, Nexus 7, Galaxy Nexus, and HTC One S. • Software implementation – 4 components • GestureSensing – GestureSensing.makeGesture(); – GestureSensing.analyzeGesture(); • LowPowerListening – LPL.start(); • AudioModem • GUI. • In Spartacus, we use tone frequencies higher than 20KHz : inaudible – quantize the energy degradation of sound • Devices: – Sennheiser MKE 2P microphone – Yamaha NX-U10 speaker • energy degradation higher than 15KHz – Mobile phone usually designed for human conversations and music that is lower than 15KHz – increases every 1KHz, the degradation of sound energies increases 5dB on speakers – average 3.2dB/m energy decrease of sound from 1m to 6m • These results indicate that, to reduce energy degradation and increase interaction range, audio tones with lower frequencies should be leveraged. • Challenging questions: – How diversely do users point their phones, and how fast can a user point? – If the user points fast enough, how often does the target device observe the highest frequency shift, thus the highest velocity, of the gesture? – If we want to estimate the frequency shifts, how much frequency- and time-domain resolution do we need to successfully capture the peak frequency shift inside of a gesture? • Participator – 12 participants (6 females) – briefed the participants on the idea of Spartacus before the experiment – 10 gestures towards a target device 2m away from them, using a Galaxy Nexus phone. – detected hand trajectories of the participants using image processing techniques • Finding 1 – Three types of gesture – most of the participants fully stretched out their arms – Focusing on evaluating this vertically downward gesture trajectory in the current design of Spartacus. • Finding 2 – facing towards the target device, with an average ±7.5 angular bias. – precisely point the phones towards the target device – selecting the target device using the maximum velocity " • Finding 3 – The peak velocity of the gestures of all participants was 3.4m/s on average – Most of the gestures lasted less than one second, and the peak velocities appeared and diminished within 25ms. – Spartacus needs a high time-domain resolution to position the peak frequency shifts • Galaxy Nexus phone 25 times towards the target device • a peak velocity of about 3m/s. • Select 20 from 25 gestures for analysis. • captured at the two candidate devices at 44.1KHz, undersampled 7 times to 6.3KHz • Performance with Distances and Angles – As the distances between devices increase, the device selection accuracy drops gradually • Since tones and other frequency bands decreases as the distances increase – as decreases, the accuracy of device selection drops. • Evaluation metals sounds – played a piece of rock music (i.e. “Burn It Down” of Linkin Park) – metal clangs can hardly reach frequencies above 18KHz, which has limited effect to Spartacus. • limited space in these scenarios – Only test to 1.5m with 30 degrees. • Distance increase, the performance slight decreases due to the stronger multi-path effects in the Cubicles and Hallway. • All three cases, achieved higher than 85% accuracy. • Spartacus: 2014-point FFT processing – takes 1.5s to process a 1s gesture • traditional FFT: 8192-point FFT processing – takes 8.7s • compare the performance under different duty cycles – fixed each listening session to 200ms • Hardware – Galaxy Nexus mobile phones • Each test time – running low-power listening task for 5min • Result – 4X lower energy consumption than WiFi Direct – 5.5X lower than the latest Bluetooth 4.0 protocols • Audio Processing in Mobile Sensing – Microphones on Mobile sensing • Miluzzo – human conversation snippets for analyzing social activities • SurroundSense – combined with other sensing modalities » accelerometers, cameras, and magnetometers to detect locations of users for social context inferences • Lu – unknown social events can be automatically identified and easily labeled – Microphones on Energy-efficient • JigSaw and Darwin Phones – enabling energy-efficient continuous sensing and collaborative learning techniques • MoVi – multiple participants to create integrated social event records • SwordFight – Provide distance ranging technique using time difference of sound arrivals • Spatially-Aware Device Interactions – Point & Connect (P&C) proposed an interaction technique based on time difference of sound arrivals. • Enabling P&C may prevent the users from using their default WiFi networks. • launched the related service and continuously waiting for interaction requests – consume significant energy. – SoundWave • Single-device interactions – the laptop is both the transmitter and the receiver of Doppler effect, the generated frequency shift is doubled. – PANDAA • No extra infrastructure and no extra effort from users to initiate interactions • only supports devices in stationary placements – Polaris • Support spatially-aware indoor device interactions • dealt with only absolute directional relationships of devices • Energy-Efficient Interaction Triggers – Be enabled on demand when the energy constraint is not a major concern. – Triggered by other traditional communication schemes, such as Bluetooth or WiFi Direct. • To solve that user has to wait for a couple of seconds for a “warmup beacon”before doing the gesture in Spartacus • Security Issues – malicious device standing close by could pretend to have detected higher Doppler shifts than other devices, so that it deceives the sender into thinking it was the receiver. – Only trusted and authenticated devices could be allowed to report their Doppler shifts. • After the user’s device determines the potential receiver who has reported the maximal Doppler shifts, the name and identity of receiver’s owner would be shown on the user’s device. • Contentions Among Interaction Sessions – Used in a crowded scenario(ex. airport) • contentions could be an issue for device pairing techniques – Need a contention coordination mechanism • Spartacus, a spatially-aware interaction system – – – – – – – High accuracy Low latency Low energy consumption No extra hardware Zero prior noisy configuration Use in various conditions. Experimental evaluations for Spartacus performance • This paper only document the initial gesture in its experiments? How about other gestures detection that receiver can recognize difference meanings of senders? • If there are many children and adults who have different height and stand close in crowded scenario, how could the system to separate tallest and shortest from all selection targets? Presenter Ivan Chiou