[Lee 2013]= Viewgraphs

advertisement
TMSG
SocioPhone: Everyday Face-to-Face
Interaction Monitoring Platform Using
Multi-Phone Sensor Fusion
實驗室: 先進網路技術與服務實驗室
報告者: 黃福銘 (Angus F.M. Huang)
2013.09.04
Publication
• MobiSys 2013
• Session 8: Behavior and Activity Recognition
• NuActiv: Recognizing Unseen New Activities Using Semantic AttributeBased Learning
• SocioPhone: Everyday Face-To-Face Interaction Monitoring Platform
Using Multi-Phone Sensor Fusion
• MoodScope: Building a Mood Sensor from Smartphone Usage Patterns
• Auditeur: A Mobile-Cloud Service Platform for Acoustic Event Detection
on Smartphones
Angus F.M. Huang
2
Authors
• Youngki Lee, Chulhong Min, Chanyou Hwang, Jaeung Lee, Inseok
Hwang,Younghyun Ju, Chungkuk Yoo, Miri Moon, Uichin Lee, Junehwa
Song
•
•
•
•
•
School of Information Systems, Singapore Management University
Computer Science Department, KAIST,
Web Science and Technology Division, KAIST
Center for Mobile Software Platform, KAIST
Knowledge Service Engineering Department, KAIST
– http://www.kaist.edu/edu.html
– (韓國科學技術院)
Angus F.M. Huang
3
• Face-to-face interaction monitoring
– Interaction-aware applications
– Group conversations
– Useful meta-linguistic contexts of conversation
– Online turn monitoring
– SocioTherapist, SocioDigest, Tug-of-War
Abstract
Angus F.M. Huang
4
Outline
•
•
•
•
•
•
INTRODUCTION
SOCIOPHONE API And APPLICATIONS
IN-SITU TURN MONITORING
PLATFORM IMPLEMENTATION
EXPERIMENTS
CONCLUSION
Angus F.M. Huang
5
Introduction
• Face-to-face social interaction
• Interaction-aware applications
– Helps a user remember the name of the person
– Developers do not know which contexts to leverage during daily
conversations
• Communicative cues
– Verbal cues
• Spoken words and sentences
– Aural cues
• Tones, pitch
– Visual cues
• Gesture, eye contact
Angus F.M. Huang
6
Introduction
• Conversational turns
– The basic unit of conversation
– A turn is a continuous speech segment where a
person starts and ends her speech
• Basic aspects of a conversation
– Length, speed, participant,…
• High-level social inference
– Role, problem,…
• It will improve collaborative decision
psychological care, content analysis,…
Angus F.M. Huang
making,
7
Challenges
• Mobile environment challenges
– Unconstrained acoustic situations
– Real-time monitoring
– Battery limitations
• Short-lasting turns (1-2 seconds)
• Long speech segments (3-8 seconds)
• Dynamic ambient noises
• Significant power consuming for high-rate sound
sensing and heavy computation
Angus F.M. Huang
8
SocioPhone API and Applications
• Meta-Linguistic Interaction Monitoring
• SocioPhone API
• Example Applications on SocioPhone
• Online conversation monitoring platform
– Meta-linguistic conversational contexts
Angus F.M. Huang
9
Online turn segmentation and metalinguistic conversation monitoring
Angus F.M. Huang
10
Meta-Linguistic Interaction Monitoring
• Online turn segmentation
– Execute online turn segmentation using smartphones
– Turn : (speaking person, start time, end time)
• Meta-linguistic conversation monitoring
– Track non-verbal elements during conversations
– Turn features
• Individual participant
– Speaking length, number of turns, duration statistics
• Relations among participants
– Turn taking orders, pair-wise turn-taking frequencies
• Whole interaction session
– Duration of speaking and non-speaking turns
– Prosodic features
• Pitch, energy, loudness, rhythm, formant, bandwidth, spectrum intensity
Angus F.M. Huang
11
SocioPhone API:
Monitoring sessions and turns
• registerSessionStartListener()
– start/end of a conversation
– join/leave of a participant
– Session table
• registerTurnChangeListener()
– provides the Turn information continuously upon each
turn-taking event
– Turn table
Angus F.M. Huang
12
SocioPhone API:
Monitoring meta-linguistic interactions
• enableProsodicFeatures()
– retrieve rich prosodic features associated with each turn
• getSparsity()
– returns how far the speaking turns are separated by nonspeaking turns
• registerDominanceListener()
– encapsulates complex social inference to find someone
with dominance over the conversation
Angus F.M. Huang
13
SocioPhone API:
Querying interaction history
• getOnGoingSessionHistory()
– query the ongoing session
• getPastInteractionHistory()
– query completed sessions
• Example queries
– “How many turns has John taken within last 10 minutes”
– “Which three friends has John spoken to the most this week?”
– a conventional SQL interface
Angus F.M. Huang
14
Angus F.M. Huang
15
Angus F.M. Huang
16
SocioPhone Application:
SocioTherapist
• Speech therapy sessions for autistic children
– Therapist often employs a stimulus
– Upon a successful response
– Reinforced with small rewards
• SocioTherapist App
– Mimic stimuli and reinforcements
– A callback is triggered for every turn-taking event
– Looks for initiations, long-lasting turns, and rapid responses
• Well-known robotic characters for children gradually
upgraded upon desirable turn-takings
Angus F.M. Huang
17
SocioPhone Application:
SocioDigest
•
•
•
(a) Cumulative conversation time within the user’s social circle
(b) Relative per-person talking times in a session
(c) Relative number of turns exchanged in a session
Angus F.M. Huang
18
SocioPhone Application:
Tug-of-War
• In group meetings or brainstorming
– the level of participation of each individual may vary greatly
• Balance participations from all individuals
– yield better outcomes in brainstorming
• Tug-of-War app
– monitor turn-takings of participants
– provides in-situ graphical feedback of how long each has talked so far
• The lines of kernel code of our prototype is only 75
– the effectiveness of SocioPhone to facilitate the development of
interaction-aware applications
Angus F.M. Huang
19
Challenges and Limitations
• Challenges in Daily Conversation Monitoring
– Interaction patterns
– Environmental characteristics
• Limitations of Existing Techniques
–
–
–
–
Slow, inaccurate speaking turn detection
Vulnerability to real-life acoustic environments
High energy consumption
Limitation of existing collaborative sensing approaches
Angus F.M. Huang
20
In-situ Turn Monitoring
• Speaker recognition
– each phone measures a speaker’s voice signal strength
• Volume-peak-based algorithm
– to select a phone that has the strongest signal strength
• Limitations
– location and placement of phones are not controllable
– some of the phones may not be available
– peak detection is susceptible to background noise
• Volume topography-based method
– each speaker have a unique volume signature over multiple phones
– limit complex signal processing only in the learning phase
– enable to work even when some phones may not be available
Angus F.M. Huang
21
Illustration of online turn monitoring
Angus F.M. Huang
22
Volume Topography-based Algorithm
•
•
•
•
•
•
Training data collection
Feature vector transformation
Topography generation
Classifier training and classification
Turn recognition
Mapping audio signatures (cluster-IDs) to group
members (member-IDs)
Angus F.M. Huang
23
Training data collection
• each phone samples the incoming sound at the rate
of 8 kHz
• audio stream is segmented into 300 ms-frames
• a given time t, each phone i calculates p(t,i)
– the power of the frame from phone i at time t
• the average of the square of the audio signals
• feature vector, P(t) = (p(t,1), p(t,2),…,p(t,np))
Angus F.M. Huang
24
Feature vector transformation
• define the feature vector so that it has discrimination power
• P(t)
– (a), performs poorly in discriminating non-speech turns (or silent turns)
• P'(t) = P(t) / E(t)
– E(t) is an average of a vector P(t)
– (b), discrimination is weak when the number of phones is less than the
group size
– (c), shows P'(t) with one fewer phone
• P''(t) = {D(t,1) × p(t,1) / E(t), …, D(t,np) × p(t,np) / E(t)}
– pref is the standard reference sound pressure level, 20μPa
– discriminate better even with fewer phones
– (d) and (e), show P''(t) using three and two phones
Angus F.M. Huang
25
Distribution of feature vectors
Angus F.M. Huang
26
Learning and Recognition
• Topography generation
– From the training dataset, we build a set of audio-signal signatures
– volume topographies
• Classifier training and classification
– multi-class SVM classifier
– After training has completed, SocioPhone segments turns online by
simply mapping incoming frames into cluster-IDs using this classifier
• Turn recognition
– A turn is detected if two consecutive frames belong to different
clusters
Angus F.M. Huang
27
Platform Implementation
• Monitoring Planner
• Source Selector first figures out how many phones participate
– checks if the phone has sufficient battery power
– sound signals are clear enough for discriminative volume topography
• Execution Planner
– performs turn monitoring with the volume-topography-based method
Angus F.M. Huang
28
Platform Implementation
• Meta-linguistic Information Processor
– Turn Detector computes turns
– Feature Extractor processes prosodic features
– Pattern Analyzer infers a number of meaningful social contexts by
combining turn information and prosodic features
• dominance and leadership in a conversation group, conversation
asymmetry, interactivity, and sparseness
– heuristic metrics
• Level of interactivity: # of speaking turns per minute
• Level of sparseness: # of non-speaking turns over three seconds
per minute
• Level of skewness: standard deviation of # of speaking turns for all
participants
Angus F.M. Huang
29
Platform Implementation
• Interaction History Manager
– supports SQL queries from applications
– SocioPhone holds the turn information for the on-going session in the
memory
– use SQLite database in Android
• Network Interface
– use Bluetooth for peer discovery and communication
• 138 mW (exchange messages every second)
– if Wi-Fi Direct
• 413 mW
Angus F.M. Huang
30
SocioPhone
System
Architecture
Angus F.M. Huang
31
Experiments
• Experimental Setup
– Scenarios and parameters
– Each conversation is 15 minutes of unscripted, free talking
– Galaxy Nexus phones
• Comparisons
– SinglePipe
– CombinePipe
• DarwinPhones, final inference
by combining GMM likelihoods
– SharePipe
• CoMon, among multiple
phones, only a single phone
runs SinglePipe and shares
the results with other phones
to save energy
Angus F.M. Huang
32
Experiments
• Evaluation metrics
– Turn-monitoring accuracy
• Accuracy = {D(TP) + D(TN)} / total time
• Precision = D(TP) / {D(TP) + D(FP)}
• Recall = D(TP) / {D(TP) + D(FN)}
– Resource efficiency in terms of energy and CPU
• Ground-truth annotation
– throat microphone records
Angus F.M. Huang
33
Turn-Monitoring Accuracy
•
•
•
•
•
Angus F.M. Huang
SocioPhone captures the overall turntaking pattern well
CombinePipe also recognizes
speakers well in long-speaking turns,
but often misses short, interactive
turns.
45% of turns are less than four
seconds
More than80% of speeches are less
than 10 seconds
the topic or type of conversation could
change the distribution, but the
general trend would remain stable
34
Effect of Number of Phones
•
•
SocioPhone outperforms the others
regardless of the group size by 1219%
Even with 5 interactants, it shows the
accuracy of 83%
– while the accuracies of other
techniques are below 70%
•
SocioPhone outperforms
CombinePipe except when only two
phones are available
– When the number of available phones
is much smaller than the group size,
our method performs worse than
CombinePipe
•
Angus F.M. Huang
The volume topography-based
method works well even if a small
portion of phones is unavailable
35
Effect of Phone Placement
• SocioPhone shows around 75%
of accuracy even with three
phones placed in a pocket and
two phones on a table
– similar to the accuracy of
CombinePipe with all five phones on
the table
• An F1 score is the harmonic
mean of precision and recall
– the F1 score of each interactants
depends on which phone is placed in
a pocket
• An imbalance of recoding volume
makes inference more difficult for
users with relatively close
positions such as A and C when
C’s phone is in C’s pocket.
Angus F.M. Huang
36
Effect of Places
Precision and recall in a natural situation
Angus F.M. Huang
37
Cost of Turn Monitoring
• A head takes charge of the coordination and final inference
• A member transmits the required information to the head
• A 1750 mAh battery would last about 23 hours with SocioPhone and 1214 hours with others
• Interestingly, CombinePipe’s member consumes 35.3 mW more than the
head, since Bluetooth consumes more power for transmission than for
reception
Angus F.M. Huang
38
Cost Breakdown of Turn Monitoring
Angus F.M. Huang
39
Conclusion
• SocioPhone
– a mobile interaction-monitoring platform
• A set of APIs
– to monitor turn and turn-derived meta-linguistic contexts
• Highlyefficient online turn-monitoring techniques
– volume topography
• SocioPhone applications
– SocioTherapist, SocioDigest, and Tug-of-War
Angus F.M. Huang
40
Angus Comments
• Extend PLASH’s daily recording to personal life
review
Angus F.M. Huang
41
Angus Comments
• From school-life trajectory distribution to School-dynamics
profiling
– School-life dynamics monitoring
– Interest-based friends exploration
– Learning-partners matchmaking
Angus F.M. Huang
42
Angus F.M. Huang
43
Download