TMSG SocioPhone: Everyday Face-to-Face Interaction Monitoring Platform Using Multi-Phone Sensor Fusion 實驗室: 先進網路技術與服務實驗室 報告者: 黃福銘 (Angus F.M. Huang) 2013.09.04 Publication • MobiSys 2013 • Session 8: Behavior and Activity Recognition • NuActiv: Recognizing Unseen New Activities Using Semantic AttributeBased Learning • SocioPhone: Everyday Face-To-Face Interaction Monitoring Platform Using Multi-Phone Sensor Fusion • MoodScope: Building a Mood Sensor from Smartphone Usage Patterns • Auditeur: A Mobile-Cloud Service Platform for Acoustic Event Detection on Smartphones Angus F.M. Huang 2 Authors • Youngki Lee, Chulhong Min, Chanyou Hwang, Jaeung Lee, Inseok Hwang,Younghyun Ju, Chungkuk Yoo, Miri Moon, Uichin Lee, Junehwa Song • • • • • School of Information Systems, Singapore Management University Computer Science Department, KAIST, Web Science and Technology Division, KAIST Center for Mobile Software Platform, KAIST Knowledge Service Engineering Department, KAIST – http://www.kaist.edu/edu.html – (韓國科學技術院) Angus F.M. Huang 3 • Face-to-face interaction monitoring – Interaction-aware applications – Group conversations – Useful meta-linguistic contexts of conversation – Online turn monitoring – SocioTherapist, SocioDigest, Tug-of-War Abstract Angus F.M. Huang 4 Outline • • • • • • INTRODUCTION SOCIOPHONE API And APPLICATIONS IN-SITU TURN MONITORING PLATFORM IMPLEMENTATION EXPERIMENTS CONCLUSION Angus F.M. Huang 5 Introduction • Face-to-face social interaction • Interaction-aware applications – Helps a user remember the name of the person – Developers do not know which contexts to leverage during daily conversations • Communicative cues – Verbal cues • Spoken words and sentences – Aural cues • Tones, pitch – Visual cues • Gesture, eye contact Angus F.M. Huang 6 Introduction • Conversational turns – The basic unit of conversation – A turn is a continuous speech segment where a person starts and ends her speech • Basic aspects of a conversation – Length, speed, participant,… • High-level social inference – Role, problem,… • It will improve collaborative decision psychological care, content analysis,… Angus F.M. Huang making, 7 Challenges • Mobile environment challenges – Unconstrained acoustic situations – Real-time monitoring – Battery limitations • Short-lasting turns (1-2 seconds) • Long speech segments (3-8 seconds) • Dynamic ambient noises • Significant power consuming for high-rate sound sensing and heavy computation Angus F.M. Huang 8 SocioPhone API and Applications • Meta-Linguistic Interaction Monitoring • SocioPhone API • Example Applications on SocioPhone • Online conversation monitoring platform – Meta-linguistic conversational contexts Angus F.M. Huang 9 Online turn segmentation and metalinguistic conversation monitoring Angus F.M. Huang 10 Meta-Linguistic Interaction Monitoring • Online turn segmentation – Execute online turn segmentation using smartphones – Turn : (speaking person, start time, end time) • Meta-linguistic conversation monitoring – Track non-verbal elements during conversations – Turn features • Individual participant – Speaking length, number of turns, duration statistics • Relations among participants – Turn taking orders, pair-wise turn-taking frequencies • Whole interaction session – Duration of speaking and non-speaking turns – Prosodic features • Pitch, energy, loudness, rhythm, formant, bandwidth, spectrum intensity Angus F.M. Huang 11 SocioPhone API: Monitoring sessions and turns • registerSessionStartListener() – start/end of a conversation – join/leave of a participant – Session table • registerTurnChangeListener() – provides the Turn information continuously upon each turn-taking event – Turn table Angus F.M. Huang 12 SocioPhone API: Monitoring meta-linguistic interactions • enableProsodicFeatures() – retrieve rich prosodic features associated with each turn • getSparsity() – returns how far the speaking turns are separated by nonspeaking turns • registerDominanceListener() – encapsulates complex social inference to find someone with dominance over the conversation Angus F.M. Huang 13 SocioPhone API: Querying interaction history • getOnGoingSessionHistory() – query the ongoing session • getPastInteractionHistory() – query completed sessions • Example queries – “How many turns has John taken within last 10 minutes” – “Which three friends has John spoken to the most this week?” – a conventional SQL interface Angus F.M. Huang 14 Angus F.M. Huang 15 Angus F.M. Huang 16 SocioPhone Application: SocioTherapist • Speech therapy sessions for autistic children – Therapist often employs a stimulus – Upon a successful response – Reinforced with small rewards • SocioTherapist App – Mimic stimuli and reinforcements – A callback is triggered for every turn-taking event – Looks for initiations, long-lasting turns, and rapid responses • Well-known robotic characters for children gradually upgraded upon desirable turn-takings Angus F.M. Huang 17 SocioPhone Application: SocioDigest • • • (a) Cumulative conversation time within the user’s social circle (b) Relative per-person talking times in a session (c) Relative number of turns exchanged in a session Angus F.M. Huang 18 SocioPhone Application: Tug-of-War • In group meetings or brainstorming – the level of participation of each individual may vary greatly • Balance participations from all individuals – yield better outcomes in brainstorming • Tug-of-War app – monitor turn-takings of participants – provides in-situ graphical feedback of how long each has talked so far • The lines of kernel code of our prototype is only 75 – the effectiveness of SocioPhone to facilitate the development of interaction-aware applications Angus F.M. Huang 19 Challenges and Limitations • Challenges in Daily Conversation Monitoring – Interaction patterns – Environmental characteristics • Limitations of Existing Techniques – – – – Slow, inaccurate speaking turn detection Vulnerability to real-life acoustic environments High energy consumption Limitation of existing collaborative sensing approaches Angus F.M. Huang 20 In-situ Turn Monitoring • Speaker recognition – each phone measures a speaker’s voice signal strength • Volume-peak-based algorithm – to select a phone that has the strongest signal strength • Limitations – location and placement of phones are not controllable – some of the phones may not be available – peak detection is susceptible to background noise • Volume topography-based method – each speaker have a unique volume signature over multiple phones – limit complex signal processing only in the learning phase – enable to work even when some phones may not be available Angus F.M. Huang 21 Illustration of online turn monitoring Angus F.M. Huang 22 Volume Topography-based Algorithm • • • • • • Training data collection Feature vector transformation Topography generation Classifier training and classification Turn recognition Mapping audio signatures (cluster-IDs) to group members (member-IDs) Angus F.M. Huang 23 Training data collection • each phone samples the incoming sound at the rate of 8 kHz • audio stream is segmented into 300 ms-frames • a given time t, each phone i calculates p(t,i) – the power of the frame from phone i at time t • the average of the square of the audio signals • feature vector, P(t) = (p(t,1), p(t,2),…,p(t,np)) Angus F.M. Huang 24 Feature vector transformation • define the feature vector so that it has discrimination power • P(t) – (a), performs poorly in discriminating non-speech turns (or silent turns) • P'(t) = P(t) / E(t) – E(t) is an average of a vector P(t) – (b), discrimination is weak when the number of phones is less than the group size – (c), shows P'(t) with one fewer phone • P''(t) = {D(t,1) × p(t,1) / E(t), …, D(t,np) × p(t,np) / E(t)} – pref is the standard reference sound pressure level, 20μPa – discriminate better even with fewer phones – (d) and (e), show P''(t) using three and two phones Angus F.M. Huang 25 Distribution of feature vectors Angus F.M. Huang 26 Learning and Recognition • Topography generation – From the training dataset, we build a set of audio-signal signatures – volume topographies • Classifier training and classification – multi-class SVM classifier – After training has completed, SocioPhone segments turns online by simply mapping incoming frames into cluster-IDs using this classifier • Turn recognition – A turn is detected if two consecutive frames belong to different clusters Angus F.M. Huang 27 Platform Implementation • Monitoring Planner • Source Selector first figures out how many phones participate – checks if the phone has sufficient battery power – sound signals are clear enough for discriminative volume topography • Execution Planner – performs turn monitoring with the volume-topography-based method Angus F.M. Huang 28 Platform Implementation • Meta-linguistic Information Processor – Turn Detector computes turns – Feature Extractor processes prosodic features – Pattern Analyzer infers a number of meaningful social contexts by combining turn information and prosodic features • dominance and leadership in a conversation group, conversation asymmetry, interactivity, and sparseness – heuristic metrics • Level of interactivity: # of speaking turns per minute • Level of sparseness: # of non-speaking turns over three seconds per minute • Level of skewness: standard deviation of # of speaking turns for all participants Angus F.M. Huang 29 Platform Implementation • Interaction History Manager – supports SQL queries from applications – SocioPhone holds the turn information for the on-going session in the memory – use SQLite database in Android • Network Interface – use Bluetooth for peer discovery and communication • 138 mW (exchange messages every second) – if Wi-Fi Direct • 413 mW Angus F.M. Huang 30 SocioPhone System Architecture Angus F.M. Huang 31 Experiments • Experimental Setup – Scenarios and parameters – Each conversation is 15 minutes of unscripted, free talking – Galaxy Nexus phones • Comparisons – SinglePipe – CombinePipe • DarwinPhones, final inference by combining GMM likelihoods – SharePipe • CoMon, among multiple phones, only a single phone runs SinglePipe and shares the results with other phones to save energy Angus F.M. Huang 32 Experiments • Evaluation metrics – Turn-monitoring accuracy • Accuracy = {D(TP) + D(TN)} / total time • Precision = D(TP) / {D(TP) + D(FP)} • Recall = D(TP) / {D(TP) + D(FN)} – Resource efficiency in terms of energy and CPU • Ground-truth annotation – throat microphone records Angus F.M. Huang 33 Turn-Monitoring Accuracy • • • • • Angus F.M. Huang SocioPhone captures the overall turntaking pattern well CombinePipe also recognizes speakers well in long-speaking turns, but often misses short, interactive turns. 45% of turns are less than four seconds More than80% of speeches are less than 10 seconds the topic or type of conversation could change the distribution, but the general trend would remain stable 34 Effect of Number of Phones • • SocioPhone outperforms the others regardless of the group size by 1219% Even with 5 interactants, it shows the accuracy of 83% – while the accuracies of other techniques are below 70% • SocioPhone outperforms CombinePipe except when only two phones are available – When the number of available phones is much smaller than the group size, our method performs worse than CombinePipe • Angus F.M. Huang The volume topography-based method works well even if a small portion of phones is unavailable 35 Effect of Phone Placement • SocioPhone shows around 75% of accuracy even with three phones placed in a pocket and two phones on a table – similar to the accuracy of CombinePipe with all five phones on the table • An F1 score is the harmonic mean of precision and recall – the F1 score of each interactants depends on which phone is placed in a pocket • An imbalance of recoding volume makes inference more difficult for users with relatively close positions such as A and C when C’s phone is in C’s pocket. Angus F.M. Huang 36 Effect of Places Precision and recall in a natural situation Angus F.M. Huang 37 Cost of Turn Monitoring • A head takes charge of the coordination and final inference • A member transmits the required information to the head • A 1750 mAh battery would last about 23 hours with SocioPhone and 1214 hours with others • Interestingly, CombinePipe’s member consumes 35.3 mW more than the head, since Bluetooth consumes more power for transmission than for reception Angus F.M. Huang 38 Cost Breakdown of Turn Monitoring Angus F.M. Huang 39 Conclusion • SocioPhone – a mobile interaction-monitoring platform • A set of APIs – to monitor turn and turn-derived meta-linguistic contexts • Highlyefficient online turn-monitoring techniques – volume topography • SocioPhone applications – SocioTherapist, SocioDigest, and Tug-of-War Angus F.M. Huang 40 Angus Comments • Extend PLASH’s daily recording to personal life review Angus F.M. Huang 41 Angus Comments • From school-life trajectory distribution to School-dynamics profiling – School-life dynamics monitoring – Interest-based friends exploration – Learning-partners matchmaking Angus F.M. Huang 42 Angus F.M. Huang 43