“You made me do it”: Classification of Blame in Married Couples’ Interactions by Fusing Automatically Derived Speech and Language Information Matthew P. Black, Panayiotis G. Georgiou, Athanasios Katsamanis, Brian R. Baucom, and Shrikanth S. Narayanan http://sail.usc.edu Examples of Blame Husband blaming Wife Wife blaming Husband I You wasn’t know, expecting you’re a this stranger when to we me were now. I- Ito don’tget married. Wife: I’m Oh, living Bob suggested at home that with this a roommate. might help. You’re Why? Why did one you who even suggested want to it. get therapy? Husband: IIthe just don’t think there is anygoing point in talking Husband: I wasn’t Are you expecting donewho dumping thisthis on me when we yet? were going get married. Wife: know, you’re athat stranger to me now. I- Ito don’tOh, You’re Bobthe suggested one suggested might it. help. about You this. * Note that the above clips are acted. * The real couples analyzed in this study cannot be shown for privacy reasons. Aug. 28, 2011 “You made me do it”: Classification of Blame in Married Couples’ Interactions 2 / 18 Overview • Blame conveyed through various communicative channels – Language (e.g., “you made me do it”) – Speech (e.g., prosody) – Gestures (e.g., pointing) • Goal: classify extreme cases of blame behavior using audio – “Low” vs. “High” • Methodology: combine 2 automatically-derived info sources – Acoustic: models how the spouses spoke – Language: models what the spouses said Aug. 28, 2011 “You made me do it”: Classification of Blame in Married Couples’ Interactions 3 / 18 Why Detect Blame? • Blame is oftentimes targeted in couple therapy – Can lead to escalation of negative affect and resentment [Dimidjian et al. 2008] • Blame is one important behavior researched in psychology – Rely on established manual coding methods – Challenges: coders must be trained, time-consuming • How can technology help? – Automatic detection of blame is scalable alternative to manual coding • Behavioral Signal Processing (BSP) – Quantify and recognize abstract human behaviors in natural interaction settings relevant to psychology research S. Dimidjian, C. R. Martell, and A. Christensen, Clinical Handbook of Couple Therapy, 4th ed. The Guilford Press, 2008, ch. Integrative behavioral couple therapy, pp. 73–106. 4 / 18 General Technical Challenges • Blame is a complex human behavior – High-level, heterogeneous – Need to extract generalizable features • Real data in real-life scenarios – Challenging for robust feature extraction and automatic speech recognition • Information across multiple modalities/cues – Distributed information across various signals: acoustics, language, gestures – How to merge/combine/fuse them Aug. 28, 2011 “You made me do it”: Classification of Blame in Married Couples’ Interactions 5 / 18 Previous Work [Black et al. 2010] • This paper is extension of our earlier work – Classified extreme instances (low/high) for 6 behavioral codes – Only used acoustic cues • “Blame” was one of the more challenging codes to predict – Ignoring important lexical cues regarding blame – Coding manual: “explicit blaming statements (e.g., ‘you made me do it’) warrant a high blame score” • This paper addresses this weakness M. P. Black, A. Katsamanis, C.-C. Lee, A. C. Lammert, B. R. Baucom, A. Christensen, P. G. Georgiou, and S. S. Narayanan, “Automatic classification of married couples’ behavior using audio features,” in Proc. Interspeech, 2010. 6 / 18 Corpus • Real couples in 10-minute problem-solving dyadic interactions – Longitudinal study at UCLA and U. of Washington [Christensen et al. 2004] – 117 distressed couples received couples therapy for 1 year – 569 sessions (96 hours) • Audio – Single channel – Far-field – Variable noise conditions • Transcription (word-level) – Chronological – Speaker labels (wife/husband) – No timing information Wife: then why did you ask Husband: to get us out of debt Wife: mm hmmm ... A. Christensen, D.C. Atkins, S. Berns, J. Wheeler, D. H. Baucom, and L.E. Simpson. “Traditional versus integrative behavioral couple therapy for significantly and chronically distressed married couples.” J. of Consulting and Clinical Psychology, 72:176-191, 2004. 7 / 18 Data Pre-processing • Signal-to-noise ratio (SNR) estimation – Voice activity detection (VAD) [Ghosh et al. 2011] – SNR ranged from -1 dB to 26 dB • Speaker diarization – Used transcriptions and SailAlign’s speech-text alignment (available on web) – Each session’s audio split into wife/husband/unknown turns • Session selection – (SNR > 5 dB) && (Speaker alignment > 55%) – 372 of 569 sessions (65%) met both thresholds (62.8 hours) P. K. Ghosh, A. Tsiartas, and S. S. Narayanan, “Robust voice activity detection using long-term signal variability,” IEEE Trans. Audio, Speech, and Language Processing, vol. 19, no. 3, pp 600–613, 2011. 8 / 18 Classification Set-up Wife • Each spouse’s overall level of blame manually coded – Standardized coding manual [Heavey et al. 2002] – 9-point scale (1 = no blame, 9 = repeated blame) – Multiple trained evaluators (mean1 pairwise = 0.788) 2 3 4 Pearson’s 5 6 7 8correlation 9 Husband Wife High Blame Low Blame Low Blame 1 2 3 4 5 6 7 8 High Blame 9 1 2 3 4 5 6 7 8 9 Husband • Binary classification set-up (top/bottom 20%) – Low blame (70 wife + 70 husband), High blame (70 wife + 70 husband) – Leave-one-couple-out cross-validation – Gender-independent models of blame 1 2 3 4 5 6 7 8 9 C. Heavey, D. Gill, and A. Christensen. Couples interaction rating system 2 (CIRS2). University of California, Los Angeles, 2002. 9 / 18 Methodology Overview • Trained 3 classifiers – Acoustic – Lexical – Fusion Aug. 28, 2011 “You made me do it”: Classification of Blame in Married Couples’ Interactions 10 / 18 Acoustic Classifier • 53,000+ features extracted 1) Extracted frame-level low-level descriptors (LLDs) – Prosodic: f0, intensity [Praat – Boersma 2001] – Spectral: 15 MFCCs, 8 MFBs [openSMILE – Eyben et al. 2010] – Voice quality: jitter, shimmer [openSMILE – Eyben et al. 2010] 2) Separate features for each spouse (wife, husband) 3) 6 temporal granularities – Global: entire session [Black et al. 2010] – Hierarchical: 0.1s, 0.5s, 1s, 5s, 10s disjoint windows [Schuller et al. 2008] 4) 14 static functionals (e.g., mean, std. dev.) • Apply binary classifier – Classifier: Support Vector Machine (SVM) with linear kernel – Confidence score: class probability estimates [LIBSVM – Chang et al. 2001] B. Schuller, M. Wimmer, L. Mösenlechner, C. Kern, D. Arsic, and G. Rigoll, “Brute-forcing hierarchical functionals for paralinguistics: A waste of feature space?” in Proc. ICASSP, 2008. 11 / 18 Lexical Classifier (1/2) • Derived from automatic speech recognition (ASR) Low/High blame Most likely blame class and most likely word sequence Acoustic observations of rated spouse’s speech “turn of rated spouse “Blame class”-specific “Blame class”-specific acoustic model language model Generic Unigram acoustic model language model • Lexical classifier based on “competitive” language models – Low (High) blame language models trained on text of low (high) blame spouses – Classifier: choose the blame class that is most likely – Confidence score: absolute difference in probabilities of low/high blame case Aug. 28, 2011 “You made me do it”: Classification of Blame in Married Couples’ Interactions 12 / 18 Lexical Classifier (2/2) • Single most likely path through word lattice may not be robust – Incorporated probabilities of 100 most likely (“N-best”) paths – Assumed N-best hypotheses independent for this paper nth-best path (n = 1, … , 100) • Oracle lexical classifier – Upper bound on performance of proposed ASR-derived lexical classifier – Assume perfect word recognition rate Manual transcription of rated spouse Aug. 28, 2011 “You made me do it”: Classification of Blame in Married Couples’ Interactions 13 / 18 Fusion Classifier • Complementary info from acoustic and lexical classifiers – Score-level fusion of classifiers using confidence scores • Fusion classifier: another binary SVM – Inputs: “confidence score”-weighted class hypotheses Aug. 28, 2011 “You made me do it”: Classification of Blame in Married Couples’ Interactions 14 / 18 Classification Results System Baseline Unimodal Fusion Classifier Accuracy Chance 140/280 = 50.0% Acoustic 223/280 = 79.6% Lexical/ASR 211/280 = 75.4% Lexical/Oracle 255/280 = 91.1% Acoustic + Lexical/ASR 230/280 = 82.1% Acoustic + Lexical/Oracle 257/280 = 91.8% WER: 40%-90% • • Findings Significant differences – –High All classifiers performance significantly of oracle higher lexical classifier accuracy means than chance lexical(all cues p <are 0.01) important – –Lower Oracle performance classifiers significantly of ASR lexical higher classifier accuracy due to than ASRnon-oracle being challenging (all p < 0.01) – –Fusion (Acoustic classifiers + Lexical/ASR) able to advantageously significantly higher combine thanpartially Lexical/ASR orthogonal only (p < 0.05) language and acoustic information sources Aug. 28, 2011 “You made me do it”: Classification of Blame in Married Couples’ Interactions 15 / 18 Conclusions & Future Work • Modeled high-level blame behaviors from real couples by fusing automatically-derived speech and language information – Proposed acoustic classifier more robust than lexical classifier – Even with noisy ASR, lexical classifier attained 75% accuracy – Successfully separated 82% of extreme instances with fusion classifier • Blaming behaviors are important cue to detect because of their significance in the context of couple therapy – Detection of blame could facilitate clinician-guided drill-down therapy • Future Work – Improve individual classifiers (acoustic/lexical) and fusion classifier – Extend fusion experiments to other behavioral codes – Apply behavioral signal processing methodologies to other domains Aug. 28, 2011 “You made me do it”: Classification of Blame in Married Couples’ Interactions 16 / 18 References & Software REFERENCES M. P. Black, A. Katsamanis, C.-C. Lee, A. C. Lammert, B. R. Baucom, A. Christensen, P. G. Georgiou, and S. S. Narayanan, “Automatic classification of married couples’ behavior using audio features,” in Proc. Interspeech, 2010. A. Christensen, D.C. Atkins, S. Berns, J. Wheeler, D. H. Baucom, and L.E. Simpson. “Traditional versus integrative behavioral couple therapy for significantly and chronically distressed married couples.” J. of Consulting and Clinical Psychology, vol. 72, pp. 176-191, 2004. S. Dimidjian, C. R. Martell, and A. Christensen, Clinical Handbook of Couple Therapy, 4th ed. The Guilford Press, 2008, ch. Integrative behavioral couple therapy, pp. 73–106. C. Heavey, D. Gill, and A. Christensen. Couples interaction rating system 2 (CIRS2). University of California, Los Angeles, 2002. B. Schuller, M. Wimmer, L. Mösenlechner, C. Kern, D. Arsic, and G. Rigoll, “Brute-forcing hierarchical functionals for paralinguistics: A waste of feature space?” in Proc. ICASSP, 2008. SOFTWARE P. Boersma, “Praat, a system for doing phonetics by computer,” Glot International, vol. 5, no. 9/10, pp. 341–345, 2001. C. C. Chang and C. J. Lin, LIBSVM: A library for support vector machines, 2001, software available at http://www.csie.ntu.edu.tw/cjlin/libsvm. F. Eyben, M. Wöllmer, and B. Schuller, “OpenSMILE - The Munich versatile and fast open-source audio feature extractor,” in ACM Multimedia, 2010, pp. 1459–1462. P. K. Ghosh, A. Tsiartas, and S. S. Narayanan, “Robust voice activity detection using long-term signal variability,” IEEE Trans. Audio, Speech, and Language Processing, vol. 19, no. 3, pp 600-613, 2011. A. Katsamanis, M. P. Black, P. G. Georgious, L. Goldstein, and S. S. Narayanan, “SailAlign: Robust long speech-text alignment,” in Proc. of Workshop on New Tools and Methods for Very-Large Scale Phonetics Research, 2011. Aug. 28, 2011 “You made me do it”: Classification of Blame in Married Couples’ Interactions 17 / 18 Related Work at Interspeech 2011 • Papers at Interspeech 2011 on same Couple Therapy corpus: • Saliency detection – Mon at 10:00 – Ses1-P1 – James Gibson, Athanasios Katsamanis, Matthew Black, and Shrikanth Narayanan, Automatic identification of salient acoustic instances in couples' behavioral interactions using Diverse Density Support Vector Machines • Interaction modeling – Tues at 14:30 – Ses2-S1-P – Chi-Chun Lee, Athanasios Katsamanis, Matthew Black, Brian Baucom, Panayiotis Georgiou, and Shrikanth Narayanan, An analysis of PCA-based vocal entrainment measures in married couples' affective spoken interactions Aug. 28, 2011 “You made me do it”: Classification of Blame in Married Couples’ Interactions 18 / 18 Thank you! Questions?