her Linguistic Circle Towards an empirically-based grammar of speech and gestures der U- re navn” g dato”: linjen, æt” > / d og et for nhedens efod Malta, Viadrina, 1 February October 2011 2012 Patrizia Paggio patrizia.paggio@um.edu.dmt University of Malta Institute of Linguistics University of Copenhagen Centre for Language Technology her er uden ling unktå ug kning nstreuden ling, ndsk re navn” g dato”: linjen, æt” > / d og et for nhedens efod Linguistic Linguistic Circle Circle General research questions How can non-verbal behaviour, in particular head movements and facial expressions, be represented in a multimodal grammar? How would such a grammar account for naturally occurring multimodal data? Can machine learning methods be used to make sense of the multimodal data? Data: video-recorded conversations in Danish. Malta, Viadrina, 1 February October 2011 2012 her er uden ling unktå ug kning nstreuden ling, ndsk re navn” g dato”: linjen, æt” > / d og et for nhedens efod Linguistic Linguistic Circle Circle Malta, 1 February 2012 In what sense grammar? Not only syntax, but all aspects of language structure; a system of constraints operating at various levels – phonology, morphology, syntax, semantics – as in HPSG (Pollard and Sag, 1994); extended to include constraints on the interaction between speech and gestures. Gestures = non-verbal behavour her er uden ling unktå ug kning nstreuden ling, ndsk re navn” g dato”: linjen, æt” > / d og et for nhedens efod Linguistic Linguistic Circle Circle Malta, 1 February 2012 Why worry about gestures? Human communication is situated in the human body. Synchrony and semantic parallelism between speech and gesture suggest a common cognitive base. Speech may have evolved on top of gesture Arbib (2005) Several attempts to explain speech and gesture production as an integrated whole growth point (McNeill 2005) visible action as utterance (Kendon 2004) her er uden ling unktå ug kning nstreuden ling, ndsk re navn” g dato”: linjen, æt” > / d og et for nhedens efod Linguistic Linguistic Circle Circle Why is it a challenge? Gestures are: largely non-conventionalised essentially indexical and iconic rather than symbolic ambiguous in content unpredictable at the expression level Malta, 1 February 2012 her er uden ling unktå ug kning Linguistic Linguistic Circle Circle Malta, 1 February 2012 However Gesture families have been described (Kendon 2004): the grip, the ring, the open palm. nstreuden ling, ndsk re navn” g dato”: linjen, æt” > / d og et for nhedens efod Vocabularies of hand gestures (not only emblems) and head movements have been proposed (Kipp 2004, Allwood et al. 2007) her er uden ling unktå ug kning nstreuden ling, ndsk re navn” g dato”: linjen, æt” > / d og et for nhedens efod Linguistic Linguistic Circle Circle Malta, 1 February 2012 However Attempts have been made to include gestures in unification-based representations: to cope with gesture analysis in multimodal interfaces (Johnston et al.,1997), Paggio and Jongejan, 2005) With a theoretical purpose (Alahverdzhieva and Lascarides, 2010) her er uden ling unktå ug kning nstreuden ling, ndsk Linguistic Linguistic Circle Circle Malta, 1 February 2012 Outline Part one The corpus Types of non-verbal behaviour: shape, semiotic types, functions Gesture-speech interaction: lexical and phrasal multimodal signs Part two Automatic classification of multimodal feedback re navn” g dato”: linjen, æt” > / d og et for nhedens efod Viadrina, October 2011 Linguistic Circle The corpus NOMCO first acquaintance corpus Funded by NOS-HS NORDCORP Partners: Gothenburg, Helsinki, Copenhagen Six M and six F age 21-36 Subjects have not met before Each person speaks with a F and a M They speak freely 12 videos, 1 hour Each interaction filmed by three cameras Centre for Malta, Language 1 February Technology 2012 Linguistic Circle The corpus Acknowledgements NOMCO partners Costanza Navarretta Jens Allwood Elisabeth Ahlsén Kristiina Jokinen CST annotators Philip Diderichsen Sara Andersen Josephine Bødker Arrild Anette Studsgård Bjørn Nicola Wesseltolvig Malta, 1 February 2012 Linguistic Circle Centre for Language Technology Malta, 1 February 2012 The corpus: annotation Speech transcribed with word time stamps Topic and focus tagged in each “sentence” Gestures annotated in ANVIL based on the MUMIN scheme (Allwood et al. 2007) Shape attributes Semiotic attributes Functional attributes Each gesture is linked to associated speech segments (gesturer or interlocutor). Linguistic Circle Centre for Language Technology Malta, 1 February 2012 The corpus: annotation Procedure Three coders Cohen’s kappa 0.5-0.6 for face and 0.6-0.8 for head. 10% improvement after coding 5 videos. Linguistic Circle Malta, 1 February 2012 The corpus: some counts Ten videos annotated so far. 3,280 seconds, 12,032 words (incl. filled pauses) Gesture # g/w g/s All 3511 0.29 1.07 Head 2335 0.19 0.71 Face 1176 0.09 0.35 Behaviour Face (#) Head (#) Average SD 61.89 24.41 129.72 34.68 Linguistic Circle Malta, 1 February 2012 Types of non-verbal behaviour Attribute Value HeadMovement Nod, Jerk, HeadForward, HeadBackward, Tilt, SideTurn, Shake, Waggle, HeadOther General face Smile, Laugh, Scowl, FaceOther Eyebrows Frown, Raise, BrowsOther Shape attributes in the coding scheme A gesture type has several values, it can be represented as a feature structure. Linguistic Circle Malta, 1 February 2012 Feature structure representations Linguistic Circle Malta, 1 February 2012 Gestures and semiotic types Semiotic-oriented classifications of gestures are very common (McNeill, Kendon, Kipp). We use Peirce’s three semiotic categories indexical, iconic and symbolic (Peirce 1931). Linguistic Circle Malta, 1 February 2012 Gestures and semiotic types The semiotic class depends on the relation between gesture and denotation: Symbolic: arbitrary conventional relation Iconic: similarity relation (concrete or abstract) Indexical: real and immediate connection (deictic, display, interactive, beat) Linguistic Circle Malta, 1 February 2012 A hierarchy of gestures Specific gestures placed in the intersection between shape and semiotic types. How do we deal with ambiguity? More ‘complex’ types are preferred. Linguistic Circle Communicative functions Gestures can express: Feedback Emphasis/focus Turn-management Discourse structuring Own communication management … We have attributes for many of these functions in the MUMIN coding scheme. Malta, 1 February 2012 Linguistic Circle Malta, 1 February 2012 Communicative functions Feedback A: HEJ jeg hedder HANNE B: jeg hedder JESPER A: OKAY nod (hi my name is Hanne my name is Jesper okay) Linguistic Circle Malta, 1 February 2012 Communicative functions Feedback Attribute Value FeedbackBasic CPU, SelfFB, FBOther FeedbackDirection FBGive, FBElicit, FBGiveElicit, FBunderSpecified Attitude happy, sad, ...,interested, bored,... Linguistic Circle Malta, 1 February 2012 Communicative functions Applied to the example symbol-nod HEAD-MOVEMENT nod feedbackGive FEEDBACK cpu COMM-FUNCT F-DIRECTION feedbackGive ATTITUDE interested FEEDBACK-ARG handle Linguistic Circle Malta, 1 February 2012 Communicative functions Focusing Gestures can reinforce the focus of a sentence, similar to sentence accent. repeated nod (that COULD one could one indeed WELL feel) Linguistic Circle Communicative functions A hierarchy of functions Functions can be formalised as typed feature structures and ordered in a hierarchy. Malta, 1 February 2012 Linguistic Circle Summing up so far A gesture type is represented as a feature structure. The type is constrained with respect to: shape attributes semiotic type communicative function But how to represent the combined speech-gesture contribution. Malta, 1 Feburary 2012 Linguistic Circle Malta, 1 Feburary 2012 Speech-gesture interaction Some claims: Gesture strokes tend to occur before their lexical affiliates (Schegloff, 1984); Gestures attach to prosodically prominent words (Alahverdzhieva and Lascarides, 2010). In our data: Repeated head movements and face expressions overlap with longer linguistic contributions, not always corresponding to syntactic phrases. Linguistic Circle Speech-gesture interaction Gesture and words In the corpus, gestures are linked to semantically related speech segments. The grammar must allow for gestures to combine with speech at lexical and phrasal levels. Malta, 1 Feburary 2012 Linguistic Circle Multimodal signs Multimodal lexemes Malta, 1 Feburary 2012 Linguistic Circle Multimodal signs Multimodal clauses Malta, 1 Feburary 2012 Linguistic Circle Malta, 1 Feburary 2012 PART TWO Automatic classification of multimodal feedback Linguistic Circle An example Malta, 1 Feburary 2012 Linguistic Circle Machine learning on nods and face A variety of previous studies on how nods and gaze cluster in different behaviours, or can be used to classify dialogue acts Few have looked at multimodal contributions (head movements, face, words) Our data allow us to do classification experiments especially concerning feedback (Paggio and Navarretta, 2011) Malta, 1 Feburary 2012 Linguistic Circle Classifying feedback behaviour Can we distinguish feedback based on shape of head and face in combination with cooccurring speech (task 1)? Given knowledge on which head movements express feedback, can we classify the direction (task 2)? Modality FB (%) Other (%) Head 67 68 Face 23 21 Eyebrows 10 11 100 100 Total Focus on head movements Linguistic Circle Methodology The dataset Co-occurring head and face were extracted based on temporal overlap. Related words were added based on the annotated links. The tool Weka, different algorithms Malta, 1 Feburary 2012 Linguistic Circle Malta, 1 Feburary 2012 The dataset Head movements were extracted first, and face expressions were then added to them. Behaviour # Head + face 1205 Head alone 1520 Total 2725 Linguistic Circle Malta, 1 Feburary 2012 The dataset Head movements and feedback Head movement function # FeedbackGive 995 FeedbackElicit 221 FeedbackGiveElicit FeedackUnderspecified 30 2 Total Feedback 1248 Feedback None 1477 Total head movement 2725 Linguistic Circle The dataset: the words Most frequent: + (pause) Followed by: yes, breath, okay, laugh, mm Three ways to add words: 1. Words by the gesturer 2. Words by the interlocutor 3. Both words by gesturer and interlocutor Malta, 1 Feburary 2012 Linguistic Circle Malta, 1 Feburary 2012 The results, task 1 Best results obtained with all words. Baseline: ZeroR, Other classifiers: SMO Linguistic Circle Malta, 1 Feburary 2012 The results, task 2 Best results are obtained with the gesturer words. Baseline: ZeroR, Other classifiers: Naive Bayes Linguistic Circle Conclusion of part two Gestural feedback can be classified with good accuracy when face as well as words are used. This confirms the fact that we need a multimodal approach. In future we want to add body posture features, prosody and preceding context. Malta, 1 Feburary 2012 Linguistic Circle General conclusion I hope to have shown that the study of multimodal behaviour is relevant for linguistics and can be approached with formal as well as empirical methods. We have plans to develop a similar corpus at UoM (MAMCO project). Malta, 1 Feburary 2012 Linguistic Circle Centre for Language Technology Grazzi!