Towards an empirically-based grammar of speech and gestures Patrizia Paggio

advertisement
her
Linguistic Circle
Towards an empirically-based
grammar of speech and gestures
der
U-
re
navn”
g dato”:
linjen,
æt” >
/
d og
et for
nhedens
efod
Malta,
Viadrina,
1 February
October 2011
2012
Patrizia Paggio
patrizia.paggio@um.edu.dmt
University of Malta
Institute of Linguistics
University of Copenhagen
Centre for Language Technology
her
er uden
ling
unktå
ug
kning
nstreuden
ling,
ndsk
re
navn”
g dato”:
linjen,
æt” >
/
d og
et for
nhedens
efod
Linguistic
Linguistic Circle
Circle
General research questions
How can non-verbal behaviour, in particular head
movements and facial expressions, be
represented in a multimodal grammar?
How would such a grammar account for naturally
occurring multimodal data?
Can machine learning methods be used to make
sense of the multimodal data?
Data: video-recorded conversations in
Danish.
Malta,
Viadrina,
1 February
October 2011
2012
her
er uden
ling
unktå
ug
kning
nstreuden
ling,
ndsk
re
navn”
g dato”:
linjen,
æt” >
/
d og
et for
nhedens
efod
Linguistic
Linguistic Circle
Circle
Malta, 1 February 2012
In what sense grammar?
 Not only syntax, but all aspects of language
structure;
 a system of constraints operating at various
levels – phonology, morphology, syntax,
semantics – as in HPSG (Pollard and Sag,
1994);
 extended to include constraints on the
interaction between speech and gestures.
Gestures = non-verbal behavour
her
er uden
ling
unktå
ug
kning
nstreuden
ling,
ndsk
re
navn”
g dato”:
linjen,
æt” >
/
d og
et for
nhedens
efod
Linguistic
Linguistic Circle
Circle
Malta, 1 February 2012
Why worry about gestures?
 Human communication is situated in the
human body.
 Synchrony and semantic parallelism
between speech and gesture suggest a
common cognitive base.
 Speech may have evolved on top of gesture
Arbib (2005)
 Several attempts to explain speech and
gesture production as an integrated whole
growth point (McNeill 2005)
visible action as utterance (Kendon 2004)
her
er uden
ling
unktå
ug
kning
nstreuden
ling,
ndsk
re
navn”
g dato”:
linjen,
æt” >
/
d og
et for
nhedens
efod
Linguistic
Linguistic Circle
Circle
Why is it a challenge?
Gestures are:
 largely non-conventionalised
 essentially indexical and iconic rather
than symbolic
 ambiguous in content
 unpredictable at the expression level
Malta, 1 February 2012
her
er uden
ling
unktå
ug
kning
Linguistic
Linguistic Circle
Circle
Malta, 1 February 2012
However
Gesture families have been described (Kendon
2004): the grip, the ring, the open palm.
nstreuden
ling,
ndsk
re
navn”
g dato”:
linjen,
æt” >
/
d og
et for
nhedens
efod
Vocabularies of hand gestures (not only emblems)
and head movements have been proposed (Kipp
2004, Allwood et al. 2007)
her
er uden
ling
unktå
ug
kning
nstreuden
ling,
ndsk
re
navn”
g dato”:
linjen,
æt” >
/
d og
et for
nhedens
efod
Linguistic
Linguistic Circle
Circle
Malta, 1 February 2012
However
Attempts have been made to include gestures in
unification-based representations:
 to cope with gesture analysis in multimodal
interfaces (Johnston et al.,1997), Paggio and Jongejan,
2005)
 With a theoretical purpose (Alahverdzhieva and
Lascarides, 2010)
her
er uden
ling
unktå
ug
kning
nstreuden
ling,
ndsk
Linguistic
Linguistic Circle
Circle
Malta, 1 February 2012
Outline
Part one
 The corpus
 Types of non-verbal behaviour:
shape, semiotic types, functions

Gesture-speech interaction:
lexical and phrasal multimodal signs
Part two

Automatic classification of multimodal
feedback
re
navn”
g dato”:
linjen,
æt” >
/
d og
et for
nhedens
efod
Viadrina, October 2011
Linguistic Circle
The corpus
NOMCO first acquaintance corpus
Funded by NOS-HS NORDCORP
Partners: Gothenburg, Helsinki, Copenhagen
Six M and six F age 21-36
Subjects have not met before
Each person speaks with a F and a M
They speak freely
12 videos, 1 hour
Each interaction filmed by three cameras
Centre for
Malta,
Language
1 February
Technology
2012
Linguistic Circle
The corpus
Acknowledgements
NOMCO partners
Costanza Navarretta
Jens Allwood
Elisabeth Ahlsén
Kristiina Jokinen
CST annotators
Philip Diderichsen
Sara Andersen
Josephine Bødker Arrild
Anette Studsgård
Bjørn Nicola Wesseltolvig
Malta, 1 February 2012
Linguistic Circle
Centre for
Language
Technology
Malta,
1 February
2012
The corpus: annotation
 Speech transcribed with word time stamps
 Topic and focus tagged in each “sentence”
 Gestures annotated in ANVIL based on the
MUMIN scheme (Allwood et al. 2007)
 Shape attributes
 Semiotic attributes
 Functional attributes
 Each gesture is linked to associated speech
segments (gesturer or interlocutor).
Linguistic Circle
Centre for
Language
Technology
Malta,
1 February
2012
The corpus: annotation
Procedure
Three coders
Cohen’s kappa 0.5-0.6 for face and 0.6-0.8 for
head.
10% improvement after coding 5 videos.
Linguistic Circle
Malta, 1 February 2012
The corpus: some counts
Ten videos annotated so far.
3,280 seconds, 12,032 words (incl. filled pauses)
Gesture
#
g/w
g/s
All
3511
0.29
1.07
Head
2335
0.19
0.71
Face
1176
0.09
0.35
Behaviour
Face (#)
Head (#)
Average
SD
61.89
24.41
129.72
34.68
Linguistic Circle
Malta, 1 February 2012
Types of non-verbal behaviour
Attribute
Value
HeadMovement
Nod, Jerk, HeadForward,
HeadBackward, Tilt, SideTurn,
Shake, Waggle, HeadOther
General face
Smile, Laugh, Scowl, FaceOther
Eyebrows
Frown, Raise, BrowsOther
Shape attributes in the coding scheme
A gesture type has several values, it can
be represented as a feature structure.
Linguistic Circle
Malta, 1 February 2012
Feature structure representations
Linguistic Circle
Malta, 1 February 2012
Gestures and semiotic types
Semiotic-oriented classifications of
gestures are very common (McNeill,
Kendon, Kipp).
We use Peirce’s three semiotic categories
indexical, iconic and symbolic (Peirce
1931).
Linguistic Circle
Malta, 1 February 2012
Gestures and semiotic types
The semiotic class depends on the relation
between gesture and denotation:
 Symbolic: arbitrary conventional
relation
 Iconic: similarity relation (concrete or
abstract)
 Indexical: real and immediate
connection (deictic, display, interactive,
beat)
Linguistic Circle
Malta, 1 February 2012
A hierarchy of gestures
Specific gestures placed in the intersection between
shape and semiotic types.
How do we deal with ambiguity? More ‘complex’ types
are preferred.
Linguistic Circle
Communicative functions
Gestures can express:
 Feedback
 Emphasis/focus
 Turn-management
 Discourse structuring
 Own communication management
 …
We have attributes for many of these
functions in the MUMIN coding scheme.
Malta, 1 February 2012
Linguistic Circle
Malta, 1 February 2012
Communicative functions
Feedback
A: HEJ jeg hedder HANNE
B: jeg hedder JESPER
A: OKAY
nod
(hi my name is Hanne
my name is Jesper
okay)
Linguistic Circle
Malta, 1 February 2012
Communicative functions
Feedback
Attribute
Value
FeedbackBasic
CPU, SelfFB, FBOther
FeedbackDirection
FBGive, FBElicit, FBGiveElicit,
FBunderSpecified
Attitude
happy, sad, ...,interested, bored,...
Linguistic Circle
Malta, 1 February 2012
Communicative functions
Applied to the example
symbol-nod
HEAD-MOVEMENT nod
feedbackGive
FEEDBACK cpu
COMM-FUNCT F-DIRECTION feedbackGive
ATTITUDE interested
FEEDBACK-ARG handle
Linguistic Circle
Malta, 1 February 2012
Communicative functions
Focusing
Gestures can reinforce the focus of a sentence,
similar to sentence accent.
repeated nod
(that COULD one could one indeed WELL feel)
Linguistic Circle
Communicative functions
A hierarchy of functions
Functions can be formalised as typed feature
structures and ordered in a hierarchy.
Malta, 1 February 2012
Linguistic Circle
Summing up so far
A gesture type is represented as a feature
structure.
The type is constrained with respect to:
 shape attributes
 semiotic type
 communicative function
But how to represent the combined
speech-gesture contribution.
Malta, 1 Feburary 2012
Linguistic Circle
Malta, 1 Feburary 2012
Speech-gesture interaction
Some claims:
 Gesture strokes tend to occur before their
lexical affiliates (Schegloff, 1984);
 Gestures attach to prosodically prominent words
(Alahverdzhieva and Lascarides, 2010).
In our data:
 Repeated head movements and face
expressions overlap with longer linguistic
contributions, not always corresponding to
syntactic phrases.
Linguistic Circle
Speech-gesture interaction
Gesture and words
In the corpus, gestures are linked to
semantically related speech segments.
The grammar must allow for gestures to combine
with speech at lexical and phrasal levels.
Malta, 1 Feburary 2012
Linguistic Circle
Multimodal signs
Multimodal lexemes
Malta, 1 Feburary 2012
Linguistic Circle
Multimodal signs
Multimodal clauses
Malta, 1 Feburary 2012
Linguistic Circle
Malta, 1 Feburary 2012
PART TWO
Automatic classification of multimodal
feedback
Linguistic Circle
An example
Malta, 1 Feburary 2012
Linguistic Circle
Machine learning on nods and face
 A variety of previous studies on how nods and
gaze cluster in different behaviours, or can be
used to classify dialogue acts
 Few have looked at multimodal contributions
(head movements, face, words)
 Our data allow us to do classification
experiments especially concerning feedback
(Paggio and Navarretta, 2011)
Malta, 1 Feburary 2012
Linguistic Circle
Classifying feedback behaviour


Can we distinguish feedback based on shape of
head and face in combination with cooccurring speech (task 1)?
Given knowledge on which head movements
express feedback, can we classify the direction
(task 2)?
Modality
FB (%)
Other (%)
Head
67
68
Face
23
21
Eyebrows
10
11
100
100
Total
Focus on head movements
Linguistic Circle
Methodology
The dataset
Co-occurring head and face were
extracted based on temporal overlap.
Related words were added based on the
annotated links.
The tool
Weka, different algorithms
Malta, 1 Feburary 2012
Linguistic Circle
Malta, 1 Feburary 2012
The dataset
Head movements were extracted first, and
face expressions were then added to
them.
Behaviour
#
Head + face
1205
Head alone
1520
Total
2725
Linguistic Circle
Malta, 1 Feburary 2012
The dataset
Head movements and feedback
Head movement function
#
FeedbackGive
995
FeedbackElicit
221
FeedbackGiveElicit
FeedackUnderspecified
30
2
Total Feedback
1248
Feedback None
1477
Total head movement
2725
Linguistic Circle
The dataset: the words
Most frequent: + (pause)
Followed by: yes, breath, okay, laugh,
mm
Three ways to add words:
1. Words by the gesturer
2. Words by the interlocutor
3. Both words by gesturer and
interlocutor
Malta, 1 Feburary 2012
Linguistic Circle
Malta, 1 Feburary 2012
The results, task 1
Best results obtained with all words.
Baseline: ZeroR, Other classifiers: SMO
Linguistic Circle
Malta, 1 Feburary 2012
The results, task 2
Best results are obtained with the
gesturer words.
Baseline: ZeroR, Other classifiers: Naive Bayes
Linguistic Circle
Conclusion of part two
Gestural feedback can be classified with
good accuracy when face as well as
words are used.
This confirms the fact that we need a
multimodal approach.
In future we want to add body posture
features, prosody and preceding
context.
Malta, 1 Feburary 2012
Linguistic Circle
General conclusion
I hope to have shown that the study of
multimodal behaviour is relevant for
linguistics and can be approached with
formal as well as empirical methods.
We have plans to develop a similar corpus
at UoM (MAMCO project).
Malta, 1 Feburary 2012
Linguistic Circle
Centre for Language Technology
Grazzi!
Download