Information Status

advertisement
Information Status
Varieties of Information Status
– Contrast John wanted a poodle but Becky preferred a
corgi.
– Topic/comment The corgi they bought turned out to
have fleas.
– Theme/rheme The corgi they bought turned out to
have fleas.
– Focus/presupposition It was Becky who took him to
the vet.
– Given/new Some wildcats bite, but this wildcat
turned out to be a sweetheart.
Today: Given/New
• Why do we care about Given/New?
• Defining Given/New: why is this hard?
– Hearer-based and Discourse-based models
• Uses of Given/New information in NLP
• Identifying Given/New information automatically
– Rule-based
– Corpus-based
– The Boston Directions Corpus
– Laboratory studies suggest new directions
Why do we care about the given/new distinction?
• Building a model of the discourse
– What do S and H believe to be true?
– What is in their consciousness now?
– What is ‘grounded’?
• Speech technologies
– TTS: Given information is often deaccented
while new information is usually accented
– ASR?
Defining Given/New
• Halliday ‘67:
– Given: Recoverable from some form of context
– New: Not recoverable
• Chafe ’74 ’76:
– Given: what S believes is in H’s consciousness
– New: what S believes is not…
– “Chafe-givenness”
Yesterday I had my class disrupted by a bulldog/dog.
I’m beginning to dislike dogs/bulldogs.
• But not vice versa….
Prince ’81: A Given/New Taxonomy
• Text as set of instructions from S to H on how to
construct a discourse model
– Model includes discourse entities, attributes,
and links between entities
– Discourse entities: individuals, classes,
exemplars, substances, concepts (NPs)
– Entities as ‘hooks’ on which to hang attributes
(Webber ’78)
• Entities when first introduced are new
– Brand-new (H must create a new entity)
I saw a dinosaur today.
– Unused (H already knows of this entity)
I saw your mother today.
• Evoked entities are old -- already in the discourse
– Textually evoked
The dinosaur was scaley and gray.
– Situationally evoked
The light was red when you went through it.
• Inferrables
– Containing
I bought a carton of eggs. One of them was broken.
– Non-containing
A bus pulled up beside me. The driver was a monkey.
Given/New and Definiteness/Indefiniteness
– Definiteness: subject NPs tend to be
syntactically definite and old
– Indefiniteness: object NPs tend to be indefinite
and new
I saw a black cat yesterday. The cat looked hungry.
• Definite articles, demonstratives, possessives,
personal pronouns, proper nouns, quantifiers like all,
every signal definiteness…but…
There were the usual suspects at the bar.
• Indefinite articles, quantifiers like some, any, one
signal indefiniteness…but….
This guy came into the room
What’s wrong with a simple Hearer-centric
model of given/new?
• Hearer-centric information status:
– Given: what S believes H has in his/her
consciousness
– New: what S believes H does not have in
his/her consciousness
• But discourse entities may also be given and new
wrt the current discourse
– Discourse-old: already evoked in the discourse
– Discourse-new: not evoked
(1) A: I’ve decided to make an appointment with Lee Bollinger.
(2) B: Why do you want to see Bollinger?
• Hearer status of discourse entities in 1? 2?
– If B is your roommate? your mother? a guy on
the subway?
• Discourse status of discourse entities in 1? 2?
• What would be the hearer/discourse status of
discourse entities in this version?
(1) A: I’ve decided to make an appointment with Lee Bollinger.
(2a) B: Why do you want to see the president?
(2b) B: Have you talked to his secretary?
What does this new Hearer/Discourse given/new
distinction provide?
• A way to separate what is explicit in the discourse
model from what is believed to be in
speaker/hearer cognitive model
• A way to explain given/new in more complex
terms
– To identify coreference relations
– To explain deaccenting in ASR and TTS
Gross Oversimplification: Given Items Tend to be
Deaccented
• Accenting and deaccenting: making items
intonationally prominent or not
• Critical to get this distinction ‘right’ in TTS
– Accenting everything makes it hard for people
to understand anything, e.g.
I like my cat and my cat adores me.
One potato, two potato, three potato,…
If a discourse entity is given for one speaker then
it may or may not be given for another speaker.
How can we determine automatically whether a
discourse entity is given or new?
• A rule-based approach:
– Stem the content words in the discourse
– Select a window within which incoming items
with the same stem as a previous entity and
within this window will be labeled ‘given’
• Other items are ‘new’
• Is this hearer-based? Discourse-based?
• How well does it work?
– 65-75% accurate (precision) depending on
genre, domain
Boston Directions Corpus (Hirschberg &
Nakatani ’96)
• Experimental Design
• 12 speakers: 4 used
• Spontaneous and read versions of 9 direction-giving
tasks
• Corpus: 50m read; 67m spon
• Labeling
– Prosodic: ToBI intonational labeling
– Discourse: Grosz & Sidner
– Given/new (Prince ’92), grammatical function,
p.o.s.,…
Boston Directions Corpus: Describe how to get
to MIT from Harvard
d1: dsp1: step 1: enter and get token
first
enter the Harvard Square T stop
and buy a token
d2: dsp2: inbound on red line
then
proceed to get on the
inbound
um
Red Line
uh subway
dp3 dsp3: take subway from hs, to cs to ks
and
take the subway
from Harvard Square
to Central Square
and then to Kendall Square
dp4: dsp4: get off T.
then get off the T
Hearer and Discourse Given/New Labeling
first
enter <HG/DN the Harvard Square T stop>
and buy <HI/DN a token>
then
proceed to get on <HI/DN the
inbound
um
Red Line
uh subway>
and
take <HG/DG the subway>
from <HG/DG Harvard Square>
to <HG/DN Central Square>
and then to <HG/DN Kendall Square>
then get off <HG/DG the T>
What could we do with this labeled data?
• Can we predict given/new?
• Can we predict what will be accented and what
will be deaccented?
Does Given/New Status Predict Deaccenting?
NPa
Deaccented
Total
HG
HI
HN
DG
DN
37.1%
53.9%
26.2%
43.3%
38.8%
1009
406
130
596
950
What else might be at work?
• Given/new and grammatical function
• Hypothesis: how discourse entities are evoked in a
discourse influences how ‘given’ they are
• E.g., How might grammatical function and surface
position interact with the accentuation of ‘given’ items?
• Cases:
– X has not been mentioned in the prior context
– X has been mentioned, with the same grammatical
function/surface position
– X has been mentioned but with a different grammatical
function/surface position
Experimental Design
• Major problem:
– How to elicit ‘spontaneous’ productions while
varying desired phenomena systematically?
– Key: simple variations and actions can
capitalize upon natural tendency to associate
grammatical functions with particular thematic
roles for a given set of verbs
Rectangle
Triangle
Cylinder
Octagon
Diamond
Context 1
Rectangle
Triangle
Cylinder
Octagon
Diamond
Context 2
Rectangle
Triangle
Cylinder
Diamond
Octagon
Context 3
Rectangle
Triangle
Cylinder
Octagon
Diamond
Target(A)
Triangle
Rectangle
Cylinder
Octagon
Diamond
Target(B)
Rectangle
Triangle
Cylinder
Octagon
Diamond
Experimental Conditions
• 10 native speakers of standard American English
• Subject and experimenter in soundproof booth
• Subject told to describe scenes to confederate
outside the booth, visible but with providing no
feedback
• 10 practice scenarios
• ~20 minutes per subject
Prosodic Analysis
• Target turns excised and analyzed by two judges
independently for location of pitch accents for
each referring expression: accented (2), unsure (1),
deaccented (0)  accentedness score from 0-4
(81% agreement for 0 and 2 scores)
Grammatical Role/Surface Position Accenting
CONTEXT
GIVEN
NEW
TARGET
Subj
D-obj
Pp-obj
Subj
2.1
3.6
3.2
D-obj
3.3
0.6
1.6
Pp-obj
3.0
1.4
0.7
3.7
3.8
--
Findings
• In general
– Items that differ from context to target in
grammatical function or surface position tend to
be accented
– Items that share grammatical function and
surface position tend to be deaccented
• But
– Subjects tend to be accented more often than
objects, even if previously mentioned in the
same role
– Direct objects and pp-objects tend to be more
distinguished from subjects than from one
another
How can we explain these observations?
• Consider our examples, e.g. subjD.O.
The TRIANGLE touches the CYLINDER.
The triangle touches the DIAMOND.
The triangle touches the OCTAGON.
The RECTANGLE touches the TRIANGLE.
• An entity may be ‘given’ or ‘new’ wrt the role it
plays in the discourse
Given/New Sensitive to the Role the Discourse
Entity Plays
• E.g., a discourse entity may retain a given or take on a new
thematic role
– By the time the target is uttered, ‘triangle’ is established
both as a ‘given’ discourse entity and as the discourse
topic (or BLC in centering theory)
– But this status has been established for ‘triangle’ as
agent
– What is new, and, perhaps, focused in the target is
‘triangle’s’ new thematic role as patient – the players
are the same but the roles are different
Consequences for NLP
– Identification of given/new status must be
sensitive to more complex model of
context (grammatical function/thematic
role)
– Will this help us predict deaccenting
more accurately?
– Stay tuned…..
Next Class
Download