Speech acts for dialogue agents

advertisement
Ashish Vaswani
Speech acts for Dialogue agents,
Coding schemes and dialogue act
taxonomies
Speech acts for dialogue agents
(Traum)
• Talks about the role of speech acts in allowing an agent
to participate in dialogue with another agent
• A dialogue agent is one that can interact and
communicate with other agents in a coherent manner,
not just with one-shot messages but with a sequence of
related messages all on the same topic or In the service
of an overall goal.
• In studying speech acts, the focus is on pragmatics
rather than semantics i.e how is language used by
agents, and not what the sentences mean.
Foundational Philosophical speech
act work
• Began with philosophers of language interested in issues in Natural
language pragmatics
• Austin:
–
–
–
–
–
Utterance are used to do things
Under favorable conditions, utterances can change the mental and interactional state of the
participants.
Speaking is acting
Three main divisions of speech acts.
• Locutionary act: Act of saying something.
• Illocutionary act: The act performed in saying something. (viz, informing, warning etc.)
– Composed of illocutionary force and propositional content.
– Indirect speech acts (Could you please pass the salt ?)
• Perlocutionary act: The effect of the utterance on the speaker (viz. persuasion, surprise
etc.)
Classified illocutionary acts into several categories based on illocutionary force (verdictives,
exercitives, commissives, expositives and behavitives)
Speech act work continued
• Searle:
– Extended and refined Austin’s work on illocutionary acts.
– No necessary correspondence between illocutionary acts and
illocutionary verbs that a language chooses to describe these
acts.
– Searle pointed out 13 different dimensions along which speech
acts could vary suggesting an alternate taxonomy on purpose
(his first dimension)
– Searle’s Taxonomy
•
•
•
•
•
Representatives
Directives
Commissives
Expressives
Declarations
AI models of speech acts
• Problem with early speech act work was that
there did not exist formal accounts of actions
and mental states that could be used to design
more precise definitions of speech acts.
• Bruce: First one to account of Speech Act
Theory in terms of actions and plans (AI)
– Natural language generation is Social Action. (beliefs,
desires and wants)
– Inform and request could be used in achieving
intentions to change states of belief.
AI models of speech acts
• Cohen and Perrault
– Defined speech acts as plan operators that change the beliefs of
the speaker and hearer
– Enumerated goals for an account of speech acts
– A plan based theory of speech acts should specify a planning
system and a definition of speech acts as operators in the
system
– Mental state consists of beliefs and wants.
– They used a modified version of the STRIPS planning system
– cando preconditions and want preconditions for operators
– They modeled REQUEST and INFORM within their system
AI models of speech acts
• Allen and Perrault
– Used the same formalism as Cohen and
Perrault
– Recognizing other agents plans important for
interpreting utterances.
• Hinkelman use linguistic cues to build
partial speech act templates and plan
inference for utterance hypothesis
AI models
• Perrault: (non monotonic theory of speech acts)
– Utterance itself is insufficient to determine the effects
of a speech act (prior context, mental state of agent,
actual utterance)
– Stated the effects in terms of default logic.
• Dynamic logic approaches:
– Cohen and Levesque showed how effects of
illocutionary acts can be derived from general
principle of rational cooperative interaction (sincerity
and helpfulness)
– Recognizing illocutionary force of an utterance is not
necessary, only cooperation.
– Sadek uses a similar logic of rational action.
Extending speech acts to Dialogue
(Dialogue function as action)
• Litman and Allen:
– Extend Allen and Perraults work to include dialogues
and hierarchy of plans
– Domain plans, discourse plans meta plans
• Carberry and Lambert add problem solving
plans to domain plans and discourse plans.
• Cohen and Levesque extend their work into a
theory of joint intention and multi agent action
– Why confirmations appear in dialogue. (belief of
object of intention)
Multiple levels of interaction
• Attempts to model different kinds of dialogue
phenomena at different strata. (from sentence level
and upwards)
• One early classification
– (transactions(exchanges(moves(acts))))
– Moves: speech acts towards a particular purpose
– The exchange structure was also called a dialogue game
• In Traum and Hinkelman, there were levels of acts
rather than ranks
Speech act based communicative
languages
• Language based on Speech acts would
itself be a good agent communication
language
• KQML (knowledge query and manipulation
language)
– Each message has an identifier (kind of
action) and other parameters specifying
content. Based on Austin’s performatives.
• Problems with hidden speech acts.
Speech Acts in multi agent action
theory
•
The main effects of speech acts are on the mental and interactional states
of the participants. (BDI attitudes)
Social attitudes
• We must also consider social attitudes (question :Are social attitudes basic
?)
• Mutual belief (Harman) : A group of people have mutual knowledge of p if
each knows p and we know this where this refers to the whole fact known.
• Mutual belief is achieved through the process of grounding. (Clark and
Schafer)
• Obligations are necessary for modeling social situations (viz. a hearer is
obligated to answer a question if posed one). What an agent should do.
• Problem: How do you decide social norms?
• Obligations might conflict with the agents goals and he might choose to
violate them (e.g, interrogation)
• Another social attitude is joint intention or shared plan. Coordinated team
activity depends on more than only individual intentions and beliefs. (how do
shared intentions guide individual action ?)
Speech acts in multi agent action
theory
Defining speech acts
– How can one give precise definitions of
speech acts using mental state and action?
– How can one recognize whether such an act
has been performed? (because of
involvement of mental states, an observer
might not be able to tell)
– How can agents plan to use speech acts to
accomplish their goals?
– Traum : Plan recipe for communication
continued
•
Planning speech acts
–
–
–
–
•
Acts can be planned as games, or single moves.
How far ahead should an agent plan?
The future actions of agents are inaccurate.
Negotiations, arguments (more planning), casual conversation (no planning)
Recognizing speech acts
– Combination of input utterance with aspects of current context to decide what
acts have been performed (for example, current context says that an INFORM
act might be impending)
– Should the agent just recognize the acts or the intentions also (This might be
necessary for interpreting indirect speech acts)
– How much of the plan should be inferred? Deep intention recognition might not
be necessary instead considering all possible actions and their immediate effects
is sufficient when combined with facility to repair erroneous conclusions. (default
logic?) (McRoy)
– Grounding relaxes the need for intention recognition since it can help in realizing
motivations as the speaker is easily accessible.
The reliability of a Dialogue structure coding
scheme (Carletta et al)
• Paper aims at introducing and describing the reliability of
a scheme of dialogue coding distinctions for a Map task
corpus
• In the Map Task, two participants have slightly different
versions of a simple map with approximately fifteen
landmarks on it. One participant's map has a route
printed on it; the task is for the other participant to
duplicate the route.
• The moves introduced is independent of the task.
• They attempt to classify dialogue structure at higher level
also (Transactions and games)
• The dialogue structure can be used with codings of
many other dialogue phenomena.
The dialogue structure coding
• Transactions:
– Highest level
– Subdialogues that accomplish one major step in the participants
plan for achieving the task.
– Size and shape depend on the task
• Conversational games (dialogue games)
– A conversational game is a set of utterances starting with an
initiation and encompassing all utterances up until the purpose of
the game has been either fulfilled (e.g., the requested
information has been transferred) or abandoned.
– Games can nest within each other
– Games are made up of Conversational moves which are
different kinds of initiations and responses
The move coding scheme
The move coding scheme (moves)
• Instruct move:
– move commands the partner to carry out an action.
– Expected response could be performance of action if the
participant knows the action.
– G: Go right round, ehm, until you get to just above them.
• Explain move:
– States information that has not been directly elicited by the
partner.
– Facts about the domain, state of plan or task, including facts that
help establish what is mutually known
– G: Where the dead tree is on the other side of the stream there's
farmed land.
Move coding scheme
• Check move:
– Requests the partner to confirm information that the
speaker has some reason to believe, but is not
entirely sure about.
Move coding scheme
• Align move:
– checks the partner's attention, agreement, or
readiness for the next move.
– most common type of ALIGN move is for the
transferer to know that the information has been
successfully transferred, so that they can close that
part of the dialogue and move on.
Move coding scheme
• Query-YN move:
– asks the partner any question that takes a yes
or no answer and does not count as a
CHECK or an ALIGN
– These questions are most often about what
the partner has on the map
• F: I've got Dutch Elm.
• G: Dutch Elm. Is it written underneath the tree?
Move coding scheme
• The Query-W move:
– is any query not covered by the other categories
– most moves classified as QUERY-W are whquestions
Move coding scheme (Response
moves)
• Used within games after an initiation and try to
fulfill expectations in the game
• Acknowledge move:
– verbal response that minimally shows that the
speaker has heard the move to which it responds,
and often also demonstrates that the move was
understood and accepted.
– only the last three (from Clark and
Schafer’s evidences for acknowledge) count as
ACKNOWLEDGE moves in this coding scheme
• G: Ehm, if you ... you're heading southwards.
• F: Mmhmm.
Move coding scheme
• Reply- Y move:
– any reply to any query with a yes-no surface form that
means "yes", however that is expressed
– normally only appear after QUERY-YN, ALIGN, and
CHECK moves.
• G: See the third seagull along?
• F: Yeah.
• Reply –N move
– reply to a query with a yes-no surface form, that
means "no“
• G: Do you have the west lake, down to your left?
• F: No.
Move coding scheme
• Reply –W move:
– any reply to any type of query that doesn't simply mean "yes"
or "no.“
• G: And then below that, what've you got?
• F: A forest stream.
• Clarify move:
– reply to some kind of question in which the speaker tells the
partner something over and above what was strictly asked.
– Route givers tend to make CLARIFY moves when the route
follower seems unsure of what to do, but there isn't a specific
problem on the agenda
Move coding scheme
• Other possible responses:
– Utterances where the responder refuses to share the
same goal as the initiator (No, lets talk about..)
– ACKNOWLEDGE moves with a negative slant
– Sufficiently rare in the corpora.
• READY move:
– moves that occur after the close of a dialogue game
and prepare the conversation for a new game to be
initiated.
• G: Okay. Now go straight down.
– Confusion: That could have been an acknowledge
move too
Coding continued
•
Game coding scheme:
– Beginning of new games are coded by purpose
– Place where games end or are abandoned are marked
– Marked as either occurring at top level or being embedded in the game structure
•
Transaction coding scheme:
– Four transaction types:
• NORMAL: Transaction serving a subtask viz. a route segment on the map.
• REVEW: Transactions created when participants return to parts of the route that have
already been completed
• OVERVIEW: Overviewing an upcoming segment in order to provide a context for the
partner.
• IRRELEVANT:
Subdialogues not relevant to of the route (maybe about the
experimental setup)
– Coding involves marking in the dialogue where the transaction starts except for
IRRELEVANT transactions.
– Ends of transactions are not coded.
Reliability of coding scheme
• Tests of reliability
– Krippendorff’s test’s of reliability
• Stability
• Reproducibility
• Accuracy
– Agreement by coders on segmentation
– Used kappa coefficient for reliability of
classification.
Reliability of coding
• Refliability of move coding
– Four coders
– Each coder had access to the speech as well as transcripts
– All coders interacted verbally with the developers
• Reliability of move segmentation
– Kappa = .92 using word boundaries as units
– Pairwise percent agreement on locations where any coder had
marked a boundary was 89%.
– No of units = 4079. No of boundaries = 796
– Most errors were with marking READY separately or marking it
in the move that followed and marking a reply or a splitting it into
a reply and EXPLAIN, CLARIFY etc.
Reliability of coding
• Reliability of move classification
– Since the reliability of segmentation was good, it gave
a good foundation for move classification
– Move classification was evaluated only over move
segments where the boundaries were agreed
– Kappa for move coding = 0.83
– Largest confusions between
• CHECK and QUERY-YN
• INSTRUCT and CLARIFY
• ACKNOWLEDGE, READY and REPLY-Y
– K = 0.89 for coding with Initiation a command, a
statement or a question
Reliability of coding
• Reliability of move classification from Written
instructions:
– K = 0.69
• Reliability of move coding in Another domain
– Transcribed conversation between a hi-fi sales assistant and a
married couple intending to purchase an amplifier
• K = 0.95 for move segmentation
• K = 0.81 for move classification
• Reliability of game coding:
– Pairwise agreement on game beginnings = 70%
• Reliability of Transaction coding:
– Done from written instructions
– K = 0.59
Coding Dialogues with the DAMSL Annotation
scheme (Mark Core and James F Allen)
• DAMSL (Dialogue Act Markup In Several
Layers)
• Automatic analysis of Dialogue needed for
– Computer acting as participant with users
– Computer as observer interpreting human speech
• DAMSL allows multiple labels in multiple layers
to be applied to an utterance
• Communicative actions described here are high
level.
DAMSL annotation scheme
•
Forward communicative functions
–
–
–
Speech acts that affect the future of dialogue
These categories are independent
Divided into
•
Representatives (statements) Making claims about the world
–
–
•
Influencing-Addressee-Future-Action
–
–
•
Offers
Commitments
Perfomative catetory
–
•
All utterances that discuss potential actions of the addressee
»
Directives:
1. Info Request: Questions and Requests (tell me the time)
2. Action Directive: Requests for action (Please take out the trash)
Open-Option
»
Speaker gives a potential course of action but does not show preference towards it
Commissives (Committing-Speaker-Future-Action)
–
–
•
Speaker trying to affect the beliefs of the hearer- Assert
Repeating information for emphasis or acknowledgement-Reassert
Utterances that make a fact true in virtue of their content (You are fired)
Other forward functions
DAMSL annotation scheme
• Backward communicative function:
– The speech act categories related to responses
– The classes are independent
– Agreement
• Accept, accept-part, Maybe, Reject-part, reject, hold
– Understanding
• Did the listener understand the speaker?
• The listener may
– Signal-non Understanding
– Signal understanding (Acknowledgements, Repeat-Rephrase,
completion)
– Correct –Misspeaking
• Answer
– Supplying information explicitly requested by a previous Info-Request
act
• Information relations
– Describe how the information in the current utterance relates to
previous utterances
• Utterance features:
– Information Level
• Task (utterance about the task)
• Task Management (utterance about the planning and monitoring of
task)
• Communication management (Physical requirements of dialogue)
• Other
– Communicative Status
• Abandoned
• Uninterpretable
– Syntactic Features
• Conventional form (hello, how may I help you)
• Exclamatory form (wow)
Experiments
• Used test dialogues from the TRAINS 9193 dialogues.
• A person was given a problem to solve viz.
shipping box cars to a city and another
person was instructed to act as a problem
solving system.
Results
• Three statistics were used to measure
interannotator reliability.
• PA – percent pairwise agreement
• PE- Expected pairwise agreement
• Kappa (PA-PE)/1-PE
Results
An emperical investigation of proposals in
Collaborative Dialogues: Barbara et al.
•
•
•
They use a slight modification of the DRI (Discourse resource initiative)
scheme.
Task (will be read out)
The DRI coding scheme
Similar and Simpler than the DAMSL scheme discussed before.
– Forward looking functions
• This dimension characterizes the potential effect that an utterance Ui has on the
subsequent dialogue.
• Statement: Make claims about the world.
– Assert (Speaker trying to change Hearers beliefs)
– Reassert (if the claim has already been made before)
• Influence on hearer (I-on-H)
– Influences H’s future action
» Open option
» Info Request
» Action directives
• Influence on Speaker (I-on-S)
– Commits S to some future course of action
» Offer
» commit
DRI coding scheme
• Backward looking functions:
– Ui has to do with response
• Answer
• Agreement :
– Accept/reject
– Holds
• Certain refinements were made to the core
features by adding heuristics for tagging
Statements, I-on-H and I-on-S.
Coding results
•Their results on forward functions were better than Core and Allen’s (97)
•Very low Kappa value for agreement
Twenty questions for Dialogue act
taxonomies (Traum)
Defining dialogue acts:
Question 1.
• Which is most important : fit to intuitions or
formal rigor?
– Difficult to precisely formulate complex intuitions using
available formal techniques
– Sacrifice intuition for formal rigor or vice versa?
– Answer will depend on the purpose of the concept.
(experimentation or verfication)
Question 2 & 3
• Is the definition of a dialogue act an issue of lexical semantics or
ontology of action?
– Is defining providing an account when someone might be justified in
describing a sentenced headed with a particular verb (inform, request),
or to provide a technical vocabulary to compactly describe various types
of occurences? (the speech acts in the third paper)
• Under what conditions may an action said to have occurred?
– Allwood uses 4 criteria
•
•
•
•
Intention of performer
Form of behavior (eg linguistic form , question 2?)
Achieved result
Context in which the behavior occurs.
– Avoid defining DA’s according to, say a certain set of results holding and
then identify instances of these acts using one of the other criteria say,
linguistic form. This would lead to coding difficulties
Question 4 &5
• What is the role of speaker intention
– Some would define dialogue acts on the basis
of intention behind them
– Some would define it with the recognition of
this intention (illocutionary acts)
• What is the role of addressee uptake
– Many dialogue act definitions require some
changes to the addressee based on
understanding of the utterance in a particular
way
Question 6
• What view should be taken regarding the
performance of acts?
– Speakers and listeners view
– View of the speaker addressee team,
normative conventional point of view.
– Is one allowed to consider subsequent
utterances before deciding performance
– This has implications while coding.
Dialogue act
components(questions 7 and 8)
• How are actions used in a logic?
• What is context?
– What aspects of the situation are relevant as potential
conditions for defining types of dialogue act
performance and what aspects are (directly) affected.
– Special sorts of information used for conditions and
effects of dialogue acts
• Dialogue state (pre: dialogue be in a particular state, effect:
transition to a new dialogue state)
• Mental states (effect: newly adopted beliefs)
• Social obligations and commitments
Questions 9 & 10
• What kind of preconditions are appropriate
– Most convenient dialogue acts have few, if
any actual preconditions
• How should an unsuccessful act be
distinguished from a failed attempt to
perform an act?
– Difference between the success and
satisfaction of a speech act
Relationships and complex
acts(question 11 and 12)
• What is the relationship between dialogue acts and other
(e.g., physical) acts?
– Different theories would maintain a crisp or more blurred
distinction between dialogue acts and non-communicative acts.
• What is the relationship between dialogue acts and
dialogue structure
– Wholly dependent on dialogue structure (grammar based
approaches)
– Dialogue structure is primarily constructed from the activity that
the participants are engaged in
– Dialogue structure is also used as context for performance of
dialogue act (question 8)48
Questions 13 & 14
• Are there multi-agent dialogue acts?
– Some researchers view the performance of most
illocutionary acts as a collective performance of
multiple agents, in virtue of the grounding process
– Games, exchanges and collaborative completions.
– Problems with tagging.
• Can dialogue acts be “composed” of more
primitive acts?
– Could a multiple strata dialogue act taxonomy have
levels or ranks?
Question 15
• Can multiple dialogue acts occur at the
same time (performed through the same
utterance) ?
– Since utterances have multiple functions, yes.
– It is a problem if the logical theory does not
support simultaneous action
– It has complications in Tagging
Taxonomic considerations(question 16 )
• Can the same taxonomy be used for different
kinds of activities?
– People have been designing taxonomies for different
dialogue activities.
– A general theory might better allow one to use act
distributions to identify activities or genres of activities
as well as episodes within an activity.
Percentage distributions of
dialogue acts in Corpus Coding
Questions 17 and 18
• Can the same taxonomy used for different kinds of
agents?
– Could the same taxonomy cover communicative activities
between
• Human with human
• Human with machine
• Humans with animals etc.
– Modality of communication also matters
• How detailed should a dialogue act taxonomy be?
– How many distinctions in speech act verbs should be captured
within a dialogue act taxonomy (e.g. state, assert, inform)
– Trade off between proposing many acts for subtle differences
and reliability of coding
Questions 19 and 20
• How should complexity be realized in a coding
taxonomy?
– How to capture multiplicity of functions in a Taxonomy?
• Multiple labels for each utterance, one for each function (DRI, Allen
and Core)
• Bundle dialogue functions into one label (Vermobil, Jekat et. Al)
• Intermediate approach (DAMSL)
• Can a Taxonomy be used for tagging dialogue corpora
be given a formal semantics and/or be used in a
dialogue system?
– Hope is “yes”
Download