Towards Learning Dialogue Structures from Speech
Data and Domain Knowledge: Challenges to
Conceptual Clustering using Multiple and
Complex Knowledge Source
Jens-Uwe Moller
Natural Language Systems Division,
Dept. of Computer Science, Univ. of Hamburg
Dialog modeling based on a set of units called dialog act
Dialog acts from theory doesn’t fit with a specific domain
Labeling dialog is time consuming and subjective learn an application specific dialog acts from speech data using conceptual clustering
Learning dialog acts from turns
Unsupervised classification (no prior definition of dialog acts is given)
Hierarchy classification with inspectable classifying rules
Domain knowledge: structure of task, task knowledge represented by goals and plans
Word recognizer: word hypotheses
Prosodic data: Pause & Stress mark important unit
Lexical semantics
Syntax (less important in spoken dialog)
Semantics (larger units of lexical semantics)
Symbolic machine learning algorithm
Build a classification tree
Distinction between subnodes are made from a function overall attribute
Support probabilistic data
Support multiple overlapping hierarchies (for ambiguous case)
Can handle multiple entries of one attribute
(e.g. stream of words)
Learning from simultaneous events
Learn from structure data: Conceptual
Graphs.
Learn case descriptions from terminological descriptions
Subsumption = correclation criterion over structured data. e.g. subsumption of individuals to classes
Independence of
Semantic Classes
Andrew Pargellis, Eric Fosler-Lussier,
Alexandros Potamianos, Chin-Hui Lee
Dialogue Systems Research Dept., Bell
Labs, Lucent Technologies Murray Hill, NJ,
USA
Employ semantic classes (concepts) from another domain
Need to identify domain-independent concepts base on comparison across domain
Domain-independent concepts should occur in similar syntactic (lexical) contexts across domains
Comparing concepts across domains
Concept-comparison method
Concept-projection method
Find the similarity between all pairs of concepts across the two domains
Two concepts are similar if their respective bigram contexts are similar
Use left and right context bigram language models
Compare how used in the and westerns san francisco
Travel and are used in the newark domain with how
Movie are comedies domain
Distance between two concepts
How well a single concept from one domain is represented in another domain.
How the words comedies used in both domains and westerns are
Useful for identifying the degree of domainindependence for a particular concept.
Semi-Automatic Acquisition of
Domain-Specific Semantic Structures
Siu K.C., Meng H.M.
Human-Computer Communications Laboratory
Department of Systems Engineering and Engineering Management
The Chinese University of Hong Kong
Use unannotated corpora
Portable across domain & language
Output grammar has reasonable coverage of within-domain data and reject out-of-domain data
Amenable to interactive refinement by human
Support optional injection of prior knowledge
Use kullback-liebler distance. use left and right context.
Consider word with pre-set minimum occurrence. (set to 5) use left and right context. Consider word w1, w2 (later be c1, c2) pair-wise for words that have a least pre-set minimum occurrence. (set to 5)
Use Mutual Information (MI).
N-highest MI pairs are clustered (N=5 in experiment)
Do spatial clustering and temporal clustering iteratively
Post-process by human
Automatic Concept identification
In goal-oriented conversations
Ananlada Chotimongkol and
Alexander I. Rudnicky
Language Technologies Institute
Carnegie Mellon University
First step towards the goal of automatically inferring domain ontologies
Goal-oriented human-human conversation has a clear structure
This structure can be used to automatically identify domain topics, e.g. dialog classfication
Hierarchical clustering
Mutual information based
Criterion=minimize the loss of average mutual information
Kullback-Lierbler based
Criterion=word pair with minimum distance
Reference concept from class-based ngram model
Cluster concept=majority concept
Precision
Recall
Singularity score (SS)
Quality score (QS)