Towards Learning Dialogue Structures from Speech Conceptual Clustering using Multiple and


Towards Learning Dialogue Structures from Speech

Data and Domain Knowledge: Challenges to

Conceptual Clustering using Multiple and

Complex Knowledge Source

Jens-Uwe Moller

Natural Language Systems Division,

Dept. of Computer Science, Univ. of Hamburg


Dialog modeling based on a set of units called dialog act

Dialog acts from theory doesn’t fit with a specific domain

Labeling dialog is time consuming and subjective learn an application specific dialog acts from speech data using conceptual clustering

The learning task

Learning dialog acts from turns

Unsupervised classification (no prior definition of dialog acts is given)

Hierarchy classification with inspectable classifying rules


Domain knowledge: structure of task, task knowledge represented by goals and plans

Word recognizer: word hypotheses

Prosodic data: Pause & Stress mark important unit

Lexical semantics

Syntax (less important in spoken dialog)

Semantics (larger units of lexical semantics)


Symbolic machine learning algorithm

Build a classification tree

Distinction between subnodes are made from a function overall attribute

Support probabilistic data

Support multiple overlapping hierarchies (for ambiguous case)

Can handle multiple entries of one attribute

(e.g. stream of words)


Learning from simultaneous events

Learn from structure data: Conceptual


Learn case descriptions from terminological descriptions

Subsumption = correclation criterion over structured data. e.g. subsumption of individuals to classes

Metrics for Measuring


Independence of

Semantic Classes

Andrew Pargellis, Eric Fosler-Lussier,

Alexandros Potamianos, Chin-Hui Lee

Dialogue Systems Research Dept., Bell

Labs, Lucent Technologies Murray Hill, NJ,



Employ semantic classes (concepts) from another domain

Need to identify domain-independent concepts base on comparison across domain

Domain-independent concepts should occur in similar syntactic (lexical) contexts across domains

Comparing concepts across domains

Concept-comparison method

Concept-projection method

Concept-comparison method

Find the similarity between all pairs of concepts across the two domains

Two concepts are similar if their respective bigram contexts are similar

Use left and right context bigram language models

Kullback-Leibler (KL) distance

Compare how used in the and westerns san francisco

Travel and are used in the newark domain with how

Movie are comedies domain

Distance between two concepts

Concept-projection method

How well a single concept from one domain is represented in another domain.

How the words comedies used in both domains and westerns are

Useful for identifying the degree of domainindependence for a particular concept.

Result: Concept-comparison

Result: Concept-projection

Concept Example

Semi-Automatic Acquisition of

Domain-Specific Semantic Structures

Siu K.C., Meng H.M.

Human-Computer Communications Laboratory

Department of Systems Engineering and Engineering Management

The Chinese University of Hong Kong

Grammar induction

Use unannotated corpora

Portable across domain & language

Output grammar has reasonable coverage of within-domain data and reject out-of-domain data

Amenable to interactive refinement by human

Support optional injection of prior knowledge

Spatial clustering

Use kullback-liebler distance. use left and right context.

Consider word with pre-set minimum occurrence. (set to 5) use left and right context. Consider word w1, w2 (later be c1, c2) pair-wise for words that have a least pre-set minimum occurrence. (set to 5)

Temporal clustering

Use Mutual Information (MI).

N-highest MI pairs are clustered (N=5 in experiment)

Do spatial clustering and temporal clustering iteratively

Post-process by human

Automatic Concept identification

In goal-oriented conversations

Ananlada Chotimongkol and

Alexander I. Rudnicky

Language Technologies Institute

Carnegie Mellon University

Concept identification

First step towards the goal of automatically inferring domain ontologies

Goal-oriented human-human conversation has a clear structure

This structure can be used to automatically identify domain topics, e.g. dialog classfication

Clustering algorithm

Hierarchical clustering

Mutual information based

Criterion=minimize the loss of average mutual information

Kullback-Lierbler based

Criterion=word pair with minimum distance

Evaluation metrics

Reference concept from class-based ngram model

Cluster concept=majority concept



Singularity score (SS)

Quality score (QS)
