Towards Learning Dialogue Structures from Speech Conceptual Clustering using Multiple and

Towards Learning Dialogue Structures from Speech

Data and Domain Knowledge: Challenges to

Conceptual Clustering using Multiple and

Complex Knowledge Source

Jens-Uwe Moller

Natural Language Systems Division,

Dept. of Computer Science, Univ. of Hamburg

Overview









Dialog modeling based on a set of units called dialog act

Dialog acts from theory doesn’t fit with a specific domain

Labeling dialog is time consuming and subjective learn an application specific dialog acts from speech data using conceptual clustering

The learning task







Learning dialog acts from turns

Unsupervised classification (no prior definition of dialog acts is given)

Hierarchy classification with inspectable classifying rules

Features













Domain knowledge: structure of task, task knowledge represented by goals and plans

Word recognizer: word hypotheses

Prosodic data: Pause & Stress mark important unit

Lexical semantics

Syntax (less important in spoken dialog)

Semantics (larger units of lexical semantics)

COWEB













Symbolic machine learning algorithm

Build a classification tree

Distinction between subnodes are made from a function overall attribute

Support probabilistic data

Support multiple overlapping hierarchies (for ambiguous case)

Can handle multiple entries of one attribute

(e.g. stream of words)

COWEB (2)









Learning from simultaneous events

Learn from structure data: Conceptual

Graphs.

Learn case descriptions from terminological descriptions

Subsumption = correclation criterion over structured data. e.g. subsumption of individuals to classes

Metrics for Measuring

Domain

Independence of

Semantic Classes

Andrew Pargellis, Eric Fosler-Lussier,

Alexandros Potamianos, Chin-Hui Lee

Dialogue Systems Research Dept., Bell

Labs, Lucent Technologies Murray Hill, NJ,

USA

Introduction







Employ semantic classes (concepts) from another domain

Need to identify domain-independent concepts base on comparison across domain

Domain-independent concepts should occur in similar syntactic (lexical) contexts across domains

Comparing concepts across domains



Concept-comparison method



Concept-projection method

Concept-comparison method







Find the similarity between all pairs of concepts across the two domains

Two concepts are similar if their respective bigram contexts are similar

Use left and right context bigram language models

Kullback-Leibler (KL) distance



Compare how used in the and westerns san francisco

Travel and are used in the newark domain with how

Movie are comedies domain



Distance between two concepts

Concept-projection method





How well a single concept from one domain is represented in another domain.

How the words comedies used in both domains and westerns are



Useful for identifying the degree of domainindependence for a particular concept.

Result: Concept-comparison

Result: Concept-projection

Concept Example

Semi-Automatic Acquisition of

Domain-Specific Semantic Structures

Siu K.C., Meng H.M.

Human-Computer Communications Laboratory

Department of Systems Engineering and Engineering Management

The Chinese University of Hong Kong

Grammar induction











Use unannotated corpora

Portable across domain & language

Output grammar has reasonable coverage of within-domain data and reject out-of-domain data

Amenable to interactive refinement by human

Support optional injection of prior knowledge

Spatial clustering









Use kullback-liebler distance. use left and right context.

Consider word with pre-set minimum occurrence. (set to 5) use left and right context. Consider word w1, w2 (later be c1, c2) pair-wise for words that have a least pre-set minimum occurrence. (set to 5)

Temporal clustering





Use Mutual Information (MI).

N-highest MI pairs are clustered (N=5 in experiment)





Do spatial clustering and temporal clustering iteratively

Post-process by human

Automatic Concept identification

In goal-oriented conversations

Ananlada Chotimongkol and

Alexander I. Rudnicky

Language Technologies Institute

Carnegie Mellon University

Concept identification







First step towards the goal of automatically inferring domain ontologies

Goal-oriented human-human conversation has a clear structure

This structure can be used to automatically identify domain topics, e.g. dialog classfication

Clustering algorithm







Hierarchical clustering

Mutual information based



Criterion=minimize the loss of average mutual information

Kullback-Lierbler based



Criterion=word pair with minimum distance

Evaluation metrics













Reference concept from class-based ngram model

Cluster concept=majority concept

Precision

Recall

Singularity score (SS)

Quality score (QS)

Towards Learning Dialogue Structures from Speech Conceptual Clustering using Multiple and

Overview

The learning task

Features

COWEB

COWEB (2)

Metrics for Measuring

Domain

Introduction

Concept-comparison method

Kullback-Leibler (KL) distance

Concept-projection method

Result: Concept-comparison

Result: Concept-projection

Concept Example

Grammar induction

Spatial clustering

Temporal clustering

Concept identification

Clustering algorithm

Evaluation metrics

Related documents

Products

Support

Towards Learning Dialogue Structures from Speech Conceptual Clustering using Multiple and

Overview

The learning task

Features

COWEB

COWEB (2)

Metrics for Measuring

Domain

Introduction

Concept-comparison method

Kullback-Leibler (KL) distance

Concept-projection method

Result: Concept-comparison

Result: Concept-projection

Concept Example

Grammar induction

Spatial clustering

Temporal clustering

Concept identification

Clustering algorithm

Evaluation metrics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib