Creating Concept Hierarchies in a Customer Self-Help System Bob Wall

advertisement
Creating Concept
Hierarchies in a Customer
Self-Help System
Bob Wall
CS 535
04/29/05
Outline



Introduction / motivation
Background
Algorithm




Feature selection / feature vector generation
Hierarchical agglomerative clustering (HAC)
Tree partitioning
Results / conclusions
Introduction

Application – customer self-help (FAQ)
system


RightNow Technologies’ Customer Service
module
Need ways to organize Knowledge Base (KB)


System already organizes documents (answers)
using clustering
Desirable to also organize user queries
Goals

Create concept hierarchy from user queries



Domain-specific
Self-guided (no human intervention / guidance
required)
Present hierarchy to help guide users in
navigating KB


Demonstrate the types of queries that can be
answered by system
Automatically augment searches with related
terms
Background

Problem – cluster short text segments



Inadequate information in queries to provide
context for clustering
Need some source of context
Possible solution – use Web as source of info


Cilibrasi and Vitanyi proposed mechanism to
extract meaning of words using Google searches
Chuang and Chien presented more detailed
algorithm for clustering short segments by using
text snippets returned by search engine
Algorithm





Use each text segment as input query to
search engine
Process resulting text snippets using
stemming, stop word lists to extract related
terms (keywords)
Select set of keywords, build feature vectors
Cluster using Hierarchical Agglomerative
Clustering (HAC)
Compact tree using min-max partitioning
KB-Specific Version – HAC-KB






Choose set of user queries, corresponding
answers
Find list of keywords corresponding to those
answers
Trim down list to reasonable length
Generate feature vectors
HAC clustering
Min-max partitioning
Available Data

Answers


Ans_phrases




Documents forming the KB – actually question and answer,
plus keywords and other information like product and
category associations
Extracted from answers, using stop word lists and
stemming
One-, two-, and three-word phrases
Counts of occurences in different parts of answer
Keyword_searches


List of user queries – also filtered by stop word lists and
stemmed
List of answers matching query
Feature Selection




Select N most frequent user queries
Select set of all answers matching those queries
Select set of all keywords found in those answers
Reduce to list of K keywords



Avoid removing all keywords associated with a query
(would generate empty feature vector)
Try to eliminate keywords that provide little
discrimimination (ones associated with many queries)
Also eliminate keywords that only map to a single query
Feature Vector Generation


Generate map from queries to keywords, and
inverse map from keywords to queries
Use the TF-IDF (term frequency / inverse document
frequency) metric for weighting
vi , j




N
 1 log 2 tfi , j log 2
nj
vi,j is weight of jth keyword for ith query
tfi,j is the number of times that keyword j occurred in list of
answers associated with query i
nj is number of queries associated with keyword j
Now have a N x K feature matrix
Standard HAC Algorithm


Initialize clusters – one cluster per query
Initialize similarity matrix

Using the average linkage similarity metric and cosine
distance measure
1
sim AL (Ci , C j ) 
Ci C j
sim (va , vb ) 


t j T

2
v
a
,j
t T
j
  sim (v , v )
va Ci vb C j
va , j vb , j

2
v
b
,j
t T
j
Matrix is upper-triangular
a
b
HAC (cont.)

For N – 1 iterations




Pick two root-node clusters with largest similarity
Combine into new root-node cluster
Add new cluster to similarity matrix – compute
similarity with all other root-level clusters
Generates tall binary tree of clusters


2N – 1 nodes
Not particularly usable by humans
Min-Max Partitioning


Need to combine nodes in cluster tree,
produce a shallow, bushy multi-way tree
Recursive partitioning algorithm

MinMaxPartition(Cluster sub-tree)




For each possible cut level in tree, compute quality of
cut
Choose best-quality cut level
For each subtree cut off, recursively process
Stop at max depth or max cluster size
Cut Levels in Tree
Choosing Best Cut


Goal is to maximize intra-cluster similarity,
minimize inter-cluster similarity
Quality = Q(C) / N(C)

Cluster set quality (smaller is better)
Q(C ) 

1
C
sim AL (Ci , Ci )
, Ci   C k

Ci C sim AL (Ci , Ci )
k i
Cluster size preference (gamma distribution)
1
 1  x / 
N (C )  f ( C ), f ( x) 
x
e

! 
Issues / Further Work






Resolve issues with data / implementation
Outstanding problem – generating
meaningful labels for clusters in hierarchy
Means of measuring performance
Incorporate other KB data, like relevance
scores of search results, products/categories
Better feature selection
Fuzzy clustering – query can belong to
multiple clusters (Frigui & Masraoui)
References




S.-L. Chuang and L.-F. Chien, “Towards Automatic Generation of
Query Taxonomy: A Hierarchical Query Clustering Approach,
“Proceedings of ICDM’02, Maebashi City, Japan, Dec. 9-12,
2002, pp. 75–82, 2002.
S.-L. Chuang and L.-F. Chien, “A Practical Web-based Approach
to Generating Topic Hierarchy for Text Segments,” Proceedings
of CIKM’04, Washington, DC, Nov., 2004, pp. 127-136.
R. Cilibrasi and P. Vitanyi, “Automatic Meaning Discovery Using
Google,” published on Web, available at
http://arxiv.org/abs/cs/0412098.
H. Frigui and O. Masraoui, “Simultaneous Clustering and
Dynamic Keyword Weighting for Text Documents,” in Survey of
Text Mining: Clustering, Classification, and Retrieval, Michael W.
Berry, ed., Springer-Verlag, New York, 2004, pp. 45-72.
Download