Clustering - DePaul University

advertisement
Clustering
Bamshad Mobasher
DePaul University
What is Clustering in Data Mining?
Clustering is a process of partitioning a set of data (or objects) in a
set of meaningful sub-classes, called clusters
Helps users understand the natural grouping or structure in a data set
 Cluster:
 a collection of data objects that are
“similar” to one another and thus
can be treated collectively as one
group
 but as a collection, they are
sufficiently different from other
groups
 Clustering
 unsupervised classification
 no predefined classes
2
Applications of Cluster Analysis
 Data reduction
Summarization: Preprocessing for regression, PCA,
classification, and association analysis
Compression: Image processing: vector quantization
 Hypothesis generation and testing
 Prediction based on groups
Cluster & find characteristics/patterns for each group
 Finding K-nearest Neighbors
Localizing search to one or a small number of clusters
 Outlier detection: Outliers are often viewed as those “far
away” from any cluster
3
Basic Steps to Develop a Clustering Task
 Feature selection / Preprocessing
 Select info concerning the task of interest
 Minimal information redundancy
 May need to do normalization/standardization
 Distance/Similarity measure
 Similarity of two feature vectors
 Clustering criterion
 Expressed via a cost function or some rules
 Clustering algorithms
 Choice of algorithms
 Validation of the results
 Interpretation of the results with applications
4
Quality: What Is Good Clustering?
 A good clustering method will produce high quality clusters
high intra-class similarity: cohesive within clusters
low inter-class similarity: distinctive between clusters
 The quality of a clustering method depends on
the similarity measure used by the method
its implementation, and
Its ability to discover some or all of the hidden patterns
5
Measure the Quality of Clustering
 Distance/Similarity metric
 Similarity is expressed in terms of a distance function, typically metric:
d(i, j)
 The definitions of distance functions are usually rather different for
interval-scaled, Boolean, categorical, ordinal ratio, and vector variables
 Weights should be associated with different variables based on
applications and data semantics
 Quality of clustering:
 There is usually a separate “quality” function that measures the
“goodness” of a cluster.
 It is hard to define “similar enough” or “good enough”

The answer is typically highly subjective
6
Distance or Similarity Measures
 Common Distance Measures:
 Manhattan distance:
 Euclidean distance:
 Cosine similarity:
dist ( X ,Y )  1  sim( X ,Y )
sim( X , Y ) 
 ( xi  yi )
i
 xi   yi
2
i
2
i
7
More Similarity Measures
 In vector-space model many similarity measures can be used
in clustering
Simple Matching
Cosine Coefficient
Dice’s Coefficient
Jaccard’s Coefficient
8
Distance (Similarity) Matrix
 Similarity (Distance) Matrix
 based on the distance or similarity measure we can construct a symmetric
matrix of distance (or similarity values)
 (i, j) entry in the matrix is the distance (similarity) between items i and j
Note that dij = dji (i.e., the matrix is
symmetric. So, we only need the lower
triangle part of the matrix.
The diagonal is all 1’s (similarity) or all
0’s (distance)
dij  similarity (or distance) of Di to D j
9
Example: Term Similarities in Documents
 Suppose we want to cluster terms that appear in a collection of
documents with different frequencies
Each term can be viewed
as a vector of term
frequencies (weights)
Doc1
Doc2
Doc3
Doc4
Doc5
T1
0
3
3
0
2
T2
4
1
0
1
2
T3
0
4
0
0
2
T4
0
3
0
3
3
T5
0
1
3
0
1
T6
2
2
0
0
4
T7
1
0
3
2
0
T8
3
1
0
0
2
 We need to compute a term-term similarity matrix
 For simplicity we use the dot product as similarity measure (note that this is the nonnormalized version of cosine similarity)
N
sim(Ti ,T j )   (wik  w jk )
k 1
 Example:
N = total number of dimensions (in this case documents)
wik = weight of term i in document k.
Sim(T1,T2) = <0,3,3,0,2> * <4,1,0,1,2>
0x4 + 3x1 + 3x0 + 0x1 + 2x2 = 7
10
Example: Term Similarities in Documents
Doc1
Doc2
Doc3
Doc4
Doc5
T1
0
3
3
0
2
T2
4
1
0
1
2
T3
0
4
0
0
2
T4
0
3
0
3
3
T5
0
1
3
0
1
T6
2
2
0
0
4
T7
1
0
3
2
0
T8
3
1
0
0
2
N
sim(Ti , Tj )   ( wik  w jk )
k 1
Term-Term
Similarity Matrix
T2
T3
T4
T5
T6
T7
T8
T1
7
16
15
14
14
9
7
T2
T3
T4
T5
T6
T7
8
12
3
18
6
17
18
6
16
0
8
6
18
6
9
6
9
3
2
16
3
11
Similarity (Distance) Thresholds
 A similarity (distance) threshold may be used to mark pairs that are
“sufficiently” similar
T2
T3
T4
T5
T6
T7
T2
T3
T4
T5
T6
T7
T8
T1
7
16
15
14
14
9
7
8
12
3
18
6
17
18
6
16
0
8
6
18
6
9
6
9
3
2
16
3
T2
T3
T4
T5
T6
T7
T2
T3
T4
T5
T6
T7
T8
T1
0
1
1
1
1
0
0
0
1
0
1
0
1
1
0
1
0
0
0
1
0
0
0
0
0
0
1
0
Using a threshold
value of 10 in the
previous example
12
Graph Representation
 The similarity matrix can be visualized as an undirected graph
 each item is represented by a node, and edges represent the fact that two items
are similar (a one in the similarity threshold matrix)
T2
T3
T4
T5
T6
T7
T8
T1 T2 T3 T4 T5 T6 T7
0
1 0
1 1 1
1 0 0 0
1 1 1 1 0
0 0 0 0 0 0
0 1 0 0 0 1 0
If no threshold is used, then
matrix can be represented as
a weighted graph
T1
T3
T5
T4
T2
T7
T6
T8
13
Connectivity-Based Clustering Algorithms
 If we are interested only in threshold (and not the degree of similarity
or distance), we can use the graph directly for clustering
 Clique Method (complete link)
 all items within a cluster must be within the similarity threshold of all other
items in that cluster
 clusters may overlap
 generally produces small but very tight clusters
 Single Link Method
 any item in a cluster must be within the similarity threshold of at least one
other item in that cluster
 produces larger but weaker clusters
 Other methods
 star method - start with an item and place all related items in that cluster
 string method - start with an item; place one related item in that cluster; then
place anther item related to the last item entered, and so on
14
Simple Clustering Algorithms
 Clique Method
 a clique is a completely connected subgraph of a graph
 in the clique method, each maximal clique in the graph becomes a cluster
T1
T3
Maximal cliques (and therefore the
clusters) in the previous example are:
T5
T4
T2
{T1, T3, T4, T6}
{T2, T4, T6}
{T2, T6, T8}
{T1, T5}
{T7}
Note that, for example, {T1, T3, T4}
is also a clique, but is not maximal.
T7
T6
T8
15
Simple Clustering Algorithms
 Single Link Method
 selected an item not in a cluster and place it in a new cluster
 place all other similar item in that cluster
 repeat step 2 for each item in the cluster until nothing more can be added
 repeat steps 1-3 for each item that remains unclustered
T1
T3
In this case the single link method
produces only two clusters:
T5
T4
T2
{T1, T3, T4, T5, T6, T2, T8}
{T7}
Note that the single link method does
not allow overlapping clusters, thus
partitioning the set of items.
T7
T6
T8
16
Major Clustering Approaches
 Partitioning approach:
 Construct various partitions and then evaluate them by some criterion, e.g.,
minimizing the sum of square errors
 Typical methods: k-means, k-medoids, CLARANS
 Hierarchical approach:
 Create a hierarchical decomposition of the set of data (or objects) using
some criterion
 Typical methods: Diana, Agnes, BIRCH, CAMELEON
 Density-based approach:
 Based on connectivity and density functions
 Typical methods: DBSACN, OPTICS, DenClue
 Grid-based approach:
 based on a multiple-level granularity structure
 Typical methods: STING, WaveCluster, CLIQUE
17
Major Clustering Approaches (Cont.)
 Model-based:
 A model is hypothesized for each of the clusters and tries to find the best
fit of that model to each other
 Typical methods: EM, SOM, COBWEB
 Frequent pattern-based:
 Based on the analysis of frequent patterns
 Typical methods: p-Cluster
 User-guided or constraint-based:
 Clustering by considering user-specified or application-specific constraints
 Typical methods: COD (obstacles), constrained clustering
 Link-based clustering:
 Objects are often linked together in various ways
 Massive links can be used to cluster objects: SimRank, LinkClus
18
Partitioning Approaches
 The notion of comparing item similarities can be extended to
clusters themselves, by focusing on a representative vector
for each cluster
 cluster representatives can be actual items in the cluster or other “virtual”
representatives such as the centroid
 this methodology reduces the number of similarity computations in
clustering
 clusters are revised successively until a stopping condition is satisfied, or
until no more changes to clusters can be made
 Partitioning Methods
 reallocation method - start with an initial assignment of items to clusters
and then move items from cluster to cluster to obtain an improved
partitioning
 Single pass method - simple and efficient, but produces large clusters, and
depends on order in which items are processed
19
The K-Means Clustering Method
 Given k, the k-means algorithm is implemented in four
steps:
Partition objects into k nonempty subsets
Compute seed points as the centroids of the clusters of
the current partitioning (the centroid is the center, i.e.,
mean point, of the cluster)
Assign each object to the cluster with the nearest seed
point
Go back to Step 2, stop when the assignment does not
change
20
K-Means Algorithm
 The basic algorithm (based on reallocation method):
1. Select K initial clusters by (possibly) random assignment of some items to clusters and compute each
of the cluster centroids.
2. Compute the similarity of each item xi to each cluster centroid and (re-)assign each item to the cluster
whose centroid is most similar to xi.
3. Re-compute the cluster centroids based on the new assignments.
4. Repeat steps 2 and 3 until three is no change in clusters from one iteration to the next.
Example: Clustering Documents
Initial (arbitrary)
assignment:
C1 = {D1,D2},
C2 = {D3,D4},
C3 = {D5,D6}
Cluster Centroids
D1
D2
D3
D4
D5
D6
D7
D8
C1
C2
C3
T1
0
4
0
0
0
2
1
3
4/2
0/2
2/2
T2
3
1
4
3
1
2
0
1
4/2
7/2
3/2
T3
3
0
0
0
3
0
3
0
3/2
0/2
3/2
T4
0
1
0
3
0
0
2
0
1/2
3/2
0/2
T5
2
2
2
3
1
4
0
2
4/2
5/2
5/2
21
Example: K-Means
Now compute the similarity (or distance) of each item with each cluster, resulting a
cluster-document similarity matrix (here we use dot product as the similarity measure).
C1
C2
C3
D1
29/2
31/2
28/2
D2
29/2
20/2
21/2
D3
24/2
38/2
22/2
D4
27/2
45/2
24/2
D5
17/2
12/2
17/2
D6
32/2
34/2
30/2
D7
15/2
6/2
11/2
D8
24/2
17/2
19/2
For each document, reallocate the document to the cluster to which it has the highest
similarity (shown in red in the above table). After the reallocation we have the following
new clusters. Note that the previously unassigned D7 and D8 have been assigned, and that
D1 and D6 have been reallocated from their original assignment.
C1 = {D2,D7,D8}, C2 = {D1,D3,D4,D6}, C3 = {D5}
This is the end of first iteration (i.e., the first reallocation).
Next, we repeat the process for another reallocation…
22
Example: K-Means
Now compute new
cluster centroids using
the original documentterm matrix
C1 = {D2,D7,D8}, C2 = {D1,D3,D4,D6}, C3 = {D5}
D1
D2
D3
D4
D5
D6
D7
D8
C1
C2
C3
This will lead to a new
cluster-doc similarity matrix
similar to previous slide.
Again, the items are
reallocated to clusters with
highest similarity.
C1
C2
C3
D1
7.67
16.75
14.00
New assignment 
T1
0
4
0
0
0
2
1
3
8/3
2/4
0/1
D2
15.01
11.25
3.00
D3
5.34
17.50
6.00
T2
3
1
4
3
1
2
0
1
2/3
12/4
1/1
D4
9.00
19.50
6.00
D5
5.00
8.00
11.00
T3
3
0
0
0
3
0
3
0
3/3
3/4
3/1
D6
12.00
6.68
9.34
T4
0
1
0
3
0
0
2
0
3/3
3/4
0/1
D7
7.67
4.25
9.00
T5
2
2
2
3
1
4
0
2
4/3
11/4
1/1
D8
11.34
10.00
3.00
C1 = {D2,D6,D8}, C2 = {D1,D3,D4}, C3 = {D5,D7}
Note: This process is now repeated with new clusters. However, the next iteration in this example
Will show no change to the clusters, thus terminating the algorithm.
23
K-Means Algorithm
 Strength of the k-means:
 Relatively efficient: O(tkn), where n is # of objects, k is # of
clusters, and t is # of iterations. Normally, k, t << n
 Often terminates at a local optimum
 Weakness of the k-means:
 Applicable only when mean is defined; what about categorical
data?
 Need to specify k, the number of clusters, in advance
 Unable to handle noisy data and outliers
 Variations of K-Means usually differ in:
 Selection of the initial k means
 Dissimilarity calculations
 Strategies to calculate cluster means
24
Single Pass Method
 The basic algorithm:
1. Assign the first item T1 as representative for C1
2. for item Ti calculate similarity S with centroid for each existing cluster
3. If Smax is greater than threshold value, add item to corresponding cluster and
recalculate centroid; otherwise use item to initiate new cluster
4. If another item remains unclustered, go to step 2
See: Example of Single Pass Clustering Technique
 This algorithm is simple and efficient, but has some problems
 generally does not produce optimum clusters
 order dependent - using a different order of processing items will result in a
different clustering
25
Hierarchical Clustering Algorithms
• Two main types of hierarchical clustering
– Agglomerative:
• Start with the points as individual clusters
• At each step, merge the closest pair of clusters until only one cluster (or k
clusters) left
– Divisive:
• Start with one, all-inclusive cluster
• At each step, split a cluster until each cluster contains a point (or there are k
clusters)
• Traditional hierarchical algorithms use a similarity or
distance matrix
– Merge or split one cluster at a time
Hierarchical Clustering Algorithms
 Use dist / sim matrix as clustering criteria
 does not require the no. of clusters as input, but needs a termination condition
Step 0
Step 1
Step 2
Step 3
Step 4
Agglomerative
a
ab
b
abcde
c
cd
d
cde
e
Divisive
Step 4
Step 3
Step 2
Step 1
Step 0
27
Hierarchical Agglomerative Clustering
 Hierarchical Agglomerative Methods
 starts with individual items as clusters
 then successively combine smaller clusters to form larger ones
 combining clusters requires a method to determine similarity or distance
between two existing clusters
 Some commonly used HACM methods for combining clusters
 Single Link: at each step join most similar pairs of objects that are not yet in the
same cluster
 Complete Link: use least similar pair between each cluster pair to determine
inter-cluster similarity - all items within one cluster are linked to each other
within a similarity threshold
 Group Average (Mean): use average value of pairwise links within a cluster to
determine inter-cluster similarity (i.e., all objects contribute to inter-cluster
similarity)
 Ward’s method: at each step join cluster pair whose merger minimizes the
increase in total within-group error sum of squares (based on distance between
centroids) - also called the minimum variance method
28
Hierarchical Agglomerative Clustering
 Basic procedure
 1. Place each of N items into a cluster of its own.
 2. Compute all pairwise item-item similarity coefficients
Total of N(N-1)/2 coefficients
 3. Form a new cluster by combining the most similar pair of
current clusters i and j
(use one of the methods described in the previous slide, e.g., complete
link, group average, Ward’s, etc.);
update similarity matrix by deleting the rows and columns
corresponding to i and j;
calculate the entries in the row corresponding to the new cluster i+j.
 4. Repeat step 3 if the number of clusters left is great than 1.
29
Hierarchical Agglomerative Clustering
:: Example
4
1
2
5
5
0.4
0.35
2
0.3
0.25
3
3
6
1
4
0.2
0.15
0.1
0.05
0
Nested Clusters
3
6
4
1
Dendrogram
2
5
Input/ Initial setting
Start with clusters of individual points and a distance/similarity
matrix
p1
p2
p3
p4 p5
...
p1
p2
p3
p4
p5
.
.
.
Distance/Similarity Matrix
...
p1
p2
p3
p4
p9
p10
p11
p12
Intermediate State
 After some merging steps, we have some clusters
C1
C2
C3
C4
C5
C1
C2
C3
C3
C4
C4
C5
C1
Distance/Similarity Matrix
C2
C5
...
p1
p2
p3
p4
p9
p10
p11
p12
Intermediate State
 Merge the two closest clusters (C2 and C5) and update the
distance matrix
C1
C2
C3
C4
C5
C1
C2
C3
C3
C4
C4
C5
Distance/Similarity Matrix
C1
C2
C5
...
p1
p2
p3
p4
p9
p10
p11
p12
After Merging
 “How do we update the distance matrix?”
C1
C1
C3
C2 + C5
C4
C2
+
C5
C3
C4
?
?
?
C3
?
C4
?
?
?
C1
C2 + C5
...
p1
p2
p3
p4
p9
p10
p11
p12
Distance between two clusters
 Single-link distance between clusters Ci and Cj is the minimum
distance between any object in Ci and any object in Cj
 The distance is defined by the two most similar objects

Dsl Ci , C j   min x , y d ( x, y ) x  Ci , y  C j
I1
I2
I3
I4
I5
I1
1.00
0.90
0.10
0.65
0.20
I2
0.90
1.00
0.70
0.60
0.50
I3
0.10
0.70
1.00
0.40
0.30
I4
0.65
0.60
0.40
1.00
0.80
I5
0.20
0.50
0.30
0.80
1.00
1
2 3

4
5
Distance between two clusters
 Complete-link distance between clusters Ci and Cj is the
maximum distance between any object in Ci and any object in Cj
 The distance is defined by the two least similar objects

Dcl Ci , C j   max x , y d ( x, y ) x  Ci , y  C j
I1
I2
I3
I4
I5
I1
1.00
0.90
0.10
0.65
0.20
I2
0.90
1.00
0.70
0.60
0.50
I3
0.10
0.70
1.00
0.40
0.30
I4
0.65
0.60
0.40
1.00
0.80
I5
0.20
0.50
0.30
0.80
1.00
1
2

3 4
5
Distance between two clusters
 Group average distance between clusters Ci and Cj is the average
distance between objects in Ci and objects in Cj
 The distance is defined by the average similarities
1
Davg Ci , C j  
Ci  C j
I1
I2
I3
I4
I5
I1
1.00
0.90
0.10
0.65
0.20
I2
0.90
1.00
0.70
0.60
0.50
I3
0.10
0.70
1.00
0.40
0.30
I4
0.65
0.60
0.40
1.00
0.80
I5
0.20
0.50
0.30
0.80
1.00
 d ( x, y)
xCi , yC j
1
2
3
4
5
Strengths of single-link clustering
Original Points
• Can handle non-elliptical shapes
Two Clusters
Limitations of single-link clustering
Original Points
Two Clusters
• Sensitive to noise and outliers
• It produces long, elongated clusters
Strengths of complete-link clustering
Original Points
Two Clusters
• More balanced clusters (with equal diameter)
• Less susceptible to noise
Limitations of complete-link clustering
Original Points
Two Clusters
• Tends to break large clusters
• All clusters tend to have the same diameter
• Small clusters are merged with larger ones
Average-link clustering
 Compromise between Single and Complete Link
 Strengths
 Less susceptible to noise and outliers
 Limitations
 Biased towards globular clusters
Clustering Application:
Collaborative Filtering
 Discovering Aggregate Profiles
 Goal: to capture “user segments” based on their common behavior or interests
 Method: Cluster user transactions to obtain user segments automatically, then
represent each cluster by its centroid
 Aggregate profiles are obtained from each centroid after sorting by weight and filtering
out low-weight items in each centroid
 Profiles are represented as weighted collections of items (pages, products, etc.)
 weights represent the significance of item within each cluster
 profiles are overlapping, so they capture common interests among different
groups/types of users (e.g., customer segments)
43
Aggregate Profiles - An Example
Original
Session/user
data
Result of
Clustering
A
1
0
1
1
0
1
1
0
1
0
user0
user1
user2
user3
user4
user5
user6
user7
user8
user9
Cluster 0 user
user
user
Cluster 1 user
user
user
user
Cluster 2 user
user
user
1
4
7
0
3
6
9
2
5
8
A
0
0
0
1
1
1
0
1
1
1
B
1
0
0
1
0
0
1
0
0
1
B
0
0
0
1
1
1
1
0
0
0
C
0
1
0
0
1
0
0
1
1
1
C
1
1
1
0
0
0
1
0
0
1
D
0
1
1
0
1
1
0
1
1
0
D
1
1
1
0
0
0
0
1
1
1
E
0
0
1
0
0
1
0
0
1
0
F
1
0
0
1
0
0
1
0
0
1
E
0
0
0
0
0
0
0
1
1
1
F
0
0
0
1
1
1
1
0
0
0
Given an active session A  B,
the best matching profile is
Profile 1. This may result in a
recommendation for item F
since it appears with high
weight in that profile.
PROFILE 0 (Cluster Size = 3)
-------------------------------------1.00
C
1.00
D
PROFILE 1 (Cluster Size = 4)
-------------------------------------1.00
B
1.00
F
0.75
A
0.25
C
PROFILE 2 (Cluster Size = 3)
-------------------------------------1.00
A
1.00
D
1.00
E
0.33
C
44
Web Usage Mining: clustering example
 Transaction Clusters:
 Clustering similar user transactions and using centroid of each cluster as an
aggregate usage profile (representative for a user segment)
Sample cluster centroid from dept. Web site (cluster size =330)
Support
URL
Pageview Description
1.00
/courses/syllabus.asp?course=45096-303&q=3&y=2002&id=290
SE 450 Object-Oriented Development
class syllabus
0.97
/people/facultyinfo.asp?id=290
Web page of a lecturer who thought the
above course
0.88
/programs/
Current Degree Descriptions 2002
0.85
/programs/courses.asp?depcode=96
&deptmne=se&courseid=450
SE 450 course description in SE program
0.82
/programs/2002/gradds2002.asp
M.S. in Distributed Systems program
description
45
Clustering Application:
Discovery of Content Profiles
 Content Profiles
 Goal: automatically group together documents which partially deal with similar
concepts
 Method:
 identify concepts by clustering features (keywords) based on their common
occurrences among documents (can also be done using association discovery or
correlation analysis)
 cluster centroids represent docs in which features in the cluster appear frequently
 Content profiles are derived from centroids after filtering out low-weight docs
in each centroid
 Note that each content profile is represented as a collections of item-weight
pairs (similar to usage profiles)
 however, the weight of an item in a profile represents the degree to which features
in the corresponding cluster appear in that item.
46
Content Profiles – An Example
Filtering
threshold = 0.5
PROFILE 0 (Cluster Size = 3)
-------------------------------------------------------------------------------------------------------------1.00
C.html
(web, data, mining)
1.00
D.html
(web, data, mining)
0.67
B.html
(data, mining)
PROFILE 1 (Cluster Size = 4)
------------------------------------------------------------------------------------------------------------1.00
B.html
(business, intelligence, marketing, ecommerce)
1.00
F.html
(business, intelligence, marketing, ecommerce)
0.75
A.html
(business, intelligence, marketing)
0.50
C.html
(marketing, ecommerce)
0.50
E.html
(intelligence, marketing)
PROFILE 2 (Cluster Size = 3)
------------------------------------------------------------------------------------------------------------1.00
A.html
(search, information, retrieval)
1.00
E.html
(search, information, retrieval)
0.67
C.html
(information, retrieval)
0.67
D.html
(information, retireval)
47
User Segments Based on Content
 Essentially combines usage and content profiling techniques
discussed earlier
 Basic Idea:
 for each user/session, extract important features of the selected
documents/items
 based on the global dictionary create a user-feature matrix
 each row is a feature vector representing significant terms associated with
documents/items selected by the user in a given session
 weight can be determined as before (e.g., using tf.idf measure)
 next, cluster users/sessions using features as dimensions
 Profile generation:
 from the user clusters we can now generate overlapping collections of features
based on cluster centroids
 the weights associated with features in each profile represents the significance
of that feature for the corresponding group of users.
48
A.html
B.html
C.html
D.html
E.html
user1
1
0
1
0
1
user2
1
1
0
0
1
user3
0
1
1
1
0
user4
1
0
1
1
1
user5
1
1
0
0
1
user6
1
0
1
1
1
Feature-Document
Matrix FP
User transaction matrix UT
A.html
B.html
C.html
D.html
E.html
web
0
0
1
1
1
data
0
1
1
1
0
mining
0
1
1
1
0
business
1
1
0
0
0
intelligence
1
1
0
0
1
marketing
1
1
0
0
1
ecommerce
0
1
1
0
0
search
1
0
1
0
0
information
1
0
1
1
1
retrieval
1
0
1
1
1
49
Content Enhanced Transactions
User-Feature
Matrix UF
Note that: UF = UT x FPT
web
data
mining
business
intelligence
marketing
ecommerce
search
information
retrieval
user1
2
1
1
1
2
2
1
2
3
3
user2
1
1
1
2
3
3
1
1
2
2
user3
2
3
3
1
1
1
2
1
2
2
user4
3
2
2
1
2
2
1
2
4
4
user5
1
1
1
2
3
3
1
1
2
2
user6
3
2
2
1
2
2
1
2
4
4
Example: users 4 and 6 are more interested in concepts related to Web
information retrieval, while user 3 is more interested in data mining.
50
Clustering and Collaborative Filtering
:: clustering based on ratings: movielens
51
Clustering and Collaborative Filtering
:: tag clustering example
52
Hierarchical Clustering
:: example – clustered search results
Can drill
down within
clusters to
view subtopics or to
view the
relevant
subset of
results
53
Download