Uploaded by udayanda

5 Techniques to Identify Clusters In Your Data – MeasuringU

advertisement
 Menu
5 Techniques to Identify Clusters In
Your Data
Jeff Sauro, PhD
May 31, 2017
Understanding who your users are and
what they think about an experience is an essential step for measuring and
improving the user experience.
Part of understanding your users is understanding how they are similar and different
with respect to demographics, psychographics, and behaviors. These groupings are
often called clusters or segments to refer to the shared characteristics within each
group.
Clusters play an important role in both marketing and product decisions and they
don’t only apply to people. You can use them to organize content on websites,
features in software, or items in a questionnaire.
Like many approaches in data science and statistics, there are different approaches

for uncovering clusters. The process involves examining observed and latent
(hidden) variables to identify the similarities and number of distinct groups. Here are
five ways to identify segments.
 Menu
1. Cross-Tab
Cross-tabbing is the process of examining more than one variable in the same table
or chart (“crossing” them). It allows you to see to what extent groups differ on
variables. For example, the graph below shows which activities participants reported
doing online, crossed by device type (laptop/smartphone). We can see some
activities that cluster on smartphones (e.g. taking photos/videos with the camera,
downloading apps, and listening to music) versus desktops.
You can also cross-tab using more than two variables and then create a visualization
to better see the clusters. For example, the graph below shows three variables that
describe clusters with smartphone usage: task importance, task frequency, and the
stage in which the activities were performed.

 Menu
Cross-tabbing is most commonly done using observed variables (like in the examples
above) to illustrate similarities and differences between groups. You can also crosstab using created and latent variables but first need to create a way to represent
hidden variables using one of the subsequent methods.
2. Cluster Analysis
Cluster analysis groups related items together using different algorithms to identify
the “clusters.” These clusters are latent variables, meaning they aren’t directly
measured but instead are inferred from the relationship items have with each other.
Cluster analysis is the approach used in card sorting when you want to know how
closely products, content, or functions relate from the users’ perspective.
For example, the graph below—a dendrogram—shows a visualization of the
similarities (from a similarity matrix) in ratings participants provided with respect to
smartphone usage. It revealed six clusters and how participants grouped items
together (e.g. looking at user reviews and detailed specifications on consumer
electronics products).

 Menu
3. Factor Analysis
Factor analysis is a staple of quantitative research and has history dating back to
some of the earliest research into measuring intelligence. In an exploratory factor
analysis (EFA), a researcher looks to identify underlying latent groups of variables,
called factors, by using software to examine the intercorrelations between many
variables. The researcher then increases or decreases the number of factors and
variables in an iterative fashion to identify both the number of factors and which
variables “load” on each factor.
Factor analysis is often used in questionnaire development to identify underlying
constructs from many items participants respond to. When we created the SUPR-Q
we used a factor analysis that identified four factors. We later confirmed the four
factors in a new dataset using confirmatory factor analysis (CFA).
Factor analysis works best with continuous or ordinal data. The table below shows
the typical output for an exploratory factor analysis, which shows items that group or
“load” together.
For example, the loading of the items “It is easy to navigate within the website” and
which items load together, the researcher names the factor accordingly. For these

“The website is easy to use” both have high loadings on the first factor. Based on
two items we named it the Usability factor, because both items addressed the
 Menu
underlying concept of website usability (ease and findability).
Usability
Trust
Loyalty
Appearance
0.88
0.01
0.00
0.00
The website is easy to use.
0.87
0.01
0.02
-0.01
I am able to find what I need quickly
0.58
0.09
0.12
0.11
0.02
0.86
-0.07
0.01
0.06
0.84
0.05
-0.04
-0.01
0.35
0.31
0.20
0.05
-0.01
0.78
-0.03
0.04
-0.02
0.77
0.04
-0.01
0.35
0.39
0.16
I find the website to be attractive.
-0.01
0.01
0.03
0.76
The website has a clean and simple
0.34
-0.01
-0.03
0.56
Extraction Sums of Squared Loadings
5.90
0.91
0.44
0.20
% of Variance
53.67
8.25
3.97
1.80
It is easy to navigate within the
website.
on the website.
I feel comfortable purchasing from
the website.
I feel confident conducting business
on the website.
I can count on the information I get
on the website.
I will likely return to the website in the
future.
How likely are you to recommend the
website to a friend or colleague?
The website keeps the promises it
makes to me.
presentation.

Cumulative %
Rotation Sums of Squared Loadings
53.67
4.82
61.92
3.89
65.88
4.54
67.68
 Menu
4.62
4. Latent Class Analysis (LCA)
Latent class analysis is another method that identifies latent variables to segment
customers, content, and ideas. We use it as part of our process for creating a
customer segmentation analysis and the process of making personas more scientific.
An LCA can handle both nominal and ordinal data well. The process is iterative, as a
researcher has software identify the correlations between responses to uncover
segments. These segments are called classes and are analogous to the factors in a
factor analysis. The researcher often starts with four or five classes and adjusts the
number of classes and variables, retaining the variables that differentiate groups.
The number of classes the researcher settles on is a combination of finding the best
statistical fit, and something that matches the theories of what differentiates the
segments. For example, in an LCA we conducted as part of a segmentation on home
buying and selling, we suspected variables like whether participants had kids or were
budget conscious would differentiate the respondents. After the iterative process of
adding and removing variables we identified four classes using twelve variables as
shown in the graph below.

 Menu
We then used the four classes as new variables in a cross-tab, as shown in the graph
below, which crosses class membership and gender (showing disproportionally more
women in Classes C and D).
Classes, like factors, can be named based on the dominance of certain variables.
5. Multidimensional Scaling (MDS)
Multidimensional scaling is another technique related to cluster analysis and latent
class analysis that groups items or responses into latent variables. It’s often used to
transform participants’ judgments or preferences for products or experiences into
distances in multidimensional space. Participants typically rate the same products or
websites and an MDS provides a visualization of the similarities in two or three
dimensions.

We recently did an analysis on how participants differentiate retail websites on a
Menu
number of dimensions, including price, variety, quality, and website ease. The
graph
below shows how participants in the study perceive Amazon and Best Buy as similar
on price for consumer electronics compared to Walmart, HSN, and hhgregg.
Summary and Recommendations
Identifying clusters plays an important role in both marketing and product decisions.
Here are some things to consider when identifying clusters.
1. Clustering techniques generally require larger sample sizes. Statistical
techniques like factor analysis and LCA generally need a minimum of 100
responses (and ideally a lot more) for the algorithms to provide stable clusters.
You can use cross-tabbing on any sample size but the limits of small sample
sizes still apply.
2. You need specialized software. You’re going to need specialized software like
R, SPSS, or Minitab to do most of the advanced clustering techniques.

3. You need someone with good statistical knowledge. In addition to software,
you’ll need to have access to someone with advanced statistical knowledge on
how to interpret the statistical output and make recommendations on the
right number of clusters (this is when you contact MeasuringU).
 Menu
4. Data types matter. The type of data you collect (e.g. categorical data versus
rating-scale data) will help you determine the best clustering technique to use.
So don’t wait until you’ve collected your data to consult the stats person; you’ll
want their input on how to provide response options before you collect data.
5. You can use combinations of methods. You can combine clustering techniques
(especially cross-tabbing) with other techniques to help answer your research
questions.
You might also be interested in
Has the Net Promoter Score
Been Discredited in the…
How to Handle Bad Data
Picking the Right Dependent
Variables for UX Research
Sign-up for our weekly newsletter.
Your Email
Subscribe

Blog
 Menu
Most Popular
5 Types of Qualitative Methods
Measuring Usability with the System Usability Scale (SUS)
15 Common Rating Scales Explained
What Does Statistically Significant Mean?
4 Types of Observational Research
How to Measure the Reliability of Your Methods and Metrics
Most Recent
Select-All-That-Apply versus Yes/No Forced Choice Items
Sliders versus Five-Point Numeric Scales on Desktop and Mobile Devices
Censuses, Polls, Surveys, and Questionnaires:
How Are They Different?
Horizontal versus Vertical Rating Scales
Accuracy of Three Ways to Estimate SUS with the UX-Lite
Three Ways to Measure a User’s Prior Experience
Books

 Menu
Benchmarking the User Experience

 Menu
Customer Analytics For Dummies
Quantifying The User Experience: Practical Statistics For User Research

 Menu
Excel & R Companion to the 2nd Edition of Quantifying the User
Experience
About 
Blog
Calculators
MUIQ®Platform

Products
MUIQ Platform
Services
 Menu
Contact
Events
3300 E 1st Ave. Suite 370
Denver, Colorado 80206
United States
+1 303-963-5449
(Mountain Time)
Contact Us
   
Copyright © 2021 MeasuringU®

Download