Questionnaires
Judy Kay
Main references:
Sauro, J., & Lewis, J. R. (2012). Quantifying the user experience: Practical statistics
for user research. Elsevier. Ch 8
http://www.measuringu.com/blog/standardized-usability.php
http://www.usability.gov/ - Improving the User Experience
Brooke, J. (2013). SUS: a retrospective. Journal of Usability Studies, 8(2), 29-40.
Landscape of methods to understand users
Interviews, Surveys, Focus Groups
Observations of what people
actually do
Ask people what they believe
Many limitations:
• Human memory
• Politeness
• Self-awareness
Many benefits:
• User involvement, commitment,
buy-in
• Access to what users know, think,
perspectives, contexts that matter
Many limitations:
• Can you be there at the right
time?
• Observer effect
• Intrusiveness
Many benefits:
• Insights into actual behaviour
• And contexts
• And stakeholder perspectives
2
Questionnaires
• Types we consider
– Post study, Post task, Web sites
– Usability, hedonic
• Can be easy to administer
– Closed
– Open
• Difficult to design them well
• Standard Questionnaires
–
–
–
–
SUS
SEQ
SUPR-Q
Attrakdiff
https://webdesignviews.com/measuring-usability-with-system-usability-scale-sus/
SUS (on tasks completion)
SUS: A Quick and Dirty Usability Scale by John Brooke
 3,000 Scholar citations
 Determining What Individual SUS Scores Mean:
Adding an Adjective Rating Scale by Kortum
Bangor and Miller May *
Goals of SUS
from Brooke, J. (2013). SUS: a retrospective. Journal of Usability Studies, 8(2), 29-40.
• “Effectiveness? … a measure of the user’s subjective
view of the usability of the system…[not] diagnostic
information… it is possible to get reliable results with a
sample of 8-12 users. … applicable over a wide range of
systems and types of technology … body of normative
data … produces similar results to other, more
extensive attitude … good ability to discriminate and
identify systems with good and poor usability
• Efficiency? … “quick.” … all SUS required you to do was
to check one box per question for ten questions.
• Satisfying?
How was SUS created?
And why does that matter?
• Assembled a pool of 50 potential questionnaire statements.
• Two systems
– {linguistic .. end users, for systems programmers}
– {“really easy to use”, “almost impossible to use, even for highly technically
skilled users”}.
• Twenty people … secretary to systems programmer
– Rated 50 potential questionnaire statements
– 5-point scale .. “strongly agree” to “strongly disagree.”
• Final items selected :
– Strong inter-correlations between all of the items selected. (Why does that
matter?)
– The total number of items had to be limited
• Chose 10
– 10 (intercorrelations between all 10 were r=± 0.7 to ± 0.9.
– five of each polarity: ie good .. strong agreement …strong disagreement.
• … “historically, when SUS was first constructed, it was a generally good
practice to alternate positive and negative items; if I was starting out
again, I might do things differently.”
The questions
Please check the box that reflects your immediate response to each
statement. Don’t think too long about each statement. Make sure you
respond to every statement. If you don’t know how to respond, simply
check box “3.” *
1.
2.
3.
4.
I think that I would like to use this system frequently.
I found the system unnecessarily complex.
I thought the system was easy to use.
I think that I would need the support of a technical person to be
able to use this system.
5. I found the various functions in this system were well integrated.
6. I thought there was too much inconsistency in this system.
7. I would imagine that most people would learn to use this system
very quickly.
8. I found the system very cumbersome (*awkward) to use.
9. I felt very confident using the system.
10. I needed to learn a lot of things before I could get going with this
system.
* System … product
Likert scales, semantic differential
• From SUS:
• Please check the box that reflects your immediate
response to each statement. Don’t think too long
about each statement. Make sure you respond to
every statement. If you don’t know how to
respond, simply check box “3.”
http://www.measuringu.com/sus.php
Bangor, A., Kortum, P., & Miller, J. (2009). Determining what individual SUS scores mean:
Adding an adjective rating scale. Journal of usability studies, 4(3), 114-123.
Class activity
Very vague and unscientific, but to get a feel for the nature of the beast
Suppose you had just been involved in a study using CUSP for a set of
10 tasks, such as:
Determine the deadlines for COMP5047 assignments
Find out whether COMP5427 has a final exam or not
…. Basic stuff that is actually in CUSP and is its core functionality
Use SUS for your experience to-date of CUSP
Scoring
Getting the SUS score (/100)
1. Convert each score into a number/4
1. For items 1,3,5,7,and 9 the score contribution is the scale
position minus 1
2. For items 2,4,6,8 and 10, the contribution is 5 minus the
scale position.
2. Multiply the sum of the scores by 2.5 .
Empirically – overall 68 is “average”
SUS: A Quick and Dirty Usability Scale by John Brooke
More subtlety in scoring
• … 3,500 surveys within 273 studies.
• ... to evaluate a wide range of interfaces that include Web sites, cell
phones, IVR, GUI, hardware, and TV user interfaces.
• … participants performed a representative sample of tasks for the
product (usually in formative usability tests) and then, before any
discussion with the moderator, completed the survey.
Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale by Kortum Bangor and Miller
May
Mean ~ 70 over long term. Median score of 70.5.
18% of surveys < 50
http://www.measuringu.com/sus.php
From usability.gov
• The scoring system is somewhat complex
• There is a temptation, when you look at the
scores, since they are on a scale of 0-100, to
interpret them as percentages, they are not
• The best way to interpret your results involves
“normalizing” the scores to produce a percentile
ranking
• SUS is not diagnostic - its use is in classifying the
ease of use of the site, application or
environment being tested
More detail in SUS pragmatics
http://www.measuringu.com/products/SUSpack
• Score SUS properly, accounting for errors,
omissions, inconsistencies tested by internal
reliability test (Cronbach's alpha)
• Percentile Ranks and Grades A+ .. F: Compare
against 500 studies by application type
• Compares SUS Scores Statistically
• Computes Sample Sizes, based on desired
margin of error
Post-Study Usability
Questionnaire (PSSUQ)
Another free standard questionnaire
16-item survey of satisfaction … sub-scales
•
System Quality (average of Q 1-6),
•
Information Quality (average of Q 7-12),
•
Interface Quality (average of Q 13-16).
The PSSUQ is highly reliable (.94) and is entirely free.
From: http://chaione.com/ux-research-standardizing-usability-questionnaires/
Carefully review these questions and
discuss how they they compare with
SUS
What are the SUPR-Q Questions?
http://www.suprq.com/
Usability
This website is easy to use.
I am able to find what I need quickly on this website.
I enjoy using the website.
It is easy to navigate within the website.
Credibility (Trust, Value & Comfort)
I feel comfortable purchasing from this website.
This website keeps the promises it makes to me.
I can count on the information I get on this website.
I feel confident conducting business with this website.
The information on this website is valuable.
Loyalty
How likely are you to recommend this website to a friend or colleague?
I will likely visit this website in the future.
Appearance
I found the website to be attractive.
The website has a clean and simple presentation.
How likely are you to recommend this website to a friend or colleague?
http://www.suprq.com/
• 13 questions to give a normalized score.
• scoring system is based on the relative ranking from a
database of over 100 websites and thousands of users.
• … percentile ranks for each factor
• … developed through a process of psychometric validation.
• Creation:
–
–
–
–
Over 75 candidate questions were tested
thousands of users
across hundreds of websites
to arrive at the 13 questions:
• reliable, sensitive and valid picture of the attitudes which shape the
success of a website.
How do you score the responses?
– SUPR-Q score = sum of responses for the first 12 questions + 1/2 the score for the Likelihood
to recommend question. (givings core 12 .. 65)
– Compare to the industry benchmarks.
How is the SUPR score generated?
– score is a percentile (hence percentile rank), ie score of 75% > 75% of all websites tested in
the database.
– Ranking done for all of the four factors + by each of the 13 questions.
– This can also be done within a specific industry. It is common to have a range of rankings for
each of the four subscales.
How do SUPR and SUS compare?
• SUS has 10 questions, highly reliable, generic (ie not just web)
• Assesses usability, not trust, credibility and loyalty.
• The SUPR-Q usability factor has a strong correlation with a SUS score, r = .96. p <
.001, ie just four questions account for 93 percent of the variation in SUS (.96
squared).
• Both are post-study
• SUS is free and comparative scoring is avilable freely, SUPR-Q costs
Likert Scales
About closed rating scales
• Given a statement, indicate agreement or other closed response
• Semantic differential
– Set of discrete choice eg
•
•
•
•
•
1 .. 5
1 10 20 30 40 50 60 70 80 90 100
-2 .. +2
  
What is benefit of each?
– Anchors eg Strongly disagree
• Set of options for responses
– Eg Rate your knowledge of Heuristic Evaluation
•
•
•
•
•
Never heard of it
Have heard of it but do not understand it
Think I may know what it means
Fairly confident I do know that it means
Know this really well and couple explain it to a friend
How many points?
• Finstad, K. (2010). Response interpolation and scale
sensitivity: Evidence against 5-point scales. Journal of
Usability Studies, 5(3), 104-110.
• Two versions of the SUS were used in usability testing
workforce management application.
– Usual 5-point, 7-point version
• Each participant did a series of tasks involving the
respective enterprise application being tested.
• Participants read each survey item out loud and verbally
indicated their response.
• Results:
– 11 people interpolated for 5-point, nil for 7-point
– 77 people did not interpolate for 5-point, 84 for 7-point
The conclusion?
Design decisions
• The questions to ask
– Challenges of biasing the user
– Asking all questions in term of a positive view
• eg I found the LearnWell site easy to use
– Or all in terms of a negative view
• eg I found the LearnWell site hard to use
• User choices:
– Note we previously used a different approach Words to describe
each case (as in the learning scale for the concept inventory:
never heard of it … could teach a friend)
– Just give end points
• Even or odd?
• What makes sense to ask users?
Why standard questionnaires
• Reliability : how consistent responses are to the questions. How would
you assess that?
• Validity : correlates with other evidence eg completion rates.
• Sensitivity: discriminates between good interfaces and bad
interfaces. The ability to detect differences at even small sample sizes
(<10) was one of the major reasons why the System Usability Scale (SUS)
and the Single Ease Question (SEQ) are recommended.
• Objectivity: Give independent instrument.
• Quantification: Fine grain of reporting and statistical analysis.
• Economy: Cheap to design!
• Comparability + scientific generalisation
Adapting questionnaires
• You can modify an existing questionnaire
– Choosing a subset of questions
– Changing the wording in some questions
– Adding questions to address specific areas of
concern
– Using different scale values
• Warning: Modifying a questionnaire can
damage its validity
37
Copyright MKP. All rights
reserved.
SEQ
Single Ease Question
Sauro, J., & Dumas, J. S. (2009, April). Comparison of
three one-question, post-task usability questionnaires.
In Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems (pp. 1599-1608). ACM.
SEQ
http://www.measuringu.com/blog/single-question.php
AttrackDiff
A very different sort of more recent
questionnaire
Hedonic quality: AttrakDiff
Marc Hassenzahl
http://attrakdiff.de/sience-en.html
• “The model separates the four essential
aspects:
– The product quality intended by the designer.
– The subjective perception of quality and
subjective evaluation of quality [by the user].
– The independent pragmatic and hedonic qualities.
– Behavioural and emotional consequences. “
http://translate.google.com.au/translate?hl=en&sl=de&u=http://www.qu.tu-berlin.de/menue/forschung/laufende_projekte/joyofuse/joy_of_use/joy_of_use/measurement_methods/attrakdiff/&prev=/search%3Fq%3D
Summary
• Questionnaires – what is their role in the
whole process?
• When is it appropriate to use a
questionnaire?
• When is it not?
• Why standard questionnaires?
• What you need to know about altering them?
• How you should use them for Assignment 2.
Now to the project
The inspiration
SAL
Simple Ambient Loggers
(small enough to do a modest
prototype and solid usability testing)
Example of a logger to compare and
contract
• https://www.dropbox.com/s/luqq3dfyqy6gvm
0/Ubicomp2015-demo.mp4?dl=0
Overview of some loggers
Role for your project
Critique
Inspiration
Grading