Questionnaires Judy Kay Main references: Sauro, J., & Lewis, J. R. (2012). Quantifying the user experience: Practical statistics for user research. Elsevier. Ch 8 http://www.measuringu.com/blog/standardized-usability.php http://www.usability.gov/ - Improving the User Experience Brooke, J. (2013). SUS: a retrospective. Journal of Usability Studies, 8(2), 29-40. Landscape of methods to understand users Interviews, Surveys, Focus Groups Observations of what people actually do Ask people what they believe Many limitations: • Human memory • Politeness • Self-awareness Many benefits: • User involvement, commitment, buy-in • Access to what users know, think, perspectives, contexts that matter Many limitations: • Can you be there at the right time? • Observer effect • Intrusiveness Many benefits: • Insights into actual behaviour • And contexts • And stakeholder perspectives 2 Questionnaires • Types we consider – Post study, Post task, Web sites – Usability, hedonic • Can be easy to administer – Closed – Open • Difficult to design them well • Standard Questionnaires – – – – SUS SEQ SUPR-Q Attrakdiff https://webdesignviews.com/measuring-usability-with-system-usability-scale-sus/ SUS (on tasks completion) SUS: A Quick and Dirty Usability Scale by John Brooke 3,000 Scholar citations Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale by Kortum Bangor and Miller May * Goals of SUS from Brooke, J. (2013). SUS: a retrospective. Journal of Usability Studies, 8(2), 29-40. • “Effectiveness? … a measure of the user’s subjective view of the usability of the system…[not] diagnostic information… it is possible to get reliable results with a sample of 8-12 users. … applicable over a wide range of systems and types of technology … body of normative data … produces similar results to other, more extensive attitude … good ability to discriminate and identify systems with good and poor usability • Efficiency? … “quick.” … all SUS required you to do was to check one box per question for ten questions. • Satisfying? How was SUS created? And why does that matter? • Assembled a pool of 50 potential questionnaire statements. • Two systems – {linguistic .. end users, for systems programmers} – {“really easy to use”, “almost impossible to use, even for highly technically skilled users”}. • Twenty people … secretary to systems programmer – Rated 50 potential questionnaire statements – 5-point scale .. “strongly agree” to “strongly disagree.” • Final items selected : – Strong inter-correlations between all of the items selected. (Why does that matter?) – The total number of items had to be limited • Chose 10 – 10 (intercorrelations between all 10 were r=± 0.7 to ± 0.9. – five of each polarity: ie good .. strong agreement …strong disagreement. • … “historically, when SUS was first constructed, it was a generally good practice to alternate positive and negative items; if I was starting out again, I might do things differently.” The questions Please check the box that reflects your immediate response to each statement. Don’t think too long about each statement. Make sure you respond to every statement. If you don’t know how to respond, simply check box “3.” * 1. 2. 3. 4. I think that I would like to use this system frequently. I found the system unnecessarily complex. I thought the system was easy to use. I think that I would need the support of a technical person to be able to use this system. 5. I found the various functions in this system were well integrated. 6. I thought there was too much inconsistency in this system. 7. I would imagine that most people would learn to use this system very quickly. 8. I found the system very cumbersome (*awkward) to use. 9. I felt very confident using the system. 10. I needed to learn a lot of things before I could get going with this system. * System … product Likert scales, semantic differential • From SUS: • Please check the box that reflects your immediate response to each statement. Don’t think too long about each statement. Make sure you respond to every statement. If you don’t know how to respond, simply check box “3.” http://www.measuringu.com/sus.php Bangor, A., Kortum, P., & Miller, J. (2009). Determining what individual SUS scores mean: Adding an adjective rating scale. Journal of usability studies, 4(3), 114-123. Class activity Very vague and unscientific, but to get a feel for the nature of the beast Suppose you had just been involved in a study using CUSP for a set of 10 tasks, such as: Determine the deadlines for COMP5047 assignments Find out whether COMP5427 has a final exam or not …. Basic stuff that is actually in CUSP and is its core functionality Use SUS for your experience to-date of CUSP Scoring Getting the SUS score (/100) 1. Convert each score into a number/4 1. For items 1,3,5,7,and 9 the score contribution is the scale position minus 1 2. For items 2,4,6,8 and 10, the contribution is 5 minus the scale position. 2. Multiply the sum of the scores by 2.5 . Empirically – overall 68 is “average” SUS: A Quick and Dirty Usability Scale by John Brooke More subtlety in scoring • … 3,500 surveys within 273 studies. • ... to evaluate a wide range of interfaces that include Web sites, cell phones, IVR, GUI, hardware, and TV user interfaces. • … participants performed a representative sample of tasks for the product (usually in formative usability tests) and then, before any discussion with the moderator, completed the survey. Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale by Kortum Bangor and Miller May Mean ~ 70 over long term. Median score of 70.5. 18% of surveys < 50 http://www.measuringu.com/sus.php From usability.gov • The scoring system is somewhat complex • There is a temptation, when you look at the scores, since they are on a scale of 0-100, to interpret them as percentages, they are not • The best way to interpret your results involves “normalizing” the scores to produce a percentile ranking • SUS is not diagnostic - its use is in classifying the ease of use of the site, application or environment being tested More detail in SUS pragmatics http://www.measuringu.com/products/SUSpack • Score SUS properly, accounting for errors, omissions, inconsistencies tested by internal reliability test (Cronbach's alpha) • Percentile Ranks and Grades A+ .. F: Compare against 500 studies by application type • Compares SUS Scores Statistically • Computes Sample Sizes, based on desired margin of error Post-Study Usability Questionnaire (PSSUQ) Another free standard questionnaire 16-item survey of satisfaction … sub-scales • System Quality (average of Q 1-6), • Information Quality (average of Q 7-12), • Interface Quality (average of Q 13-16). The PSSUQ is highly reliable (.94) and is entirely free. From: http://chaione.com/ux-research-standardizing-usability-questionnaires/ Carefully review these questions and discuss how they they compare with SUS What are the SUPR-Q Questions? http://www.suprq.com/ Usability This website is easy to use. I am able to find what I need quickly on this website. I enjoy using the website. It is easy to navigate within the website. Credibility (Trust, Value & Comfort) I feel comfortable purchasing from this website. This website keeps the promises it makes to me. I can count on the information I get on this website. I feel confident conducting business with this website. The information on this website is valuable. Loyalty How likely are you to recommend this website to a friend or colleague? I will likely visit this website in the future. Appearance I found the website to be attractive. The website has a clean and simple presentation. How likely are you to recommend this website to a friend or colleague? http://www.suprq.com/ • 13 questions to give a normalized score. • scoring system is based on the relative ranking from a database of over 100 websites and thousands of users. • … percentile ranks for each factor • … developed through a process of psychometric validation. • Creation: – – – – Over 75 candidate questions were tested thousands of users across hundreds of websites to arrive at the 13 questions: • reliable, sensitive and valid picture of the attitudes which shape the success of a website. How do you score the responses? – SUPR-Q score = sum of responses for the first 12 questions + 1/2 the score for the Likelihood to recommend question. (givings core 12 .. 65) – Compare to the industry benchmarks. How is the SUPR score generated? – score is a percentile (hence percentile rank), ie score of 75% > 75% of all websites tested in the database. – Ranking done for all of the four factors + by each of the 13 questions. – This can also be done within a specific industry. It is common to have a range of rankings for each of the four subscales. How do SUPR and SUS compare? • SUS has 10 questions, highly reliable, generic (ie not just web) • Assesses usability, not trust, credibility and loyalty. • The SUPR-Q usability factor has a strong correlation with a SUS score, r = .96. p < .001, ie just four questions account for 93 percent of the variation in SUS (.96 squared). • Both are post-study • SUS is free and comparative scoring is avilable freely, SUPR-Q costs Likert Scales About closed rating scales • Given a statement, indicate agreement or other closed response • Semantic differential – Set of discrete choice eg • • • • • 1 .. 5 1 10 20 30 40 50 60 70 80 90 100 -2 .. +2 What is benefit of each? – Anchors eg Strongly disagree • Set of options for responses – Eg Rate your knowledge of Heuristic Evaluation • • • • • Never heard of it Have heard of it but do not understand it Think I may know what it means Fairly confident I do know that it means Know this really well and couple explain it to a friend How many points? • Finstad, K. (2010). Response interpolation and scale sensitivity: Evidence against 5-point scales. Journal of Usability Studies, 5(3), 104-110. • Two versions of the SUS were used in usability testing workforce management application. – Usual 5-point, 7-point version • Each participant did a series of tasks involving the respective enterprise application being tested. • Participants read each survey item out loud and verbally indicated their response. • Results: – 11 people interpolated for 5-point, nil for 7-point – 77 people did not interpolate for 5-point, 84 for 7-point The conclusion? Design decisions • The questions to ask – Challenges of biasing the user – Asking all questions in term of a positive view • eg I found the LearnWell site easy to use – Or all in terms of a negative view • eg I found the LearnWell site hard to use • User choices: – Note we previously used a different approach Words to describe each case (as in the learning scale for the concept inventory: never heard of it … could teach a friend) – Just give end points • Even or odd? • What makes sense to ask users? Why standard questionnaires • Reliability : how consistent responses are to the questions. How would you assess that? • Validity : correlates with other evidence eg completion rates. • Sensitivity: discriminates between good interfaces and bad interfaces. The ability to detect differences at even small sample sizes (<10) was one of the major reasons why the System Usability Scale (SUS) and the Single Ease Question (SEQ) are recommended. • Objectivity: Give independent instrument. • Quantification: Fine grain of reporting and statistical analysis. • Economy: Cheap to design! • Comparability + scientific generalisation Adapting questionnaires • You can modify an existing questionnaire – Choosing a subset of questions – Changing the wording in some questions – Adding questions to address specific areas of concern – Using different scale values • Warning: Modifying a questionnaire can damage its validity 37 Copyright MKP. All rights reserved. SEQ Single Ease Question Sauro, J., & Dumas, J. S. (2009, April). Comparison of three one-question, post-task usability questionnaires. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1599-1608). ACM. SEQ http://www.measuringu.com/blog/single-question.php AttrackDiff A very different sort of more recent questionnaire Hedonic quality: AttrakDiff Marc Hassenzahl http://attrakdiff.de/sience-en.html • “The model separates the four essential aspects: – The product quality intended by the designer. – The subjective perception of quality and subjective evaluation of quality [by the user]. – The independent pragmatic and hedonic qualities. – Behavioural and emotional consequences. “ http://translate.google.com.au/translate?hl=en&sl=de&u=http://www.qu.tu-berlin.de/menue/forschung/laufende_projekte/joyofuse/joy_of_use/joy_of_use/measurement_methods/attrakdiff/&prev=/search%3Fq%3D Summary • Questionnaires – what is their role in the whole process? • When is it appropriate to use a questionnaire? • When is it not? • Why standard questionnaires? • What you need to know about altering them? • How you should use them for Assignment 2. Now to the project The inspiration SAL Simple Ambient Loggers (small enough to do a modest prototype and solid usability testing) Example of a logger to compare and contract • https://www.dropbox.com/s/luqq3dfyqy6gvm 0/Ubicomp2015-demo.mp4?dl=0 Overview of some loggers Role for your project Critique Inspiration Grading