CSE 490L
Online Usability Testing
Introduction to Problem
Standing in line and waiting for an order at a local coffee shop or quick serve restaurant adds to the
many inconveniences of daily life. A significant amount of time and money is being wasted with these
inefficient processes. With the advent of Swickr, a mobile website, we hope streamline the order
process so that consumers can order and pay for their order on their way to the store. That way,
customers’ orders will be ready by the time they get there. The goal of this experiment is to determine
the usability of our UI and ultimately reduce the transaction costs and automate the process of paying
for and receiving food.
The following section will cover the overall method of our online usability study. Specifically, what types
of participants we used, the environment the study was conducted in, the tasks we asked our
participants to complete, the procedure we used, and our test measures of success.
For the online participants, we e-mailed our friends whom we knew had the iPhone or at least had a
Mac on which he or she could install the iPhone simulator so they would be able to view the Swickr
application correctly.
To find participants for the lab experiment portion, we went down to the CSE labs and asked whoever
was available to participate. The lab experiment participants did not have to own an iPhone or a Mac
since we used Rylan’s iPhone as a testing platform.
Overall, the online survey participant pool was relatively tech savvy, predominantly male, and mostly
The online participants performed the usability test remotely from wherever they were.
For the lab experiment participants, the participants performed the tasks in one of the basement CSE
labs using Rylan’s iPhone.
For the online study participants, we wrote an e-mail pointing them to a webpage with the test
instructions. The webpage contained a brief overview of what Swickr was and the given set of tasks.
From there we would ask them to select a username from which we could use to uniquely identify them
when we were going through the results.
From there, they were asked to go through the tasks we prepared for them. Afterwards, they filled out
the survey asking them pertinent questions regarding their test experience.
We set up our Swickr application so that every link click or input of information from the user was
tracked using their username. This way we could see how long a user stayed on a page, what they were
clicking on, and how long it took for them to accomplish the given tasks.
Test Measures
For the online survey portion we measured the amount of time it took to complete each task. The task
completion rate and took into account the user feedback that was entered in to our Catalyst survey.
For the lab experiment portion we measured the amount of user errors the user encountered and also
how long it took for the user to complete a given task.
Difficult Task: User needs to search for the closest Starbucks, order a Grande Vanilla Frappuccino, and
add it to his/her favorites. The User starts by searching and choosing the closest Starbucks. After the
user makes the selection of products to order, he/she needs to signup for an account. The first part of
the sign up process is to create a username and password, then they are promoted to another page in
order to enter their billing and credit card information. The task requires the user to add the item to his
favorites, which is done by selecting the “Star” checkbox on the order page. The user finally places the
order, and receives a confirmation page.
Easy Task: The user is on the homepage and his/her goal is to order the same drink from the favorites
list. So, he/she access the favorites link from the homepage, chooses the Grande Vanilla Frappuccino,
and places the order just like the previous task. The user does not need to sign in since they are already
logged on from the previous task.
Moderate Task: The user is on the homepage and he/she needs to order a favorite drink and an extra
drink for a friend. The user gets onto the favorites list from the homepage, adds the drink to the cart,
but clicks the “order more” button before placing the order from the shopping cart. He/she then checks
out like the previous tasks.
This section will cover the overall results from both the Online Usability study and the Lab Experiment,
as well overview individual comments/suggestions from users. The data from the Online Usability study
will include raw data from the user log kept on the web, including completion time, completion
percentage, and errors per task. The data from the Lab Experiment will be more subjective given the
lack of statistical significance.
The overwhelming result from both studies was that the Map page is a major cause of errors and
slowdowns. Almost all of the participants made the error of trying to return to the homepage after
reaching the map, and/or had significant trouble clicking the buttons on the map. Based off this fact,
and suggestions from the users, we should make two changes to the Map page.
1. Make the buttons larger and easier to click
2. Add a descriptive title (for example, “Choose a Starbucks Location Near You”)
These changes are discussed in further detail in section VI. Recommendation for Design Changes.
We were able to get exactly ten participants for our online usability study. While ten is not sufficient for
proof of usability, or lack thereof, it is enough participants to gather useful metrics for performance,
specifically task completion percentage, time to complete tasks, and errors per task. The data from this
section is all gathered from the site log (Appendix A) and the online survey (Appendix B).
In terms of overall task completion our usability test was satisfactory. We encountered 1 participant
who was not able to complete Task 1, and as a result was not able to complete Tasks 2 or 3 either. In
addition, we had 1 participant who misread directions and made no attempt at Task 2, and another who
did the same for Task 3. In our data we marked not attempted tasks as so, and decided to throw them
out given that they reflect more on our inability to explain the test rather than a major usability flaw in
our design. Figure 4.a (below) shows the overall task completion percentage for each task.
Task Completion %
Task 2
Task 3
Task 1
Figure 4.a – Task completion percentage for each task
As can be seen from the graph, only Task 1 suffered an incompletion. However, we do not see this a
huge problem given that incompletion was a direct result of the previously discussed Map page issues.
Tasks 2 and 3 resulted in zero incompletes.
In terms of overall task completion time, our product did very well under tasks 2 and 3, however it
performed very poorly during Task 1. Figure 4.b (below) shows the average task completion time per
task. Again, not attempted tasks were thrown out, and incomplete tasks were marked as taking 15
minutes, an unacceptable length for any of our tasks.
Avg. Time to Complete Tasks
Time to complete (min:sec)
Task 1
Task 2
Task 3
Figure 4b. – Average time to complete each task
The respective times of 30 seconds and 1 minute and 20 seconds for Tasks 2 and 3 are very promising.
Task 2 involves ordering an item off of the participant’s favorites menu, far and away the most common
tasks according to our task analysis survey. Task 3 involves making that same order from the favorites
menu as well as ordering a second item for a friend. Although this was a relatively less common task, it is
still important that it be quick and easy.
The glaring issue with the average completion time was the 7 minutes and 10 seconds it took to
complete Task 1. These numbers are slightly skewed in that there were two participants who took over
12 minutes each (including the 15 minutes added for the participant who could not complete Task 1);
however the number is not acceptable even after adjusting for those two outliers. Despite the long time
to complete, our group remains optimistic due to the fact that the massive distribution of the slowdown
occurred primarily on the Map page and should be easily fixable based off the previously discussed
Possibly even more important than actual task completion time is the perceived task completion time of
the participant. If a first time customer feels they are taking a long time to complete a given task they
are likely to leave before they finish. However, if a user feels a task went quickly they might be
encouraged to return. Figure 2c (below) shows the average task completion time (from above) against
the participants’ average perceived task completion time (taken from the survey).
Avg. Task Completion Time - Actual
vs. Perceived
Time to Complete (min:sec)
Actual Time
Perceived Time
Task 1
Task 2
Task 3
Figure 2c – Average task completion time – actual vs. perceived
While the perceived times for Task 2 and 3 are relatively close to their actual times (Task 2 is skewed
because the lowest perceived time allowed was 1 minute), the perceived time was less than half of the
actual time for Task 1. It is possible this is simply because people have a harder time remembering
actual time on longer tasks, but it is also possible that the participants lost track of time due to the
immersive and/or fun qualities of our application. If the latter is indeed the case, it is a sign that
participants might not be quite as likely to steer away from Swickr as expected due to the long task
completion time.
Average errors per completed task is the last metric we will cover in this section. We decided not to look
at average errors per task, because the number of errors on an incomplete task became inflated as the
participant began clicking seemingly random links. Figure 2d (below) shows the average errors per
completed task for each task.
Avg. Errors Per Completed Task
Task 1
Task 2
Task 3
Figure 2d – Average errors per completed task
As expected, Task 1 yielded poor results with almost 2 errors per completed task. The vast majority of
those errors were backtracking from the map to the homepage as the participant thought they had
made a mistake when they got there. A simple title would have alleviated most of those mistakes. In
addition, it was difficult to click the map buttons, so even participants who did not think they had made
a mistake were forced to backtrack in order to try another solution.
Task 3, despite having a respectable task completion time and 100% task completion percentage, looks
to generate about 1 mistake for every two participants. However, this is misleading due to the fact that
four out of the five total mistakes made on Task 3 were caused by one participant. In addition, the
mistake was not a mistake with our usability but instead a failure to read directions by the participant.
The participant continually attempted to delete their item, a function not allowed by our prototype,
even though the directions told them to make the exact order they were seeing.
Task 2 had zero errors.
Comparison of Techniques
We found that the two different testing techniques yielded different types of information. With the
online survey users were able to gather a lot more quantitative information since we had a larger
participant pool and user logging. Online survey participants were also able to voice their feedback in
the survey and get their general feel for the interface. The results we gathered in the survey might have
been masked in the lab experiment. For example, even though users were able to complete certain
tasks they might not have liked the interface at all. In the lab experiment, we only observed the users
performing the tasks and did not ask them for their opinion on the interface.
With the lab experiment there were fewer participants so the feedback was less quantitative and more
qualitative. Instead of users telling us what went wrong, we were able to observe the issues as they
occurred. We had much more control over the experiment results because we could see exactly where
the users were clicking on the screen and where they were looking.
In general, the online survey technique we were able to get a feel for how users liked our interface. With
the lab experiment, we could see exactly where users made errors.
Recommendation for Design Changes
Our results show that users were spending too much time on the Map page (figure below) since it was
not very descriptive. After searching for a store, small icons were shown. We thought it would be
intuitive for the user to think that the small pointers are the stores, and that when clicking on the
pointer, you would pop out a cloud with the store to order from, but apparently was not the case. So we
decided to add a title to the page saying “Click on your nearest store” to display that the main purpose is
to lead the user to a closest store menu.
Participants complained that they had trouble clicking on the small pointers (see figure above) since
they were small in size, which shows that we should reinforce Fitt’s law by making the icons larger in
order for users to click on the icon faster and easier on the IPhone platform.
Users also had the problem of clicking on other IPhone functions when clicking on certain parts of the
screen, so we decided to provide more space between functions and buttons so that the user can
navigate easier.
Some of the pages took long when browsing and navigating from a page to another, so we decided that
less images and smaller file sizes would be more compatible with the edge wireless platforms currently
run on, in order to make navigation faster. Also, implementing Google Maps Native application can
provide more reliable and better loading maps when looking for a store to provide a more efficient use
of the application, given that the application is providing an efficient and faster way for the customer to
order products.
VII. Summary
Since our main goal is to provide the most efficient use of Swickr, the Survey gave us a better
understanding of how well of a design Swickr provides to its customers, as far as what we have worked
on till now. Surveying friends that owned IPhones remotely and random participants at the computer
lab showed some great feedback for our design. We nearly had a 100% success in completing tasks
within reasonable time, and got substantial feedback regarding negatives of the design, which in return
will be used to provide better changes. The tool our group designed to track performance remotely
provided us with accurate feedback on errors and times users went through, which came in handy to
compare the user’s feedback in our analysis.
VIII. Appendix
A. User data from log.txt
Time to Complete (min)
Task 1
Task 2
Task 3
07:28 NA
15:00 NA
00:18 NA
Std Dev:
Completion Percentage
Num Valid Tries
Num Completed
Task 1
Task 2 Task 3
90.0% 100.0% 100.0%
Total Errors on Completed Tasks
Participant Task 1
Task 2 Task 3
5 NA
8 Not Completed NA
0 NA
B. Survey Questions
Thanks for taking the time to take the Swickr Interactive usability test. During the test you will
be asked to complete three tasks. Here they are for your reference.
1. Order a grande vanilla frappuccino from Starbucks. Add that order to your favorites.
2. (You are now logged in) Make that same order from your favorites
3. Make that same order from your favorites and add a tall caramel frappuccino for your
First, follow this link on your iPhone.
After you are done please complete the following survey.
Thanks for your help. - The Swickr Team
What name did you use for the usability test?
How old are you?
Your sex:
How experienced are you with using an iPhone?
How often do you use an online application to order products?
How many minutes would you say it took you to complete each task?
What part of task 1 (ordering a frapp and signing up) did you find most difficult/confusing? Why?
What part of task 2 (ordering a frapp from your favs) did you find most difficult/confusing? Why?
What part of task 3 (ordering two drinks) did you find most difficult/confusing? Why?
What did you like most about our design?
What can we change about our design to provide a faster or more fun experience?
Is there any missing feature you would like to see? Please explain.
How did each task compare to doing them in person at a Starbucks location.
Do you have any additional comments?
C. Log.txt
A log was kept of each participant’s every move on our site. The following is the raw data collected from
the log, highlighted for easier reading.
The start of a task is highlighted in yellow
The end of a task is highlighted in green
An error is highlighted in red
Raw Data:
