Practicar Test Report

advertisement
Practicar
Test Report
Practicar
Group 7
Camilla Andersson | Cecilia Kornehed | Jennifer Bodén
Johanna Jansson | Julia Gilander | Mathilde Fiant
2014 - 01 - 18
1
Introduction
3
Executive summary
3
Methodology
Sessions
Participants
Evaluation Tasks
4
4
5
5
Results
Task completion Success Rate
Task Ratings
Clicks on Task
Errors
Summary of Data
Overall Ratings
Likes/dislikes, Participant Recommandations
6
6
7
8
10
12
13
14
Recommandations
15
Conclusion
15
References
16
Attachments
A - Formative usability test questionnaires
B - Screenshots from the prototype
C - Participation report
17
17
18
19
2
Introduction
Actia is a company who specialises in designing and manufacturing embedded systems in
small to medium vehicles. They have developed a way of connecting a car with an
application on either a phone or a tablet that enables you to control the car using this
application. There have been previous applications, but those have been rather complicated
and difficult to understand if you are not used to complex diagrams and details. Therefore,
it has become necessary to design a new application that is as user friendly as possible so
that more people can benefit from this invention.
The test was conducted by all of the six members in the project group who all tested on two
persons each. For the tests we used a digital T-prototype of the application which we
presented on smartphones to make it look and feel as real as possible. We all made the
tests on our own where we presented the test, gave tasks for the participants to complete
and then observed and took notes of their comments and how well they completed their
tasks. The participants were asked to “think aloud” to give us as much information of their
impression as possible.
Executive Summary
The tests took place between 25 December 2013 to 15 January 2014 on many different
locations, most of them in Sweden. The tests were conducted in order to test the usability
of the design of the application as well as the selection of different functions included.
In total 12 tests took place, each about 20 minutes long, with 12 different participants. All
tests included the very same questions.
Overall, all participants found the application easy to understand. 70% of the participants
found the application easy to use. Even if something was not clear at first glance, it was
easy to figure out and learn how it worked. 75% of the test persons would use the finished
application.
However, some minor problems were detected, including following:
- Categories were not as logical as they could be
- Some icons were proven hard to understand
- The settings button was hard to locate
This document contains the participant feedback, satisfactions ratings, task completion
rates, ease or difficulty of completion ratings, time on task, errors, and recommendations
for improvements. A copy of the questionnaire is included in the Attachments’ section.
3
Methodology
Sessions
Preparations
Each member of the group recruited their individual participants for the user test. Because
the product is made for a wide audience, it was important for the result of the test that we
had a diverse group of participants. The observer of each test informed the concerned
participants of the test and how it would be conducted. A date and time was set. Each
individual test lasted for about 20 minutes. Prior to the user test, the participants were
informed of the test logistics. They were specifically asked to think out loud so that the
observer could follow their thoughts. The formative usability test can be divided into three
parts. Background, tasks, and analysis. A high-fidelity prototype (Saffer, 2010) was used for
our prototype testing.
Part One
During the first part of the test, participants were asked about their background. The
questions covered topics like age, car knowledge and smartphone habits.
Part Two
This part consisted of 8 tasks that the participants were to perform. No guidance from the
observer was allowed. After each task, the observer rated the success of the task on a scale
of 0-2. The scale measures how well the participants succeeded with the task, 0 being not
at all, 1 with difficulty, and 2 completed.
Part Three
When the second part of the test was done, the participant got to rate some parts of the
product yes or no. The parts rated included:
●
●
●
●
●
Easy to understand
●
●
●
Overall impression
Redundant icons
Easy to find settings
Useful product
Logical categories
In addition, the participants got to rate some parts of the product. This scale was between
1-5, from strongly disagree to strongly agree. We used this interval data (Tullis & Albert,
2013) later in the process to calculate statistics. The parts they rated included:
Menubar
The name of the product
See attachment A for formative usability test questionnaires.
4
Participants
As the course literature suggests, the prototype testing is most efficiently done when
several of the developers are present, so that some take notes, some conducts the
interviews and so forth (Saffer, 2010). But since our group would be separated when doing
the testing, every group member did their testing alone. We had a diverse group of
participants for our testing. Since ACTIA wishes that the application can be understood by
“everyone”, our prototype had to be tested on as wide a group as possible. We still
narrowed our participants slightly though, with the guidelines from our target group.
A common theory is that five participants is enough to test a prototype, because it shows
the probability of an issue being detected (Tullis & Albert, 2013). However, this theory has
been criticised and in some ways proven wrong. Therefore our formative usability test has
included a total of 12 participants to detect as many issues as possible.
Our test participants were between the ages 19 and 55, only three participants did not have
their own car at the moment, but all of them has a licence. The majority of the participants
that owns a car uses it everyday, two participants uses it more scarcely on weekends and
holidays. The most popular phone by the participants are iPhone and the test participants
generally said they have a low knowledge of cars.
The fact that we did our tests alone did not complicate the process so much for any of us.
We could still observe and write down how the test participants acted when doing the tasks
and hold out interviews.
Evaluation Tasks/Scenarios
When deciding the tasks for the test, the group made sure to include tasks that tested
every part of the application, so that no page would be left out. At the testing scenarios, the
test participants had the following tasks to solve:
-
Lock the car
See the location of the car/find the car
Change the PIN-code for the application
Check the oil level
Check the tire pressure
Change temperature in the car
Check if the windows are closed
Turn on/off notifications for lights
5
Results
Task Completion Success Rate
Every group member recorded their participants ability to complete the task on a scale of 02. 0 meaning not completed, 1 completed with some difficulty and 2 completed. With this
system the result could show both the success rate and how easy each task is, instead of
the task only being completed or not completed as in a binary success rating (Tullis &
Albert, 2013, s66) As it say in Measuring the user experience the binary way means to only
blindly looking for success or fail, and our project required more information than that for
the test to be useful.
All participants successfully completed Task 1 (Lock the car), Task 2 (Locate your car), Task
4 (Check the oil level), Task 6 (Change the temperature) and Task 7(Check if the windows
are closed). 91,7% of the participants were able to complete Task 3 (Change the PIN-code)
and Task 8 (Turn on/off notifications for lights). The lowest percentages of success were for
Task 5 (Check the tire pressure) where 83,4% of the participants completed the task.
Task Completion Rates
Participant
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Task7
Task8
1
√
√
√
√
√
√
√
√
2
√
√
√
√
√
√
√
√
3
√
√
√
√
√
√
√
√
4
√
√
√
√
√
√
√
√
5
√
√
√
√
√
√
√
√
6
√
√
√
√
-
√
√
√
7
√
√
√
√
√
√
√
√
8
√
√
√
√
√
√
√
√
9
√
√
√
√
√
√
√
√
10
√
√
√
√
-
√
√
√
11
√
√
-
√
√
√
√
-
12
√
√
√
√
√
√
√
√
Success
12
12
11
12
10
12
12
11
Completion Rates
100%
100%
91,7%
100%
83,4%
100%
100%
91,7%
6
Task Ratings
All task were divided into three groups depending on how easy it was for the participant to
complete the task. If it was completed it was viewed as easy, if it was completed with some
difficulty it was not so easy and if not completed the task was viewed as hard for the
participant. This grouping was also combined with each participants comments of the tasks
during the test.
Easy to use
All the participants could easily find their way to and complete Task 6 (Change the
temperature) and Task 7 (Check if the windows are closed). 83% could easily complete
Task 1 (Lock the car) and 75% could complete Task 2 (Locate your car) without any
trouble. The rest of the tasks did not have as satisfying results as hoped for. The aim were
for a 70%, or higher, success rate on the tasks and even if the complete rate is above that
percentage the easy to complete rate is not. Approximately 67% easily completed Task 8
(Turn off notifications for lights) and 8 of the remaining percent did not complete the task at
all. Task 3 (Change PIN-code) was successfully completed by 50% of the participants, 42%
had difficulties completing and 8% did not complete. The lowest count of easy completions
were for Task 4 (Check oil level) and Task 5 (Check tire pressure) were only 42% easily
completed the tasks. The others in Task 4 completed with some difficulty, but only 42% did
7
the same on Task 5, were 16% failed to complete the task at all. The Numbers were then
combined to show the overall usability for the application, shown in the graph below.
Usability of the whole application
Clicks per Task
During the test all of us took notes of how many clicks it took every participant to complete
a task. How few clicks you could complete the task with depended on the different tasks and
what way they choose to take. Our group determined the fastest ways by testing the
prototype ourselves. The reason to choose counting the number of clicks instead of the time
it takes to complete the task because the amount of clicks felt more relevant to our project.
Time on task is described as a good way to see if a task is too complicated to be solved
within a reasonable time. In this case our task are so simple that time seemed unnecessary
to keep note of, but we used the assembling of our data in the same way, as described for
time on task. (Tullis & Albert, 2013, s74-79) The results can be seen further down in a
table, and a bar chart with a confidence interval of 90%, calculated using the information
from Measuring the user experience (Tullis & Albert, 2013, s25).
Task 1, Lock the car, where you use the application to remotely lock your car takes on click
to complete the quickest way. This is when you understand that you can use the icons on
the homepages, otherwise you need to add one click. None of the participants needed more
than two clicks to complete this task.
Task 2, Locate your car, where you locate your car using the application also only require
one click if done as intended. In this case the numbers of clicks varied between one and
four, where two clicks are needed when using the menu bar instead of the homepage.
Task 3, Changes the PIN-code, require the most amount of clicks (four) to be completed.
This task turned out to be a bit tricky and the numbers varied between four and ten among
they who completed the task.
8
Task 4, Check the oil level. This task only require one click to take you to the right pages
and then you just need to locate the right bar. However, surprisingly the number of clicks
varied between one and six among the participants.
Task 5, Check the tire pressure, requires as above only one click to be completed in the
intended way. Shockingly the numbers of clicks to complete the task varied between one
and five and two participants did not complete the task.
Task 6, Change temperature, using the application you change the temperature in you car.
The fewest clicks required to complete this task is two, and all the participants completed
the task in only two clicks.
Task 7, Check if the windows are closed, just as the task before all the participants
completed this task with the lowest amount of clicks. This task were to check to see that the
windows were closed and to complete this the intended way only one click was needed.
Task 8, Turn on/off notifications for lights, refer to the funktion in the application that
notifies you when your lights, in or outside your car, have been left on. It takes a minimum
of three clicks to complete this task. The result showed that the number of clicks varied
from three and five.
The average of clicks it took to complete each task ended up quite close to the minimum.
Only Task 3 and Task 4 got an average that differed a whole click or more from the
minimum amount.
Amount of click/participants to complete each task
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
P12
Average
Minimum
Task 1
1
1
2
1
1
1
2
1
1
1
1
1
1,16
1
Task 2
1
1
2
1
1
1
4
1
2
2
2
1
1,58
1
Task 3
10
4
6
4
4
4
5
5
7
4
-
5
5.27
4
Task 4
6
5
3
2
1
4
2
3
4
2
3
1
3
1
Task 5
1
1
1
1
1
-
5
1
1
5
1
1
1,73
1
Task 6
2
2
2
2
2
2
2
2
2
2
2
2
2
2
Task 7
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Task 8
3
3
3
3
3
3
5
3
3
3
-
4
3,27
3
P participant
- not completed
9
* The lines in the bars shows the confidence intervals, or margin of errors, for each task. Task 6 and
Task 7 lacks this due to the fact that all participants had the same results and therefor there were no
deviance in the data.
Errors
Naturally the tasks our test participants were asked to complete were completed with a
number of errors. For us, the most important thing to measure was how many errors the
participant made and also why these errors were made. This since our goal is to make the
application as easy to understand as possible.
The tasks with the lowest amount of errors was task six and seven, where all of the
participants managed to complete them without any errors at all. The table below shows
how errors in all the tasks were distributed among the participants. The two tasks with the
highest amount of errors were task three and four.
Task three was to change the PIN-Code and had a total of 14 errors made by all the
participants together. Five of the participants managed to complete the task without any
errors at all, one participant didn’t complete the task and the other six participants
completed the task with some errors. Part of the problem with this task was caused by the
prototype itself. In an iPhone, that is not updated to iOS7, there is a top menu with black
background. This menu was partially blocking the settings icon, making some participants
confused by the fact that they couldn’t find an icon they were looking for right away. Also, a
few of the participants did not understand the sole icon for Settings. We based our icon on
10
how the “usual” settings icon looks on phones, though mostly based on the iPhone design.
If a participant is not familiar with this symbol, it will be hard for them to understand the
meaning of it.
In task four the participants was asked to check the oil level in their car, this task had a
total of 24 errors. Only two of the participants managed to complete the task without any
errors. Mostly this amount of errors were due to the icon used for this category,
“Dashboard”. The icon created were a speedometer, but the test participants expected all of
these measurements to be under another category, since our icon was believed to have
something to do with speed. Our original thought was that the speedometer should remind
the user of the dashboard in the car, but it did not work out as we had expected.
The participants named two main difficulties with the application, in the interviews following
the tasks. They had difficulty understanding the speedometer icon for the Dashboard. It was
supposed to create associations to the physical dashboard in the car. Although when testing
the prototype it turned out that our test participants did not have the same associations as
us, but thought that since it was a speedometer it had something to do with speed. As a
result of this we decided to redesign the Dashboard icon to an oil can.
Another difficulty that was observed was the understanding of the settings. Some of our
participants did not understand the icon for settings, and another problem was that
participants thought that each subpage, reached through the menu bar, had different
settings. However, this problem is something we don’t wish to do something about since we
believe that it is an uncommon error that not too many people will make. Also, many of our
test participants have said that the applications was easy to learn, and that if you have
pressed the icons once, you will most likely remember what functions lies under it.
These two difficulties go along well with the result of our tests, were, as stated above, the
tasks with most errors was finding the Settings and understanding the Dashboard.
The two tasks with no errors at all was task six, where the participants needed to change
the temperature, and task seven, where the participants should check if the windows are
closed or open. The success in these tasks may be because these functions were placed
under categories that was well associated with the functions. Also for the task with the
windows, the test participants is asked to enter that page to check their tires in an earlier
task. If the test participant noticed the windows function there, it would help them in this
next task.
11
Number of errors made by each participant per task, and total amount of errors
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
P12
Total
Task 1
0
0
1
0
0
0
1
0
0
0
0
0
2
Task 2
0
0
1
0
0
0
3
0
1
1
1
0
7
Task 3
6
0
2
0
0
0
1
1
3
0
-
1
14
Task 4
5
4
2
1
0
3
1
2
3
1
2
0
24
Task 5
0
0
0
0
0
-
4
0
0
4
0
0
8
Task 6
0
0
0
0
0
0
0
0
0
0
0
0
0
Task 7
0
0
0
0
0
0
0
0
0
0
0
0
0
Task 8
0
0
0
0
0
0
2
0
0
0
-
1
3
P participant
- task not completed/unknow amount of errors
Summary of Data
The table below displays a summary of the test data. Low completion rates and usability
ratings and high errors and clicks per tasks are highlighted in red.
Summary of Completion, Errors, Clicks per Task, Mean Satisfaction
Task
Task Completion
Errors
Clicks per Task
Usability*
1
12
2
14
4.6
2
12
7
19
4.5
3
11
14
58
3.8
4
12
24
36
3.8
5
10
8
19
3.5
6
12
0
24
5
7
12
0
12
5
8
11
3
36
4.1
*Usability = Mean combined number (on a scale 1-5) for level of completion for every participant on one task.
12
Overall Metrics
Overall Ratings
For a more synoptic view of the participants thoughts of the application, the participants
were asked to rate four questions on a scale of 1-5 at the end of the session.
- What do you think about the menu bar?
- What Is your overall impression of the application?
- What do you think about the name of the application?
The 5-point rating scale ranged from 1 (Strongly disagree) to 5 (Strongly agree). An
average agreement ratings of > 4.0 is considered as the user think it was good. See table
below
Menu bar
This included having the menu bar visible on every page and being able to navigate
between all pages, using the menu bar, at any time. The question was asked because the
group wanted to know if the participants understood what it was supposed to be used and if
they were able to understand it when it only consisted of icons. 83% of the participants
thought that the menu bar was easy to use and understand (Average of rating = 4.3).
Overall impression of the application
Here we gave the participants an opportunity to rate the whole application instead of every
little part to see how they felt about the structure. Apart from the rating the participants
also commented on parts that they would want to change to make the application better.
50% of the participants rated the overall impression as good. Most of the other 50% rated it
as a 3, the lowest rating was two (Average of rating = 3.7).
Application name
This question was asked to see if the name was understandable and if the participants
thought it was a suiting name for the application. 92% Agreed on that the name was well
suitable for this application. The lowest rating was a three. (Average of rating = 4.5)
Bonus
The participants was finally asked, without the scale(1-5), if they would use this application.
75% agreed on that they would use it, 8% of them with the comment that this is assuming
the application is free of charge.
13
Post-Task Overall Questionnaire
Strongly
Disagree
Disagree
Neutral
Agree
Strongly
Agree
Mean Rating
Percent
Agree*
What do you
think about the
menu bar?
1
1
3
7
4.3
83%
What Is your
overall
impression of
the application?
1
5
3
3
3.6
50%
1
3
6
4.5
75%
9
4
75%
What do you
think about the
name of the
application?
If they would
use this
application
3
*Percent Agree (%) = Agree & Strongly Agree Responses combined
Likes, Dislikes, Participant Recommendations
Liked Most
The function where the user is able to see the location of their car was appreciated among
our test participants. They found this useful and easy to use.
Liked Least
The speedometer as an icon for the functions categorized under it, due to the fact that they
associate that icon with information regarding speed.
You can not see which tire the tire pressure meter refers to.
Recommendations for Improvement
Fluid levels could go under the car icon rather than under the speedometer. Most test
participants did not make the connection to dashboard when seeing the speedometer so
therefore they did not understand why the fluid levels should be there.
It would be a good idea to have another way to visualize the tire pressure, than using one
meter. The users need to be able to know which tire is in need of attention.
14
Recommendations
These are some recommendations for how we after analyzing the test results would like to
change our prototype. These changes will improve the overall impression and experience of
the application and make it easier and more logical to use. We have rated the changes
according how severe our tests have proven them to be.
Change
Justification
●
Change the name of
the category
“Dashboard” to
“Status”.
●
Change the symbol
from a speed
indicator to an oil
can.
●
Add a small
description text to
the check tires
function.
Severity
58% of our participants had
High
difficulties finding where to
check the oil status and 42%
of the participants expressed
that they did not look under
the category ”dashboard”
because the icon made it
look like it only had to do
with speed.
The task to check the tires
Medium
had the lowest usability rate.
The problem was not that
the users did not find it, but
that they did not really
understand it. They tried to
get more information by
pressing the bar and asked
questions if it meant that
they were all right etc.
Conclusion
The majority of the participants found the application Practicar to be very clean and easy to
use. Based on the test results some smaller things within the application will be changed to
improve the usability even more. By keeping it uncluttered and having a menu bar for easy
navigation we consider Actias requirement to make it as user friendly as possible acquired,
this being the aim of the whole project.
15
References
Saffer, D (2010). Designing for Interaction. New Riders. Berkeley, CA.
Tullis, T. & Albert, B. (2013). Measuring the User Experience: Collecting, Analyzing, and
Presenting Usability Metrics. Elsevier Inc. Waltham, MA.
16
Attachment A: Formative usability test questionnaires
Tasks and questions, for formative usability test
Pre questions
Car
Do you have a license?
Do you own a car?
What model/year?
How often do you use it?
Phone
Do you have a smartphone?
What kind?
How often do you use it?
Person
How old are you?
Do you have a good knowledge of cars?
What is your job?
Do you live in the city or in the countryside?
Tasks
Rate success: 2) completed, 1) with difficulty or 0) not at all.
1)
2)
3)
4)
5)
6)
7)
8)
Try to lock your car
Try to see the location of your car using this app
Change your pin-code
Check oil
Check your tire-pressure
Check temperature
Check if windows are closed
Turn off/on notifications for lights
Analyzing Questions
Do you find the application easy to understand?
Do you find the icons easy to understand?
Did you think it was easy to find the settings?
Would you use it?
Are the categories logical?
First ask the question for comments, then ask on a scale 1-5
What Is your overall impression of the application?
What do you think about the menu bar?
What do you think about the name of the application?
17
Attachment B: Screenshots from prototype
In the prototype we put the different functions in to four main categories. These are ”Status”, ”General”, ”Temperature” and ”Locations”.
To make the application as easy to use as possible we made it very
simple to navigate between all the different categories placing a static
menu bar at the bottom of the page.
screen which is also available in the menu bar. The point of this page
is to give a quick overview and it is the only page you are able to
see before you log in. In that way you can easily check if the car is
locked, how far away it is, what temperature it is in the car or if you
have enough fuel, but at the same time keep it safe. If you try to
actually do anything, e.g, lock the car, a log in box will appear where
you have to enter a four number PIN-code. The compass giving the
shortcut to the temperature page. By pressing the padlock you are
able to lock and unlock the car,
but is not able to change or control anything in the car. Under ”Temperature” you simply change the temperature inside the car sliding
-
thing is open in the car, lock and unlock the car, open or close windows and see the tire pressure.
The settings of the application is reached through the icon that is located in the top corner next to the
connect the app to a second car.
The green car show
that everything is
locked and closed. If
e.g. a window is open,
that part of the car
will turn red.
To change the temperature you drag the
orange marker up or
down.
car”-function you get
information about
distance and direction
and also see the cars
location on a map,
A quick overview of
els in the car.
-
17
18
Attachment C: Participation Report
We have followed our activity plan fairly good, although this plan was made in the beginning
of the project and there has been a few changes since then. Since Mathilde has been
working on another location than the rest of us, we have divided the work so that she made
the posters for the presentations while the rest of us worked on the group report so that we
could sit together and discuss while we were writing. We also performed the tests a bit
different than we had planned from the beginning when we had decided to do them
together in two different shifts. Instead we did the tests by our selves during the holiday,
since it would be easier to find participants from our target group that way. After the
holidays we have had meetings where we discussed the test results, what changes we wish
to make to the prototype, the presentation and how we should divide the rest of the work.
Mathilde has been able to attend to these meetings by Skype.
19
Download