Practicar Test Report Practicar Group 7 Camilla Andersson | Cecilia Kornehed | Jennifer Bodén Johanna Jansson | Julia Gilander | Mathilde Fiant 2014 - 01 - 18 1 Introduction 3 Executive summary 3 Methodology Sessions Participants Evaluation Tasks 4 4 5 5 Results Task completion Success Rate Task Ratings Clicks on Task Errors Summary of Data Overall Ratings Likes/dislikes, Participant Recommandations 6 6 7 8 10 12 13 14 Recommandations 15 Conclusion 15 References 16 Attachments A - Formative usability test questionnaires B - Screenshots from the prototype C - Participation report 17 17 18 19 2 Introduction Actia is a company who specialises in designing and manufacturing embedded systems in small to medium vehicles. They have developed a way of connecting a car with an application on either a phone or a tablet that enables you to control the car using this application. There have been previous applications, but those have been rather complicated and difficult to understand if you are not used to complex diagrams and details. Therefore, it has become necessary to design a new application that is as user friendly as possible so that more people can benefit from this invention. The test was conducted by all of the six members in the project group who all tested on two persons each. For the tests we used a digital T-prototype of the application which we presented on smartphones to make it look and feel as real as possible. We all made the tests on our own where we presented the test, gave tasks for the participants to complete and then observed and took notes of their comments and how well they completed their tasks. The participants were asked to “think aloud” to give us as much information of their impression as possible. Executive Summary The tests took place between 25 December 2013 to 15 January 2014 on many different locations, most of them in Sweden. The tests were conducted in order to test the usability of the design of the application as well as the selection of different functions included. In total 12 tests took place, each about 20 minutes long, with 12 different participants. All tests included the very same questions. Overall, all participants found the application easy to understand. 70% of the participants found the application easy to use. Even if something was not clear at first glance, it was easy to figure out and learn how it worked. 75% of the test persons would use the finished application. However, some minor problems were detected, including following: - Categories were not as logical as they could be - Some icons were proven hard to understand - The settings button was hard to locate This document contains the participant feedback, satisfactions ratings, task completion rates, ease or difficulty of completion ratings, time on task, errors, and recommendations for improvements. A copy of the questionnaire is included in the Attachments’ section. 3 Methodology Sessions Preparations Each member of the group recruited their individual participants for the user test. Because the product is made for a wide audience, it was important for the result of the test that we had a diverse group of participants. The observer of each test informed the concerned participants of the test and how it would be conducted. A date and time was set. Each individual test lasted for about 20 minutes. Prior to the user test, the participants were informed of the test logistics. They were specifically asked to think out loud so that the observer could follow their thoughts. The formative usability test can be divided into three parts. Background, tasks, and analysis. A high-fidelity prototype (Saffer, 2010) was used for our prototype testing. Part One During the first part of the test, participants were asked about their background. The questions covered topics like age, car knowledge and smartphone habits. Part Two This part consisted of 8 tasks that the participants were to perform. No guidance from the observer was allowed. After each task, the observer rated the success of the task on a scale of 0-2. The scale measures how well the participants succeeded with the task, 0 being not at all, 1 with difficulty, and 2 completed. Part Three When the second part of the test was done, the participant got to rate some parts of the product yes or no. The parts rated included: ● ● ● ● ● Easy to understand ● ● ● Overall impression Redundant icons Easy to find settings Useful product Logical categories In addition, the participants got to rate some parts of the product. This scale was between 1-5, from strongly disagree to strongly agree. We used this interval data (Tullis & Albert, 2013) later in the process to calculate statistics. The parts they rated included: Menubar The name of the product See attachment A for formative usability test questionnaires. 4 Participants As the course literature suggests, the prototype testing is most efficiently done when several of the developers are present, so that some take notes, some conducts the interviews and so forth (Saffer, 2010). But since our group would be separated when doing the testing, every group member did their testing alone. We had a diverse group of participants for our testing. Since ACTIA wishes that the application can be understood by “everyone”, our prototype had to be tested on as wide a group as possible. We still narrowed our participants slightly though, with the guidelines from our target group. A common theory is that five participants is enough to test a prototype, because it shows the probability of an issue being detected (Tullis & Albert, 2013). However, this theory has been criticised and in some ways proven wrong. Therefore our formative usability test has included a total of 12 participants to detect as many issues as possible. Our test participants were between the ages 19 and 55, only three participants did not have their own car at the moment, but all of them has a licence. The majority of the participants that owns a car uses it everyday, two participants uses it more scarcely on weekends and holidays. The most popular phone by the participants are iPhone and the test participants generally said they have a low knowledge of cars. The fact that we did our tests alone did not complicate the process so much for any of us. We could still observe and write down how the test participants acted when doing the tasks and hold out interviews. Evaluation Tasks/Scenarios When deciding the tasks for the test, the group made sure to include tasks that tested every part of the application, so that no page would be left out. At the testing scenarios, the test participants had the following tasks to solve: - Lock the car See the location of the car/find the car Change the PIN-code for the application Check the oil level Check the tire pressure Change temperature in the car Check if the windows are closed Turn on/off notifications for lights 5 Results Task Completion Success Rate Every group member recorded their participants ability to complete the task on a scale of 02. 0 meaning not completed, 1 completed with some difficulty and 2 completed. With this system the result could show both the success rate and how easy each task is, instead of the task only being completed or not completed as in a binary success rating (Tullis & Albert, 2013, s66) As it say in Measuring the user experience the binary way means to only blindly looking for success or fail, and our project required more information than that for the test to be useful. All participants successfully completed Task 1 (Lock the car), Task 2 (Locate your car), Task 4 (Check the oil level), Task 6 (Change the temperature) and Task 7(Check if the windows are closed). 91,7% of the participants were able to complete Task 3 (Change the PIN-code) and Task 8 (Turn on/off notifications for lights). The lowest percentages of success were for Task 5 (Check the tire pressure) where 83,4% of the participants completed the task. Task Completion Rates Participant Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task7 Task8 1 √ √ √ √ √ √ √ √ 2 √ √ √ √ √ √ √ √ 3 √ √ √ √ √ √ √ √ 4 √ √ √ √ √ √ √ √ 5 √ √ √ √ √ √ √ √ 6 √ √ √ √ - √ √ √ 7 √ √ √ √ √ √ √ √ 8 √ √ √ √ √ √ √ √ 9 √ √ √ √ √ √ √ √ 10 √ √ √ √ - √ √ √ 11 √ √ - √ √ √ √ - 12 √ √ √ √ √ √ √ √ Success 12 12 11 12 10 12 12 11 Completion Rates 100% 100% 91,7% 100% 83,4% 100% 100% 91,7% 6 Task Ratings All task were divided into three groups depending on how easy it was for the participant to complete the task. If it was completed it was viewed as easy, if it was completed with some difficulty it was not so easy and if not completed the task was viewed as hard for the participant. This grouping was also combined with each participants comments of the tasks during the test. Easy to use All the participants could easily find their way to and complete Task 6 (Change the temperature) and Task 7 (Check if the windows are closed). 83% could easily complete Task 1 (Lock the car) and 75% could complete Task 2 (Locate your car) without any trouble. The rest of the tasks did not have as satisfying results as hoped for. The aim were for a 70%, or higher, success rate on the tasks and even if the complete rate is above that percentage the easy to complete rate is not. Approximately 67% easily completed Task 8 (Turn off notifications for lights) and 8 of the remaining percent did not complete the task at all. Task 3 (Change PIN-code) was successfully completed by 50% of the participants, 42% had difficulties completing and 8% did not complete. The lowest count of easy completions were for Task 4 (Check oil level) and Task 5 (Check tire pressure) were only 42% easily completed the tasks. The others in Task 4 completed with some difficulty, but only 42% did 7 the same on Task 5, were 16% failed to complete the task at all. The Numbers were then combined to show the overall usability for the application, shown in the graph below. Usability of the whole application Clicks per Task During the test all of us took notes of how many clicks it took every participant to complete a task. How few clicks you could complete the task with depended on the different tasks and what way they choose to take. Our group determined the fastest ways by testing the prototype ourselves. The reason to choose counting the number of clicks instead of the time it takes to complete the task because the amount of clicks felt more relevant to our project. Time on task is described as a good way to see if a task is too complicated to be solved within a reasonable time. In this case our task are so simple that time seemed unnecessary to keep note of, but we used the assembling of our data in the same way, as described for time on task. (Tullis & Albert, 2013, s74-79) The results can be seen further down in a table, and a bar chart with a confidence interval of 90%, calculated using the information from Measuring the user experience (Tullis & Albert, 2013, s25). Task 1, Lock the car, where you use the application to remotely lock your car takes on click to complete the quickest way. This is when you understand that you can use the icons on the homepages, otherwise you need to add one click. None of the participants needed more than two clicks to complete this task. Task 2, Locate your car, where you locate your car using the application also only require one click if done as intended. In this case the numbers of clicks varied between one and four, where two clicks are needed when using the menu bar instead of the homepage. Task 3, Changes the PIN-code, require the most amount of clicks (four) to be completed. This task turned out to be a bit tricky and the numbers varied between four and ten among they who completed the task. 8 Task 4, Check the oil level. This task only require one click to take you to the right pages and then you just need to locate the right bar. However, surprisingly the number of clicks varied between one and six among the participants. Task 5, Check the tire pressure, requires as above only one click to be completed in the intended way. Shockingly the numbers of clicks to complete the task varied between one and five and two participants did not complete the task. Task 6, Change temperature, using the application you change the temperature in you car. The fewest clicks required to complete this task is two, and all the participants completed the task in only two clicks. Task 7, Check if the windows are closed, just as the task before all the participants completed this task with the lowest amount of clicks. This task were to check to see that the windows were closed and to complete this the intended way only one click was needed. Task 8, Turn on/off notifications for lights, refer to the funktion in the application that notifies you when your lights, in or outside your car, have been left on. It takes a minimum of three clicks to complete this task. The result showed that the number of clicks varied from three and five. The average of clicks it took to complete each task ended up quite close to the minimum. Only Task 3 and Task 4 got an average that differed a whole click or more from the minimum amount. Amount of click/participants to complete each task P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Average Minimum Task 1 1 1 2 1 1 1 2 1 1 1 1 1 1,16 1 Task 2 1 1 2 1 1 1 4 1 2 2 2 1 1,58 1 Task 3 10 4 6 4 4 4 5 5 7 4 - 5 5.27 4 Task 4 6 5 3 2 1 4 2 3 4 2 3 1 3 1 Task 5 1 1 1 1 1 - 5 1 1 5 1 1 1,73 1 Task 6 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Task 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Task 8 3 3 3 3 3 3 5 3 3 3 - 4 3,27 3 P participant - not completed 9 * The lines in the bars shows the confidence intervals, or margin of errors, for each task. Task 6 and Task 7 lacks this due to the fact that all participants had the same results and therefor there were no deviance in the data. Errors Naturally the tasks our test participants were asked to complete were completed with a number of errors. For us, the most important thing to measure was how many errors the participant made and also why these errors were made. This since our goal is to make the application as easy to understand as possible. The tasks with the lowest amount of errors was task six and seven, where all of the participants managed to complete them without any errors at all. The table below shows how errors in all the tasks were distributed among the participants. The two tasks with the highest amount of errors were task three and four. Task three was to change the PIN-Code and had a total of 14 errors made by all the participants together. Five of the participants managed to complete the task without any errors at all, one participant didn’t complete the task and the other six participants completed the task with some errors. Part of the problem with this task was caused by the prototype itself. In an iPhone, that is not updated to iOS7, there is a top menu with black background. This menu was partially blocking the settings icon, making some participants confused by the fact that they couldn’t find an icon they were looking for right away. Also, a few of the participants did not understand the sole icon for Settings. We based our icon on 10 how the “usual” settings icon looks on phones, though mostly based on the iPhone design. If a participant is not familiar with this symbol, it will be hard for them to understand the meaning of it. In task four the participants was asked to check the oil level in their car, this task had a total of 24 errors. Only two of the participants managed to complete the task without any errors. Mostly this amount of errors were due to the icon used for this category, “Dashboard”. The icon created were a speedometer, but the test participants expected all of these measurements to be under another category, since our icon was believed to have something to do with speed. Our original thought was that the speedometer should remind the user of the dashboard in the car, but it did not work out as we had expected. The participants named two main difficulties with the application, in the interviews following the tasks. They had difficulty understanding the speedometer icon for the Dashboard. It was supposed to create associations to the physical dashboard in the car. Although when testing the prototype it turned out that our test participants did not have the same associations as us, but thought that since it was a speedometer it had something to do with speed. As a result of this we decided to redesign the Dashboard icon to an oil can. Another difficulty that was observed was the understanding of the settings. Some of our participants did not understand the icon for settings, and another problem was that participants thought that each subpage, reached through the menu bar, had different settings. However, this problem is something we don’t wish to do something about since we believe that it is an uncommon error that not too many people will make. Also, many of our test participants have said that the applications was easy to learn, and that if you have pressed the icons once, you will most likely remember what functions lies under it. These two difficulties go along well with the result of our tests, were, as stated above, the tasks with most errors was finding the Settings and understanding the Dashboard. The two tasks with no errors at all was task six, where the participants needed to change the temperature, and task seven, where the participants should check if the windows are closed or open. The success in these tasks may be because these functions were placed under categories that was well associated with the functions. Also for the task with the windows, the test participants is asked to enter that page to check their tires in an earlier task. If the test participant noticed the windows function there, it would help them in this next task. 11 Number of errors made by each participant per task, and total amount of errors P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Total Task 1 0 0 1 0 0 0 1 0 0 0 0 0 2 Task 2 0 0 1 0 0 0 3 0 1 1 1 0 7 Task 3 6 0 2 0 0 0 1 1 3 0 - 1 14 Task 4 5 4 2 1 0 3 1 2 3 1 2 0 24 Task 5 0 0 0 0 0 - 4 0 0 4 0 0 8 Task 6 0 0 0 0 0 0 0 0 0 0 0 0 0 Task 7 0 0 0 0 0 0 0 0 0 0 0 0 0 Task 8 0 0 0 0 0 0 2 0 0 0 - 1 3 P participant - task not completed/unknow amount of errors Summary of Data The table below displays a summary of the test data. Low completion rates and usability ratings and high errors and clicks per tasks are highlighted in red. Summary of Completion, Errors, Clicks per Task, Mean Satisfaction Task Task Completion Errors Clicks per Task Usability* 1 12 2 14 4.6 2 12 7 19 4.5 3 11 14 58 3.8 4 12 24 36 3.8 5 10 8 19 3.5 6 12 0 24 5 7 12 0 12 5 8 11 3 36 4.1 *Usability = Mean combined number (on a scale 1-5) for level of completion for every participant on one task. 12 Overall Metrics Overall Ratings For a more synoptic view of the participants thoughts of the application, the participants were asked to rate four questions on a scale of 1-5 at the end of the session. - What do you think about the menu bar? - What Is your overall impression of the application? - What do you think about the name of the application? The 5-point rating scale ranged from 1 (Strongly disagree) to 5 (Strongly agree). An average agreement ratings of > 4.0 is considered as the user think it was good. See table below Menu bar This included having the menu bar visible on every page and being able to navigate between all pages, using the menu bar, at any time. The question was asked because the group wanted to know if the participants understood what it was supposed to be used and if they were able to understand it when it only consisted of icons. 83% of the participants thought that the menu bar was easy to use and understand (Average of rating = 4.3). Overall impression of the application Here we gave the participants an opportunity to rate the whole application instead of every little part to see how they felt about the structure. Apart from the rating the participants also commented on parts that they would want to change to make the application better. 50% of the participants rated the overall impression as good. Most of the other 50% rated it as a 3, the lowest rating was two (Average of rating = 3.7). Application name This question was asked to see if the name was understandable and if the participants thought it was a suiting name for the application. 92% Agreed on that the name was well suitable for this application. The lowest rating was a three. (Average of rating = 4.5) Bonus The participants was finally asked, without the scale(1-5), if they would use this application. 75% agreed on that they would use it, 8% of them with the comment that this is assuming the application is free of charge. 13 Post-Task Overall Questionnaire Strongly Disagree Disagree Neutral Agree Strongly Agree Mean Rating Percent Agree* What do you think about the menu bar? 1 1 3 7 4.3 83% What Is your overall impression of the application? 1 5 3 3 3.6 50% 1 3 6 4.5 75% 9 4 75% What do you think about the name of the application? If they would use this application 3 *Percent Agree (%) = Agree & Strongly Agree Responses combined Likes, Dislikes, Participant Recommendations Liked Most The function where the user is able to see the location of their car was appreciated among our test participants. They found this useful and easy to use. Liked Least The speedometer as an icon for the functions categorized under it, due to the fact that they associate that icon with information regarding speed. You can not see which tire the tire pressure meter refers to. Recommendations for Improvement Fluid levels could go under the car icon rather than under the speedometer. Most test participants did not make the connection to dashboard when seeing the speedometer so therefore they did not understand why the fluid levels should be there. It would be a good idea to have another way to visualize the tire pressure, than using one meter. The users need to be able to know which tire is in need of attention. 14 Recommendations These are some recommendations for how we after analyzing the test results would like to change our prototype. These changes will improve the overall impression and experience of the application and make it easier and more logical to use. We have rated the changes according how severe our tests have proven them to be. Change Justification ● Change the name of the category “Dashboard” to “Status”. ● Change the symbol from a speed indicator to an oil can. ● Add a small description text to the check tires function. Severity 58% of our participants had High difficulties finding where to check the oil status and 42% of the participants expressed that they did not look under the category ”dashboard” because the icon made it look like it only had to do with speed. The task to check the tires Medium had the lowest usability rate. The problem was not that the users did not find it, but that they did not really understand it. They tried to get more information by pressing the bar and asked questions if it meant that they were all right etc. Conclusion The majority of the participants found the application Practicar to be very clean and easy to use. Based on the test results some smaller things within the application will be changed to improve the usability even more. By keeping it uncluttered and having a menu bar for easy navigation we consider Actias requirement to make it as user friendly as possible acquired, this being the aim of the whole project. 15 References Saffer, D (2010). Designing for Interaction. New Riders. Berkeley, CA. Tullis, T. & Albert, B. (2013). Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics. Elsevier Inc. Waltham, MA. 16 Attachment A: Formative usability test questionnaires Tasks and questions, for formative usability test Pre questions Car Do you have a license? Do you own a car? What model/year? How often do you use it? Phone Do you have a smartphone? What kind? How often do you use it? Person How old are you? Do you have a good knowledge of cars? What is your job? Do you live in the city or in the countryside? Tasks Rate success: 2) completed, 1) with difficulty or 0) not at all. 1) 2) 3) 4) 5) 6) 7) 8) Try to lock your car Try to see the location of your car using this app Change your pin-code Check oil Check your tire-pressure Check temperature Check if windows are closed Turn off/on notifications for lights Analyzing Questions Do you find the application easy to understand? Do you find the icons easy to understand? Did you think it was easy to find the settings? Would you use it? Are the categories logical? First ask the question for comments, then ask on a scale 1-5 What Is your overall impression of the application? What do you think about the menu bar? What do you think about the name of the application? 17 Attachment B: Screenshots from prototype In the prototype we put the different functions in to four main categories. These are ”Status”, ”General”, ”Temperature” and ”Locations”. To make the application as easy to use as possible we made it very simple to navigate between all the different categories placing a static menu bar at the bottom of the page. screen which is also available in the menu bar. The point of this page is to give a quick overview and it is the only page you are able to see before you log in. In that way you can easily check if the car is locked, how far away it is, what temperature it is in the car or if you have enough fuel, but at the same time keep it safe. If you try to actually do anything, e.g, lock the car, a log in box will appear where you have to enter a four number PIN-code. The compass giving the shortcut to the temperature page. By pressing the padlock you are able to lock and unlock the car, but is not able to change or control anything in the car. Under ”Temperature” you simply change the temperature inside the car sliding - thing is open in the car, lock and unlock the car, open or close windows and see the tire pressure. The settings of the application is reached through the icon that is located in the top corner next to the connect the app to a second car. The green car show that everything is locked and closed. If e.g. a window is open, that part of the car will turn red. To change the temperature you drag the orange marker up or down. car”-function you get information about distance and direction and also see the cars location on a map, A quick overview of els in the car. - 17 18 Attachment C: Participation Report We have followed our activity plan fairly good, although this plan was made in the beginning of the project and there has been a few changes since then. Since Mathilde has been working on another location than the rest of us, we have divided the work so that she made the posters for the presentations while the rest of us worked on the group report so that we could sit together and discuss while we were writing. We also performed the tests a bit different than we had planned from the beginning when we had decided to do them together in two different shifts. Instead we did the tests by our selves during the holiday, since it would be easier to find participants from our target group that way. After the holidays we have had meetings where we discussed the test results, what changes we wish to make to the prototype, the presentation and how we should divide the rest of the work. Mathilde has been able to attend to these meetings by Skype. 19