AMERICAN METEOROLOGICAL SOCIETY Bulletin of the American Meteorological Society EARLY ONLINE RELEASE This is a preliminary PDF of the author-produced manuscript that has been peer-reviewed and accepted for publication. Since it is being posted so soon after acceptance, it has not yet been copyedited, formatted, or processed by AMS Publications. This preliminary version of the manuscript may be downloaded, distributed, and cited, but please be aware that there will be visual differences and possibly some content differences between this version and the final published version. The DOI for this manuscript is doi: 10.1175/BAMS-D-14-00121.1 The final published version of this manuscript will replace the preliminary version at the above DOI once it is available. If you would like to cite this EOR in a separate work, please use the following full citation: Oakley, N., and B. Daudert, 2015: Establishing best practices to improve usefulness and usability of web interfaces providing atmospheric data. Bull. Amer. Meteor. Soc. doi:10.1175/BAMS-D-14-00121.1, in press. © 2015 American Meteorological Society Manuscript (non-LaTeX) Click here to download Manuscript (non-LaTeX): SCENIC_BAMS_manuscript_edits.docx 1 2 Establishing best practices to improve usefulness and usability of 3 web interfaces providing atmospheric data 4 5 Nina S. Oakley1 6 Britta Daudert 7 8 9 1 Western Regional Climate Center 10 Desert Research Institute 11 2215 Raggio Parkway, Reno, NV, 89512 12 nina.oakley@dri.edu 13 14 Submitted to: 15 16 Bulletin of the American Meteorological Society 17 5 November 2014 18 19 1 20 Tagline/Capsule: Addressing usability when developing a web portal for data access is 21 relatively inexpensive, increases a site’s use and user satisfaction, and reflects positively 22 on the organization hosting the site. 23 24 Abstract 25 Accessing scientific data and information through an online portal can be a frustrating 26 task, often due to the fact that they were not built with the user’s needs in mind. The 27 concept of making web interfaces easy to use, known as “usability,” has been thoroughly 28 researched in the field of e-commerce but has not been explicitly addressed in the 29 atmospheric and most other sciences. As more observation stations are installed, satellites 30 flown, models run, and field campaigns performed, data are continuously produced. 31 Portals on the Internet have become the favored mechanisms to share this information 32 and are ever increasing in number. Portals are often created without being explicitly 33 tested for usability with the target audience though the expenses of testing are low and 34 the returns high. To remain competitive and relevant in the provision of atmospheric 35 information, it is imperative that developers understand design elements of a successful 36 portal to make their product stand out among others. This work informs the audience of 37 the benefits and basic principles of usability that can be applied to web pages presenting 38 atmospheric information. We will also share some of the best practices and 39 recommendations we have formulated from the results of usability testing performed on a 40 data provision site designed for researchers in the Southwest Climate Science Center and 41 hosted by the Western Regional Climate Center. 42 2 43 44 Introduction 45 Atmospheric data and information (hereafter referred to as “information”) are 46 becoming increasingly important to a wide range of users outside of the atmospheric 47 science discipline. These include other scientists (hydrologists, social scientists, 48 ecologists), resource managers, public health officials, farmers, and others (National 49 Research Council 2010; Overpeck et al. 2011). As a result, providers of atmospheric 50 information have a growing obligation to not only provide information (access is usually 51 taxpayer funded), but also make the information easily digestible by the various members 52 of a broadening audience (Brugger and Crimmins 2011; Overpeck et al. 2011; Rood and 53 Edwards 2014). A site developed without the user in mind may prove frustrating or 54 challenging to use (Krug 2005). Assessment of a site’s usability, [the extent to which the 55 site can be used to achieve goals with effectiveness, efficiency, and satisfaction (ISO 56 1998)] is a cost-effective way to ensure users can fluidly accomplish intended tasks on a 57 site. To summarize usability as it applies to web design today, Dumas and Redish (1999) 58 offer four points: (1) Usability means focusing on users [as opposed to 59 developer/designer needs]; (2) people use products to be productive; (3) users are busy 60 people trying to accomplish tasks; and (4) users decide whether a product is easy to use. 61 Building and testing a usable site requires employing these principles as well as 62 following general guidelines that have become standard in the field of usability and 63 Human Computer Interaction (HCI). Much of the literature assessed in this paper is 64 focused on usability in the practical sense; we will leave the theory of HCI to others 3 65 (Preece et al. 1994; Dix 2009; for example). This work also does not approach the topics 66 of accessibility and responsive design. 67 Here we take a “small shop” approach to web development. In this case, a 68 research scientist, data analyst or programmer with no formal training in web design must 69 develop a web site to provide atmospheric information. This person generally has some 70 support from his or her research group, but does not have a web development team to 71 work with and must try and apply principles of usability with limited resources. Though 72 this situation is not representative of all groups in the atmospheric sciences, it is the group 73 that is likely the most challenged to build a usable site. We present the results of usability 74 testing performed on a website providing climate information hosted by the Western 75 Regional Climate Center (WRCC). Additionally, we outline general usability guidelines 76 that are applicable to pages providing atmospheric information and explain how our test 77 participants perceive and search for climate data. Though this test focuses on station- 78 based and gridded climate data for the elements temperature and precipitation, the results 79 of the testing and general guidelines provided are easily applicable to other types of 80 atmospheric data. 81 82 83 Why is usability important when providing atmospheric data? People are very goal-driven when they access a website. A usable site design will 84 “get out of the way” and allow people to successfully accomplish their goals in a 85 reasonable amount of time (Nielsen 2000a). Krug (2005) defines a “reservoir of 86 goodwill” that users have when entering a site. Each problem or challenge in using the 87 site lowers the reservoir until it is exhausted and the user leaves the site altogether. It is 4 88 important to note that each user’s “goodwill reservoir” is unique and situational; some 89 users are by nature more patient than others and have a larger reserve. Some may have a 90 predetermined opinion about an organization that influences the experience they will 91 have on a site. Nielsen (2011) observes that people are likely to leave a site within the 92 first 10-20 seconds if they do not see what they are looking for or become confused. If a 93 site can convince a user the material presented is valuable and persuade the user to stay 94 beyond the 20-second threshold, they are likely to remain on the page for a longer period 95 of time. If principles of usability are not addressed, page visitors are likely to find, or at 96 least search for, another site that makes the information they want easier to access 97 (Nielsen 2000a). Additionally, a successful experience on a web site makes people likely 98 to return. In economic terms, loyal users tend to spend considerably more money on a site 99 than a first time user (Nielsen 1997). In atmospheric science, demonstration of a loyal 100 website following can indicate to supporting agencies that the site provides information 101 that is useful to stakeholders, which may help to secure future resources. Furthermore, 102 having a usable site can make your organization stand out among others. 103 Compared with other parts of scientific research and data production, usability 104 testing is relatively cheap and very effective. Nielsen (2000b) suggests that performing 105 usability testing with five users per round of testing will uncover approximately 80% of 106 the problems on a website. The usability tests themselves are typically an hour in length 107 and the equipment can often be located within a research institution keeping technology 108 costs to a minimum. 109 Another benefit of usability testing is the opportunity to learn about the culture of 110 your intended data users. By watching the target audience for your site perform usability 5 111 tests you will observe some of their rules, habits, behaviors, values, beliefs, and attitudes 112 (Spillers 2009). This information can then be applied to future products generated by 113 your research group or organization. 114 115 Usability Testing 116 The usability of a site is typically evaluated through a formal process called 117 usability testing (Nielsen 2000b; Krug 2005; Usability.gov 2014). During a usability test, 118 participants that have been chosen based on some criteria (e.g., users of climate data) are 119 asked to perform specified tasks with little guidance under controlled conditions while 120 one or more facilitators observe. Tests are often recorded for later viewing and analysis. 121 To obtain the skills necessary to perform usability testing, the authors attended a 122 workshop hosted by usability consultants Nielsen-Norman Group1. The workshop 123 instructed on the basics of creating a usable website, developing a space for usability 124 testing, facilitating the testing to achieve meaningful results, interpreting test results, and 125 incorporating results into site design. Besides attending a workshop, there are many texts 126 and online resources that can provide support on how to conduct usability testing (Krug 127 2005; Krug 2009; Usability.gov 2014; for example). 128 129 Site Tested: SCENIC 130 To investigate how users interact with weather and climate data, we tested a 131 website under development entitled SCENIC2 (Southwest Climate and ENvironmental 132 Information Collaborative, Figure 1). SCENIC is designed to serve scientists working for 1 2 http://www.nngroup.com http://wrcc.dri.edu/csc/scenic/ 6 133 the Department of Interior Southwest Climate Science Center (SW-CSC) and other such 134 Climate Science Centers. These scientists typically work in the fields of ecology, 135 hydrology, forestry, or resource management. SCENIC acts as an interface to the Applied 136 Climate Information System (ACIS) database, which contains daily climate data for the 137 United States from many networks. Resources available in SCENIC are focused on the 138 Southwest US though data are available for locations throughout the nation. SCENIC has 139 a wide variety of data acquisition and analysis tools for both gridded and station-based 140 data. 141 142 Creating a Usability Lab 143 A formal usability test should take place in a usability lab. These labs may be 144 extremely complex, such that the test takes place in an isolated room while the design 145 team watches remotely via closed-circuit television. Eye or mouse tracking and screen 146 recording software may be utilized as well. We opted for a simple lab and used a small 147 conference room for a quiet space, a computer and full-size screen, mouse, and keyboard. 148 We used Camtasia3 screen recording software and a microphone to record the screen 149 movements and verbalizations of the study participants. For comfort and ease of use, test 150 subjects were able to work on either a Mac or Windows operating system with the web 151 browser of their choice. Though it is unnatural for a person to be working on a computer 152 while being observed, the goal is to make them as comfortable as possible during the test 153 so they will act as they normally would when using a website and provide realistic 154 feedback on the site’s usability. 3 By TechSmith, http://www.techsmith.com/camtasia.html 7 155 During usability testing, a facilitator is used to help guide the participant through 156 each of the tasks. The facilitator does not answer questions about the site or guide the 157 participant in any way; they serve to prompt the participant to verbalize their thought 158 process as they work through each task. The facilitator takes detailed notes as subjects 159 complete the tasks in each section. The notes as well as video recordings are later 160 reviewed to assess functions of the site that exhibit or lack usability. Clips created from 161 the video taken during this testing can be viewed online. 4 162 163 Selecting and recruiting test participants 164 In general, the usability literature suggests that five users will uncover most of the 165 usability issues in a site (Virzi 1992; Nielsen 2000b; Krug 2005; Usability.gov 2014). 166 However, Faulkner (2003) points out that it is not all cases where five users will uncover 167 a majority of the issues. Faulkner’s work states that ten users will uncover 82% of 168 usability problems at a minimum and 95% of problems on average. We chose to test five 169 users in each of two rounds to ensure that we uncover a large majority of usability issues 170 by all approaches in the aforementioned literature. Usability literature does not 171 recommend any specific number of iterations of testing, though a new round of testing is 172 suggested after major updates to a site (Nielsen 2000b; Krug 2009). We first performed 173 an experimental round of testing utilizing five graduate students in natural resources, 174 hydrology, and geography from the University of Nevada, Reno (UNR). This allowed us 175 to develop our facilitation methods, work out any recording software issues, and refine 176 our general questions about climate data before utilizing professionals. The results of this 4 http://www.dri.edu/scenic-usability-research 8 177 preliminary testing are not included here except for reporting on the card sorting activity, 178 for which 15 participants are recommended to achieve meaningful results (Nielsen 2004; 179 Tullis and Wood 2004; Usability.gov 2014). 180 Usability testing yields the best results when the subjects chosen represent the 181 target user group (Nielsen 2003; Krug 2005; Usability.gov 2014). As SCENIC is 182 intended to serve SW-CSC scientists, we sought out people working in resource 183 management and ecology who utilize climate data in their work to participate in the 184 study. Our group of participants came from private, state and federal agencies including 185 independent consulting firms, Bureau of Land Management, Nevada Department of 186 Wildlife, Great Basin Landscape Conservation Cooperative, SW-CSC, UNR, and the 187 Desert Research Institute. 188 All participants were informed that they were being recorded and gave their 189 verbal consent to participate in the study. Where regulations allowed, users were 190 compensated for their time with a gift card as suggested in the literature to improve the 191 quality of the participant’s involvement in the testing (Nielsen 2003; Krug 2005; 192 Usability.gov 2014). Providing an incentive for participation helps to ensure users are 193 motivated to perform the tasks. 194 195 196 Designing test questions We used both qualitative and quantitative techniques to assess participants’ ability 197 to use SCENIC in a fluid manner. Each test was comprised of three portions: a set of 198 three tasks to complete on the website, a standardized usability test, and a set of questions 9 199 relating to the general use of climate data that were not specific to the site tested (see 200 Table 1). 201 We devised three tasks we expected target users to be able to perform on 202 SCENIC, as Nielsen (2000a) recommends designing a site around the top three reasons a 203 user would visit a site. The tasks were based on the common types of questions asked of 204 WRCC’s service climatologists by members of the target audience. The tasks are given in 205 Table 1. Each task utilizes different capabilities on the site and is achieved through a 206 different set of steps to provide breadth in covering the site’s usability issues. The 207 assessments of this portion of the test were qualitative and involved the fluidity with 208 which the user was able to complete the task and their commentary about the site as they 209 used it. 210 The System Usability Scale (SUS) is a widely used and reliable tool for 211 measuring the ease of use of a product. It produces valid results on small sample sizes 212 making it an applicable quantitative tool for this usability evaluation (Brooke 1986; 213 Bangor 2009). The SUS test should be administered immediately after the web-based 214 tasks and before any post-test discussion takes place. SUS is a 10-item questionnaire with 215 five response options for respondents presented in a Likert-type scale from strongly agree 216 to strongly disagree. The SUS test is summarized in Table 1. Half of the questions (1, 3, 217 5, 7, 9) are phrased such that they describe the site being evaluated in a positive way 218 while the other half (2, 4, 6, 8, 10) portray the site negatively. This design prevents biased 219 responses caused by testers choosing an answer without having to consider each 220 statement (Brooke 2013). An SUS questionnaire is scored by doing the following: 221 Each question is valued between one and five points 10 222 For odd items, subtract 1 from the user response. 223 For even-numbered items, subtract the user response from 5. This scales all values 224 225 from 0 to 4, with four being the most positive response. Add up these adjusted responses for each user and multiply that total by 2.5. This 226 converts the range of possible values from 0 to 100 instead of from 0 to 40. 227 Although the scores range from 0 to 100, they should not be considered as a 228 percentage. Instead, SUS scores should be thought of as a percentile ranking that is based 229 on scores from a large number of studies. The SUS score curve and percentile rankings 230 (Figure 3) is based off SUS testing performed on over 5,000 web sites. An SUS score 231 above 68 is considered above average (Sauro 2011). 232 The climate data questions (summarized in Table 1, Part 3) asked in this study 233 stemmed from challenges we had internally naming various items on SCENIC and other 234 sites as well as general curiosity of how people perceive and search for climate data. 235 Questions 1-4 address naming conventions for various products generated from climate 236 data. Question 5, the card sorting activity, assesses how our participants search for 237 climate data by having them order cards with various aspects of climate data on them 238 from least to most important (Table 1, part 3, question 6). The last two questions allow 239 users to evaluate SCENIC and provide detailed feedback. In covering the last two 240 questions, we also explained to participants our intended method of answering any of the 241 questions the users struggled with in the first portion of the test. With the exception of the 242 card activity, answers were taken qualitatively and used as anecdotal information rather 243 than concrete research findings as the sample size (n=10) of participants was not large 244 enough to produce statistically significant results. 11 245 246 247 Conducting usability tests Only one to three testers were assessed each day. The time between tests was used 248 to remove any bugs, in this case referring to errors in code that cause the site to break or 249 perform in a way not anticipated by the developer. This helped to keep the focus of the 250 tests on design rather than having subjects repeatedly encounter the same bug. One 251 example of this while testing SCENIC was a Chrome browser issue that inserted a 252 dropdown menu arrow into any form element that had an auto-fill option. The first 253 participant to test on Chrome thought that they had to choose from a dropdown menu 254 rather than utilize the auto-fill option and could not move forward on the task. We viewed 255 this as a Chrome browser issue rather than part of SCENIC’s design so removed it 256 between testers within the first round of testing. After removing the arrow, subsequent 257 participants easily utilized the auto-fill option. Major changes to the site design were 258 made after the first round of testing such that new issues might be uncovered in the 259 second round (Krug 2005). 260 261 Lessons from usability testing 262 General usability guidelines from e-commerce, HCI 263 In researching usability, we found a variety of general recommendations for 264 usable web design that we sought to incorporate in SCENIC. The general purpose of 265 following these recommendations is to reduce the cognitive load on the user (Krug 2005). 266 These guidelines do not necessarily relate to the provision of atmospheric data in 267 particular, though we feel they are valuable enough to be listed here. Where applicable, 12 268 examples from our usability testing are given and video clips are available in the online 269 appendix. 270 Adhere to web conventions. This includes a navigation menu along the top of 271 the page, search bar option near the top, and links presented in a recognizable 272 style (Nielsen 2000a; Krug 2005). In an early version of SCENIC, participants 273 clicked on what appeared to be a link that displayed text on the same page. This 274 confused participants, so the text was changed to a different style and color to 275 avoid confusion with links. 276 Be consistent within a set of pages. Maintain the same layout from page to page 277 with similar text styling and form layout. This will enable the user to quickly 278 learn how to navigate and use a site (Usability.gov 2014). 279 Anticipate that text will not be read. Though it is tempting to provide detailed 280 information and directions to the user, Krug (2005) suggests that people tend to 281 “muddle through” a site rather than reading instructive texts. Brief titles and 282 concise labels will be read, but anything more than a few words will likely be 283 overlooked. 284 Provide help texts. We found that after participants had muddled through and 285 failed at accomplishing a task, they were receptive to reading help texts. Make the 286 help text source easy to see and use a question mark symbol or information “i” 287 and ensure the information provided is clear and concise. Our testing revealed that 288 participants who read help texts found the text answered their questions and were 289 likely to utilize help texts in later tasks as well. 13 290 Reduce options when possible. When presenting the user with a form element, 291 hide options until the user indicates they are needed. Otherwise, the user has to 292 scan and consider all options, increasing cognitive load (Krug 2005). An example 293 of this in SCENIC is that the options for output file name and delimiter are hidden 294 until the user indicates they would like to output data to a file rather than to the 295 screen. 296 Make labels clear and meaningful. Label buttons and navigation menus with 297 meaningful terms (Nielsen 2000a; Krug 2005). SCENIC’s station finder tool 298 displays stations on a map that meet criteria dictated by the user. The button to 299 show stations meeting criteria was labeled “submit.” Two participants in the first 300 round of testing were confused when they made their selections and hit “submit” 301 they did not receive data. Changing the “submit” action button to “show stations” 302 eliminated this issue in later testing. Two participants in the second round 303 repeated aloud, “show stations,” suggesting they were processing the outcome of 304 clicking the button. Several other buttons were also changed to increase clarity on 305 what clicking the button provides, such as “get data” for a data listing option 306 rather than “submit.” 307 How did users rate the site? 308 The average SUS score in the first round of testing was 63, placing the site at a 309 percentile rank of approximately 35% (Figure 3). Several changes were implemented 310 after the first round of testing to fix bugs as well as usability issues. Scores from the 311 second round of testing increased to 67.5, which falls just below the average of 68 with a 312 percentile rank of 50% (Figure 3). This indicates the usability of the site increased from 14 313 the first to second round of testing, but there is still much room for improvement on the 314 usability of the site. To ascribe an adjective to SCENIC, our participants collectively 315 found the site to be in the “OK” to “Good” range (Figure 4). 316 Figure 5 shows the overall SUS test scores from each participant for the two 317 rounds of testing. Scores in each round were comparable, with the scores in the second 318 round being slightly higher overall. There are no notable outliers in the data set that 319 would significantly affect the overall score for each test. 320 Figure 6 shows the adjusted scores from each question on the SUS assessment. 321 Questions 6, 9, and 10 stood out as showing considerable improvement. Question 6 322 focuses on consistency between pages. We removed quite a bit of clutter from the pages 323 between rounds one and two as well as performed some layout adjustments and improved 324 instructiveness of the text labels. We hypothesize these changes led to the increase in this 325 score. Questions 9 and 10 relate to the participant’s confidence in using the site. This 326 suggests the changes in labeling and improvements in help texts helped increase 327 participants’ confidence in using these pages. 328 A different group of participants tested the site in each round, and the site was 329 modified from the first to the second round. It is possible that removal of some usability 330 issues in the first round allowed users to access other challenges in the second. This, and 331 the characteristics of the individual users in each group, may help to explain why some 332 scores increased and some decreased on each question between rounds. 333 How people search for data 334 335 The results of the card ordering activity (summarized in Table 2) revealed that people search for climate information in different ways, though with some consistency. 15 336 Sixty percent of participants rated “where,” the location of the data, as the first and most 337 important thing they search for when acquiring climate data. In our participant group, 338 73% ranked the source of the data as least important when accessing climate data. This 339 raises some concern as many in the climate services community feel, from their deep 340 familiarity with data systems, that data source is very important. One participant 341 commented, “I generally trust that the data I am getting is of quality. I may run [quality 342 control] on it myself anyway, so I am not really concerned about the source, just getting 343 the data.” Between these two extremes, responses were fairly spread across when, what, 344 and type in that order with only 1-2 votes determining rank. Several participants indicated 345 that their responses may vary depending on project and we asked them to focus on a 346 current or recent project. Many climate data provision agencies, such as WRCC, provide 347 data organized by network. Our results indicate that source (network) ranks as least 348 important and location as most important to our study participants. It follows that it 349 would be most useful to our target audience to allow data to be selected by region and 350 then by time period. These results are incorporated in SCENIC by offering the spatial 351 option first and foremost with “station finder” map tools. In part 1, task 2, participants 352 were asked to find the record high March temperature at Winnemucca airport. The 353 number of Winnemucca Airport entries in the station finder table; one entry for each of 354 several network memberships, puzzled the first user. We updated the site such that each 355 unique station name had a single entry and its networks were grouped together. 356 Subsequent users who utilized the station finder table were able to quickly locate the 357 Winnemucca Airport station. 358 Challenges in labeling 16 359 Comments from our test participants as well as prior experience providing climate 360 services via the Web at WRCC show that the labeling of links and items on pages 361 providing weather and climate data is one of the greatest challenges to usability. 362 Questions asked to explore labeling and terminology are given in Table 1, section 3. The 363 terms we explore include: modeled data, gridded data, tool, product, time series, anomaly 364 map, raw data, and data analysis. 365 We struggled with the decision of how to title links to gridded data products (data 366 generated by a model and put on a grid) provided through SCENIC. Possible terms 367 included “gridded data”, “modeled data”, or “gridded/modeled data”. Several participants 368 indicated the term “modeled” data was confusing to them and they weren’t sure what to 369 expect if they were to click on it. One of the most useful responses was, “modeled data is 370 not a very informative term; gridded gives me more useful information about the data.” 371 In light of these responses, we selected the term “gridded data” to use after the first round 372 of testing rather than “gridded/modeled”. All ten participants were able to complete the 373 gridded data question without confusion as to how to access the data, showing the term 374 “gridded data” is a useful indicator. 375 “Climate anomaly map” and “time series graph” are two commonly used terms to 376 describe graphics that depict climate. All participants readily agreed to “time series 377 graph” as an adequate term, and with some hesitation, were unanimous in agreement on 378 the use of “climate anomaly maps” as well. These terms were incorporated into SCENIC 379 where appropriate. 17 380 “Tool” and “product” are terms commonly used on sites providing weather and 381 climate information (at time of writing, Regional Climate Centers5, National Integrated 382 Drought Information System6, National Climate Data Center7, to name a few). All 383 participants were in agreement that a “tool” allows the user to perform some sort of 384 action or analysis using data, while a “product” is static. In essence, a “tool” creates a 385 “product”, though only a few participants drew this conclusion. In spite of the general 386 agreement on these terms, using “data tools” on SCENIC did not yield the desired results 387 leading us to look for a better phrase to guide people to tools that can be used to analyze 388 and summarize the data. 389 The question on “raw data” yielded a variety of answers. Several participants 390 viewed “raw data” as a list of data that they could download from a site to use in 391 analyses. They assumed it had had quality control (QC) applied; “raw” implied it was not 392 an average or summary of any sort. Other participants viewed “raw data” as what came 393 directly from the sensor (as is the standard terminology in climate services) that may have 394 lots of errors and other issues and would require clean up. We opted to use the term “data 395 lister” due to the difference in user responses and to be consistent with phrasing on other 396 WRCC pages. 397 The greatest challenge test participants experienced in the three tasks we posed to 398 them was efficiently completing web-based task 2, finding the highest temperature ever 399 recorded in March at the airport in Winnemucca, Nevada. All ten participants first went 400 to the “historic station data lister” and listed maximum temperature for the station’s http://www.ncdc.noaa.gov/customer-support/partnerships/regional-climatecenters 6 http://www.drought.gov/drought/ 7 http://www.ncdc.noaa.gov/ 5 18 401 period of record. After listing daily data for the station’s period of record, they realized 402 that was not the right way to answer the question and began to search for other options, 403 eventually arriving at the data analysis tools. In the first round of testing, we intended 404 participants to go to the navigation tab, “Station Data Tools” where there were several 405 tools available that allow them to answer the question. As this labeling did not prompt 406 participants to click on it, we changed the navigation tab to “Data Analysis” for the 407 second round of testing. Unfortunately, participants were still not motivated to click on 408 this link to complete the task that required data analysis. We remain challenged to find 409 the best way to prompt people to utilize the variety of analyses we have provided. 410 Interestingly, half of the participants said that when they got to the point of listing the 411 period of record maximum temperature data, they would not have continued to look for 412 analysis tools. They would have pulled the data into analysis software (such as MATLAB 413 or Excel) to obtain the maximum March temperature. These participants said they 414 preferred to do things in this manner, as they may need the data later for other 415 applications. They stated that the analysis tools were “neat” and “good for quick 416 answers.” The result of this piece of the test raises two questions: Does our target 417 audience want analysis tools? If so, how do we advertise the tools and let it be known that 418 they are available? 419 420 421 Conclusions Watching target users complete tasks on SCENIC provided us with valuable 422 information on how people in the target audience, researchers with the SW-CSC, use the 423 site and allowed us to fix a number of roadblocks to usability as well as programming 19 424 bugs. Results of SUS scores rose from 63 in the first round of testing to 67.5 in the 425 second round, indicating some level of improvement to the site. These scores fall in the 426 “average” range for a website (Figure 4), indicating there is still considerable progress to 427 be made. We found that while usability testing uncovers usability issues on a site, it is not 428 always clear how to modify the site to remove these problems. We were not able to 429 rectify all usability challenges in the two rounds of testing on SCENIC, though the testing 430 made us aware that they exist and allows us to further work to improve the site. 431 Performing this testing allowed us to interact with our target audience and ask 432 questions that helped us decide on the naming of certain elements of the site. Though the 433 sample size for these questions (n=10) is not large enough to be statistically significant, it 434 still provided us with useful insights to how our target audience perceives various terms 435 used frequently in climate data. The card sorting activity revealed that our participants 436 consistently rank location of data as the most important factor in searching for data and 437 source as the least important. This challenges climatologists with how to provide data in a 438 streamlined manner while still making sure the user is aware of any caveats to the data 439 (which the user may or may not be concerned with). 440 The challenges and lessons learned presented here are not unique to the climate 441 data presented on SCENIC. Any atmospheric or related data (satellite, air quality, stream 442 flow, etc.) that can be offered over the Internet can benefit from usability testing, though 443 the labeling and terminology challenges will likely be dataset-specific. In summation, we 444 suggest the following as best practices for creating web pages that provide climate data as 445 well as other atmospheric data: 20 446 447 448 site development process and repeat often 449 450 Work with representatives from the target audience rather than office mates or research team members; this yields more meaningful results 451 452 Usability testing is extremely useful in building this type of site; test early in the Consider the way in which your target audience looks for the atmospheric data provided and design the site to meet these needs Labels and naming of site elements can be extremely challenging and has a 453 significant effect on usability of sites providing atmospheric data; test terms early, 454 borrow from other agency’s sites for consistency, and respect historical 455 terminology where acceptable 456 457 Adhere to general usability guidelines as described in the General usability guidelines from e-commerce, HCI section of this paper 458 459 460 Discussion We are by no means usability professionals and speak to the readers as fellow 461 atmospheric scientists and programmers attempting to deliver atmospheric data and 462 information to a target audience. From our experience performing usability testing, we 463 highly recommend the process to any group providing atmospheric data online. The 464 knowledge and experience gained from this research will propagate into future work and 465 allow us to build better sites with the end user in mind. There is still much work to be 466 done in the field of creating usable websites for the provision of atmospheric data. Some 467 directions include: 21 468 469 470 Conduct a larger survey on how people in various audiences look for atmospheric information and how they expect data and tools to be organized 471 Work to achieve greater consistency in the terminology used on websites providing atmospheric information across agencies 472 Research how to develop effective “help” videos for atmospheric data websites 473 Share results of usability testing in the atmospheric sciences community 474 As more data are available online to an increasingly diverse audience, usability 475 becomes more and more essential in the success of a web site. We hope that the work 476 presented here will encourage others in the field of atmospheric science to consider the 477 fundamentals of usability when developing sites for accessing and exploring atmospheric 478 data. 479 As described by the National Research Council (2010), Overpeck, et al. (2011), and 480 Rood and Edwards (2014), the future of informatics will be to provide data users with the 481 information necessary to correctly interpret the data applicable to their particular 482 question. Before we can reach this step, it is essential that we can first provide basic data 483 and information via the Internet in a way that is easily utilized by the target audience. 484 Achieving this step will help us move forward to supporting user interpretation of data. 485 486 Acknowledgements 487 We would like to thank Kelly Redmond, Mark Pitchford, David Herring, and two 488 anonymous reviewers for their helpful feedback and comments. We would also like to 489 thank all of our test participants for making this study possible. This project was 490 supported by competitive award funds furnished by the Desert Research Institute 22 491 Division of Atmospheric Sciences under its Effective Designs to Generate Enhanced 492 Support (EDGES) program. 493 494 References 495 Bangor, A., P. Kortum, J. Miller, 2009: Determining what individual SUS scores mean: 496 Adding an adjective rating scale. Journal of usability studies, 4(3), 114-123. 497 498 499 Brooke, J., 1986. SUS-A quick and dirty usability scale. Usability evaluation in industry, 189, 194. 500 501 Brooke, J., 2013. SUS: a retrospective. Journal of Usability Studies, 8(2), 29-40. 502 503 Brugger, J., M. Crimmins, 2011. Weather, Climate, and Rural Arizona: Insights and 504 Assessment Strategies. A Technical Input to the U.S. National Climate 505 Assessment. U.S. Global Climate Research Program, Washington, D.C. 80 pp. 506 507 Dix, A., 2009. Human-computer interaction (pp. 1327-1331). Springer US. 508 509 510 Dumas, J. S., J. Redish, 1999. A practical guide to usability testing. Intellect Books. 404 pp. 511 512 Faulkner, L. 2003. Beyond the five-user assumption: Benefits of increased sample sizes 23 513 in usability testing. Behavioral Research Methods, Instruments, and Computers, 514 35(3), 379-383. 515 516 517 Krug, S., 2005. Don’t Make Me Think: A Practical Guide to Web Usability. New Riders Publishing, 195 pp. 518 519 520 Krug, S., 2009. Rocket surgery made easy: The do-it-yourself guide to finding and fixing usability problems. New Riders Publishing, 168 pp. 521 522 523 National Research Council, 2010. Informing an Effective Response to Climate Change. Washington, DC: The National Academies Press. 524 525 Nielsen, J., 1993. Usability Engineering. Boston: AP Professional. 526 527 528 Nielsen, J., 1997. Loyalty on the Web. Alertbox Newsletter. [Available online at http://www.nngroup.com/articles/loyalty-on-the-web/ ] 529 530 Nielsen, J., 2000a. Designing Web Usability. New Riders Publishing, 419 pp. 531 532 Nielsen, J., 2000b. Why you only need to test with 5 users. Alertbox Newsletter. 533 [Available 534 test-with-5-users/] online at http://www.nngroup.com/articles/why-you-only-need-to- 535 24 536 Nielsen, J., 2003. Recruiting test participants for usability studies. Alertbox Newsletter. 537 [Available 538 participants-for-usability-studies/] online at http://www.nngroup.com/articles/recruiting-test- 539 540 541 Nielsen, J., 2004. Card sorting: How many users to test. Alertbox Newsletter. [Available online at http://www.nngroup.com/articles/card-sorting-how-many-users-to-test/] 542 543 Nielsen, J., 2011. How long do users stay on web pages?. Alertbox Newsletter. [Available 544 online at http://www.nngroup.com/articles/how-long-do-users-stay-on-webpages/] 545 546 547 Overpeck, J. T., G. A. Meehl, S. Bony, D. R. Easterling, 2011. Climate data challenges in the 21st century. Science, 331, 700-702. 548 549 550 Preece, J., Y. Rogers, H. Sharp, D. Benyon, S. Holland, & T. Carey, 1994. Human computer interaction. Addison-Wesley Longman Ltd. 551 552 Rood, R., P. Edwards, 2014. Climate infomatics: Human experts and the end-to-end 553 system. 554 http://www.earthzine.org/2014/05/22/climate-informatics-human-experts-and-the- 555 end-to-end-system/] Earthzine. [Available online at 556 557 558 Sauro, J., 2011. A practical guide to the system usability scale: Background, benchmarks & best practices. Measuring Usability LLC, 162 pp. 559 25 560 561 Spillers, F., 2009. Usability testing tips. Usability testing central. [Available online at http://www.usabilitytestingcentral.com/usability_testing_tips/] 562 563 564 Tullis, T., L. Wood, 2004. How many users are enough for a card-sorting study. Proceedings Usability Professionals’ Association, Vol. 2004. 565 566 Usability.gov, accessed 2014: What and why of usability; user research basics. U.S. 567 Department of Health and Human Services. [Available online at 568 http://www.usability.gov/] 569 570 Virzi, R. A., 1992. Refining the test phase of usability evaluation: How many subjects is 571 enough? Human Factors, 34, 457-468. 572 573 574 575 576 Tables Part 1: Web-based tasks performed on SCENIC 1. List data for all stations in Shasta County, California that recorded snowfall and precipitation for all dates from December 15-December 31 2013. 2. Find the highest temperature ever recorded in March at Winnemucca Airport, Nevada.* 3. Find the lowest minimum temperature among grid points approximately covering the area of Pyramid Lake in December 2013. Use the NRCC interpolated dataset. Part 2: Standardized Usability Test (SUS) Rated on scale of 1 (disagree strongly) to 5 (agree strongly) 1. I think I would use SCENIC frequently. 2. I found these web pages unnecessarily complex. 3. I thought SCENIC was easy to use. 26 4. I think I would need the support of a technical person to use SCENIC. 5. I found the various functions of the product well integrated. 6. I thought there was too much inconsistency in the pages. 7. I imagine that most people would learn to use these pages very quickly. 8. I found SCENIC very cumbersome to use. 9. I felt very confident using these pages. 10. I would need to learn a lot of things before I could get going with this product. Part 3: Selections from general questions on using climate data ** 1. What are gridded data? In there a difference between gridded and modeled data as it applies to climate data? 2. What is the difference between a “tool” and a “product” when it comes to weather and climate data? 3. We call these images climate anomaly maps (see Figure 2a). Would you expect to find these on a link labeled “climate anomaly maps”? 4. We call these graphs time series graphs (see Figure 2b). Would you expect to find these if you clicked on a link labeled “time series graphs’? 5. What are “raw data”? 6. Sort the five cards (labeled as follows) in order from least important to most important to you when searching for climate data. Think of your most recent project involving climate data if necessary. Where: location of data (particular county, watershed, state, climate division) When: date range available (days, weeks, months, years, record) What: climate element (temperature, precipitation, snowfall) Type: station data or gridded data Source: originator of data (NWS, NRCS, NIFC, etc) 7. Anything you find confusing or would change on these pages? Likes/dislikes? *In second round of testing, a bug in the auto fill function led us to use September and Elko Regional Airport, Nevada. ** Responses were collected from 10 participants, except in part 3, question 6, where participants from practice testing were incorporated to reach necessary sample size of 15 for usability card sorting activities 577 578 Table 1: Questions used in the three portions of the SCENIC usability test. In part 3, 579 only questions whose results are discussed in this paper are shown for brevity. 580 581 582 583 27 Card label WHERE- location WHEN- dates available WHAT- variables TYPE- gridded, station SOURCE- originator Results Most Important 1 2 60% (9) 25% (4) 0% (0) 31% (5) 20% (3) 25% (4) 20% (3) 13% (2) 0% (0) 6% (1) WHERE WHEN 3 13% (2) 40% (6) 33% (5) 13% (2) 0% (0) WHAT Least important 4 5 7% (1) 0% (0) 20% (3) 7% (1) 13% (2) 7% (1) 40% (6) 13% (2) 20% (3) 73% (11) TYPE SOURCE 584 585 Table 2: Results of card sorting activity. Fifteen participants were asked to perform 586 the activity with one abstaining for a total n=14. Two participants assigned two 587 cards equal weight, thus a single card may be counted in two categories for an 588 individual participant. Values given are percent of total cards and number of cards 589 in each rank and category. Results row gives the final ranking of each card from 590 most important on left, “where,” to least important on right, “source”. The highest 591 ranking value in column 3, “when,” was already the highest ranking in column 2. 592 Thus, the second highest value in the column, ”what”, is given as the highest ranking 593 for column 3. 594 595 Figure Captions 596 Figure 1: Home page of SCENIC, the website assessed in this study. 597 Figure 2: Examples of figures shown to participants in Part 3, Questions 3 and 4 in 598 SCENIC usability test. 599 600 28 601 Figure 3: Percentile ranks associated with SUS scores and “letter grades” for different 602 areas along the scale following the standard United States A-F grading scale. Scores from 603 each round of testing are displayed as vertical lines on the graph. Figure adapted from 604 Sauro (2011). 605 Figure 4: Adjectives describing a site associated with various SUS scores. Mean SUS 606 score ratings and error bars +/- one standard error of the mean. Source: Bangor et al. 607 2009. 608 609 Figure 5: Normalized SUS scores by participant for the first and second rounds of testing 610 on SCENIC. Note that unique participants are used in each round. 611 612 Figure 6: Scores from round 1 and 2 of testing by question. Scores for each question are 613 out of a total of 20 points. Higher scores imply more favorable responses about the site. 614 Questions refer to those in Table 1, Section 2. 615 616 617 618 619 620 621 622 623 29 624 Figures and Captions 625 626 Figure 1: Home page of SCENIC, the website assessed in this study. 627 628 Figure 2: Examples of figures shown to participants in Part 3, Questions 3 and 4 in 629 SCENIC usability test. 630 30 631 632 Figure 3: Percentile ranks associated with SUS scores and “letter grades” for different 633 areas along the scale following the standard United States A-F grading scale. Scores from 634 each round of testing are displayed as vertical lines on the graph. Figure adapted from 635 Sauro (2011). 636 31 637 638 Figure 4: Adjectives describing a site associated with various SUS scores. Mean SUS 639 score ratings and error bars +/- one standard error of the mean. Source: Bangor et al. 640 2009. 32 641 642 Figure 5: Normalized SUS scores by participant for the first and second rounds of testing 643 on SCENIC. Note that unique participants are used in each round. 33 644 645 Figure 6: Scores from round 1 and 2 of testing by question. Scores for each question are 646 out of a total of 20 points. Higher scores imply more favorable responses about the site. 647 Questions refer to those in Table 1, Section 2. 648 649 650 34