AMERICAN METEOROLOGICAL SOCIETY

advertisement
AMERICAN
METEOROLOGICAL
SOCIETY
Bulletin of the American Meteorological Society
EARLY ONLINE RELEASE
This is a preliminary PDF of the author-produced
manuscript that has been peer-reviewed and
accepted for publication. Since it is being posted
so soon after acceptance, it has not yet been
copyedited, formatted, or processed by AMS
Publications. This preliminary version of the
manuscript may be downloaded, distributed, and
cited, but please be aware that there will be visual
differences and possibly some content differences
between this version and the final published version.
The DOI for this manuscript is doi: 10.1175/BAMS-D-14-00121.1
The final published version of this manuscript will replace the
preliminary version at the above DOI once it is available.
If you would like to cite this EOR in a separate work, please use the following full
citation:
Oakley, N., and B. Daudert, 2015: Establishing best practices to improve
usefulness and usability of web interfaces providing atmospheric data. Bull.
Amer. Meteor. Soc. doi:10.1175/BAMS-D-14-00121.1, in press.
© 2015 American Meteorological Society
Manuscript (non-LaTeX)
Click here to download Manuscript (non-LaTeX): SCENIC_BAMS_manuscript_edits.docx
1
2
Establishing best practices to improve usefulness and usability of
3
web interfaces providing atmospheric data
4
5
Nina S. Oakley1
6
Britta Daudert
7
8
9
1
Western Regional Climate Center
10
Desert Research Institute
11
2215 Raggio Parkway, Reno, NV, 89512
12
nina.oakley@dri.edu
13
14
Submitted to:
15
16
Bulletin of the American Meteorological Society
17
5 November 2014
18
19
1
20
Tagline/Capsule: Addressing usability when developing a web portal for data access is
21
relatively inexpensive, increases a site’s use and user satisfaction, and reflects positively
22
on the organization hosting the site.
23
24
Abstract
25
Accessing scientific data and information through an online portal can be a frustrating
26
task, often due to the fact that they were not built with the user’s needs in mind. The
27
concept of making web interfaces easy to use, known as “usability,” has been thoroughly
28
researched in the field of e-commerce but has not been explicitly addressed in the
29
atmospheric and most other sciences. As more observation stations are installed, satellites
30
flown, models run, and field campaigns performed, data are continuously produced.
31
Portals on the Internet have become the favored mechanisms to share this information
32
and are ever increasing in number. Portals are often created without being explicitly
33
tested for usability with the target audience though the expenses of testing are low and
34
the returns high. To remain competitive and relevant in the provision of atmospheric
35
information, it is imperative that developers understand design elements of a successful
36
portal to make their product stand out among others. This work informs the audience of
37
the benefits and basic principles of usability that can be applied to web pages presenting
38
atmospheric information. We will also share some of the best practices and
39
recommendations we have formulated from the results of usability testing performed on a
40
data provision site designed for researchers in the Southwest Climate Science Center and
41
hosted by the Western Regional Climate Center.
42
2
43
44
Introduction
45
Atmospheric data and information (hereafter referred to as “information”) are
46
becoming increasingly important to a wide range of users outside of the atmospheric
47
science discipline. These include other scientists (hydrologists, social scientists,
48
ecologists), resource managers, public health officials, farmers, and others (National
49
Research Council 2010; Overpeck et al. 2011). As a result, providers of atmospheric
50
information have a growing obligation to not only provide information (access is usually
51
taxpayer funded), but also make the information easily digestible by the various members
52
of a broadening audience (Brugger and Crimmins 2011; Overpeck et al. 2011; Rood and
53
Edwards 2014). A site developed without the user in mind may prove frustrating or
54
challenging to use (Krug 2005). Assessment of a site’s usability, [the extent to which the
55
site can be used to achieve goals with effectiveness, efficiency, and satisfaction (ISO
56
1998)] is a cost-effective way to ensure users can fluidly accomplish intended tasks on a
57
site. To summarize usability as it applies to web design today, Dumas and Redish (1999)
58
offer four points: (1) Usability means focusing on users [as opposed to
59
developer/designer needs]; (2) people use products to be productive; (3) users are busy
60
people trying to accomplish tasks; and (4) users decide whether a product is easy to use.
61
Building and testing a usable site requires employing these principles as well as
62
following general guidelines that have become standard in the field of usability and
63
Human Computer Interaction (HCI). Much of the literature assessed in this paper is
64
focused on usability in the practical sense; we will leave the theory of HCI to others
3
65
(Preece et al. 1994; Dix 2009; for example). This work also does not approach the topics
66
of accessibility and responsive design.
67
Here we take a “small shop” approach to web development. In this case, a
68
research scientist, data analyst or programmer with no formal training in web design must
69
develop a web site to provide atmospheric information. This person generally has some
70
support from his or her research group, but does not have a web development team to
71
work with and must try and apply principles of usability with limited resources. Though
72
this situation is not representative of all groups in the atmospheric sciences, it is the group
73
that is likely the most challenged to build a usable site. We present the results of usability
74
testing performed on a website providing climate information hosted by the Western
75
Regional Climate Center (WRCC). Additionally, we outline general usability guidelines
76
that are applicable to pages providing atmospheric information and explain how our test
77
participants perceive and search for climate data. Though this test focuses on station-
78
based and gridded climate data for the elements temperature and precipitation, the results
79
of the testing and general guidelines provided are easily applicable to other types of
80
atmospheric data.
81
82
83
Why is usability important when providing atmospheric data?
People are very goal-driven when they access a website. A usable site design will
84
“get out of the way” and allow people to successfully accomplish their goals in a
85
reasonable amount of time (Nielsen 2000a). Krug (2005) defines a “reservoir of
86
goodwill” that users have when entering a site. Each problem or challenge in using the
87
site lowers the reservoir until it is exhausted and the user leaves the site altogether. It is
4
88
important to note that each user’s “goodwill reservoir” is unique and situational; some
89
users are by nature more patient than others and have a larger reserve. Some may have a
90
predetermined opinion about an organization that influences the experience they will
91
have on a site. Nielsen (2011) observes that people are likely to leave a site within the
92
first 10-20 seconds if they do not see what they are looking for or become confused. If a
93
site can convince a user the material presented is valuable and persuade the user to stay
94
beyond the 20-second threshold, they are likely to remain on the page for a longer period
95
of time. If principles of usability are not addressed, page visitors are likely to find, or at
96
least search for, another site that makes the information they want easier to access
97
(Nielsen 2000a). Additionally, a successful experience on a web site makes people likely
98
to return. In economic terms, loyal users tend to spend considerably more money on a site
99
than a first time user (Nielsen 1997). In atmospheric science, demonstration of a loyal
100
website following can indicate to supporting agencies that the site provides information
101
that is useful to stakeholders, which may help to secure future resources. Furthermore,
102
having a usable site can make your organization stand out among others.
103
Compared with other parts of scientific research and data production, usability
104
testing is relatively cheap and very effective. Nielsen (2000b) suggests that performing
105
usability testing with five users per round of testing will uncover approximately 80% of
106
the problems on a website. The usability tests themselves are typically an hour in length
107
and the equipment can often be located within a research institution keeping technology
108
costs to a minimum.
109
Another benefit of usability testing is the opportunity to learn about the culture of
110
your intended data users. By watching the target audience for your site perform usability
5
111
tests you will observe some of their rules, habits, behaviors, values, beliefs, and attitudes
112
(Spillers 2009). This information can then be applied to future products generated by
113
your research group or organization.
114
115
Usability Testing
116
The usability of a site is typically evaluated through a formal process called
117
usability testing (Nielsen 2000b; Krug 2005; Usability.gov 2014). During a usability test,
118
participants that have been chosen based on some criteria (e.g., users of climate data) are
119
asked to perform specified tasks with little guidance under controlled conditions while
120
one or more facilitators observe. Tests are often recorded for later viewing and analysis.
121
To obtain the skills necessary to perform usability testing, the authors attended a
122
workshop hosted by usability consultants Nielsen-Norman Group1. The workshop
123
instructed on the basics of creating a usable website, developing a space for usability
124
testing, facilitating the testing to achieve meaningful results, interpreting test results, and
125
incorporating results into site design. Besides attending a workshop, there are many texts
126
and online resources that can provide support on how to conduct usability testing (Krug
127
2005; Krug 2009; Usability.gov 2014; for example).
128
129
Site Tested: SCENIC
130
To investigate how users interact with weather and climate data, we tested a
131
website under development entitled SCENIC2 (Southwest Climate and ENvironmental
132
Information Collaborative, Figure 1). SCENIC is designed to serve scientists working for
1
2
http://www.nngroup.com
http://wrcc.dri.edu/csc/scenic/
6
133
the Department of Interior Southwest Climate Science Center (SW-CSC) and other such
134
Climate Science Centers. These scientists typically work in the fields of ecology,
135
hydrology, forestry, or resource management. SCENIC acts as an interface to the Applied
136
Climate Information System (ACIS) database, which contains daily climate data for the
137
United States from many networks. Resources available in SCENIC are focused on the
138
Southwest US though data are available for locations throughout the nation. SCENIC has
139
a wide variety of data acquisition and analysis tools for both gridded and station-based
140
data.
141
142
Creating a Usability Lab
143
A formal usability test should take place in a usability lab. These labs may be
144
extremely complex, such that the test takes place in an isolated room while the design
145
team watches remotely via closed-circuit television. Eye or mouse tracking and screen
146
recording software may be utilized as well. We opted for a simple lab and used a small
147
conference room for a quiet space, a computer and full-size screen, mouse, and keyboard.
148
We used Camtasia3 screen recording software and a microphone to record the screen
149
movements and verbalizations of the study participants. For comfort and ease of use, test
150
subjects were able to work on either a Mac or Windows operating system with the web
151
browser of their choice. Though it is unnatural for a person to be working on a computer
152
while being observed, the goal is to make them as comfortable as possible during the test
153
so they will act as they normally would when using a website and provide realistic
154
feedback on the site’s usability.
3
By TechSmith, http://www.techsmith.com/camtasia.html
7
155
During usability testing, a facilitator is used to help guide the participant through
156
each of the tasks. The facilitator does not answer questions about the site or guide the
157
participant in any way; they serve to prompt the participant to verbalize their thought
158
process as they work through each task. The facilitator takes detailed notes as subjects
159
complete the tasks in each section. The notes as well as video recordings are later
160
reviewed to assess functions of the site that exhibit or lack usability. Clips created from
161
the video taken during this testing can be viewed online. 4
162
163
Selecting and recruiting test participants
164
In general, the usability literature suggests that five users will uncover most of the
165
usability issues in a site (Virzi 1992; Nielsen 2000b; Krug 2005; Usability.gov 2014).
166
However, Faulkner (2003) points out that it is not all cases where five users will uncover
167
a majority of the issues. Faulkner’s work states that ten users will uncover 82% of
168
usability problems at a minimum and 95% of problems on average. We chose to test five
169
users in each of two rounds to ensure that we uncover a large majority of usability issues
170
by all approaches in the aforementioned literature. Usability literature does not
171
recommend any specific number of iterations of testing, though a new round of testing is
172
suggested after major updates to a site (Nielsen 2000b; Krug 2009). We first performed
173
an experimental round of testing utilizing five graduate students in natural resources,
174
hydrology, and geography from the University of Nevada, Reno (UNR). This allowed us
175
to develop our facilitation methods, work out any recording software issues, and refine
176
our general questions about climate data before utilizing professionals. The results of this
4
http://www.dri.edu/scenic-usability-research
8
177
preliminary testing are not included here except for reporting on the card sorting activity,
178
for which 15 participants are recommended to achieve meaningful results (Nielsen 2004;
179
Tullis and Wood 2004; Usability.gov 2014).
180
Usability testing yields the best results when the subjects chosen represent the
181
target user group (Nielsen 2003; Krug 2005; Usability.gov 2014). As SCENIC is
182
intended to serve SW-CSC scientists, we sought out people working in resource
183
management and ecology who utilize climate data in their work to participate in the
184
study. Our group of participants came from private, state and federal agencies including
185
independent consulting firms, Bureau of Land Management, Nevada Department of
186
Wildlife, Great Basin Landscape Conservation Cooperative, SW-CSC, UNR, and the
187
Desert Research Institute.
188
All participants were informed that they were being recorded and gave their
189
verbal consent to participate in the study. Where regulations allowed, users were
190
compensated for their time with a gift card as suggested in the literature to improve the
191
quality of the participant’s involvement in the testing (Nielsen 2003; Krug 2005;
192
Usability.gov 2014). Providing an incentive for participation helps to ensure users are
193
motivated to perform the tasks.
194
195
196
Designing test questions
We used both qualitative and quantitative techniques to assess participants’ ability
197
to use SCENIC in a fluid manner. Each test was comprised of three portions: a set of
198
three tasks to complete on the website, a standardized usability test, and a set of questions
9
199
relating to the general use of climate data that were not specific to the site tested (see
200
Table 1).
201
We devised three tasks we expected target users to be able to perform on
202
SCENIC, as Nielsen (2000a) recommends designing a site around the top three reasons a
203
user would visit a site. The tasks were based on the common types of questions asked of
204
WRCC’s service climatologists by members of the target audience. The tasks are given in
205
Table 1. Each task utilizes different capabilities on the site and is achieved through a
206
different set of steps to provide breadth in covering the site’s usability issues. The
207
assessments of this portion of the test were qualitative and involved the fluidity with
208
which the user was able to complete the task and their commentary about the site as they
209
used it.
210
The System Usability Scale (SUS) is a widely used and reliable tool for
211
measuring the ease of use of a product. It produces valid results on small sample sizes
212
making it an applicable quantitative tool for this usability evaluation (Brooke 1986;
213
Bangor 2009). The SUS test should be administered immediately after the web-based
214
tasks and before any post-test discussion takes place. SUS is a 10-item questionnaire with
215
five response options for respondents presented in a Likert-type scale from strongly agree
216
to strongly disagree. The SUS test is summarized in Table 1. Half of the questions (1, 3,
217
5, 7, 9) are phrased such that they describe the site being evaluated in a positive way
218
while the other half (2, 4, 6, 8, 10) portray the site negatively. This design prevents biased
219
responses caused by testers choosing an answer without having to consider each
220
statement (Brooke 2013). An SUS questionnaire is scored by doing the following:
221

Each question is valued between one and five points
10
222

For odd items, subtract 1 from the user response.
223

For even-numbered items, subtract the user response from 5. This scales all values
224
225
from 0 to 4, with four being the most positive response.

Add up these adjusted responses for each user and multiply that total by 2.5. This
226
converts the range of possible values from 0 to 100 instead of from 0 to 40.
227
Although the scores range from 0 to 100, they should not be considered as a
228
percentage. Instead, SUS scores should be thought of as a percentile ranking that is based
229
on scores from a large number of studies. The SUS score curve and percentile rankings
230
(Figure 3) is based off SUS testing performed on over 5,000 web sites. An SUS score
231
above 68 is considered above average (Sauro 2011).
232
The climate data questions (summarized in Table 1, Part 3) asked in this study
233
stemmed from challenges we had internally naming various items on SCENIC and other
234
sites as well as general curiosity of how people perceive and search for climate data.
235
Questions 1-4 address naming conventions for various products generated from climate
236
data. Question 5, the card sorting activity, assesses how our participants search for
237
climate data by having them order cards with various aspects of climate data on them
238
from least to most important (Table 1, part 3, question 6). The last two questions allow
239
users to evaluate SCENIC and provide detailed feedback. In covering the last two
240
questions, we also explained to participants our intended method of answering any of the
241
questions the users struggled with in the first portion of the test. With the exception of the
242
card activity, answers were taken qualitatively and used as anecdotal information rather
243
than concrete research findings as the sample size (n=10) of participants was not large
244
enough to produce statistically significant results.
11
245
246
247
Conducting usability tests
Only one to three testers were assessed each day. The time between tests was used
248
to remove any bugs, in this case referring to errors in code that cause the site to break or
249
perform in a way not anticipated by the developer. This helped to keep the focus of the
250
tests on design rather than having subjects repeatedly encounter the same bug. One
251
example of this while testing SCENIC was a Chrome browser issue that inserted a
252
dropdown menu arrow into any form element that had an auto-fill option. The first
253
participant to test on Chrome thought that they had to choose from a dropdown menu
254
rather than utilize the auto-fill option and could not move forward on the task. We viewed
255
this as a Chrome browser issue rather than part of SCENIC’s design so removed it
256
between testers within the first round of testing. After removing the arrow, subsequent
257
participants easily utilized the auto-fill option. Major changes to the site design were
258
made after the first round of testing such that new issues might be uncovered in the
259
second round (Krug 2005).
260
261
Lessons from usability testing
262
General usability guidelines from e-commerce, HCI
263
In researching usability, we found a variety of general recommendations for
264
usable web design that we sought to incorporate in SCENIC. The general purpose of
265
following these recommendations is to reduce the cognitive load on the user (Krug 2005).
266
These guidelines do not necessarily relate to the provision of atmospheric data in
267
particular, though we feel they are valuable enough to be listed here. Where applicable,
12
268
examples from our usability testing are given and video clips are available in the online
269
appendix.
270

Adhere to web conventions. This includes a navigation menu along the top of
271
the page, search bar option near the top, and links presented in a recognizable
272
style (Nielsen 2000a; Krug 2005). In an early version of SCENIC, participants
273
clicked on what appeared to be a link that displayed text on the same page. This
274
confused participants, so the text was changed to a different style and color to
275
avoid confusion with links.
276

Be consistent within a set of pages. Maintain the same layout from page to page
277
with similar text styling and form layout. This will enable the user to quickly
278
learn how to navigate and use a site (Usability.gov 2014).
279

Anticipate that text will not be read. Though it is tempting to provide detailed
280
information and directions to the user, Krug (2005) suggests that people tend to
281
“muddle through” a site rather than reading instructive texts. Brief titles and
282
concise labels will be read, but anything more than a few words will likely be
283
overlooked.
284

Provide help texts. We found that after participants had muddled through and
285
failed at accomplishing a task, they were receptive to reading help texts. Make the
286
help text source easy to see and use a question mark symbol or information “i”
287
and ensure the information provided is clear and concise. Our testing revealed that
288
participants who read help texts found the text answered their questions and were
289
likely to utilize help texts in later tasks as well.
13
290

Reduce options when possible. When presenting the user with a form element,
291
hide options until the user indicates they are needed. Otherwise, the user has to
292
scan and consider all options, increasing cognitive load (Krug 2005). An example
293
of this in SCENIC is that the options for output file name and delimiter are hidden
294
until the user indicates they would like to output data to a file rather than to the
295
screen.
296

Make labels clear and meaningful. Label buttons and navigation menus with
297
meaningful terms (Nielsen 2000a; Krug 2005). SCENIC’s station finder tool
298
displays stations on a map that meet criteria dictated by the user. The button to
299
show stations meeting criteria was labeled “submit.” Two participants in the first
300
round of testing were confused when they made their selections and hit “submit”
301
they did not receive data. Changing the “submit” action button to “show stations”
302
eliminated this issue in later testing. Two participants in the second round
303
repeated aloud, “show stations,” suggesting they were processing the outcome of
304
clicking the button. Several other buttons were also changed to increase clarity on
305
what clicking the button provides, such as “get data” for a data listing option
306
rather than “submit.”
307
How did users rate the site?
308
The average SUS score in the first round of testing was 63, placing the site at a
309
percentile rank of approximately 35% (Figure 3). Several changes were implemented
310
after the first round of testing to fix bugs as well as usability issues. Scores from the
311
second round of testing increased to 67.5, which falls just below the average of 68 with a
312
percentile rank of 50% (Figure 3). This indicates the usability of the site increased from
14
313
the first to second round of testing, but there is still much room for improvement on the
314
usability of the site. To ascribe an adjective to SCENIC, our participants collectively
315
found the site to be in the “OK” to “Good” range (Figure 4).
316
Figure 5 shows the overall SUS test scores from each participant for the two
317
rounds of testing. Scores in each round were comparable, with the scores in the second
318
round being slightly higher overall. There are no notable outliers in the data set that
319
would significantly affect the overall score for each test.
320
Figure 6 shows the adjusted scores from each question on the SUS assessment.
321
Questions 6, 9, and 10 stood out as showing considerable improvement. Question 6
322
focuses on consistency between pages. We removed quite a bit of clutter from the pages
323
between rounds one and two as well as performed some layout adjustments and improved
324
instructiveness of the text labels. We hypothesize these changes led to the increase in this
325
score. Questions 9 and 10 relate to the participant’s confidence in using the site. This
326
suggests the changes in labeling and improvements in help texts helped increase
327
participants’ confidence in using these pages.
328
A different group of participants tested the site in each round, and the site was
329
modified from the first to the second round. It is possible that removal of some usability
330
issues in the first round allowed users to access other challenges in the second. This, and
331
the characteristics of the individual users in each group, may help to explain why some
332
scores increased and some decreased on each question between rounds.
333
How people search for data
334
335
The results of the card ordering activity (summarized in Table 2) revealed that
people search for climate information in different ways, though with some consistency.
15
336
Sixty percent of participants rated “where,” the location of the data, as the first and most
337
important thing they search for when acquiring climate data. In our participant group,
338
73% ranked the source of the data as least important when accessing climate data. This
339
raises some concern as many in the climate services community feel, from their deep
340
familiarity with data systems, that data source is very important. One participant
341
commented, “I generally trust that the data I am getting is of quality. I may run [quality
342
control] on it myself anyway, so I am not really concerned about the source, just getting
343
the data.” Between these two extremes, responses were fairly spread across when, what,
344
and type in that order with only 1-2 votes determining rank. Several participants indicated
345
that their responses may vary depending on project and we asked them to focus on a
346
current or recent project. Many climate data provision agencies, such as WRCC, provide
347
data organized by network. Our results indicate that source (network) ranks as least
348
important and location as most important to our study participants. It follows that it
349
would be most useful to our target audience to allow data to be selected by region and
350
then by time period. These results are incorporated in SCENIC by offering the spatial
351
option first and foremost with “station finder” map tools. In part 1, task 2, participants
352
were asked to find the record high March temperature at Winnemucca airport. The
353
number of Winnemucca Airport entries in the station finder table; one entry for each of
354
several network memberships, puzzled the first user. We updated the site such that each
355
unique station name had a single entry and its networks were grouped together.
356
Subsequent users who utilized the station finder table were able to quickly locate the
357
Winnemucca Airport station.
358
Challenges in labeling
16
359
Comments from our test participants as well as prior experience providing climate
360
services via the Web at WRCC show that the labeling of links and items on pages
361
providing weather and climate data is one of the greatest challenges to usability.
362
Questions asked to explore labeling and terminology are given in Table 1, section 3. The
363
terms we explore include: modeled data, gridded data, tool, product, time series, anomaly
364
map, raw data, and data analysis.
365
We struggled with the decision of how to title links to gridded data products (data
366
generated by a model and put on a grid) provided through SCENIC. Possible terms
367
included “gridded data”, “modeled data”, or “gridded/modeled data”. Several participants
368
indicated the term “modeled” data was confusing to them and they weren’t sure what to
369
expect if they were to click on it. One of the most useful responses was, “modeled data is
370
not a very informative term; gridded gives me more useful information about the data.”
371
In light of these responses, we selected the term “gridded data” to use after the first round
372
of testing rather than “gridded/modeled”. All ten participants were able to complete the
373
gridded data question without confusion as to how to access the data, showing the term
374
“gridded data” is a useful indicator.
375
“Climate anomaly map” and “time series graph” are two commonly used terms to
376
describe graphics that depict climate. All participants readily agreed to “time series
377
graph” as an adequate term, and with some hesitation, were unanimous in agreement on
378
the use of “climate anomaly maps” as well. These terms were incorporated into SCENIC
379
where appropriate.
17
380
“Tool” and “product” are terms commonly used on sites providing weather and
381
climate information (at time of writing, Regional Climate Centers5, National Integrated
382
Drought Information System6, National Climate Data Center7, to name a few). All
383
participants were in agreement that a “tool” allows the user to perform some sort of
384
action or analysis using data, while a “product” is static. In essence, a “tool” creates a
385
“product”, though only a few participants drew this conclusion. In spite of the general
386
agreement on these terms, using “data tools” on SCENIC did not yield the desired results
387
leading us to look for a better phrase to guide people to tools that can be used to analyze
388
and summarize the data.
389
The question on “raw data” yielded a variety of answers. Several participants
390
viewed “raw data” as a list of data that they could download from a site to use in
391
analyses. They assumed it had had quality control (QC) applied; “raw” implied it was not
392
an average or summary of any sort. Other participants viewed “raw data” as what came
393
directly from the sensor (as is the standard terminology in climate services) that may have
394
lots of errors and other issues and would require clean up. We opted to use the term “data
395
lister” due to the difference in user responses and to be consistent with phrasing on other
396
WRCC pages.
397
The greatest challenge test participants experienced in the three tasks we posed to
398
them was efficiently completing web-based task 2, finding the highest temperature ever
399
recorded in March at the airport in Winnemucca, Nevada. All ten participants first went
400
to the “historic station data lister” and listed maximum temperature for the station’s
http://www.ncdc.noaa.gov/customer-support/partnerships/regional-climatecenters
6 http://www.drought.gov/drought/
7 http://www.ncdc.noaa.gov/
5
18
401
period of record. After listing daily data for the station’s period of record, they realized
402
that was not the right way to answer the question and began to search for other options,
403
eventually arriving at the data analysis tools. In the first round of testing, we intended
404
participants to go to the navigation tab, “Station Data Tools” where there were several
405
tools available that allow them to answer the question. As this labeling did not prompt
406
participants to click on it, we changed the navigation tab to “Data Analysis” for the
407
second round of testing. Unfortunately, participants were still not motivated to click on
408
this link to complete the task that required data analysis. We remain challenged to find
409
the best way to prompt people to utilize the variety of analyses we have provided.
410
Interestingly, half of the participants said that when they got to the point of listing the
411
period of record maximum temperature data, they would not have continued to look for
412
analysis tools. They would have pulled the data into analysis software (such as MATLAB
413
or Excel) to obtain the maximum March temperature. These participants said they
414
preferred to do things in this manner, as they may need the data later for other
415
applications. They stated that the analysis tools were “neat” and “good for quick
416
answers.” The result of this piece of the test raises two questions: Does our target
417
audience want analysis tools? If so, how do we advertise the tools and let it be known that
418
they are available?
419
420
421
Conclusions
Watching target users complete tasks on SCENIC provided us with valuable
422
information on how people in the target audience, researchers with the SW-CSC, use the
423
site and allowed us to fix a number of roadblocks to usability as well as programming
19
424
bugs. Results of SUS scores rose from 63 in the first round of testing to 67.5 in the
425
second round, indicating some level of improvement to the site. These scores fall in the
426
“average” range for a website (Figure 4), indicating there is still considerable progress to
427
be made. We found that while usability testing uncovers usability issues on a site, it is not
428
always clear how to modify the site to remove these problems. We were not able to
429
rectify all usability challenges in the two rounds of testing on SCENIC, though the testing
430
made us aware that they exist and allows us to further work to improve the site.
431
Performing this testing allowed us to interact with our target audience and ask
432
questions that helped us decide on the naming of certain elements of the site. Though the
433
sample size for these questions (n=10) is not large enough to be statistically significant, it
434
still provided us with useful insights to how our target audience perceives various terms
435
used frequently in climate data. The card sorting activity revealed that our participants
436
consistently rank location of data as the most important factor in searching for data and
437
source as the least important. This challenges climatologists with how to provide data in a
438
streamlined manner while still making sure the user is aware of any caveats to the data
439
(which the user may or may not be concerned with).
440
The challenges and lessons learned presented here are not unique to the climate
441
data presented on SCENIC. Any atmospheric or related data (satellite, air quality, stream
442
flow, etc.) that can be offered over the Internet can benefit from usability testing, though
443
the labeling and terminology challenges will likely be dataset-specific. In summation, we
444
suggest the following as best practices for creating web pages that provide climate data as
445
well as other atmospheric data:
20
446

447
448
site development process and repeat often

449
450
Work with representatives from the target audience rather than office mates or
research team members; this yields more meaningful results

451
452
Usability testing is extremely useful in building this type of site; test early in the
Consider the way in which your target audience looks for the atmospheric data
provided and design the site to meet these needs

Labels and naming of site elements can be extremely challenging and has a
453
significant effect on usability of sites providing atmospheric data; test terms early,
454
borrow from other agency’s sites for consistency, and respect historical
455
terminology where acceptable
456
457

Adhere to general usability guidelines as described in the General usability
guidelines from e-commerce, HCI section of this paper
458
459
460
Discussion
We are by no means usability professionals and speak to the readers as fellow
461
atmospheric scientists and programmers attempting to deliver atmospheric data and
462
information to a target audience. From our experience performing usability testing, we
463
highly recommend the process to any group providing atmospheric data online. The
464
knowledge and experience gained from this research will propagate into future work and
465
allow us to build better sites with the end user in mind. There is still much work to be
466
done in the field of creating usable websites for the provision of atmospheric data. Some
467
directions include:
21
468

469
470
Conduct a larger survey on how people in various audiences look for atmospheric
information and how they expect data and tools to be organized

471
Work to achieve greater consistency in the terminology used on websites
providing atmospheric information across agencies
472

Research how to develop effective “help” videos for atmospheric data websites
473

Share results of usability testing in the atmospheric sciences community
474
As more data are available online to an increasingly diverse audience, usability
475
becomes more and more essential in the success of a web site. We hope that the work
476
presented here will encourage others in the field of atmospheric science to consider the
477
fundamentals of usability when developing sites for accessing and exploring atmospheric
478
data.
479
As described by the National Research Council (2010), Overpeck, et al. (2011), and
480
Rood and Edwards (2014), the future of informatics will be to provide data users with the
481
information necessary to correctly interpret the data applicable to their particular
482
question. Before we can reach this step, it is essential that we can first provide basic data
483
and information via the Internet in a way that is easily utilized by the target audience.
484
Achieving this step will help us move forward to supporting user interpretation of data.
485
486
Acknowledgements
487
We would like to thank Kelly Redmond, Mark Pitchford, David Herring, and two
488
anonymous reviewers for their helpful feedback and comments. We would also like to
489
thank all of our test participants for making this study possible. This project was
490
supported by competitive award funds furnished by the Desert Research Institute
22
491
Division of Atmospheric Sciences under its Effective Designs to Generate Enhanced
492
Support (EDGES) program.
493
494
References
495
Bangor, A., P. Kortum, J. Miller, 2009: Determining what individual SUS scores mean:
496
Adding an adjective rating scale. Journal of usability studies, 4(3), 114-123.
497
498
499
Brooke, J., 1986. SUS-A quick and dirty usability scale. Usability evaluation in industry,
189, 194.
500
501
Brooke, J., 2013. SUS: a retrospective. Journal of Usability Studies, 8(2), 29-40.
502
503
Brugger, J., M. Crimmins, 2011. Weather, Climate, and Rural Arizona: Insights and
504
Assessment Strategies. A Technical Input to the U.S. National Climate
505
Assessment. U.S. Global Climate Research Program, Washington, D.C. 80 pp.
506
507
Dix, A., 2009. Human-computer interaction (pp. 1327-1331). Springer US.
508
509
510
Dumas, J. S., J. Redish, 1999. A practical guide to usability testing. Intellect Books.
404 pp.
511
512
Faulkner, L. 2003. Beyond the five-user assumption: Benefits of increased sample sizes
23
513
in usability testing. Behavioral Research Methods, Instruments, and Computers,
514
35(3), 379-383.
515
516
517
Krug, S., 2005. Don’t Make Me Think: A Practical Guide to Web Usability. New Riders
Publishing, 195 pp.
518
519
520
Krug, S., 2009. Rocket surgery made easy: The do-it-yourself guide to finding and fixing
usability problems. New Riders Publishing, 168 pp.
521
522
523
National Research Council, 2010. Informing an Effective Response to Climate Change.
Washington, DC: The National Academies Press.
524
525
Nielsen, J., 1993. Usability Engineering. Boston: AP Professional.
526
527
528
Nielsen, J., 1997. Loyalty on the Web. Alertbox Newsletter. [Available online at
http://www.nngroup.com/articles/loyalty-on-the-web/ ]
529
530
Nielsen, J., 2000a. Designing Web Usability. New Riders Publishing, 419 pp.
531
532
Nielsen, J., 2000b. Why you only need to test with 5 users. Alertbox Newsletter.
533
[Available
534
test-with-5-users/]
online at http://www.nngroup.com/articles/why-you-only-need-to-
535
24
536
Nielsen, J., 2003. Recruiting test participants for usability studies. Alertbox Newsletter.
537
[Available
538
participants-for-usability-studies/]
online
at
http://www.nngroup.com/articles/recruiting-test-
539
540
541
Nielsen, J., 2004. Card sorting: How many users to test. Alertbox Newsletter. [Available
online at http://www.nngroup.com/articles/card-sorting-how-many-users-to-test/]
542
543
Nielsen, J., 2011. How long do users stay on web pages?. Alertbox Newsletter. [Available
544
online at http://www.nngroup.com/articles/how-long-do-users-stay-on-webpages/]
545
546
547
Overpeck, J. T., G. A. Meehl, S. Bony, D. R. Easterling, 2011. Climate data challenges in
the 21st century. Science, 331, 700-702.
548
549
550
Preece, J., Y. Rogers, H. Sharp, D. Benyon, S. Holland, & T. Carey, 1994. Human
computer interaction. Addison-Wesley Longman Ltd.
551
552
Rood, R., P. Edwards, 2014. Climate infomatics: Human experts and the end-to-end
553
system.
554
http://www.earthzine.org/2014/05/22/climate-informatics-human-experts-and-the-
555
end-to-end-system/]
Earthzine.
[Available
online
at
556
557
558
Sauro, J., 2011. A practical guide to the system usability scale: Background, benchmarks
& best practices. Measuring Usability LLC, 162 pp.
559
25
560
561
Spillers, F., 2009. Usability testing tips. Usability testing central. [Available online at
http://www.usabilitytestingcentral.com/usability_testing_tips/]
562
563
564
Tullis, T., L. Wood, 2004. How many users are enough for a card-sorting study.
Proceedings Usability Professionals’ Association, Vol. 2004.
565
566
Usability.gov, accessed 2014: What and why of usability; user research basics. U.S.
567
Department of Health and Human Services. [Available online at
568
http://www.usability.gov/]
569
570
Virzi, R. A., 1992. Refining the test phase of usability evaluation: How many subjects is
571
enough? Human Factors, 34, 457-468.
572
573
574
575
576
Tables
Part 1: Web-based tasks performed on SCENIC
1. List data for all stations in Shasta County, California that recorded snowfall and
precipitation for all dates from December 15-December 31 2013.
2. Find the highest temperature ever recorded in March at Winnemucca Airport,
Nevada.*
3. Find the lowest minimum temperature among grid points approximately covering
the area of Pyramid Lake in December 2013. Use the NRCC interpolated dataset.
Part 2: Standardized Usability Test (SUS)
Rated on scale of 1 (disagree strongly) to 5 (agree strongly)
1. I think I would use SCENIC frequently.
2. I found these web pages unnecessarily complex.
3. I thought SCENIC was easy to use.
26
4. I think I would need the support of a technical person to use SCENIC.
5. I found the various functions of the product well integrated.
6. I thought there was too much inconsistency in the pages.
7. I imagine that most people would learn to use these pages very quickly.
8. I found SCENIC very cumbersome to use.
9. I felt very confident using these pages.
10. I would need to learn a lot of things before I could get going with this product.
Part 3: Selections from general questions on using climate data **
1. What are gridded data? In there a difference between gridded and modeled data
as it applies to climate data?
2. What is the difference between a “tool” and a “product” when it comes to weather
and climate data?
3. We call these images climate anomaly maps (see Figure 2a). Would you expect to
find these on a link labeled “climate anomaly maps”?
4. We call these graphs time series graphs (see Figure 2b). Would you expect to find
these if you clicked on a link labeled “time series graphs’?
5. What are “raw data”?
6. Sort the five cards (labeled as follows) in order from least important to most
important to you when searching for climate data. Think of your most recent project
involving climate data if necessary.
Where: location of data (particular county, watershed, state, climate division)
When: date range available (days, weeks, months, years, record)
What: climate element (temperature, precipitation, snowfall)
Type: station data or gridded data
Source: originator of data (NWS, NRCS, NIFC, etc)
7. Anything you find confusing or would change on these pages? Likes/dislikes?
*In second round of testing, a bug in the auto fill function led us to use September
and Elko Regional Airport, Nevada.
** Responses were collected from 10 participants, except in part 3, question 6,
where participants from practice testing were incorporated to reach necessary
sample size of 15 for usability card sorting activities
577
578
Table 1: Questions used in the three portions of the SCENIC usability test. In part 3,
579
only questions whose results are discussed in this paper are shown for brevity.
580
581
582
583
27
Card label
WHERE- location
WHEN- dates available
WHAT- variables
TYPE- gridded, station
SOURCE- originator
Results
Most Important
1
2
60% (9) 25% (4)
0% (0)
31% (5)
20% (3)
25% (4)
20% (3)
13% (2)
0% (0)
6% (1)
WHERE
WHEN
3
13% (2)
40% (6)
33% (5)
13% (2)
0% (0)
WHAT
Least important
4
5
7% (1)
0% (0)
20% (3) 7% (1)
13% (2) 7% (1)
40% (6) 13% (2)
20% (3) 73% (11)
TYPE
SOURCE
584
585
Table 2: Results of card sorting activity. Fifteen participants were asked to perform
586
the activity with one abstaining for a total n=14. Two participants assigned two
587
cards equal weight, thus a single card may be counted in two categories for an
588
individual participant. Values given are percent of total cards and number of cards
589
in each rank and category. Results row gives the final ranking of each card from
590
most important on left, “where,” to least important on right, “source”. The highest
591
ranking value in column 3, “when,” was already the highest ranking in column 2.
592
Thus, the second highest value in the column, ”what”, is given as the highest ranking
593
for column 3.
594
595
Figure Captions
596
Figure 1: Home page of SCENIC, the website assessed in this study.
597
Figure 2: Examples of figures shown to participants in Part 3, Questions 3 and 4 in
598
SCENIC usability test.
599
600
28
601
Figure 3: Percentile ranks associated with SUS scores and “letter grades” for different
602
areas along the scale following the standard United States A-F grading scale. Scores from
603
each round of testing are displayed as vertical lines on the graph. Figure adapted from
604
Sauro (2011).
605
Figure 4: Adjectives describing a site associated with various SUS scores. Mean SUS
606
score ratings and error bars +/- one standard error of the mean. Source: Bangor et al.
607
2009.
608
609
Figure 5: Normalized SUS scores by participant for the first and second rounds of testing
610
on SCENIC. Note that unique participants are used in each round.
611
612
Figure 6: Scores from round 1 and 2 of testing by question. Scores for each question are
613
out of a total of 20 points. Higher scores imply more favorable responses about the site.
614
Questions refer to those in Table 1, Section 2.
615
616
617
618
619
620
621
622
623
29
624
Figures and Captions
625
626
Figure 1: Home page of SCENIC, the website assessed in this study.
627
628
Figure 2: Examples of figures shown to participants in Part 3, Questions 3 and 4 in
629
SCENIC usability test.
630
30
631
632
Figure 3: Percentile ranks associated with SUS scores and “letter grades” for different
633
areas along the scale following the standard United States A-F grading scale. Scores from
634
each round of testing are displayed as vertical lines on the graph. Figure adapted from
635
Sauro (2011).
636
31
637
638
Figure 4: Adjectives describing a site associated with various SUS scores. Mean SUS
639
score ratings and error bars +/- one standard error of the mean. Source: Bangor et al.
640
2009.
32
641
642
Figure 5: Normalized SUS scores by participant for the first and second rounds of testing
643
on SCENIC. Note that unique participants are used in each round.
33
644
645
Figure 6: Scores from round 1 and 2 of testing by question. Scores for each question are
646
out of a total of 20 points. Higher scores imply more favorable responses about the site.
647
Questions refer to those in Table 1, Section 2.
648
649
650
34
Download