DAISY – Universally Designed? Prototyping an Approach to Measuring Universal Design

advertisement
DAISY – Universally Designed? Prototyping an
Approach to Measuring Universal Design
Miriam Eileen Nes1, Kirsten Ribu2, and Morten Tollefsen1
1
MediaLT, Jerikoveien. 22,
1067 Oslo, Norway
{miriam,morten}@medialt.no
2
Oslo University College, St. Olavs plass. 4,
0130 Oslo, Norway
Kirsten.ribu@iu.hio.no
Abstract. The DAISY system is currently used as the alternative reading format for print-disabled students in Norway. DAISY is denoted by many as
universally designed. This is an important claim, ensuring suited learning opportunities for all students. Thus, to be able to determine this aspect of DAISY is
important – as is the case for many information systems. However, methods for
evaluating whether a software product is universally designed are lacking. This
text builds on previous work investigating the use of DAISY in Norwegian primary- and secondary education, now looking into strategies to evaluate whether
DAISY is universally designed. We argue that the term universally designed
needs to be more strictly defined in order to become applicable to systems development. Further, we propose two related methods that measure to what degree
DAISY is universally designed, using feature analysis methodology.
Keywords: DAISY, universal design, evaluation, feature analysis.
1 Introduction
Universal design is being promoted in information technology, and has recently been
defined as a criterion in Norway when choosing public information systems. The
purpose is to design products for the broadest possible range of users [1] – designing
“for all”. By doing this, one aims to include the often excluded user groups of information systems – thus bridging digital gaps. But without concrete and measurable
interpretations of what it means for information systems to be universally designed,
one cannot be sure that these intentions will lead to a more inclusive design.
The focus of this paper is on methodology for evaluating whether a software or information system is universally designed. Although some early attempts have been
made to set up test scales for determining whether artefacts are universally designed
[2], there does not yet exist an established strategy for determining this for software or
information systems. This paper suggests definitions and methodology to make the
fuzzy term “universal design” measurable.
K. Miesenberger et al. (Eds.): ICCHP 2008, LNCS 5105, pp. 268–275, 2008.
© Springer-Verlag Berlin Heidelberg 2008
DAISY – Universally Designed? Prototyping an Approach
269
1.1 DAISY: Digital Accessible Information SYstem
The DAISY system is established in Norwegian schools as the alternative reading
format to print. The DAISY system consists of three interacting system “components”; a DAISY standard, a DAISY CD book and a playback system [3]. When the
“DAISY system” is referred to in this study, it should be noted that we are in fact
referring to the collected experienced functionality.
DAISY is being strongly promoted by several communities as a universally designed alternative to print [4] [5]. However, no studies evaluating the system are published to back up the claims of universal suitability.
2 Defining “Universal Design”
To be able to measure and determine whether DAISY is universally designed, a suggested definition of the term “universally designed” is first outlined. From a software
engineering perspective, there is a need to arrive at a clear understanding of what
universal design of software and information systems means in practice [6] [7], similar to WCAG (1.0) in relation to accessibility and Web interfaces. We propose linking
the general NCSU definition
“Universal design is design of products and environments to be usable for all people, to the greatest extent possible, without the need for adaptation or specialized
design” (Ron Mace)
to two main areas, features and users, creating a tailored definition specifying that:
1) It is not always beneficial, or possible, to adapt all functionality to all possible
users. Focus could be on making the core features of a software/system usable.
2) Two limitations can be added to the usability definition “for all”. First, there is a
need to categorize users into defined user groups, where common needs of the different groups are identified. Second, not all users need the use of all systems – and often
developers target certain markets and users. This could still be allowed.
2.1 Limitation 1: Focus on Core Features
Designing solutions to fit the needs of all user groups is not trivial. Software/systems
are usually designed for the stereotypical user, whereas universal design principles
focus on diverse user groups. It is not always beneficial, or possible, to adapt all functionality to all possible users. This paper proposes that universal design within software/system development should be defined to refer to the core functionality, and not
necessarily to all extensions. This means a universally designed website may have
functionality designed for all users along with elements targeted to specific
groups.Such a definition of universal design is perceived as loose enough to increase
the desire of developers to want to design their systems to be “usable for all”. It also
fits well with known legal regulations on non-discriminating design, putting pressure
on the developer to attempt a universal design. The consequence is that for example
an e-mail client could be denoted universally designed if features for reading and
sending mail were universal, even if not all could use the address book.
270
M.E. Nes, K. Ribu, and M. Tollefsen
2.2 Limitation 2a: Categorize Users and Determine User Group Needs
The universal design strategy is to develop technological products from a distinct
perspective, namely one of respecting and valuing the diversity in human capabilities,
technological environments and contexts of use [1]. In most cases, however, one cannot possibly design systems to fit a range of individuals. There has to be some kind of
categorization of users, where common needs of a user group are identified – establishing the group requirements to be considered in the design. Examples of categorized user groups are ‘blind users’ and ‘elderly users’.
With this approach, possibilities for system adaptability would be considered beneficial, complementing a generalized solution, contradicting the harsh “without-needfor-adaptation” policy of the original definition. Consequences would be:
1. Opting for multi-modality and device-independence, where add-on’s and adaptations may complement the design
2. Opting for dialogue-independence and flexible user interfaces, where for example
interaction style and layout can be altered to suit individual needs.
Options for personalization are often considered as positive by end-users. Be aware
that if adaptations are vital for use, they should be considered core functionality.
2.3 Limitation 2b: Allow Targeting of a Specific Audience
Not all users are in need of using all systems – and often developers are targeting a
certain market and user type. This should be allowed. One could therefore also link
universal design to a target audience, and specify which user groups the software is
designed for. For example, graphical designer software could be targeted to visual
user groups only, and still be universally designed (fitting all visual user groups).
3 Measuring “Universal Design” in Information Technology
The outlined definition argues for the need to specify user group requirements and
focus on core functionality (what should be usable, for whom and to what extent). To
further operationalize in order to be measured, we suggest applying feature analysis
methodology, which approach fits well with the outlined definition: “Feature analysis
is intended to help you decide whether or not a specific method/tool meets the requirements of its potential users” [9].Measuring universal design is thus based on the
notion that if the system/software doesn’t “fit” a target user group, the system/software is not universally designed. From the proposed definition, being “usable” translates to not being able to use the core features of the system/software in a
satisfying matter. In feature analysis, a system’s “fit” or usability is evaluated by
looking at how well implemented, with regard to perceived user needs, a deducted set
of features are. In short, what should “usable” mean for the defined target user
groups? Feature analysis thus encourages not only considering accessibility of functionality when measuring an information system, but also the actual usefulness –
which is often overlooked. Universal design should aim for more than universal accessibility – it should also include universal usability.
In feature analysis the “fit” is measured by deriving feature lists, identifying any
groups likely to have different requirements, refining the lists and judging the
DAISY – Universally Designed? Prototyping an Approach
271
importance of features in relation to each user group, determining how the features
should be assessed and identifying an acceptable threshold/score for each feature.
A feature list specifies user requirements and core features of the system for one
categorized user group. This force the developer to clarify which universally designed
features will be offered – and what is considered base functionality. A feature list may
be hierarchical; decomposing a feature, such as “Robustness”, into more specific and
measurable requirements. The level of detail is up to the evaluator. User involvement
is of particular help when identifying user needs – and feature analysis methodology
urges it – thus promoting involvement from disabled user groups.
In addition to deciding the attributes that should be present, one must also consider
the degree to which they should be present. The feature evaluation is done by examining each feature against its assessment scale. A feature may be simple – present or not
present – or compound – its quality judged on an ordinal scale. In the assessment
scale, each level of support is both textually described and has a related score. Each
feature receives a score according to the textually specified level of support the
evaluator feels is most fitting. Thus, the evaluation is quantified.
Features at the same level may be grouped into feature sets. For each feature, feature set and as a total, one decides on a threshold for acceptance: what score/level of
support is acceptable. Acceptance criteria may be refined to take into consideration
judged feature importance. A four level scale is common, ranging from Mandatory
through Highly Desirable and Desirable to Nice-to-have [9]. A system may for example be defined as “usable” if all Mandatory features reach a certain level of support.
Acceptance criteria and assessment scales are specified prior to analysis.
The authors recognize that despite the quantification and operationalization of
“universal design”, the measurement of universal usability outlined in this paper still
depends on many subjective decisions. However, the approach provides a transparent
evaluation.. One has to explicitly specify what is meant by a product being “universally designed”, and is able to do sousing the proposed definition. The prototyped
strategies outlined below both drew on and compared results to previous research and
existing knowledge, to ensure some form of control of the subjective elements of the
evaluations. The authors suggest using similar strategies and triangulation for evaluations with high degrees of subjective “best-guessing.
4 Evaluating DAISY
The suggested definition of universal design defined the focus and scope of investigating DAISY. Two different strategies for measuring universal design using feature
analysis were attempted: feature analysis survey and feature analysis expert evaluation. The first attempted a measurement of the fit of “the DAISY system”, and used
end-users to evaluate usefulness – incorporating practice and human aspects. The
second looked into DAISY software playback systems in particular, thus leaning
more towards measuring software than an information system in its widest term.
4.1 Feature Analysis Survey
The feature analysis survey was applied in order to assess the overall usefulness of
DAISY. Kitchenham [9] describes using feature analysis surveys for comparing
272
M.E. Nes, K. Ribu, and M. Tollefsen
different systems when a system has been used in an organization for a while [10]
[11]. The DAISY system had been used as a tool for reading in the student sample for
at least nine months [12]. Attempting to measure the usability of only one system was
not yet described. That approach is prototyped here, within one specific user group:
students with dyslexia or general reading and writing difficulties, but may be replicated for each relevant user category, each with their tailored feature list and assessments. Using the approach, one is able to say whether the system is universally
designed (the system fit all relevant user groups) or if users are marginalized.
Two questionnaires were distributed to almost 600 schools – one targeted towards
students, and another towards their teachers [12] [13]. About 10% replied: 130 students and 67 teachers. A hierarchical feature list was derived based on interviews and
expert input [14]. Student users were asked which of the features they used and on
perceived usefulness and ease for each. Based on the frequencies of use, importance
and corresponding score was assigned to each feature1. The assumption was made that
features frequently used within a user group should be considered core features within
this group. Thus, only limited pre-knowledge of core features for the users was necessary. No features were identified as being negative. Respondents scored each feature
in relation to ease/usefulness on ordinal scales. Negative user assessments gave negative scores, and positive assessments corresponding positive scores, no answer gave
the score 0 and thus did not influence the evaluation.
Points were added for each feature and feature set. Importance score and user assessment scores were multiplied for each feature and feature set, creating total scores.
Thus, the assessment scale, specifying the points to achieve to be defined as acceptable in terms of usability, took into consideration the importance of the feature/feature
set. The thresholds were calculated based on the percentage of possible maximum and
minimum score for each importance level. Finally, importance-weighted scores were
added and whether DAISY was considered fitting for the user group depended on the
percentage of total points received in relation to the maximum possible. In the end,
DAISY received a total score corresponding to 73% of the total range while the
threshold was 75%, i.e. not universally designed.
The method proved successful; showing that a feature analysis survey may be used
to measure a system’s usability and usefulness for a specific user group. Results on
“fit” coincided with apprehensions and suspicions in the Norwegian DAISY environment and indications from previous qualitative studies. Separate items in the
questionnaires measured emotional satisfaction (aspects) and checked for feature
completeness). Being able to include additional items (e.g. checking for survey validity) is a major plus, giving the opportunity to verify the evaluation against previous
research. Taking these into consideration, the overall assessment of DAISY was that
it is positive for the user group, and should be viewed as beneficial.
Since acceptance scores are floating with regards to how many use them (importance), this means pre-knowledge of spread in feature use (whether the sample used
all features or few, i.e. frequency of use for each feature) is not necessary, since it
does not influence final feature/system acceptance. Using a feature analysis survey as
prototyped here, the functionality found to be the most important is the one that contributes the most to whether the system was evaluated as universally designed or not.
1
Used by: <25% = score 1 ‘Nice to have’, 25-50% = score 4 ‘Desirable’, 20-75% = score 6
‘Highly desirable’ and >75% = score 10 and defined as ‘Mandatory’.
DAISY – Universally Designed? Prototyping an Approach
273
The assessments of the features with the highest importance – those that are frequently used – have a higher weight and thus influence the total DAISY usefulness
percentage of possible maximum score more than those with low importance weight.
In addition to overall results, one can also extract information about which features
or feature sets contributed most to the system being, or not being, suited for a certain
user group. The quantitative data that emerge make it easy to compare features and
feature sets both internally within a user group and between user categories, as well as
the total scores. When looking at these low-level data, one may fairly easily deduct
why a specific feature received a low score. Depending on the items in the questionnaire, and the detail of the evaluation form, one may be able to pin-point exactly why
a feature, feature set or system did or did not reach its anticipated threshold for acceptance. The authors found the quantitative low-level feature information to be very
valuable for improving the system in question, providing information on usefulness
and suitability for a specific user group.
4.2 Feature Analysis Expert Evaluation
The survey strongly indicated that desired improvements of DAISY are linked to the
improvement of the playback software interfaces and their usability. The three most
common playback software according to the survey and the Norwegian DAISY environment, both freeware and proprietary, were chosen for feature analysis expert
evaluations. Merging feature analysis into a traditional system expert evaluation was
not defined in feature analysis literature [10]. Since feature analysis strategies aim at
supporting the planning and execution of unbiased, dependable evaluations [8], the
expert evaluation is strengthened by applying feature analysis methodology.
In contrast to the survey where limited previous knowledge was needed, this
method requires deeper knowledge of the system or software to be evaluated – since
no users are asked, but instead the expert evaluates on their behalf. In particular,
knowledge of user groups, their functional and interface requirements is needed, in
addition to which features should be regarded as core features for each user group.
Again, replicating the evaluation for all relevant user categories would provide the
answer to whether the software is universally designed, however this prototype focus
on students with reading and writing difficulties and dyslexia. Knowledge gathered
from the survey provided the basis for this expert evaluation. The goal was to uncover
strengths and weaknesses in the software, by measuring how well the software fit the
target user group. The feature list was extended compared to the survey, as features
not suited in survey questions were included. Three categories of features were defined, and within each of the categories, general feature sets were formulated, such as
‘User Interface’ within ‘Usability’ [3] [16] [17] [18] [19]. Feature sets consisted of
more specific and testable level 3 features. Level 3 features were assessed as either
‘Support’ or ‘Important’. Feature sets were assigned importance on a 4 level scale
from ‘Mandatory’ to ‘Nice to have’. Importance points were not used. Four levels of
support were defined for ordinal level features, and corresponding scores from 1 to 4,
4 being the top score, were given. For nominal features, a 4 was given for present, and
1 for not-present. The acceptance of a feature set was defined by the acceptance of its
level 3 features. These were explicitly formulated prior to evaluation [12], to ensure
conformance. Acceptance conformity between the two high importance ranks, as well
274
M.E. Nes, K. Ribu, and M. Tollefsen
as the two lower, made the subjectivity involved in deciding on the precise importance of a requirement less influential. Some form of external control of the evaluation was attempted, as for the survey, by separately conducting usability tests and
comparing results to previous experience in the DAISY community [12] [15].
Total scores were calculated by adding the feature points given. The overall criteria
for acceptance were that all Mandatory feature sets were acceptably implemented.
One tool adhered to this criterion; however it was not the one with the highest total
score, and very unstable. Quantitative low-level information was utilized for software
comparison and improvement suggestions. Improving software is viewed as crucial
for making DAISY truly fit this user group, and achieving universal design.
5 Conclusion
There is a need for a practical definition of universal design in software and information systems. This paper hopes to inspires to a more careful use of the term, and provides input to a suitable definition and measurement. A suggested definition of the
term is proposed, linking the general NCSU definition to two main areas, features and
users. The main elements of this definition are: 1) It is not always beneficial, or possible, to adapt all software functionality to all
possible users. Focus could be on the core features of a system.
2) Two limitations can be added to the usability definition “for all”. First, there is a
need to categorize users into defined user groups, where common needs of the different groups are identified. Second, not all users need the use of all systems – and often
developers target certain markets and users. This could still be allowed.
Implications, gains and consequences using the proposed limitations and definition
of “universal design” were discussed, before specifying how this definition may be
utilized in evaluating DAISY. We show how the proposed definition may be converted into measurable terms, by proposing strategies for defining main user groups,
core functionality and acceptable implementation of attributes through applying feature analysis methodology. Two different methods for evaluating DAISY, utilizing
these measurable terms, are outlined. Both measure usability by looking at the implementation of functionality, reflecting that important features should be of high
quality. The first is a feature analysis survey. It successfully demonstrates the possibility to measure usefulness in an isolated system, instead of using feature analysis
survey as a means to compare systems, within a user group. How this strategy may be
used to conduct evaluations of universal design in a single system or software, with
limited previous knowledge of the system/software, is explained. Next, feature analysis methodology is successfully applied to a software expert evaluation, showing how
feature analysis can be extended to this area to contribute to more detailed usefulness
evaluations of software. The information gathered from the evaluations gave new
insight into “fit” of the software for the user group in question, and is currently included in the software advices given to DAISY users, as well as communicated to the
software developers.
The analyses explicitly formulate criteria for the evaluations, clearly state specifications of acceptance and pinpoint criteria that are not fulfilled. They demonstrate
how to evaluate appropriateness for a specific user group using the proposed
DAISY – Universally Designed? Prototyping an Approach
275
definition of universal design, and may easily be extended to all relevant user categories, as the definition specifies, for a full universal design evaluation.
References
1. Stephanidis, C., Akoumianakis, D.: Universal design: Towards universal access in the Information Society. In: CHI 2001 Workshop, pp. 499–500 (2001)
2. Beecher, V., Paquet, V.: Survey instrument for the universal design of costumer products.
Applied Ergonomics 36(3), 363–372 (2006)
3. DAISY Consortium, http://www.daisy.org
4. Kawamura, H.: DAISY: a better way to read, a better way to publish – a contribution of libraries serving persons with print disabilities. World Library and Information Congress:
IFLA, Seoul (August 20-24, 2006)
5. Kerscher, G.: DAISY is. DAISY Consortium (2003)
6. Vanderheiden, M.: Fundamental Principles and Priority Setting for Universal Usability. In:
CUU 2000, Arlington, pp. 24–32. ACM, New York (2000)
7. Masuwa-Morgan, K.R., Burrell, P.: Justification of the need for an ontology for accessibility requirements (Theoretic framework). Interacting with Computers 16, 523–555 (2004)
8. Mace, R.: The Center for Universal Design. North Carolina State University,
http://www.design.ncsu.edu/
9. Kitchenham, B.A.: Evaluating software engineering methods and tool, part 6: Identifying
and scoring features. Software Engineering Notes 22(2), 16–18 (1997)
10. Kitchenham, B.A.: Evaluating software engineering methods and tool, part 1: The evaluation context and evaluation methods. Software Engineering Notes 21(1), 11–15 (1996)
11. Kitchenham, B.A.: Evaluating software engineering methods and tool, part 3: Selecting an
appropriate evaluation method - practical issues. Software Engineering Notes 21(4), 9–12
(1996)
12. Nes, M.: Appraising and Evaluating the Use of DAISY – For Print Disabled Students in
Primary and Secondary Education. Master thesis, UiO, Oslo (2007)
13. Nes, M., Ribu, K.: Appraising and Evaluating the Use of DAISY: A Study of a Reading
Aid System. NOKOBIT, Tapir, Trondheim, pp. 263–278 (2007)
14. Tollefsen, M., Nes, M.: En sekretær ville løst alle problemer! MediaLT (2006)
15. Huseby Resource Centre, http://www.skolelydbok.no/Avspilling.html
16. NISO: Specifications for the digital talking book. NISO Press (2002)
17. NISO: Specifications for the digital talking book. NISO Press (2005)
18. NISO Working Papers: Digital talking book standards committee – document navigation
features list. NISO Press (2007)
19. Sourcefourge.net, http://amis.sourcefourge.net/
Download