DAISY – Universally Designed? Prototyping an Approach to Measuring Universal Design Miriam Eileen Nes1, Kirsten Ribu2, and Morten Tollefsen1 1 MediaLT, Jerikoveien. 22, 1067 Oslo, Norway {miriam,morten}@medialt.no 2 Oslo University College, St. Olavs plass. 4, 0130 Oslo, Norway Kirsten.ribu@iu.hio.no Abstract. The DAISY system is currently used as the alternative reading format for print-disabled students in Norway. DAISY is denoted by many as universally designed. This is an important claim, ensuring suited learning opportunities for all students. Thus, to be able to determine this aspect of DAISY is important – as is the case for many information systems. However, methods for evaluating whether a software product is universally designed are lacking. This text builds on previous work investigating the use of DAISY in Norwegian primary- and secondary education, now looking into strategies to evaluate whether DAISY is universally designed. We argue that the term universally designed needs to be more strictly defined in order to become applicable to systems development. Further, we propose two related methods that measure to what degree DAISY is universally designed, using feature analysis methodology. Keywords: DAISY, universal design, evaluation, feature analysis. 1 Introduction Universal design is being promoted in information technology, and has recently been defined as a criterion in Norway when choosing public information systems. The purpose is to design products for the broadest possible range of users [1] – designing “for all”. By doing this, one aims to include the often excluded user groups of information systems – thus bridging digital gaps. But without concrete and measurable interpretations of what it means for information systems to be universally designed, one cannot be sure that these intentions will lead to a more inclusive design. The focus of this paper is on methodology for evaluating whether a software or information system is universally designed. Although some early attempts have been made to set up test scales for determining whether artefacts are universally designed [2], there does not yet exist an established strategy for determining this for software or information systems. This paper suggests definitions and methodology to make the fuzzy term “universal design” measurable. K. Miesenberger et al. (Eds.): ICCHP 2008, LNCS 5105, pp. 268–275, 2008. © Springer-Verlag Berlin Heidelberg 2008 DAISY – Universally Designed? Prototyping an Approach 269 1.1 DAISY: Digital Accessible Information SYstem The DAISY system is established in Norwegian schools as the alternative reading format to print. The DAISY system consists of three interacting system “components”; a DAISY standard, a DAISY CD book and a playback system [3]. When the “DAISY system” is referred to in this study, it should be noted that we are in fact referring to the collected experienced functionality. DAISY is being strongly promoted by several communities as a universally designed alternative to print [4] [5]. However, no studies evaluating the system are published to back up the claims of universal suitability. 2 Defining “Universal Design” To be able to measure and determine whether DAISY is universally designed, a suggested definition of the term “universally designed” is first outlined. From a software engineering perspective, there is a need to arrive at a clear understanding of what universal design of software and information systems means in practice [6] [7], similar to WCAG (1.0) in relation to accessibility and Web interfaces. We propose linking the general NCSU definition “Universal design is design of products and environments to be usable for all people, to the greatest extent possible, without the need for adaptation or specialized design” (Ron Mace) to two main areas, features and users, creating a tailored definition specifying that: 1) It is not always beneficial, or possible, to adapt all functionality to all possible users. Focus could be on making the core features of a software/system usable. 2) Two limitations can be added to the usability definition “for all”. First, there is a need to categorize users into defined user groups, where common needs of the different groups are identified. Second, not all users need the use of all systems – and often developers target certain markets and users. This could still be allowed. 2.1 Limitation 1: Focus on Core Features Designing solutions to fit the needs of all user groups is not trivial. Software/systems are usually designed for the stereotypical user, whereas universal design principles focus on diverse user groups. It is not always beneficial, or possible, to adapt all functionality to all possible users. This paper proposes that universal design within software/system development should be defined to refer to the core functionality, and not necessarily to all extensions. This means a universally designed website may have functionality designed for all users along with elements targeted to specific groups.Such a definition of universal design is perceived as loose enough to increase the desire of developers to want to design their systems to be “usable for all”. It also fits well with known legal regulations on non-discriminating design, putting pressure on the developer to attempt a universal design. The consequence is that for example an e-mail client could be denoted universally designed if features for reading and sending mail were universal, even if not all could use the address book. 270 M.E. Nes, K. Ribu, and M. Tollefsen 2.2 Limitation 2a: Categorize Users and Determine User Group Needs The universal design strategy is to develop technological products from a distinct perspective, namely one of respecting and valuing the diversity in human capabilities, technological environments and contexts of use [1]. In most cases, however, one cannot possibly design systems to fit a range of individuals. There has to be some kind of categorization of users, where common needs of a user group are identified – establishing the group requirements to be considered in the design. Examples of categorized user groups are ‘blind users’ and ‘elderly users’. With this approach, possibilities for system adaptability would be considered beneficial, complementing a generalized solution, contradicting the harsh “without-needfor-adaptation” policy of the original definition. Consequences would be: 1. Opting for multi-modality and device-independence, where add-on’s and adaptations may complement the design 2. Opting for dialogue-independence and flexible user interfaces, where for example interaction style and layout can be altered to suit individual needs. Options for personalization are often considered as positive by end-users. Be aware that if adaptations are vital for use, they should be considered core functionality. 2.3 Limitation 2b: Allow Targeting of a Specific Audience Not all users are in need of using all systems – and often developers are targeting a certain market and user type. This should be allowed. One could therefore also link universal design to a target audience, and specify which user groups the software is designed for. For example, graphical designer software could be targeted to visual user groups only, and still be universally designed (fitting all visual user groups). 3 Measuring “Universal Design” in Information Technology The outlined definition argues for the need to specify user group requirements and focus on core functionality (what should be usable, for whom and to what extent). To further operationalize in order to be measured, we suggest applying feature analysis methodology, which approach fits well with the outlined definition: “Feature analysis is intended to help you decide whether or not a specific method/tool meets the requirements of its potential users” [9].Measuring universal design is thus based on the notion that if the system/software doesn’t “fit” a target user group, the system/software is not universally designed. From the proposed definition, being “usable” translates to not being able to use the core features of the system/software in a satisfying matter. In feature analysis, a system’s “fit” or usability is evaluated by looking at how well implemented, with regard to perceived user needs, a deducted set of features are. In short, what should “usable” mean for the defined target user groups? Feature analysis thus encourages not only considering accessibility of functionality when measuring an information system, but also the actual usefulness – which is often overlooked. Universal design should aim for more than universal accessibility – it should also include universal usability. In feature analysis the “fit” is measured by deriving feature lists, identifying any groups likely to have different requirements, refining the lists and judging the DAISY – Universally Designed? Prototyping an Approach 271 importance of features in relation to each user group, determining how the features should be assessed and identifying an acceptable threshold/score for each feature. A feature list specifies user requirements and core features of the system for one categorized user group. This force the developer to clarify which universally designed features will be offered – and what is considered base functionality. A feature list may be hierarchical; decomposing a feature, such as “Robustness”, into more specific and measurable requirements. The level of detail is up to the evaluator. User involvement is of particular help when identifying user needs – and feature analysis methodology urges it – thus promoting involvement from disabled user groups. In addition to deciding the attributes that should be present, one must also consider the degree to which they should be present. The feature evaluation is done by examining each feature against its assessment scale. A feature may be simple – present or not present – or compound – its quality judged on an ordinal scale. In the assessment scale, each level of support is both textually described and has a related score. Each feature receives a score according to the textually specified level of support the evaluator feels is most fitting. Thus, the evaluation is quantified. Features at the same level may be grouped into feature sets. For each feature, feature set and as a total, one decides on a threshold for acceptance: what score/level of support is acceptable. Acceptance criteria may be refined to take into consideration judged feature importance. A four level scale is common, ranging from Mandatory through Highly Desirable and Desirable to Nice-to-have [9]. A system may for example be defined as “usable” if all Mandatory features reach a certain level of support. Acceptance criteria and assessment scales are specified prior to analysis. The authors recognize that despite the quantification and operationalization of “universal design”, the measurement of universal usability outlined in this paper still depends on many subjective decisions. However, the approach provides a transparent evaluation.. One has to explicitly specify what is meant by a product being “universally designed”, and is able to do sousing the proposed definition. The prototyped strategies outlined below both drew on and compared results to previous research and existing knowledge, to ensure some form of control of the subjective elements of the evaluations. The authors suggest using similar strategies and triangulation for evaluations with high degrees of subjective “best-guessing. 4 Evaluating DAISY The suggested definition of universal design defined the focus and scope of investigating DAISY. Two different strategies for measuring universal design using feature analysis were attempted: feature analysis survey and feature analysis expert evaluation. The first attempted a measurement of the fit of “the DAISY system”, and used end-users to evaluate usefulness – incorporating practice and human aspects. The second looked into DAISY software playback systems in particular, thus leaning more towards measuring software than an information system in its widest term. 4.1 Feature Analysis Survey The feature analysis survey was applied in order to assess the overall usefulness of DAISY. Kitchenham [9] describes using feature analysis surveys for comparing 272 M.E. Nes, K. Ribu, and M. Tollefsen different systems when a system has been used in an organization for a while [10] [11]. The DAISY system had been used as a tool for reading in the student sample for at least nine months [12]. Attempting to measure the usability of only one system was not yet described. That approach is prototyped here, within one specific user group: students with dyslexia or general reading and writing difficulties, but may be replicated for each relevant user category, each with their tailored feature list and assessments. Using the approach, one is able to say whether the system is universally designed (the system fit all relevant user groups) or if users are marginalized. Two questionnaires were distributed to almost 600 schools – one targeted towards students, and another towards their teachers [12] [13]. About 10% replied: 130 students and 67 teachers. A hierarchical feature list was derived based on interviews and expert input [14]. Student users were asked which of the features they used and on perceived usefulness and ease for each. Based on the frequencies of use, importance and corresponding score was assigned to each feature1. The assumption was made that features frequently used within a user group should be considered core features within this group. Thus, only limited pre-knowledge of core features for the users was necessary. No features were identified as being negative. Respondents scored each feature in relation to ease/usefulness on ordinal scales. Negative user assessments gave negative scores, and positive assessments corresponding positive scores, no answer gave the score 0 and thus did not influence the evaluation. Points were added for each feature and feature set. Importance score and user assessment scores were multiplied for each feature and feature set, creating total scores. Thus, the assessment scale, specifying the points to achieve to be defined as acceptable in terms of usability, took into consideration the importance of the feature/feature set. The thresholds were calculated based on the percentage of possible maximum and minimum score for each importance level. Finally, importance-weighted scores were added and whether DAISY was considered fitting for the user group depended on the percentage of total points received in relation to the maximum possible. In the end, DAISY received a total score corresponding to 73% of the total range while the threshold was 75%, i.e. not universally designed. The method proved successful; showing that a feature analysis survey may be used to measure a system’s usability and usefulness for a specific user group. Results on “fit” coincided with apprehensions and suspicions in the Norwegian DAISY environment and indications from previous qualitative studies. Separate items in the questionnaires measured emotional satisfaction (aspects) and checked for feature completeness). Being able to include additional items (e.g. checking for survey validity) is a major plus, giving the opportunity to verify the evaluation against previous research. Taking these into consideration, the overall assessment of DAISY was that it is positive for the user group, and should be viewed as beneficial. Since acceptance scores are floating with regards to how many use them (importance), this means pre-knowledge of spread in feature use (whether the sample used all features or few, i.e. frequency of use for each feature) is not necessary, since it does not influence final feature/system acceptance. Using a feature analysis survey as prototyped here, the functionality found to be the most important is the one that contributes the most to whether the system was evaluated as universally designed or not. 1 Used by: <25% = score 1 ‘Nice to have’, 25-50% = score 4 ‘Desirable’, 20-75% = score 6 ‘Highly desirable’ and >75% = score 10 and defined as ‘Mandatory’. DAISY – Universally Designed? Prototyping an Approach 273 The assessments of the features with the highest importance – those that are frequently used – have a higher weight and thus influence the total DAISY usefulness percentage of possible maximum score more than those with low importance weight. In addition to overall results, one can also extract information about which features or feature sets contributed most to the system being, or not being, suited for a certain user group. The quantitative data that emerge make it easy to compare features and feature sets both internally within a user group and between user categories, as well as the total scores. When looking at these low-level data, one may fairly easily deduct why a specific feature received a low score. Depending on the items in the questionnaire, and the detail of the evaluation form, one may be able to pin-point exactly why a feature, feature set or system did or did not reach its anticipated threshold for acceptance. The authors found the quantitative low-level feature information to be very valuable for improving the system in question, providing information on usefulness and suitability for a specific user group. 4.2 Feature Analysis Expert Evaluation The survey strongly indicated that desired improvements of DAISY are linked to the improvement of the playback software interfaces and their usability. The three most common playback software according to the survey and the Norwegian DAISY environment, both freeware and proprietary, were chosen for feature analysis expert evaluations. Merging feature analysis into a traditional system expert evaluation was not defined in feature analysis literature [10]. Since feature analysis strategies aim at supporting the planning and execution of unbiased, dependable evaluations [8], the expert evaluation is strengthened by applying feature analysis methodology. In contrast to the survey where limited previous knowledge was needed, this method requires deeper knowledge of the system or software to be evaluated – since no users are asked, but instead the expert evaluates on their behalf. In particular, knowledge of user groups, their functional and interface requirements is needed, in addition to which features should be regarded as core features for each user group. Again, replicating the evaluation for all relevant user categories would provide the answer to whether the software is universally designed, however this prototype focus on students with reading and writing difficulties and dyslexia. Knowledge gathered from the survey provided the basis for this expert evaluation. The goal was to uncover strengths and weaknesses in the software, by measuring how well the software fit the target user group. The feature list was extended compared to the survey, as features not suited in survey questions were included. Three categories of features were defined, and within each of the categories, general feature sets were formulated, such as ‘User Interface’ within ‘Usability’ [3] [16] [17] [18] [19]. Feature sets consisted of more specific and testable level 3 features. Level 3 features were assessed as either ‘Support’ or ‘Important’. Feature sets were assigned importance on a 4 level scale from ‘Mandatory’ to ‘Nice to have’. Importance points were not used. Four levels of support were defined for ordinal level features, and corresponding scores from 1 to 4, 4 being the top score, were given. For nominal features, a 4 was given for present, and 1 for not-present. The acceptance of a feature set was defined by the acceptance of its level 3 features. These were explicitly formulated prior to evaluation [12], to ensure conformance. Acceptance conformity between the two high importance ranks, as well 274 M.E. Nes, K. Ribu, and M. Tollefsen as the two lower, made the subjectivity involved in deciding on the precise importance of a requirement less influential. Some form of external control of the evaluation was attempted, as for the survey, by separately conducting usability tests and comparing results to previous experience in the DAISY community [12] [15]. Total scores were calculated by adding the feature points given. The overall criteria for acceptance were that all Mandatory feature sets were acceptably implemented. One tool adhered to this criterion; however it was not the one with the highest total score, and very unstable. Quantitative low-level information was utilized for software comparison and improvement suggestions. Improving software is viewed as crucial for making DAISY truly fit this user group, and achieving universal design. 5 Conclusion There is a need for a practical definition of universal design in software and information systems. This paper hopes to inspires to a more careful use of the term, and provides input to a suitable definition and measurement. A suggested definition of the term is proposed, linking the general NCSU definition to two main areas, features and users. The main elements of this definition are: 1) It is not always beneficial, or possible, to adapt all software functionality to all possible users. Focus could be on the core features of a system. 2) Two limitations can be added to the usability definition “for all”. First, there is a need to categorize users into defined user groups, where common needs of the different groups are identified. Second, not all users need the use of all systems – and often developers target certain markets and users. This could still be allowed. Implications, gains and consequences using the proposed limitations and definition of “universal design” were discussed, before specifying how this definition may be utilized in evaluating DAISY. We show how the proposed definition may be converted into measurable terms, by proposing strategies for defining main user groups, core functionality and acceptable implementation of attributes through applying feature analysis methodology. Two different methods for evaluating DAISY, utilizing these measurable terms, are outlined. Both measure usability by looking at the implementation of functionality, reflecting that important features should be of high quality. The first is a feature analysis survey. It successfully demonstrates the possibility to measure usefulness in an isolated system, instead of using feature analysis survey as a means to compare systems, within a user group. How this strategy may be used to conduct evaluations of universal design in a single system or software, with limited previous knowledge of the system/software, is explained. Next, feature analysis methodology is successfully applied to a software expert evaluation, showing how feature analysis can be extended to this area to contribute to more detailed usefulness evaluations of software. The information gathered from the evaluations gave new insight into “fit” of the software for the user group in question, and is currently included in the software advices given to DAISY users, as well as communicated to the software developers. The analyses explicitly formulate criteria for the evaluations, clearly state specifications of acceptance and pinpoint criteria that are not fulfilled. They demonstrate how to evaluate appropriateness for a specific user group using the proposed DAISY – Universally Designed? Prototyping an Approach 275 definition of universal design, and may easily be extended to all relevant user categories, as the definition specifies, for a full universal design evaluation. References 1. Stephanidis, C., Akoumianakis, D.: Universal design: Towards universal access in the Information Society. In: CHI 2001 Workshop, pp. 499–500 (2001) 2. Beecher, V., Paquet, V.: Survey instrument for the universal design of costumer products. Applied Ergonomics 36(3), 363–372 (2006) 3. DAISY Consortium, http://www.daisy.org 4. Kawamura, H.: DAISY: a better way to read, a better way to publish – a contribution of libraries serving persons with print disabilities. World Library and Information Congress: IFLA, Seoul (August 20-24, 2006) 5. Kerscher, G.: DAISY is. DAISY Consortium (2003) 6. Vanderheiden, M.: Fundamental Principles and Priority Setting for Universal Usability. In: CUU 2000, Arlington, pp. 24–32. ACM, New York (2000) 7. Masuwa-Morgan, K.R., Burrell, P.: Justification of the need for an ontology for accessibility requirements (Theoretic framework). Interacting with Computers 16, 523–555 (2004) 8. Mace, R.: The Center for Universal Design. North Carolina State University, http://www.design.ncsu.edu/ 9. Kitchenham, B.A.: Evaluating software engineering methods and tool, part 6: Identifying and scoring features. Software Engineering Notes 22(2), 16–18 (1997) 10. Kitchenham, B.A.: Evaluating software engineering methods and tool, part 1: The evaluation context and evaluation methods. Software Engineering Notes 21(1), 11–15 (1996) 11. Kitchenham, B.A.: Evaluating software engineering methods and tool, part 3: Selecting an appropriate evaluation method - practical issues. Software Engineering Notes 21(4), 9–12 (1996) 12. Nes, M.: Appraising and Evaluating the Use of DAISY – For Print Disabled Students in Primary and Secondary Education. Master thesis, UiO, Oslo (2007) 13. Nes, M., Ribu, K.: Appraising and Evaluating the Use of DAISY: A Study of a Reading Aid System. NOKOBIT, Tapir, Trondheim, pp. 263–278 (2007) 14. Tollefsen, M., Nes, M.: En sekretær ville løst alle problemer! MediaLT (2006) 15. Huseby Resource Centre, http://www.skolelydbok.no/Avspilling.html 16. NISO: Specifications for the digital talking book. NISO Press (2002) 17. NISO: Specifications for the digital talking book. NISO Press (2005) 18. NISO Working Papers: Digital talking book standards committee – document navigation features list. NISO Press (2007) 19. Sourcefourge.net, http://amis.sourcefourge.net/