Deliverables Report IST-2001-33310 VICTEC Evaluation Methodology AUTHORS: Carsten Zoll, Rui Falcao, Nisa Silva, Lynne Hall, Polly Sobreperez, Sandy Louchart, Sarah Woods, Sybille Enz, Harald Schaub STATUS: Final CHECKERS: Nuno Otero, Sandy Louchart -1Deliverable2.2.2/final 1 Deliverables Report IST-2001-33310 VICTEC PROJECT MANAGER Name: Ruth Aylett Address: CVE, Business House, University of Salford, University Road,, Salford, M5 4WT Phone Number: +44 161 295 2922 Fax Number:+44 161 295 2925 E-mail: r.s.aylett@salford.ac.uk TABLE OF CONTENTS 0.PURPOSE OF DOCUMENT 3 1.EXECUTIVE OVERVIEW 4 2 INTRODUCTION 8 3. EVALUATION OF TOOLKIT 9 4. EVALUATION OF DEMONSTRATOR 17 1. TECHNOLOGICAL EVALUATION 18 2. USABILITY EVALUATION 23 3. PEDAGOGICAL EVALUATION 29 4. PSYCHOLOGICAL EVALUATION 41 5. CONCLUSION 46 6. REFERENCES 47 -2Deliverable2.2.2/final 2 Deliverables Report IST-2001-33310 VICTEC 0. Purpose of Document This document gives an overview of the intended evaluation of the software generated by the VICTEC project team members. Two pieces of software will be created. Firstly, the generic Toolkit. With the help of this program it will be possible to construct virtual 3D environments containing autonomous and empathic agents. The document refers to the goals and methods of the Toolkit evaluation, focusing on usability aspects. The second software program is the Demonstrator. The Demonstrator uses autonomous and empathic agents in a virtual environment for PSE purpose. The aim is to use the Demonstrator in schools with pupils of 8 to 12 years to teach them about bullying and specifically to support anti-bullying education. The evaluation of the Demonstrator focuses on usability, technological, psychological and pedagogical aspects. For each of the two applications – Toolkit and Demonstrator – the document establishes methods and proposes assessment instruments of evaluation. As for the final evaluation methodology (which will be described in detail in D7.1.1) the proposed instruments have to be further tested and investigated. -3Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 1. Executive Overview In this document we will describe the ideas for the evaluation of the software programs that are generated within the VICTEC project. The evaluation will take place both during and at the end of the project. Two peer reviewed documents reporting the results of the evaluation will be published (D7.2.1: Evaluation of Demonstrator in schools (to be delivered in month 28) and D7.3.1: Evaluation of Toolkit (to be delivered in month 30)). Since this is the first document concerning evaluation some details regarding evaluation methods and instruments proposed in this document may change up until when the final evaluation methodology is reported (D7.1.1: Operational evaluation methodology; to be delivered in month 17). This document focuses on the goals and the strategy of evaluation. In D7.1.1 a more detailed description of the diagnostic methods and instruments used will follow. Thus, this document will describe the content of our evaluation and how we intend to achieve our goals. The document is divided into two parts: The evaluation of the Toolkit, a generic software program that enables users to create empathic and autonomous agents in virtual environments, and the evaluation of the Demonstrator, a software program for children between the age of eight to twelve years, which is situated in the PSE-context and aims to teach children about problems concerning bullying. The evaluation of the Toolkit focuses on the ISO 9241-11 norm which suggests that measures of usability should cover effectiveness, efficiency and user satisfaction. To meet this goal four different methods are proposed: usability inspections, quality assurance testing, usability testing and user satisfaction ratings. Evaluation of Toolkit Goals Evaluation of different criteria of usability based on ISO 9241-11: effectiveness, efficiency, satisfaction Methods Usability inspections Quality assurance testing Usability testing User satisfaction ratings The evaluation of the Demonstrator demands a higher effort since it focuses on four different aspects (technological aspects, usability, psychological effects and pedagogical effects) and parts of it have to be carried out in all countries participating in the VICTEC project in order to detect cross-cultural differences among pupils and ensure the cross-cultural applicability of the Demonstrator. -4Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC Evaluation of Demonstrator Technological Evaluation Usability Evaluation Pedagogical Evaluation Psychological Evaluation The technological evaluation focuses on the question of stability and compatibility of the Demonstrator. Furthermore, it focuses on the minimum technical requirements (hardware and software) that are necessary to ensure optimal performance of the product. Technological Evaluation of Demonstrator Goals Methods Stability Compatibility Minimum hardware requirements Questionnaire for hardware and software Bug grid for the detection and deletion of bugs Like the evaluation of the Toolkit the usability evaluation of the Demonstrator is based on the ISO 9241-11 norm and includes the quality of the interaction system, the ease of use of interface objects, satisfaction and engagement of the user. Furthermore, it focuses on the physical embodiment of the environment and the agents. The usability evaluation is carried out in two stages. The first stage takes places simultaneously to the development of the Demonstrator. Thus the results of the evaluation can influence the development. The second stage evaluates the final version of the Demonstrator. -5Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC Usability Evaluation of Demonstrator Goals Methods Measurement of ISO 9241-11 criteria: effectiveness, efficiency, and satisfaction Physical embodiment of the virtual environment and agents 1. Iterative usability evaluation during the development of the Demonstrator Evaluation of agent appearance, movement and behaviour Heuristic walkthroughs Usability testing 2. Large scale testing with completed Demonstrator Logging software User satisfaction questionnaire Focus groups The pedagogical evaluation of the Demonstrator focuses on two aspects: 1. Do children who have worked with the Demonstrator understand the mechanisms that lead to bullying better than children who did not? 2. Do children who have worked with the Demonstrator show less aggressive behaviour in schools than children who did not? A pre-/post-test-design with four diagnostic instruments including a teacher rating of bullying behaviour, a bullying questionnaire for children, a pupil test and an empathy questionnaire aims to answer those questions. Furthermore, the pedagogical evaluation aims to involve the teachers to asses and create acceptance for the application of the Demonstrator in schools. Pedagogical Evaluation of Demonstrator Goals Methods Effect of user interaction with Demonstrator on cognitive and behavioural bullying aspects Involvement of teachers Teacher rating of bullying Bullying questionnaire Empathy questionnaire Pupil test -6Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC The psychological evaluation concentrates on specific questions important for psychological theory development. One question is whether there are interactions between the work with the Demonstrator and the bullying type. Other research topics focus on the psychological processes underlying the selection of certain coping responses when dealing with the Demonstrator and the role of “theory of mind”. Psychological Evaluation of Demonstrator Goals Methods Effect of user interaction with Demonstrator on cognitive and behavioural bullying aspects Involvement of teachers to asses and create acceptance for the application of the Demonstrator in schools Bullying questionnaire Theory of Mind questions Justification questions -7Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 2. Introduction The evaluation of two complex software programs like the VICTEC Toolkit and the VICTEC Demonstrator is a great challenge and has to be planned carefully. Especially for the Demonstrator a variety of different aspects have to be evaluated, above all technological, usability, psychological and pedagogical aspects. Each of these aspects can be distinguished further, for example the pedagogical aspects can be distinguished as cognitive and behavioural aspects. Behavioural aspects can again be distinguished in bullying and coping behaviour, for example. Because of the multitude of different evaluation aspects, a variety of different methods for evaluation are necessary. These methods will be described in the following chapters, beginning with the evaluation of the Toolkit. Here the usability aspects effectiveness, efficiency and satisfaction are in the foreground. The plan for the evaluation of the Demonstrator follows referring to the above mentioned aspects (technological, usability, psychological, pedagogical). In some respects the overview of the evaluation design has to remain abstract, because the evaluation has to be planned while the software development is still in progress. Thus, the final and concrete appearance of the evaluation objects is not entirely clear yet and the evaluation plans have to remain on a level that allows adapting them to the concrete final version of the software. For example, the interaction of the user with the Demonstrator is not entirely clear yet. Thus, it is possible to set criteria for the appropriateness of the interaction style, but it is impossible to develop a method to evaluate the mouse control when it is uncertain whether the interaction will be carried out via mouse control. Another reason for some abstractness is that many of the planned evaluation methods are selfdeveloped (e.g. user satisfaction questionnaire, bullying questionnaire, empathy questionnaire). The work on these methods is still in progress, they have to be tested and eventually modified on the results of the tests. Thus, the final versions of these methods are not included in this document. They will be included in D7.1.1 (operational evaluation methodology). Apart from that the document provides a complete overview of the aspects of the software programs that will be evaluated and discusses the advantages and disadvantages of the proposed evaluation procedures in detail. -8Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 3. Evaluation of Toolkit The definition of usability used in the VICTEC project is provided by ISO 9241-11 (International Standards Organisation, 1998): “The effectiveness, efficiency and satisfaction with which specified users can achieve specified goals in particular environments” ISO 9241-11 suggests that measures of usability should cover Effectiveness: the ability of users to complete tasks using the system, and the quality of the output of those tasks, the ease with which the user can achieve goals that the system was intended to support. Efficiency: the level of resource consumed in performing tasks, the goals can be achieved with acceptable levels of resource, whether this is mental energy, physical effort or time. Satisfaction: users’ subjective reactions to using the system, with the aim being for users to have a positive, enjoyable, satisfying experience in accomplishing their goals. Effectiveness, efficiency and satisfaction are closed linked. An effective application enables the user to perform their activities in an efficient manner. Similarly, users are more likely to be satisfied with an application that lets them perform their tasks in an effective and efficient manner. Our aim in VICTEC is to evaluate each of these measures of usability. To meet this aim a range of different methods (usability inspections, quality assurance, usability testing and user satisfaction rating) will be used to evaluate the toolkit, seeking to obtain different perspectives of the usability of the toolkit. The overall framework for the early evaluation is provided by Nielsen’s Discount Usability Engineering (Nielsen, 1994b). Discount Usability Engineering is a low-cost, rapid method for evaluating interfaces. It supports the evaluation of interfaces with different levels of fidelity (e.g. from paper based prototypes to stable products) using a range of techniques that enables the usability of the interface to be explored (Rudd, Stern, & Isensee, 1996; Snyder, 1996). This method also permits usability evaluation to occur using only a few (3-5) experts (where the expertise is in usability). Nielsen has provided considerable evidence (Nielsen, 1992; Nielsen, 1993; Nielsen, 1994b) that a few experts will pick up the majority of problems, particularly where domain experts and usability experts collaborate in the evaluation. Discount Usability Engineering has a number of characteristics that make it appropriate for the toolkit evaluation in VICTEC: fast only needs small number of evaluators / users low cost useful for the iterative, rapid prototyping approach flexible, permitting the use of a mixture of appropriate methods, techniques and tools -9Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC Proposed Methods and Techniques The proposed methods and techniques are summarised in table 1 and further detail relating to their selection and application is provided in the following sections. Method Involves Identifies Occurs Gross usability problems and solutions Usability Inspections Quality Assurance User Testing User Satisfaction Rating HCI Experts Potential usability problems (typically related to interaction mechanisms) HCI Expert Potential usability issues and problems (typically related to functionality) Before User Testing Users Refined / specific usability issues and problems At same session as User Satisfaction Rating Users General usability problems After User Testing Developer Facilitator Before User Testing Table 1: Summary of Methods and Techniques for the Toolkit Evaluation It is envisaged that the usability evaluation will occur as an iterative process closely linked to the development of the Toolkit, figure 1 outlines this process. - 10 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC Current Version of Toolkit Usability Inspection Quality Assurance User Testing User Satisfaction Rating Toolkit Development Prioritised Usability Recommendations Figure 1: The Usability Evaluation Process The usability evaluation will occur throughout toolkit development, with considerable input during the prototyping phase. Table 2 identifies the probable dates and locations of evaluation activities for the protoyping phase and the first version of the Toolkit. The usability activities required for subsequent versions of the Toolkit will be identified in response to this first evaluation. However, it is likely that a further two evaluations (following the same style as that for version 1) will occur in November 2003 and April 2004. - 11 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC Usability Activities during Prototyping Date Location / Activity Activity Participants 31/01/03 Sunderland Usability Inspection 2 – 4 experts February Salford Quality Assurance 2 – 4 experts / evaluators February Salford User Testing 2 – 4 User Testers Similar activities will occur until Version 1 is stable. Usability Evaluation of Toolkit Version 1 Date Location / Activity Activity Participants 07/05/03 Sunderland Usability Inspection 2 – 4 experts 09/05/03 Salford / INESC Quality Assurance 2 – 4 experts / evaluators 14/05/03 Sunderland User Testing 20 User Testers Table 2: Evaluation Activities 3.1.1 Usability Inspections Usability Inspection Methods are used by usability experts to evaluate from an expert-standpoint whether the application meets the usability criteria that have been used for an application. Usability experts have considerable knowledge of Human-Computer-Interaction and user interface design coupled with extensive practitioner experience, thus enabling them to identify potential user interface problems and to suggest possible solutions or alternatives. The usability inspection that will occur within VICTEC involves heuristic evaluation: “A method of usability evaluation where an analyst finds usability problems by checking the user interface against a set of supplied heuristics or principles.” (Lavery, Cockton, & Atkinson, 1997) Heuristic evaluation is an informal usability inspection technique, involving hands-on experience with the application. The heuristics used are those derived from an analysis of usability problems (Nielsen, 1994a, 2002) and are based on principles, guidelines and rules of thumb. An example of one of Nielsen’s heuristics is that for visibility of system status, which states “The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.” Within the heuristic evaluation the expert determines how appropriate the visibility of system status is, effectively, does the level of system visibility have a negative, neutral or positive impact on application use. Whilst the heuristics used for the Toolkit will be based mainly on existing heuristics, some modification and refinement of these will be needed to tailor the heuristics to meet the usability requirements of the Toolkit. - 12 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC The heuristic evaluation will be performed by a group of usability experts (in this case based at the University of Sunderland). This involves an individual evaluation (largely unstructured and informal allowing the expert to walkthrough any area of the toolkit) that typically takes 1 - 2 hours, followed by a group debriefing session, where usability problems are further considered and prioritised. Like other methods within discount usability, heuristic evaluation is quick, cheap, useful and effective. The output from heuristic evaluation is provided in a form that enables selection of further implementation activities. Where the output is prioritised and costed it is relatively easy to determine the time required to repair usability issues. A possible negative effect of heuristic evaluation can be the identification of non-existent problems that may actually be irrelevant to the usability of the product. The tendency for this to happen can be reduced through using several experienced evaluators as will be the case in VICTEC. This should avoid the tendency to evaluate the wrong issues or to focus on problems are easily understood and solved. Additionally, false alarms and spurious problems will be rejected during the Quality Assurance session. 3.1.2 Quality Assurance testing. This testing checks the functionality of the product ensuring that it supports users in their tasks. This is achieved through the usability expert exploring the product following prepared test scripts that focus on the main tasks of the toolkit. The quality assurance is performed by a usability expert and a member of the development team. The QA session should take place after the usability inspection and before the usability testing. The QA sessions can help to identify, modify and reject usability problems that have been identified in the usability inspection. The rejection of problems is of particular importance as this stops a concentration on non-problems (Cockton & Woolrych, 2001) and allows future development to be prioritised for the most severe usability problems. QA also aids in the structuring of the usability testing, allowing a focus on tasks that have been identified as problematic. Although expert walkthroughs coupled with heuristic evaluation can identify many usability problems, they can miss severe usability problems, particularly related to unsupported areas of work. Such problems can be identified through usability testing. 3.1.3 Usability Testing Usability testing of the toolkit involves watching the intended users of the toolkit use the toolkit to discover the ways in which this product aids or hinders users in reaching their goals. This approach is based on the work of Neilsen (Nielsen, 1993) and Rubin (Rubin, 1994). Usability tests are conducted in a controlled setting, and the user is asked to attempt a set of predefined tasks. The test group of users should reflect the intended user population and should not be part of the development team nor should they be usability experts. Usability testing involves users being observed performing specific tasks with the toolkit within a specific context. The users are watched by usability experts with the focus of the evaluation being the effectiveness of the product. A product cannot be considered to be usable unless users can perform - 13 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC tasks efficiently and effectively. Whilst opinion of the product is important and any comments will be noted, in the usability testing we are looking for the obstacles that hinder user progress in their work. The situation in which users will be placed will be that of a novice user, working individually (i.e. with no other users). The training and support that users will be given will be that we intend to provide with the toolkit with the intention being to replicate the real world situation of toolkit introduction. Focusing the usability testing at toolkit introduction to the user is felt to be of relevance as the decision to use software such as the toolkit is often based on initial exposure to the product. Following Nielsen’s Discount Usability Engineering approach, it is possible to test the usability of the toolkit with only a small user group (Nielsen & Landauer, 1993), Nielsen identifies that testing is possible with only 5 users (Nielsen, 2000). This approach requires that all of the users are from the same user group and that only key areas are tested. This restriction of testing allows a focus on the areas that have been identified as having potential usability problems through the usability inspection and quality assurance. It also reflects that there is a significant time constraint on the usability evaluation of the toolkit. It avoids wasting resource on evaluating functionality that has been ‘signedoff’ through the other evaluation methods, for example, many of the functionalities in the toolkit will be similar to those seen in other applications. QA testing identifies whether the functionality is met and further usability testing is not required to check cross-application functionality (e.g. file management and basic editing functions). Test users will be selected through their adherence to a user profile. This profile will be composed of essential characteristics (such as users should be experienced developers with knowledge of a range of development environments) and desirable characteristics (such as users who are not known by the evaluators). Recruitment of users will be mainly in the educational sector. The demographics of the participants will also be captured through a pre-test questionnaire, although it is unlikely that these will have any impact on performance. The tasks that will be used for the usability evaluation must accurately represent the intended actual use of the application and occur within a realistic scenario. These tasks are developed out of the QA session. Task performance is evaluated using SMART (Specific, Measureable, Achievable, Realistic, Time-based) usability criteria. Users will perform the various tasks whilst being observed by a facilitator. Users will be encouraged to use a Think-Aloud protocol (Erikson & Simon, 1985) to explain what they are doing, to ask questions and to give information. The facilitator will use an interactive style, asking users to expand upon comments and activities. User interaction with the toolkit will be monitored and logged, providing data on error rates, navigational paths, appropriateness of task structures, etc. 3.1.4 User Satisfaction Rating Usability is considered to be composed of a number of dimensions [Nielsen, 1994 #17]: learnability, memorability, efficiency, error rates and satisfaction The satisfaction that a user experiences with an application is considered to be one of the most important aspects of usability. Determining the level of satisfaction can be achieved in a number of ways including rating scales. Dissatisfied users will tend to stop using the system, so achieving a positive result for this metric is vital. The level of acceptability - 14 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC for satisfaction tends to be higher than for the other dimensions, as it is such an important factor for system success. User satisfaction will be determined through a post-test questionnaire. This questionnaire will be constructed through merging a number of standard user satisfaction questionnaires, such as Brooke’s System Usability Scale (Brooke, 1996), Lin’s Usability Index (Lin, Choong, & Salvendy, 1997) using the approach provided by Perlman (Perlman, 1999). This merger is to create a questionnaire which is relevant to the context provided by the application and the intended user group, with questions assuming that the intended users are to be trained developers rather than the general public. 3.2 Proposed Assessment Instruments Heuristics A set of heuristics will be developed to enable the experts to assess the software from a usability perspective. These will be a subset of those identified by authors such as Nielsen (Nielsen, 2002) and from a competitive analysis of similar products (e.g. kar2ouche (Immersive Education, 2001)) Quality Scripts / Predefined User Tasks & Usability Criteria Usability criteria will be generated from users these will provide a scoring mechanism for the evaluation of the toolkit. A series of scripts and tasks will be provided to enable the evaluators and users to explore various aspects of the toolkit. The interactions will be linked to the criteria thus allowing the usability of the toolkit to be assessed. User Questionnaire A questionnaire will be developed to identify the user’s perception of the effectiveness and efficiency of the toolkit and their level of satisfaction with the interaction. 3.3 Work to date on the Toolkit Usability Evaluation 1. Initial identification of heuristics 2. Initial version of user questionnaire 3. Currently attempting to create a paper prototype to enable evaluation with users. This also results in the generation of usability criteria, quality scripts and tasks. 3.4 Future work on the Toolkit Usability Evaluation The main activities for the first version of the Toolkit are provided here. The usability activities required after this period will be focused on usability issues, problems and benefits that are identified through the evaluation of Version 1 of the Toolkit. - 15 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC Activity Deliverable 31/01/03: Sunderland – Low Fidelity Usability Usability Recommendations for Toolkit Inspection Quality Scripts Early February: Salford – Quality Assurance Refined heuristics for Inspections Identification of user testing tasks / issues Refined recommendations Early February: Salford participatory user testing – Qualitative, Refinement of usability issues Refined Recommendations 17/02/03 Usability Recommendations Report Similar activities will continue to take place on a monthly or more frequent basis (depending on need) with the current version of the prototype. The User Satisfaction Rating Questionnaire will also be piloted during the user testing activities. 07/05/03: Sunderland – Usability Inspection of 15/06/03: Toolkit Usability Report Version 1 30/06/03: Usability Recommendations for Toolkit 09/05/03: Salford / INESC – Quality Assurance of 15/07/03 Plan for future Activities Version 1 14/05/03: Sunderland – User Testing and User Satisfaction Rating of Version 1 Table 3: Usability Evaluation Activities for Prototypes and Version 1 3.5 Dissemination of the Usability Evaluation Case Study Feedback to development team - 16 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 4. Evaluation of Demonstrator The evaluation of the Demonstrator is divided into four parts. The single parts relate to technological, usability, pedagogical and psychological issues. The main challenge for the present deliverable is that the development of the evaluation methodology happens simultaneously to the development of the software program that has to be evaluated. This means that the appearance and some of the functionality of the Demonstrator is not clear in all details. Thus, a certain level of abstractness remains at a few issues. These issues will be put in concrete terms within deliverable 7.1.1 (Operational Evaluation Methodology). - 17 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 4.1 Technological Evaluation 4.1.1 Goals of the technological evaluation The technological evaluation aims to define how the program works on the system, on which machine, on which type of hardware, firmware and software, which type of plugin's are needed, what is the best screen resolution, and how to ensure optimal performance. This is a fundamental step for pre-product launching since it guarantees the consistency with existing solutions, tests compatibilities and technological robustness. It is also important to define the minimum requirements of the product and to decide the optimal way of explaining the process of installation and how the product works. 4.1.2 How many tests and sample characteristics? It is important to test the product with different sub-samples of participants: - People external to the project, who have never seen the product and do not have any knowledge about it. The rationale for employing this group of users is due to the fact that sometimes programmers and software designers tend to utilise the system in stereotypical ways. For example, following certain navigation paths that reflect their own knowledge about the system. This phenomenon makes them unaware of alternatives. Novices to the product will view things from a different perspective. Things that are obvious to programmers may seem strange or not intuitive for other people. This methodology is important in order to identify bugs and to implement modifications (buttons, explanations or animations) if necessary. - Participants dealing with the software have different levels of technological experience. Two categories will be considered: 1) Users that deal with computers regularly and 2) Users that deal with computers rarely. These two groups deal with applications in different ways. Therefore, it is essential that the demonstrator is easy to use for both groups. All potential product users, which include teachers, children and parents for the VICTEC project. - 4.1.3 What kinds of technological tests are essential? It is important to carry out some tests, which cover different operating systems with different hardware, software, different types of browsers and different users. On the one hand, tests should cover as much alternatives of use as possible to predict bugs. On the other hand, it is important to define parameters relevant to the product performance and avoid superfluous testing. All these parameters have influence on the product performance: o Operating systems o Hardware o Screen resolution o Browsers o Plug-ins - 18 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 4.1.4 The phases of the technological evaluation Technological evaluation starts at the end of the first phase of development, when the development team regards the product to be finished and delivers it for testing. At this time, copies of the product, debugging grids and questionnaires are distributed to beta testers. Between two weeks and one month, depending on the product’s size, beta testers and the development team find and correct bugs till they assume that the product is really finished and ready to use. 4.1.5 Example for a specific questionnaire related to software and hardware: 1. Which of the following operating systems are you currently using? Windows 95 or 95 SR2 Windows 98 or 98 SE Windows ME Windows NT Workstation Windows NT server 4.0 Windows 2000 Professional Windows 2000 Server Windows XP Mac OS 9 Mac OS X Linux Other Operating system (Please specify) ________________________ 2. Which of the following hardware equipments are available on your computer? Intel Pentium Celeron, II, III, IV or higher AMD Duron, Athlon Macintosh Video and Audio hardware Web cams CD-ROM drive DVD drive Microphone Other Hardware equipment (Please specify)______________________ 3. What is the display resolution you are currently using? 640 x 480 px 800 x 600 px - 19 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 1024 x 768 px 1280 x 1024 px 4. Which Browser are you using?? Internet Explorer 4.x Internet Explorer 5.x Internet Explorer 6.x Netscape 4.x Netscape 6.x Other Browsers (Please specify) __________________ 5. Which of the following Plug-In’s are available on your computer? Macromedia Shockwave Director Macromedia Shockwave Flash Apple Quicktime Player Realplayer Adobe Acrobat 6. Which type of Internet access do you have? 56K modem, ISDN 64k/128K, ADSL/Cable, Specialized connections, Other (Please specify) __________________________ - 20 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 4.1.6 Grids of tests Grids to help beta testers with their task can be prepared. These grids (see below) guarantee that all bugs occurring during beta test phase are corrected. The first grid provides information about the system on which the bug was occurring. The second grid contains a detailed description of the bug, information on the importance of the bug and a date when to correct the bug. Each tester must complete the two grids. The project coordinator must collect and organise the information of all beta testers. This can also be done with a shared document allowing people to do the evaluation work simultaneously. - 21 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC Computer Operating System 95 98 ME Resolution NT 2000 640 800 Browsers 1024 1280 IE5 Plug ins IE6 NET4 Flash Director QT GRID 1 – Description of hardware and software of test Section What How many times happened always once Kind of problem random crash graphic Priority display sound 1 2 description GRID 2 – Description of a bug - 22 Deliverable2.2.2/final 3 Error Who will Correct Who find, when, in Message correct on: (date) what computer, Deliverables Report IST-2001-33310 VICTEC 4.2 Evaluation of the demonstrator - Usability Evaluation 4.2.1 Aims of the Usability Evaluation The definition of usability used in the VICTEC project is provided in section 3. Although usability is a key quality of any interactive system, it is typically only noticed in a negative context, for example when system features and behaviour obstruct usage, learning or satisfaction (Cockton, Lavery, & Woolrych, 2003). This obstruction can be subtle and may be related to a number of factors, including domain and task issues, semiotic and interaction mechanism expectations, human activity structures and resources, etc. The usability evaluation of the VLE aims to consider a range of different areas using a variety of techniques to gain relevant data. Ensuring the usability of the VICTEC demonstrator is not a single, static activity, rather it is an ongoing process which will have a number of iterations. The participatory design (Gould, Boies, & Clayton, 1991) approach that guides the VICTEC development process requires early, regular and continuous input from users throughout the design and implementation phases, thus allowing users to make a significant contribution to the final product. This will be achieved through using the most appropriate approaches, methods and techniques from usability and incorporating the results and recommendations from the usability evaluations into subsequent iterations of the VLE. This approach permits the usability of the VLE to become an intrinsic and almost invisible aspect of the application rather than a bolted-on interface added at the final moment. The need to include users in the development process as well as the final evaluation requires the use of a range of complimentary usability techniques at a number of different times within the VLE development lifecycle. The major aim of the usability evaluation is to determine whether the VLE is usable by the intended primary user group, children. Although there has been an increase in the awareness and need to focus on children as a special user group (Brouwer-Janse et al., 1997), the majority of currently available user-centred research has focused on the development of applications for the adult population. However, an increasing number of studies, NIMIS (Brna, Martins, & Cooper, 1999; Cooper & Brna, 2000; Lieberman, 1999) support the view that evaluation with children can be achieved by slightly modifying traditional usability techniques to focus on this particular user group. A wide range of usability techniques are available (Nielsen, 1993; Nielsen, 2000) and these can be modified for use depending on the situation and the aspects being evaluated. Usability testing techniques (Rubin, 1994) such as monitoring of behaviour, observation of user activity and user feedback are appropriate for children, using relatively standard techniques, with some change in focus to facilitate elicitation (Hanna, Risden, & Alexander, 1997). Usability is only one part of the VLE evaluation process, with psychological and pedagogical evaluation activities also occurring. Within the usability evaluation, the focus of the evaluation is on ease of use and user satisfaction, that is the effectiveness and efficiency of the user interface components and interaction mechanisms to enable the user to have satisfying and positive interaction experiences. Within the usability evaluation the questions we are seeking to answer are: - 23 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 1. Do the input and output mechanisms of the demonstrator enable the user to interact effectively, efficiently and enjoyably? 2. Is the interaction with the demonstrator a satisfying and positive user experience? 4.2.2 Issues impacting on method and technique choice The usability evaluation for the demonstrator has two distinct sections. Firstly, the evaluation that occurs as part of the development process (iterative usability evaluation) and secondly the evaluation of the completed demonstrator (large-scale user testing). The selection of approaches, techniques and measurement instruments were based on the issues that should be considered when modifying techniques for children (de Vries, 1997; Hanna et al., 1997), see table 1. Issue Lack of awareness application potential Tool / Techniques of software Scenarios are to be provided that should allow exploration of all key functional areas of the demonstrator Logging software will help to identify areas of missed software potential Communications difficulties (including Logging software doesn’t require communications with low literacy levels) the user Storyboards and other non-text based stimuli will be used within focus groups and questionnaires High willingness to agree with the analyst Logging software is impartial Questionnaires do not require agreement In focus group often possible for users to state opinions they would not state when in an interview situation. Potentially limited social and interaction Logging Software will identify interaction ability abilities problems Logging Software permits the logging of behaviour without requiring any social interaction or engagement with the analyst. Questionnaires require limited interaction Comfortable, safe, secure, known context All evaluation activities to occur within the classroom. Table 1: Issues to be considered for Tool Selection - 24 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 4.2.3 Proposed methodology for the Usability Evaluation 4.2.3.1 Iterative Usability Evaluation The approach to be taken during early design and implementation is based on the use of low fidelity techniques (Nielsen, 1993; Snyder, 1996) such as scenarios, storyboards, paper prototypes and screenshots. These techniques are associated with Discount Usability Engineering (Nielsen, 1994), which was discussed in section 3. The use of these techniques and their related tools will result in the design of appropriate interaction mechanisms and interface components created quickly at low cost. The suitability and appropriateness of these mechanisms and components will be assessed through transforming them to a high fidelity prototype, using software such as Kar2ouche (Immersive Education, 2001) with the agent interaction simulated using a Wizard of Oz technique (Maulsby, Greenberg, & Mander, 1993; NECTAR, 2001). Wizard of Oz is a technique used to present advanced concepts of interactions to users. Basically, the user interacts with what appears to be an computer system, but is in fact a simulation provided by either a human (referred to as the wizard) or the combination of a human and a computer. The wizard processes input from a user and emulates system output. The aim is to demonstrate computer capabilities which do not yet exist, for technical reasons or lack of resources. The technique derives its name from the character from the film The Wizard of Oz, whom everyone thought was a tall imposing 'statue' when in fact he was small man who controlled the 'statue' from behind a curtain. This technique is simple and flexible and can be used to explore a range of usability issues throughout the development lifecycle. As this technique does not require the application under development to be stable nor complete, it is highly suitable for a prototyping approach. Due to the possibility of simulating advanced interactions, the Wizard of Oz technique is highly applicable to interfaces for intelligent systems which feature agents, advisors and/or natural language processing (NECTAR, 2001). Within the usability evaluation the agent is evaluated in terms of user satisfaction, and high satisfaction is likely to occur if the user empathises with the agent and enjoys their experience with the VLE. Three levels of evaluation will occur, focused on agent appearance, movement and behaviour. The agents are to be evaluated using an evaluation grid that is under development. This grid provides a set of themes that will be used to evaluate user experience with the agent. Each of the themes, for example believability, will be represented through a set of heuristics (see section 3.1) that will be used to score the agents on a range of usability criteria. The grid is the communications vehicle for this evaluation. The criteria will initially be based on an extensive literature review of embodied agent and usability evaluation. They will then be tailored to the VICTEC project as we gain greater awareness of issues such as empathy. Various iterations of the evaluation of the interface elements of the demonstrator will occur. Each iteration will commence with a quality assessment of the demonstrator that will be performed through the use of heuristic walkthroughs (Sears, 1997) ascertaining that the prototypical VLE is sufficiently robust to permit testing with end users to occur. 4 – 6 children will then evaluate the prototypical - 25 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC demonstrator attempting specific tasks that will allow various usability aspects of the demonstrator to be explored, evaluated and improved. After several iterations this will result in the emergence of the final demonstrator that will be used for the large scale user testing as described below. Location In evaluating products with children, procedures need to be employed so that children feel readily able to participate in the process and additionally that they feel comfortable with the evaluation procedures themselves. A number of authors (Brouwer-Janse et al., 1997; Hanna et al., 1997) suggest that a suitable context for children, is that of the classroom and the iterative usability evaluations with VICTEC will be set within this context. Date The iterative approach to evaluation proposed will involve a number of different evaluation dates that will emerge as a response to changes and modifications of the demonstrator. Sample Size Only small numbers of evaluators are required, with one expert evaluator (for the heuristic walkthrough) and 4 –6 children in each country (Portugal, U.K., Germany). 4.2.3.1.1 Proposed Assessment Instruments Heuristic Walkthrough Specific, critical tasks will be walked through as part of the heuristic evaluation, focusing the attention of the HCI expert on specific aspects of the interface. Usability Testing Children will evaluate the interaction mechanisms through performing a series of typical tasks with the VLE. The debriefing from this evaluation will be achieved using focus groups. Focus groups (Gorman & Clayton, 1997) were selected for eliciting views, expectations and needs from the user group as it was felt they would result in the maximum amount of quality data. In addition, this technique has previously been used successfully with children (de Vries, 1997). Focus groups allow a range of perspectives to be gathered in a short time period in an encouraging and enjoyable way. The satisfaction of the participants is felt to be of considerable importance and focus groups tend to be enjoyable, stimulating experiences. The principal alternative to the focus group for this study was interviewing, however, focus groups were selected as they are non-threatening, social and can result in considerable input into the design process. Focus groups enable children to become integrated into the design process in a way that is both efficient and effective and enables a determination of how children interact with their environment. A further benefit of the use of focus groups is that they permit a significant degree of flexibility in their application. Different groups of children respond in different ways and may have preferred styles of interaction with the evaluator. Whilst some children may chose to discuss the application others may wish to use paper and pencils. - 26 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 4.2.3.2 Large-Scale User Testing with Completed Demonstrator The large scale user testing will occur at the same time as the psychological evaluation: Location University of Hertfordshire Date June, 2004 Sample size Table 2: Data for large-scale user testing 400 children, aged 8 – 12 4.2.3.2.1 Proposed Assessment Instruments Logging Software Each interaction will be logged using logging software. This will be used to gather data relating to error rates, navigational paths, use of help, time performing tasks, functionalities used, etc. This will help to identify aspects of the demonstrator that users found difficult to use, used in a way that was unexpected or that users failed to discover. This information will be used to provide recommendations for the demonstrator and to feed into the evaluation grid. User Satisfaction Questionnaire User Satisfaction Questionnaires will be given to each user. The usability questionnaire are to be answered after the child’s interaction with the VLE. A number of questionnaire frameworks have been developed for use in measuring user attitudes and satisfaction levels during software testing. The questionnaire design for this project incorporates tried and tested techniques from a number of sources (Brooke, 1996; Lin, Choong, & Salvendy, 1997; Perlman, 1999). All of the questionnaires reviewed in this section have been assessed for reliability in a wide range of environments. Work is still being carried out into the types of satisfaction questions to be asked. Agent Evaluation Grid The agent evaluation attempts to evaluate the agent itself and focuses on whether the agent approach is a useful and valid way to expose children to learning about how to develop coping strategies for bullying. The grid is used to analyse relevant results obtained from the other evaluation exercises. The user is not directly exposed to the grid, rather it is used to aggregate a wide range of data from various sources, including the logging of the interaction. 4.2.4 Work to date on the Usability Evaluation Initial version of evaluation criteria for interaction mechanisms created Initial identification of heuristics for walkthrough Initial version of the agent evaluation grid focusing on agent appearance Next step is to make firm recommendations for evaluating agent appearance and for the appearance of the agent in the VLE. - 27 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 4.2.5 Dissemination of the Usability Evaluation Presentations for HCI based conferences. Publications for peer reviewed HCI journals. Feedback in teacher workshops. Feedback to development team - 28 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 4.3 Pedagogical Evaluation 4.3.1 Objectives of the Pedagogical Evaluation The pedagogical evaluation will investigate the effects of interaction with the demonstrator on the child users. According to Bortz & Döring (2002) evaluation is a branch of empirical research that deals with the measurement of effects of actions and interventions. Most evaluative research deals with the question of whether a certain intervention has effect(s) on a predefined population and if so, what type of effect(s). The intervention used in VICTEC is the interaction of child users with the Demonstrator. The project aims to evaluate both cognitive and behavioural effects. The largest aspect of the investigation involves the measurement of interaction effects on the bullying behaviour of the child user. A further crucial objective for this investigation is the measurement of empathy. Cognitive Effects The cognitive evaluation aims to determine whether the interaction with the Demonstrator helps the child user to learn something about bullying. Does the child have an idea about how bullying operates after interacting with the demonstrator? Does the child know which psychological processes lead to bullying situations? Is the child aware of strategies that help to avoid bullying situations and to deal with bullying situations? The evaluation of cognitive effects is the most important part of the evaluation. Since the interaction process will be limited in time (see 4.3.3) it is probable that we will find non-significant behavioural effects of the interaction, significant cognitive effects are expected. Behavioural Effects The evaluation of behavioural effects deals with the question of whether the interaction leads to differences in the bullying behaviour of the child user. For example, do bullies bully less and do victims develop and perform strategies that cause less victimization? Emotional Effects Emotional effects are not the main focus of the evaluation of the Demonstrator, as it cannot be assumed that the interaction with the Demonstrator will change the emotional functioning of the child users strongly. However, there are two main areas that will measure the emotional impact of the interaction on the child user. 1) The measurement of satisfaction of the child user with the Demonstrator within the usability evaluation. If the user is satisfied with the Demonstrator they should have had positive feelings during or after interaction. - 29 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 2) Changes in the empathic reaction of the child user with emphasis on both cognitive and affective empathy components (see 4.3.4). Further goals The pedagogical evaluation’s further goals regard the teachers. Liaising closely with teachers will allow the team to determine (see 4.3.2 and 4.3.4) whether they think that the Demonstrator is a useful tool and easy to integrate into the school context, and eventually the school curriculum. 4.3.2 Pre-, Posttest-Design The investigation of the cognitive and behavioural effects will be carried out as a pre-, posttest-design consisting of three levels: Level Methodology 1) Pretest Initial diagnosis of the child users’ bullying role (the cognitive and behavioural aspects described above) with four diagnostic instruments (see 4.3.4). 2) Application of demonstrator The child users interact individually with the Demonstrator. 3) Posttest Final diagnosis of the child users with same diagnostic instruments as used in the pretest. Table 1: Methodology for the pedagogical evaluation According to Perrez & Patry (1982) the investigation takes place in the “field” which they define as “an ensemble of conditions that is not solely composed of controlled and systematically varied variables, but of a vast number of constraints that are difficult to survey and whose impacts on certain dependent variables are unknown in the first instance” (translation by the author). We agree with this definition of "field", although not all researchers do. For example, Cook & Campbell (1967) state “by field we understand any setting which respondents do not perceive to have been set up for the primary purpose of conducting research”. Following this definition the pedagogical evaluation would not take place in the field because our subjects (the child users) know that they take part. Gachowetz (1993) claims that for some authors the place where the study is carried out, is the decisive point. All research taking place outside of the laboratory is field research according to them. - 30 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC As the application of the diagnostic instruments and the Demonstrator is an invasion in the normal processes taking place in the field (Gachowetz, 1993), we can speak of our investigation as a field experiment. Field experiments have the advantage of high ecological validity. On the other hand, as explained above, compared to experiments in the lab the researcher has less control over variables that might have an impact. The sample used for this field experiment consists of subjects of the target group of the Demonstrator. Since bullying is defined as repeated aggression in the context of schools, our sample consists of school children. The Demonstrator is created for children from eight to twelve years of age, so our sample consists of children from this age group. Furthermore, the sample consists of approximately equal parts of pupils from all three participating countries (Portugal, England and Germany). This is important to see if there are cultural particularities that diminish or promote the intended effects of the Demonstrator. The investigation will be carried out with entire school classes. This makes sense for four reasons. Firstly, it would be a problem to take single pupils out of the class to do a test. It would imply the danger of stigmatisation of the pupil. Secondly, from an organisational point of view it is easier to work with entire classes. Thirdly, if entire classes experience the interaction with the Demonstrator this gives the class teachers the chance to work with it, e.g. to initiate a discussion on bullying. And finally, by examining entire classes it is most probable that we capture the whole range of bullying types. Because every class has its bullies, victims and so on. Our plan is to investigate four school classes in each country. Given an average class size of twenty to thirty pupils, this would result in an overall number of approximately 300 subjects. Of course, we will take care that the classes are from schools that are attended by children who have different social and economic backgrounds. Sample characteristics Age of children 8-12 years Countries assessed U.K., Portugal, Germany Sample Size N: 300 (approximately); 4 school classes from each country Socio-economic status of schools Cross-section of lower, middle and upper class regions School location Urban and rural schools Table 2: Sample characteristics for the pedagogical evaluation - 31 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 4.3.3 Why the Pedagogical Evaluation Has To Remain Exploratory According to Annex 1 – “Description of work” of the contract for the VICTEC project (p. 18), the pedagogical evaluation of the Demonstrator has to remain exploratory. This is due to the following constraints of the investigation: field experiment, pedagogical constraints, organizational constraints and effect size. Due to the nature of field experiments, it will be impossible to control for all the conditions that may have an impact on the dependent variables to be measured. For example, it is not possible to preclude communication among the children in between the different parts of the investigation, although this could have an impact on their perception of the Demonstrator and resulting bullying behaviour. Other variables that might have an impact (e.g. age, gender and bullying type) can be controlled. Alternatively, the impact of certain variables could be minimised by sampling a wide spectrum of subjects. For example, investigating pupils from different social and economic backgrounds. Further issues concern pedagogical and ethical constraints. Extreme caution should be taken when evaluating the impact of a new software program which deals with a sensitive subject such as bullying. Since it is not known how the interaction with the Demonstrator will affect the bullying behaviour of children the possibility of unintended effects must be taken into account. For example, children could learn how bullying works as intended (cognitive effect), but this could result in an increase in proficient bullying behaviour negating the positive effects expected by the interaction. The most important constraint is of an organisational nature. Since the teachers must follow the curriculum the number of potential classes that can participate in the investigation is limited. Furthermore, there are limitations of the time that is available for the investigation. This has the following consequences: 1) Due to time constraints for the pre- and posttests, diagnostic instruments need to be carefully selected. For example, interviews with all pupils would take too long. 2) The time allowed for the application of the Demonstrator is limited. It is currently estimated that the interaction of the child user with the Demonstrator will last approximately 30 minutes. This is an extremely short period of time that will probably not lead to substantial cognitive and behavioural effects. Therefore, effect sizes are expected to be rather small. 3) Given the expected small effect sizes and the limited number of pupils that are available for the investigation, the decision to carry out a research design without control groups has been made. If classes were divided into two experimental and two control classes in each country, the sample sizes would be so small that there would be the risk of finding no effects at all. Since this investigation deals with a rather unexplored area it is difficult to develop precise hypotheses for the effect size. However, a hypothesis on the direction of the effect for bullying behaviour can be delineated: bullying behaviour should decrease and the understanding of bullying mechanisms should increase, but, as pointed out, the concrete size of the effects is impossible to estimate. 4.3.4 Diagnostic Instruments of the Pedagogical Evaluation Four diagnostic instruments in pre- and posttest are to be employed: - 32 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 1) A teacher rating of children’s bullying behaviour. 2) A pupil test. 3) A bullying questionnaire for children. 4) An empathy questionnaire for children. The diagnostic instruments have been chosen because they meet certain demands that were considered to be important for the investigation. Most importantly, the combination of the instruments should adequately measure the different aspects of effects of the interaction with the Demonstrator, especially the cognitive and behavioural effects on bullying. Furthermore, the instruments should be easy to apply, allowing them to be utilised in schools and not just solely in the laboratory. Finally, the application should not be too time consuming (see 4.3.3). As a consequence instruments that can be applied in groups have been selected. Explorative instruments such as the Rorschach-test (Rorschach, 1999; first in 1921) have been omitted as they require individual application. a) Teacher Rating (Behavioural Effects) Procedure The teachers will be asked to rate the bullying behaviour among pupils (as a perpetrator and as a victim) on a scale of five (from very often to never). There are different scales regarding the different types of bullying behaviour: physical bullying, verbal bullying, relational bullying and a scale for an overall score. Dependent Variables The dependent variables to be obtained from this instrument are the ratings of the bullying behaviour for all pupils for all bullying types. Critique The Teacher Rating is a diagnostic instrument which is easy to apply. Since teachers observe their students on a daily basis, there is no additional effort involved. Teacher ratings of bullying behaviour should elucidate bullying behaviour from an adult perspective. However, research has shown that the attitudes and observations of bullying behaviour can be very different between teachers and pupils (Schaefer, 1996). There are a number of reasons for this. Firstly, children who bully others try to conceal their behaviour from adults to avoid punishment. Secondly, teachers may judge bullying behaviour in a biased nature due to individual differences in relationship with different pupils. However, the teacher rating is a core instrument for the evaluation. If the demonstrator is to be accepted by schools, it is essential that the teachers are aware of the possibility of the role of the demonstrator in reducing bullying behaviour. - 33 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC b) Pupil Test (Cognitive Effects) Procedure The children will receive a picture-story containing ambiguous content. The task of the pupils will be to re-narrate the story. Dependent Variables The re-narration will be analysed in two ways. 1) The language of the re-narration will be analysed regarding the amount of aggression-associated words used. 2) The content of the re-narration will be analysed. This analysis will for example deal with the questions: Do the children interpret the picture-story in an aggressive or a non-aggressive manner? Are there protagonists that can be associated with certain bullying types? Critique An advantage of this instrument is that it can be applied during class and thus be useful to the teachers too. Social desirability effects will be minimal as the purpose of the re-narration will remain ambiguous to the children participating. c) Questionnaire on Bullying (Behavioural Effects) Procedure The diagnostic procedure of the bullying questionnaire is described in the chapter on the psychological evaluation (4.4). Dependent Variables The questionnaire on bullying includes a section on personal data. This is comprised of information about the child’s family background (especially siblings), friends and hobbies. This data will be useful in order to investigate whether the Demonstrator is effective for specific subgroups of students. The main body of the questionnaire focuses on bullying behaviour from which data concerning physical, verbal and relational bullying can be derived. Pupils are asked if they bully others and/or if they are victims of bullying (physical, verbal or relational). This data is important in two respects. Firstly, it can be investigated whether the application of the Demonstrator has any effect on the extent of bullying in classes. Secondly, the data enables the students to be assigned different bullying roles. This makes it possible to investigate if the effects of the interaction with the Demonstrator depend on the user's bullying type. The size of the sample collected within the pedagogical evaluation is too small to get statistically significant results for this question (especially the number of bullies will probably be too small). This question is dealt with within the psychological evaluation (see 4.4), but because the pedagogical evaluation takes place six months before the psychological evaluation, the results found here can be of use in planning the psychological evaluation. - 34 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC Critique Pupils are the experts in bullying behaviour with the richest knowledge base as they have first hand experiences in either bullying themselves, being bullied or being an observer. However, it may be difficult for the children to disclose the truth to the researcher. All data will be treated confidentially and the children will be assured that no one, not even their teacher will see the data collected. Children may still withhold information either because they do not trust the researchers, or because they try to answer in a socially desirable way. However, previous studies carried out into the nature of bullying behaviour using individual private interviews with primary school children (aged 6-9) has highlighted that most children trust researchers and are happy to cooperate in an honest manner (Wolke et al, 2001). d) Questionnaire on Empathy (Cognitive and Emotional Effects) What is Empathy? The term “empathy” stems from Titchener (1909), who derived it from the Greek “empatheia” which means “passion”, “passionate affection” or “to be much affected” (Levy, 1997). Titchener used “empathy” as a translation of the German term “Einfühlung” which means “feeling into” somebody. “Einfühlung” is described by Lipps (1903), who worked on aesthetics and originally used the term for the description of a process between a person and an art object. “Lipps […] believed that empathy was a form of inner imitation. An observer is stimulated by the sight of an object and responds by imitating the object, loses consciousness of himself, and experiences the object as if his own identity had disappeared and he had become the object himself” (Katz, 1963). Of course, imagining what this inner imitation of an object looks like is quite difficult, but the important thing is that the notion of “Einfühlung” was transferred to processes between two persons. Thus, a working definition of empathy could be “a person feeling into another person”. In psychological literature the first person is referred to as “observer”, the second as “target”. When speaking of empathy today, two distinct perspectives of empathy have to be distinguished. They describe the two possible results of an empathic process between an observer and a target: a cognitive perspective and an affective perspective (Holz-Ebeling & Steinmetz, 1995). Researchers who use the term “cognitive empathy” refer to the cognitive perspective, which means the observer tries to understand how the target should feel in a given situation. The cues available for the observer are the behaviour of the subject (including his bodily especially facial expression of emotion) and the situation with which the target is dealing with. The result of this process of understanding is a cognition, e.g. “I think the target is feeling sad, because he lost his wallet.” “Affective empathy” refers to processes with an affective outcome. When such a process takes place the observer feels something due to the perception of a target. That much is clear, but researchers are discordant on the question of what quality the relationship between the emotion of the observer and the emotion of the target must be. Some researchers (e.g. Stotland, 1969) just postulate that the observer’s emotion has to be caused by the perception of the target, to be labelled empathic. Others (Kobayashi, - 35 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 1995; Eisenberg & Strayer, 1987) state that the observer’s emotion should at least be adequate to the target's inner state. A third party (e.g. Stroebe et al., 1996) demands a parallel affective outcome of the observer and the target. We follow Stotland’s opinion because, we think that on the one hand it is not possible to decide which emotion of the observer can be labelled “adequate” or not. On the other hand, since the observer cannot distinguish between emotional reaction on the process of feeling in the target and the emotion resulting from the feeling in process itself, it does not make sense to try to develop questionnaire scales that do so. Why Measure Empathy? VICTEC stands for “Virtual Information and Communication Technology with Empathic Characters”. Thus, it is clear that empathy is a key concept within the project. Two reasons can be highlighted to illustrate the importance of measuring the empathy of the user and not solely concentrating on the empathy of the characters. Firstly, the concept of the Demonstrator is to help the child user to gain insight into the social and psychological processes that lead to bullying. This means that after interacting with the Demonstrator the users should have a fuller understanding concerning why other children behave the way they do in bullying situations. If this is the case, the interaction with the Demonstrator should improve the empathic abilities of the users. The use of the empathy questionnaire will be useful in determining whether this is the case. It is crucial that the questionnaire is sensitive to changes in empathic abilities and not just a measure of trait empathy. A literature review concerning empathy illustrates that there has been one successful attempt in measuring empathy as a state among adults (Nezlek et al., 2001). Furthermore, research shows that even trait measures are sensitive to changes (Badke-Schaub & Schaub, 1986). It is reasonable to assume that empathy serves as a confounding variable in the pedagogical evaluation. It is possible that empathic children show greater learning effects than less empathic children because they understand more fully the nature of social bullying situations that are presented on the screen and have a higher ability of transferring the content presented there into real life. On the other hand, there are indications that high empathy and less aggressive behaviour are associated (Miller & Eisenberg, 1988). Therefore, an effect of the interaction with the Demonstrator on the bullying behaviour on users with high levels of empathy may not be found (ceiling effect). Additionally, if empathy is measured within the pedagogical evaluation there is an area of general psychological interest. Together with the data collected from the bullying questionnaire, it can be investigated whether there are any differences in the empathic abilities for different bullying types. For example, it may be possible to substantiate previous research questions (Sutton & Smith, 1999) as to whether pure bullies are less empathic than bully/victims. - 36 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC Procedure Structure of the empathy Questionnaire The questionnaire consists of five scales referring to the empathic outcomes (cognitive, affective) mentioned above and the types of mediation. Three types of mediation of targets’ emotions can be distinguished: 1) via situational cues (situation mediated), 2) via bodily expression cues (expression mediated) and 3) via ideomotoric cues (ideomotoric). Situation mediated means that the observer perceives the situation in which the target is acting and thereupon reacts empathic. Expression mediated means that the observer reacts empathically upon the perceived bodily expressions of the target's emotion. The target's emotions can show in mimic, gesture, posture, paraverbal parameters (e.g. pitch of voice, speech-rate) and physiological parameters (e.g. flush). Ideomotoric mediation refers to results of the research group of Prinz (2002). Basis of their research is the so called “ideomotoric principle” by William James (1890) which means that “each imagination of a body movement implies the tendency to perform this movement”. Prinz and his colleagues extended this principle. According to them not only the imagination of a movement, but the perception of another person's movement is able to trigger this tendency. This means that the observer must have the same or at least similar motoric schemes as the target at his disposal. A minimum similarity of motoric schemes of observer and target can be postulated for an empathic reaction. Prinz and colleagues showed that subjects can predict the trajectories of their own handwriting better than that of others. This could be due to the fact that they, of course, are familiar with their own motoric schemes. Referring to empathy this could mean, that empathic persons have wider and/or more flexible motoric schemes. But the concept need not be limited to motoric schemes, it can comprise actions as well. This was demonstrated by Bach et al. (2001) with the game “paper, scissors, stone”, where subjects had to predict the decisions of other subjects on the basis of their movements. The different scales of the questionnaire result from the combination of the different outcomes (cognitive, affective) and types of mediation (situation, expression). Since the mechanism of ideomotoric empathy is not entirely clear to us yet, the concept remains as a whole in the first instance. Five aspects (scales) of empathy can be distinguished for the final questionnaire: cognitive, situation mediated empathy (11 items) cognitive, expression mediated empathy (11 items) affective, situation mediated empathy (11 items) affective, expression mediated empathy (7 items) ideomotoric empathy (11 items) - 37 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC The initial version of the questionnaire consists of 51 items. The scale “affective, expression mediated empathy” consists of only seven items, because of difficulties in creating appropriate ones. The other scales consist of eleven items each. It was necessary to create a new empathy questionnaire due to a lack of appropriate existing measures. The only existing empathy questionnaire for children stems from Bryant (1982; “index of empathy for children and adolescents”) and does not distinguish between the five scales, which is necessary for the VICTEC project. Furthermore, the validation process of Bryant's questionnaire is disputable posing a risk if used for the current pedagogical evaluation. However, the scale was of use, because the items have been incorporated into the new questionnaire. A few items were taken from the empathy questionnaire devised by Leibetseder and colleagues (2001) and adapted for children. The remaining items were developed by the VICTEC team. As the questionnaire is a new instrument, the quality in terms of validity and reliability needs to be considered. Item Analysis, Reliability and Validity The questionnaire must be available in three language versions: Portuguese, English and German. Therefore, the items have been translated and quality analyses will be carried out in each country. The investigation aims to include two hundred subjects in each country. One hundred aged eight and one hundred aged twelve. Pupils will be selected from different schools allowing the questionnaire to be evaluated for all educational backgrounds. The investigation will provide data concerning complexity issues and item selectivity. Because the questionnaire is designed as a state measure, it is not possible to calculate retest reliability. Thus, splithalf reliability will be calculated. The assessment of validity faces some crucial difficulties. The usual validation procedure involves correlating the questionnaire with another one that has previously been validated to measure the same construct. However, this is not possible since there are no empathy questionnaires for children aside from the Bryant questionnaire. The Bryant questionnaire cannot be used because it only has one dimension (empathy in general) and the items have been integrated into the new questionnaire to be validated. Furthermore, the method for validation used by Bryant is questionable itself. There are 3 possibilities to validate the questionnaire: 1) Perform a factor analysis. The results of this factor analysis should reflect the three aspects of empathy, cognitive, affective and ideomotoric, and the two different types of mediation. If this is the case it would provide strong support for the validity of the questionnaire. 2) The scales on cognitive and affective empathy will be validated with self-assessments of the cognitive and affective abilities of the children. - 38 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 3) Experts (students of psychology and researchers) will assess, if the items are appropriate to measure the specific scales they are developed for. Dependent Variables From the questionnaire, scores for all five scales and an overall score on empathy will be generated. Why will Psycho-physiological Measures not be employed? Psycho-physiological measures were considered as a possibility to assess the reactions and the involvement of the child users when interacting with the Demonstrator. Psycho-physiological measures could also have been a criterion for the validation of the affective empathy scales of the questionnaire. However, after deliberation, it was decided not to use psycho-physiological measures. The most important reason for refraining from using these measures was due to the opposition of the most important groups involved in the pedagogical evaluation: the teachers, the children and the parents. Parents in particular are very concerned about research projects that occur within schools and would not allow any procedures that may be in any respect deleterious to their children. Furthermore, such measures may jeopardise the quality of the data itself. Although it is easy to apply measures such as heart rate or skin conductance (and only those can be taken into account) they provide little information about the quality of the emotion felt by the subject. For example, if heart rate is measured, the information provided is whether the subject is aroused or not. Another issue is that psycho-physiological measures are only really appropriate for one-point measurements and not for the measurement of processes. If physiological measures are recorded over a long time period (like the interaction with the Demonstrator) the data is likely to be biased by disruptive factors such as motoric action and would lead to interpretation difficulties. Furthermore, physiological data is extremely noisy, resulting in the researcher having to collect data several times in a row to get one accurate measure. Finally, the application of such measures is time consuming (installing the devices, getting the base rates..) which would pose a limitation for the current evaluation. 4.3.5 Measurement of Educator Focused Learning Criteria Teachers will be asked to complete a questionnaire to determine whether they found the Demonstrator useful and thought that the interaction sequences provided to the children were optimal from their pedagogical point of view. Furthermore, the possibility of integrating the Demonstrator into school curriculum needs to be evaluated from the teacher’s perspective. Communication with the child users will be another crucial aspect of the evaluation. The effects of the interaction with the Demonstrator will be measured with the diagnostic instruments mentioned above. - 39 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC Informal communication and focus groups with the child users will help us to evaluate if children liked playing with the Demonstrator and the utility of it. 4.3.6 Timetable for the Pedagogical Evaluation Date Action Create, pilot and validate instruments (empathy June 2003 questionnaire, pupil test and bullying questionnaire) September 2003 Design for the statistical analysis of the data January 2004 Pretest February 2004 Application of Demonstrator in schools March 2004 Posttest Table 3: Timetable for pedagogical evaluation - 40 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 4.4 Psychological Evaluation 4.4.1 Aims of the Psychological Evaluation Psychological research concerning bullying and aggression in schools has increased over the past decade. There is now substantial evidence within the literature that bullying and victimisation problems are problematic worldwide with prevalence rates of victimisation ranging from 8% to 46%, and bullying ranging from as little as 3% to 23% (Wolke et al. 2001). The consequences of bullying behaviour have also been widely researched and reveal both short term and long term decrements for children including mental health and behaviour problems (Wolke et al. 2000), physical health problems (Wolke et al., 2001a), truancy (Sharp, 1995), criminal behaviour (Farrington, 1993) and even suicide (Olweus, 1993). Research studies are now considering the individual differences of children involved in bullying behaviour in terms of those children categorised as ‘pure’ bullies, ‘pure’ victims, bully/victims and neutral children, who can either be bystanders in bullying incidents or defenders to the victim. However, there is still little known about the individual differences in cognitive styles for children involved in bullying behaviour and coping and planning skills in combating victimisation. For example, there is still uncertainty regarding whether ‘pure’ bullies are socially intelligent individuals who are extremely competent in gauging the intentions of others, and whether ‘pure’ victims have deficits in encoding and interpreting the behaviour of others. These skills are coined within the term ‘Theory of Mind’ which is used to understand social scenarios and the thinking of others. Theory of mind skills allow children and adults to form representations of mental states such as pretending and knowing and secondly to understand the relationships between states and actions. Theory of Mind (ToM) tasks can be used to assess whether children can attribute mental states to themselves and others in order to explain and predict behaviour. The ability to recognise that others can have false as well as true beliefs is a central tenet to the development of Theory of Mind skills (E.g. Sally-Anne task, Premack and Woodruff, 1978; Baron-Cohen et al., 1985). The ‘Sally-Anne Test’ is acted out by two puppets. Sally is seen putting a marble in a specific place. Later, while Sally is away, Anne puts the marble somewhere else. Sally returns to the room and the child is asked ‘where will Sally look for the marble?’ One consideration concerns the distinction between Theory of Mind and empathy. Theory of Mind is a broad cognitive concept where the understanding of emotions and empathy play a role. For example, Theory of Mind can be used to explain and predict a great deal about human talk and action. The theory encapsulates the mental states of thinking, knowing, guessing, remembering, hoping, fearing, perceptions, intentions and emotions. There are all important elements for the development, understanding and display of empathy. Research studies are also interested in the types of coping mechanisms that children use to deal with bullying incidents such as telling somebody, or ignoring the bully (Talamelli & Cowie, 2001). What is not known are the justifications and reasons why children select specific types of coping mechanisms, and whether there are differences in the types of coping mechanisms chosen by bullies, bully/victims, victims and neutral children. - 41 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC The major aim of the psychological evaluation is to determine whether user characteristics (roles in bullying behaviour) are reflected in user choices regarding the bullying scenarios in terms of mental representations (e.g. theory of mind), action choices, empathy and differences in coping strategies. Two major questions will be investigated: 1) Are there any differences in Theory of Mind responses for children classified as ‘pure’ bullies, ‘pure’ victims, bully/victims or neutral children for both direct and relational bullying behaviour? 2) a) Are there distinctions between the types of coping mechanisms selected by ‘pure’ bullies, ‘pure’ victims, bully/victims and neutral children when interacting with the Virtual Learning Environment (VLE) bullying scenarios? b) Are there differences in terms of the justifications that children state for selecting coping mechanisms according to bullying status? c) Are there any differences in how the children empathise with the characters in the scenarios (empathy)? 4.4.2 Proposed methodology for the Psychological Evaluation Location: The psychological evaluation is to be held at the University of Hertfordshire where 65 computers with the required specifications to run the demonstrator can be used for a consecutive period of two weeks. Teams in Portugal and Germany do not have any access to such large numbers of computers at a time. Children will interact with the VLE on an individual basis. Date of the evaluation: It is proposed that the psychological evaluation will take place during June 2004. There are two important reasons for this: School summer vacation is in mid July until the beginning of September and we need to coordinate the use of the University computers when the university students are on summer vacation. The results from the pedagogical evaluation will provide useful insights for the psychological evaluation and these will be available by June 2004. Sample Size: It is proposed that the psychological evaluation will involve approximately 400 children aged 8-12 years from schools in Hertfordshire and the surrounding area, in the United Kingdom. A large sample of at least this size is required in order to get a large enough sub-set of children who are characterised as ‘pure’ bullies for either direct and relational bullying behaviour. - 42 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC Proposed Assessment Instruments: 1) Bullying Questionnaire: A bullying questionnaire will be given to every child participating in the psychological evaluation to complete before they commence interaction with the bullying scenarios in the VLE. This questionnaire assesses the following areas: - Friendship including liking & disliking - Physical (direct) bullying - Relational bullying - Sibling bullying - Verbal bullying - hobbies (inc. computer games) - Child’s perception of strength Upon completion, children’s individual bullying status can be computed to determine whether they are categorised as a ‘pure’ bully, ‘pure’ victim, bully/victim or neutral child for both direct and relational bullying. 2) Theory of Mind Questions (ToM): ToM questions are to be devised for each child participating in the evaluation to complete. Examples of such questions are ‘how do you think Billy was feeling when he was hit by John?’ ‘How would you feel if you were Billy?’ ‘What do you think Billy was feeling about John after he was hit?’ The ToM questions are to be included at the end of the child’s interaction with the VLE. If the questions were integrated during the VLE interaction, it was felt that this would greatly reduce the believability of the experience for the child. A series of storyboards will be provided for the child at the end of the VLE interaction depicting the major events that happened during each scenario episode. The relevant ToM questions can then be integrated with these storyboards which should provide a reminder of the events. Work is still being carried out into the types of ToM questions to be asked. 3) Justification Questions: During the child’s interaction with the bullying scenarios, the child will have the choice to select a series of different coping strategies to deal with the incident that they have witnessed. The coping mechanisms have been categorised into 5 different groups (e.g. passive response style, future plan by yourself). Once the child has selected a coping mechanism, they will be asked a series of questions through the use of the other characters in the VLE so as not to reduce the believability of the interaction. Example questions are, ‘Why did you select the coping response, go to tell the teacher?’ ‘What do you think the teacher will do to deal with this bullying situation?’ The justification questions to be used still need to be finalised. - 43 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 4.4.3 Timetable for the Psychological Evaluation DATE January 2003 - February 2003 – March 2003 March 2003 – April 2003 April 2003 – May 2003 May 2003 – June 2003 June 2003 onwards January 2004 onwards April 2004 – June 2004 June 2004 June 2004 onwards - EVENT Translation of the bullying questionnaire into German and Portuguese. Ethic approval forms submitted to the UH board for psychological evaluation. Pilot the bullying questionnaire in schools in the UK. Once this has been done, pilot studies will take place in Portugal and Germany. Carry out necessary modifications. Schools visits to take place to evaluate the content of the bullying scenarios for the psychological evaluation. Carry out necessary modifications. Generation of Theory of Mind questions and justification questions to be used in the evaluation. Theory of mind and justification questions to be piloted to ensure that the children understand them. Carry out necessary modifications. Ideas to be proposed for the activities that the children will take part in during the day at UH in addition to taking part in the evaluation. Travel arrangements and possible sponsors to be considered. Schools contacted in the Hertfordshire region to take part in the psychological evaluation. School visits carried out where necessary to encourage schools to take part. Pedagogical evaluation takes place. Information from this used to aid psychological evaluation. Technical equipment checked etc for the running of the evaluation. Volunteers recruited to help run the evaluation. Psychological evaluation takes place at UH. Data collated and analysed. Publications prepared. Project feedback provided to schools. - 44 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 4.4.4 Work to date on the Psychological Evaluation 1. Initial version of the Bullying Questionnaire completed. 2. Initial ToM questions formulated. 3. Initial Justification questions formulated. 4. Submission for ethics permission regarding research with children. 5. Next work is to pilot the bullying questionnaire in all 3 countries (U.K., Germany, Portugal). 6. Work with the schools to ensure that they understand the ToM questions chosen and justification questions to be integrated into the VLE. 4.4.5 Educational impact of the Psychological Evaluation The psychological evaluation will have high educational impact in terms of shedding light on the different characteristics of ToM capabilities and deficits for bullying roles, the types of coping mechanisms that children believe work and do not work, and the reasons why and whether there are particular styles that victims use repeatedly that could result in the circular notion of victimisation. 4.4.6 Dissemination 1. Presentations for Psychological based conferences. 2. Publications for peer reviewed psychological journals. 3. Feedback in teacher workshops. 4. Likely to attract media attention for written, radio/TV media. - 45 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 5. Conclusion The present document provided an overview of the design of the Toolkit and Demonstrator evaluation, two software programs to be developed by the VICTEC project. The main focus of this document is on the completeness of the evaluation aspects. With the great variety of evaluation aspects this goal is reached. The proposed methods are easy to apply and sufficient in light of the constraints the evaluators have to deal with (e.g. time limitation at the pedagogical evaluation). Furthermore, attention is paid to collecting the data from different sources to improve the validity of the data. This is evident for example in the participation of both usability evaluation experts and users in the usability evaluation or in the inclusion of teachers and children in the pedagogical evaluation. Thus, validity of evaluation results will be high. The participation of teachers and users serves another goal of the VICTEC project, the dissemination of the developed software. Feedback from these groups gives information whether the Demonstrator in the present version will be accepted by those who are responsible for its use. Additionally, the methods provide data that can be used to answer research questions that go beyond the primary evaluation purpose, for example the relationship between bullying type and empathy (see chapter on pedagogical evaluation). The following D7.1.1 (operational evaluation methodology) will focus on the concrete instruments that will be used for the evaluation. Especially the self-developed instruments will be introduced there in detail (e.g. questionnaire items, time needed for application). Furthermore, D7.1.1 will focus on the statistical or qualitative analysis of the data collected within the evaluation process. The integration of data from this variety of different evaluation methods is a complex and difficult problem that has to be solved now that the methods are clear. - 46 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC 6. References Bach, P., Knoblich, G., Friederici, A. D., & Prinz, W. (2001). Comprehension of action sequences: The case of Paper, Scissors, Rock. In J. Moore and K. Stenning, Proceedings of the Twenty-Third Annual Conference of the Cognitive Science Society (pp 39-44). Mahwah, NJ: Lawrence Erlbaum Associates. Badke-Schaub, P. & Schaub, H. (1986). Persönlichkeit und Problemlösen. Bamberg: Diplomarbeit Lst Psychologie II Universität Bamberg. Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a 'theory of mind'? Cognition, 21, 37-46. Bortz, J. & Döring, N. (2002). Forschungsmethoden und Evaluation für Human- und Sozialwissenschaftler. Heidelberg: Springer. Brna, P., Martins, A., & Cooper, B. (1999). My first story: Support for learning to Write Stories. In G. Cumming, T. Okamoto & L. Gomez (Eds), Advanced Research in Computers and Communications in Education, Amsterdam: IOS, 335-341. Brooke, J. (1996). SUS: A 'quick and dirty' usability scale. In P. W. Jordan, B. Thomas, B. A. Weerdmeester, & I. L. McClelland (Eds.), Usability evaluation in industry . London, UK: Taylor & Francis. Brouwer-Janse, M. D., Suri, J. F., Yawitz, M. A., de Vries, G., Fozard, J. L., & Coleman, R. (1997). User interfaces for Young and Old. Interactions, 4(2), 34-46. Bryant, B. K. (1982). An Index of empathy for children and adolescents. Child Development, 53(1), 413-425. Cockton, G., & Woolrych, A. (2001). Understanding Inspection Methods: Lessons from an Assessment of Heuristic Evaluation. In A. Blandford, J. Vanderdonckt, & P. Gray (Eds.), People and Computers XV, Springer-Verlag, 171-192. Cockton, G., Lavery, D., & Woolrych, A. (2003). Inspection Based Evaluations. In J. A. Jacko & A. Sears (Eds.), The Human-Computer Interaction Handbook. Laurence Erlbaum Associates. 1118-1138 Cooper, B., & Brna, P. (2000). Influencing the Intangible: Towards a Positive Ambience for Learning through Sensitive Systems and Software Design in the Classroom of the Future. Paper presented at the British Education Research Association, Cardiff. de Vries, G. (1997). Involvement of School-aged Children in the Design Process. Interactions, 4(2), 41-2. Eisenberg, N. & Strayer, J. (Eds.). Empathy and its development. Cambridge: Cambridge Press. - 47 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC Erikson, T. D., & Simon, H. A. (1985). Protocol Analysis: Verbal Reports as Data. Cambridge, MA: MIT Press. Farrington, D. P. (1993). Understanding and preventing bullying. In M. Tonry (Ed.), Crime and Justice (Vol. 17, pp. 381-458). Chicago: University of Chicago. Gachowetz, H. (1993). Feldforschung. In: Roth, E. (ed.). Sozialwissenschaftliche Methoden (3rd edition). München: Oldenbourg, 245-266. Gorman, G. E. & Clayton, P. (1997). Qualitative Research for the Information Professional: a practical handbook. London: Library Association Publishing. Gould, J. D., Boies, S. J., & Clayton, L. (1991). Making usable, useful productivity-enhancing computer applications. Communications of the ACM, 34(1), 75-85. Hanna, L., Risden, K., & Alexander, K. (1997). Guidelines for Usability Testing with Children,. Interactions, 4(2), 9-14. Holz-Ebeling, F. & Steinmetz, M. (1995). Wie brauchbar sind die vorliegenden Fragebogen zur Messung von Empathie? Kritische Analysen unter Berücksichtigung der Iteminhalte. Zeitschrift für Differentielle und Diagnostische Psychologie, 16, 11-32. Immersive Education (2001). Kar2ouche. Oxford. International Standards Organisation (1998). Ergonomic requirements for office work with visual display terminals (VDTs) -- Part 11: Guidance on usability. Katz, R. L. (1963). Empathy: Its nature and uses. New York: Free Press. Kobayashi, M. (1995). Selbstkonzept und Empathie im Kulturvergleich. Konstanz: Universitätsverlag Konstanz. Lavery, D., Cockton, G., & Atkinson, M. (1997). Comparison of Evaluation Methods Using Structured Usability Problem Reports. Behaviour and Information Technology, 16(4/5), 246266. Leibetseder, M., Laireiter, A.-R., Riepler, A. & Köller, T. (2001). E-Skala: Fragebogen zur Erfassung von Empathie – Beschreibung psychometrische Eigenschaften. Zeitschrift für Differentielle und Diagnostische Psychologie, 22(1), 70-85. Lieberman, D. A. (1999). The Researcher's Role in the Design of Children's Media and Technology. In A. Druin (Ed.), The Design of Children's Technology. San Francisco: Morgan Kaufmann Publishers, 73-97 Lin, H. X., Choong, Y.-Y., & Salvendy, G. (1997). A Proposed Index of Usability: A Method for Comparing the Relative Usability of Different Software Systems Usability Evaluation Methods. Behaviour and Information Technology, 16(4/5), 267-278. - 48 Deliverable2.2.2/final Deliverables Report IST-2001-33310 VICTEC Lipps, T. (1903). Einfühlung, innere Nachahmung und Organempfindungen, Archiv für die gesamte Psychologie, 2, 185-204. Maulsby, D., Greenberg, S., & Mander, R. (1993). Prototyping an intelligent agent through Wizard of OZ. Paper presented at the Human-Computer Interaction - INTERCHI '93, 277-284. Miller, P. A. & Eisenberg, N. (1988). The Relation of Empathy to Aggressive and Externalizing/Anitsocial Behavior. Psychological Bulletin, 103(3), 324-344. NECTAR. (2001). User Centred Requirements http://www.ejeisa.com/nectar/inuse/6.2/contents.htm. Nezlek, J., Feist, G. J., Wilson, F. C. & Plesko, R. M. (2001). Day-to-Day variability in empathy as a function of daily events and mood. Journal of Research in Personality, 35, 401423. Nielsen, J. (1992). Finding usability problems through heuristic evaluation. Paper presented at CHI'92, Monterey, CA, 373-380 Nielsen, J. (1993). Usability Engineering, Academic Press Inc: London. Nielsen, J. (1994a). Enhancing the Explanatory Power of Usability Heuristics. Paper presented at the CHI'94 Conference on Human Factors in Computing Systems, ACM Press, 152-158 Nielsen, J. (1994b). Guerrilla HCI: Using Discount Usability Engineering to Penetrate the Intimidation Barrier. In R. G. Bias, Mayhew, D.J. (Ed.), Cost-Justifying Usability (pp. 245272). London: Academic Press Inc. Nielsen, J. (2000). Why You Only Need to Test With 5 Users. Jakob Nielsen's Alertbox. Available: http://www.useit.com/alertbox/20000319.html. Nielsen, J. (2000a). Designing Web Usability: The Practice of Simplicity. New York: New Riders Publishing. Nielsen, J. (2002). Heuristic Evaluation. Available: http://www.useit.com/papers/heuristic/ Nielsen, J., & Landauer, T. K. (1993). A mathematical model of the finding of usability problems. Paper presented at ACM INTERCHI '93, Amsterdam, ACM Press, 206-213. Olweus, D. (1993). Bullying at school: What we know and what we can do. Oxford: Blackwell Publishers. Perlman, G. (1999). Web-Based User Interface Evaluation with Questionnaires, Available: http://www.acm.org/~perlman/question.html. Perrez, M. & Patry, J. L. (1982). Nomologisches Wissen, Technologisches Wissen, Tatsachenwissen – drei Ziele sozialwissenschaftlicher Forschung. In: Patry, J. L. (ed.). Feldforschung. Methoden und Probleme sozialwissenschaftlicher Forschung unter natürlichen Bedingungen. Bern: Huber, 45-66. - 49 Deliverable2.2.2/final Handbook. Available at Deliverables Report IST-2001-33310 VICTEC Premack, D., & Woodruff, G. (1978). Does the Chimpanzee have a theory of mind? Behavioural and Brain Sciences, 1, 515-526. Prinz, W. (2002). Ideomotorische Handlungstheorie. Presentation at the 43rd conference of the German Society of Psychology, 22.-26.9.02, Berlin, Germany. Rubin, J. (1994). Handbook of Usability Testing: How to Plan, Design and Conduct Effective Tests: John Wiley and Sons. Rudd, J., Stern, K., & Isensee, S. (1996). Low vs. high-fidelity prototyping debate. interactions, 3(1), 76-85. Schaefer, M (1996). Different Perspectives of bullying. Poster presented at the XIV Meetings of ISSBD, August, Quebec, Canada. Sears, A. (1997). Heuristic Walkthroughs: Finding the Problems Without the Noise. International Journal of Human-Computer Interaction, 9(3), 213-234. Sharp, S. (1995). How much does bullying hurt? The effects of bullying on the personal wellbeing and educational progress of secondary aged students. Educational and Child Psychology, 12(2), 81-88. Snyder, C. (1996). Using Paper Prototypes to Manage Risk. Software Design and Publishing Magazine. Stotland, E. (1969). Exploratory investigations of empathy. In: Berkowitz, L. (Ed.). Advances in experimental social psychology. New York: Academic Press, p. 271-314. Stroebe, W., Hewstone, M. & Stephenson, G. M. (1996, Eds.). Sozialpsychologie – Eine Einführung. Berlin: Springer. Sutton, J. & Smith, P. K. (1999). Bullying as a group process: An adaptation of the participant role approach. Aggressive Behaviour, 25, 97-111. Talamelli, L., & Cowie, H. (2001). How pupils cope with bullying: Successful and unsuccessful strategies. London: University of Surrey, Roehampton in association with HSBC. Titchener, E. (1909). Experimental psychology of the thought processes. New York: Macmillan. Wolke, D., Woods, S., Bloomfield, L., & Karstadt, L. (2000). The association between direct and relational bullying and behaviour. J Child Psychol Psychiatry, 41(8), 989-1002. Wolke, D., Woods, S., Bloomfield, L., & Karstadt, L. (2001). Bullying involvement in primary school and common health problems. Archives of Disease in Childhood, 85, 197-201. Wolke, D., Woods, S., Schulz, H., & Stanford, K. (2001). Bullying and victimisation of primary school children in South England and South Germany: Prevalence and school factors. British Journal of Psychology, 92, 673-696. - 50 Deliverable2.2.2/final