D2.2.2 - Mathematical & Computer Sciences

advertisement
Deliverables Report
IST-2001-33310 VICTEC
Evaluation Methodology
AUTHORS:
Carsten Zoll, Rui Falcao, Nisa Silva, Lynne Hall, Polly Sobreperez, Sandy
Louchart, Sarah Woods, Sybille Enz, Harald Schaub
STATUS:
Final
CHECKERS:
Nuno Otero, Sandy Louchart
-1Deliverable2.2.2/final
1
Deliverables Report
IST-2001-33310 VICTEC
PROJECT MANAGER
Name: Ruth Aylett
Address: CVE, Business House, University of Salford, University Road,, Salford, M5 4WT
Phone Number: +44 161 295 2922 Fax Number:+44 161 295 2925
E-mail: r.s.aylett@salford.ac.uk
TABLE OF CONTENTS
0.PURPOSE OF DOCUMENT
3
1.EXECUTIVE OVERVIEW
4
2 INTRODUCTION
8
3. EVALUATION OF TOOLKIT
9
4. EVALUATION OF DEMONSTRATOR
17
1. TECHNOLOGICAL EVALUATION
18
2. USABILITY EVALUATION
23
3. PEDAGOGICAL EVALUATION
29
4. PSYCHOLOGICAL EVALUATION
41
5. CONCLUSION
46
6. REFERENCES
47
-2Deliverable2.2.2/final
2
Deliverables Report
IST-2001-33310 VICTEC
0. Purpose of Document
This document gives an overview of the intended evaluation of the software generated by the VICTEC
project team members.
Two pieces of software will be created.
Firstly, the generic Toolkit. With the help of this program it will be possible to construct virtual 3D
environments containing autonomous and empathic agents. The document refers to the goals and
methods of the Toolkit evaluation, focusing on usability aspects.
The second software program is the Demonstrator. The Demonstrator uses autonomous and empathic
agents in a virtual environment for PSE purpose. The aim is to use the Demonstrator in schools with
pupils of 8 to 12 years to teach them about bullying and specifically to support anti-bullying education.
The evaluation of the Demonstrator focuses on usability, technological, psychological and pedagogical
aspects.
For each of the two applications – Toolkit and Demonstrator – the document establishes methods and
proposes assessment instruments of evaluation. As for the final evaluation methodology (which will be
described in detail in D7.1.1) the proposed instruments have to be further tested and investigated.
-3Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
1. Executive Overview
In this document we will describe the ideas for the evaluation of the software programs that are
generated within the VICTEC project. The evaluation will take place both during and at the end of the
project. Two peer reviewed documents reporting the results of the evaluation will be published
(D7.2.1: Evaluation of Demonstrator in schools (to be delivered in month 28) and D7.3.1: Evaluation
of Toolkit (to be delivered in month 30)).
Since this is the first document concerning evaluation some details regarding evaluation methods and
instruments proposed in this document may change up until when the final evaluation methodology is
reported (D7.1.1: Operational evaluation methodology; to be delivered in month 17). This document
focuses on the goals and the strategy of evaluation. In D7.1.1 a more detailed description of the
diagnostic methods and instruments used will follow.
Thus, this document will describe the content of our evaluation and how we intend to achieve our
goals. The document is divided into two parts: The evaluation of the Toolkit, a generic software
program that enables users to create empathic and autonomous agents in virtual environments, and the
evaluation of the Demonstrator, a software program for children between the age of eight to twelve
years, which is situated in the PSE-context and aims to teach children about problems concerning
bullying.
The evaluation of the Toolkit focuses on the ISO 9241-11 norm which suggests that measures of
usability should cover effectiveness, efficiency and user satisfaction. To meet this goal four different
methods are proposed: usability inspections, quality assurance testing, usability testing and user
satisfaction ratings.
Evaluation of Toolkit
Goals

Evaluation of different criteria of usability based on ISO 9241-11:
effectiveness, efficiency, satisfaction
Methods

Usability inspections

Quality assurance testing

Usability testing

User satisfaction ratings
The evaluation of the Demonstrator demands a higher effort since it focuses on four different aspects
(technological aspects, usability, psychological effects and pedagogical effects) and parts of it have to
be carried out in all countries participating in the VICTEC project in order to detect cross-cultural
differences among pupils and ensure the cross-cultural applicability of the Demonstrator.
-4Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
Evaluation of Demonstrator
Technological Evaluation
Usability Evaluation
Pedagogical Evaluation
Psychological Evaluation
The technological evaluation focuses on the question of stability and compatibility of the
Demonstrator. Furthermore, it focuses on the minimum technical requirements (hardware and
software) that are necessary to ensure optimal performance of the product.
Technological Evaluation of Demonstrator
Goals
Methods

Stability

Compatibility

Minimum hardware requirements

Questionnaire for hardware and software

Bug grid for the detection and deletion of bugs
Like the evaluation of the Toolkit the usability evaluation of the Demonstrator is based on the ISO
9241-11 norm and includes the quality of the interaction system, the ease of use of interface objects,
satisfaction and engagement of the user. Furthermore, it focuses on the physical embodiment of the
environment and the agents. The usability evaluation is carried out in two stages. The first stage takes
places simultaneously to the development of the Demonstrator. Thus the results of the evaluation can
influence the development. The second stage evaluates the final version of the Demonstrator.
-5Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
Usability Evaluation of Demonstrator
Goals
Methods

Measurement of ISO 9241-11 criteria: effectiveness, efficiency, and
satisfaction

Physical embodiment of the virtual environment and agents
1. Iterative usability evaluation during the development of the
Demonstrator
 Evaluation of agent appearance, movement and behaviour
 Heuristic walkthroughs
 Usability testing
2. Large scale testing with completed Demonstrator
 Logging software
 User satisfaction questionnaire
 Focus groups
The pedagogical evaluation of the Demonstrator focuses on two aspects:
1. Do children who have worked with the Demonstrator understand the mechanisms that lead to
bullying better than children who did not?
2. Do children who have worked with the Demonstrator show less aggressive behaviour in
schools than children who did not?
A pre-/post-test-design with four diagnostic instruments including a teacher rating of bullying
behaviour, a bullying questionnaire for children, a pupil test and an empathy questionnaire aims to
answer those questions. Furthermore, the pedagogical evaluation aims to involve the teachers to asses
and create acceptance for the application of the Demonstrator in schools.
Pedagogical Evaluation of Demonstrator
Goals
Methods

Effect of user interaction with Demonstrator on cognitive and
behavioural bullying aspects

Involvement of teachers

Teacher rating of bullying

Bullying questionnaire

Empathy questionnaire

Pupil test
-6Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
The psychological evaluation concentrates on specific questions important for psychological theory
development. One question is whether there are interactions between the work with the Demonstrator
and the bullying type. Other research topics focus on the psychological processes underlying the
selection of certain coping responses when dealing with the Demonstrator and the role of “theory of
mind”.
Psychological Evaluation of Demonstrator
Goals
Methods

Effect of user interaction with Demonstrator on cognitive and
behavioural bullying aspects

Involvement of teachers to asses and create acceptance for the
application of the Demonstrator in schools

Bullying questionnaire

Theory of Mind questions

Justification questions
-7Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
2. Introduction
The evaluation of two complex software programs like the VICTEC Toolkit and the VICTEC
Demonstrator is a great challenge and has to be planned carefully. Especially for the Demonstrator a
variety of different aspects have to be evaluated, above all technological, usability, psychological and
pedagogical aspects. Each of these aspects can be distinguished further, for example the pedagogical
aspects can be distinguished as cognitive and behavioural aspects. Behavioural aspects can again be
distinguished in bullying and coping behaviour, for example.
Because of the multitude of different evaluation aspects, a variety of different methods for evaluation
are necessary. These methods will be described in the following chapters, beginning with the
evaluation of the Toolkit. Here the usability aspects effectiveness, efficiency and satisfaction are in the
foreground. The plan for the evaluation of the Demonstrator follows referring to the above mentioned
aspects (technological, usability, psychological, pedagogical).
In some respects the overview of the evaluation design has to remain abstract, because the evaluation
has to be planned while the software development is still in progress. Thus, the final and concrete
appearance of the evaluation objects is not entirely clear yet and the evaluation plans have to remain on
a level that allows adapting them to the concrete final version of the software. For example, the
interaction of the user with the Demonstrator is not entirely clear yet. Thus, it is possible to set criteria
for the appropriateness of the interaction style, but it is impossible to develop a method to evaluate the
mouse control when it is uncertain whether the interaction will be carried out via mouse control.
Another reason for some abstractness is that many of the planned evaluation methods are selfdeveloped (e.g. user satisfaction questionnaire, bullying questionnaire, empathy questionnaire). The
work on these methods is still in progress, they have to be tested and eventually modified on the results
of the tests. Thus, the final versions of these methods are not included in this document. They will be
included in D7.1.1 (operational evaluation methodology).
Apart from that the document provides a complete overview of the aspects of the software programs
that will be evaluated and discusses the advantages and disadvantages of the proposed evaluation
procedures in detail.
-8Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
3. Evaluation of Toolkit
The definition of usability used in the VICTEC project is provided by ISO 9241-11 (International
Standards Organisation, 1998):
“The effectiveness, efficiency and satisfaction with which specified users can achieve specified goals
in particular environments”
ISO 9241-11 suggests that measures of usability should cover
Effectiveness: the ability of users to complete tasks using the system, and the quality of the output of
those tasks, the ease with which the user can achieve goals that the system was intended to support.
Efficiency: the level of resource consumed in performing tasks, the goals can be achieved with
acceptable levels of resource, whether this is mental energy, physical effort or time.
Satisfaction: users’ subjective reactions to using the system, with the aim being for users to have a
positive, enjoyable, satisfying experience in accomplishing their goals.
Effectiveness, efficiency and satisfaction are closed linked. An effective application enables the user to
perform their activities in an efficient manner. Similarly, users are more likely to be satisfied with an
application that lets them perform their tasks in an effective and efficient manner. Our aim in VICTEC
is to evaluate each of these measures of usability.
To meet this aim a range of different methods (usability inspections, quality assurance, usability
testing and user satisfaction rating) will be used to evaluate the toolkit, seeking to obtain different
perspectives of the usability of the toolkit. The overall framework for the early evaluation is provided
by Nielsen’s Discount Usability Engineering (Nielsen, 1994b).
Discount Usability Engineering is a low-cost, rapid method for evaluating interfaces. It supports the
evaluation of interfaces with different levels of fidelity (e.g. from paper based prototypes to stable
products) using a range of techniques that enables the usability of the interface to be explored (Rudd,
Stern, & Isensee, 1996; Snyder, 1996). This method also permits usability evaluation to occur using
only a few (3-5) experts (where the expertise is in usability). Nielsen has provided considerable
evidence (Nielsen, 1992; Nielsen, 1993; Nielsen, 1994b) that a few experts will pick up the majority of
problems, particularly where domain experts and usability experts collaborate in the evaluation.
Discount Usability Engineering has a number of characteristics that make it appropriate for the toolkit
evaluation in VICTEC:
 fast
 only needs small number of evaluators / users
 low cost
 useful for the iterative, rapid prototyping approach
 flexible, permitting the use of a mixture of appropriate methods, techniques and tools
-9Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
Proposed Methods and Techniques
The proposed methods and techniques are summarised in table 1 and further detail relating to their
selection and application is provided in the following sections.
Method
Involves
Identifies
Occurs
Gross usability problems and
solutions
Usability Inspections
Quality Assurance
User Testing
User Satisfaction
Rating
HCI Experts
Potential usability problems
(typically related to interaction
mechanisms)
HCI Expert
Potential usability issues and
problems (typically related to
functionality)
Before User Testing
Users
Refined / specific usability issues
and problems
At same session as
User Satisfaction
Rating
Users
General usability problems
After User Testing
Developer
Facilitator
Before User Testing
Table 1: Summary of Methods and Techniques for the Toolkit Evaluation
It is envisaged that the usability evaluation will occur as an iterative process closely linked to the
development of the Toolkit, figure 1 outlines this process.
- 10 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
Current Version
of Toolkit
Usability Inspection
Quality Assurance
User Testing
User Satisfaction Rating
Toolkit
Development
Prioritised Usability
Recommendations
Figure 1: The Usability Evaluation Process
The usability evaluation will occur throughout toolkit development, with considerable input during the
prototyping phase. Table 2 identifies the probable dates and locations of evaluation activities for the
protoyping phase and the first version of the Toolkit. The usability activities required for subsequent
versions of the Toolkit will be identified in response to this first evaluation. However, it is likely that a
further two evaluations (following the same style as that for version 1) will occur in November 2003
and April 2004.
- 11 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
Usability Activities during Prototyping
Date
Location / Activity
Activity
Participants
31/01/03
Sunderland
Usability Inspection
2 – 4 experts
February
Salford
Quality Assurance
2 – 4 experts / evaluators
February
Salford
User Testing
2 – 4 User Testers
Similar activities will occur until Version 1 is stable.
Usability Evaluation of Toolkit Version 1
Date
Location / Activity
Activity
Participants
07/05/03
Sunderland
Usability Inspection
2 – 4 experts
09/05/03
Salford / INESC
Quality Assurance
2 – 4 experts / evaluators
14/05/03
Sunderland
User Testing
20 User Testers
Table 2: Evaluation Activities
3.1.1 Usability Inspections
Usability Inspection Methods are used by usability experts to evaluate from an expert-standpoint
whether the application meets the usability criteria that have been used for an application. Usability
experts have considerable knowledge of Human-Computer-Interaction and user interface design
coupled with extensive practitioner experience, thus enabling them to identify potential user interface
problems and to suggest possible solutions or alternatives. The usability inspection that will occur
within VICTEC involves heuristic evaluation:
“A method of usability evaluation where an analyst finds usability problems by checking the user
interface against a set of supplied heuristics or principles.”
(Lavery, Cockton, & Atkinson, 1997)
Heuristic evaluation is an informal usability inspection technique, involving hands-on experience with
the application. The heuristics used are those derived from an analysis of usability problems (Nielsen,
1994a, 2002) and are based on principles, guidelines and rules of thumb. An example of one of
Nielsen’s heuristics is that for visibility of system status, which states “The system should always keep
users informed about what is going on, through appropriate feedback within reasonable time.” Within
the heuristic evaluation the expert determines how appropriate the visibility of system status is,
effectively, does the level of system visibility have a negative, neutral or positive impact on
application use. Whilst the heuristics used for the Toolkit will be based mainly on existing heuristics,
some modification and refinement of these will be needed to tailor the heuristics to meet the usability
requirements of the Toolkit.
- 12 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
The heuristic evaluation will be performed by a group of usability experts (in this case based at the
University of Sunderland). This involves an individual evaluation (largely unstructured and informal
allowing the expert to walkthrough any area of the toolkit) that typically takes 1 - 2 hours, followed by
a group debriefing session, where usability problems are further considered and prioritised.
Like other methods within discount usability, heuristic evaluation is quick, cheap, useful and effective.
The output from heuristic evaluation is provided in a form that enables selection of further
implementation activities. Where the output is prioritised and costed it is relatively easy to determine
the time required to repair usability issues.
A possible negative effect of heuristic evaluation can be the identification of non-existent problems
that may actually be irrelevant to the usability of the product. The tendency for this to happen can be
reduced through using several experienced evaluators as will be the case in VICTEC. This should
avoid the tendency to evaluate the wrong issues or to focus on problems are easily understood and
solved. Additionally, false alarms and spurious problems will be rejected during the Quality Assurance
session.
3.1.2 Quality Assurance testing.
This testing checks the functionality of the product ensuring that it supports users in their tasks. This is
achieved through the usability expert exploring the product following prepared test scripts that focus
on the main tasks of the toolkit. The quality assurance is performed by a usability expert and a member
of the development team. The QA session should take place after the usability inspection and before
the usability testing. The QA sessions can help to identify, modify and reject usability problems that
have been identified in the usability inspection. The rejection of problems is of particular importance
as this stops a concentration on non-problems (Cockton & Woolrych, 2001) and allows future
development to be prioritised for the most severe usability problems. QA also aids in the structuring of
the usability testing, allowing a focus on tasks that have been identified as problematic.
Although expert walkthroughs coupled with heuristic evaluation can identify many usability problems,
they can miss severe usability problems, particularly related to unsupported areas of work. Such
problems can be identified through usability testing.
3.1.3 Usability Testing
Usability testing of the toolkit involves watching the intended users of the toolkit use the toolkit to
discover the ways in which this product aids or hinders users in reaching their goals. This approach is
based on the work of Neilsen (Nielsen, 1993) and Rubin (Rubin, 1994). Usability tests are conducted
in a controlled setting, and the user is asked to attempt a set of predefined tasks. The test group of users
should reflect the intended user population and should not be part of the development team nor should
they be usability experts.
Usability testing involves users being observed performing specific tasks with the toolkit within a
specific context. The users are watched by usability experts with the focus of the evaluation being the
effectiveness of the product. A product cannot be considered to be usable unless users can perform
- 13 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
tasks efficiently and effectively. Whilst opinion of the product is important and any comments will be
noted, in the usability testing we are looking for the obstacles that hinder user progress in their work.
The situation in which users will be placed will be that of a novice user, working individually (i.e. with
no other users). The training and support that users will be given will be that we intend to provide with
the toolkit with the intention being to replicate the real world situation of toolkit introduction. Focusing
the usability testing at toolkit introduction to the user is felt to be of relevance as the decision to use
software such as the toolkit is often based on initial exposure to the product.
Following Nielsen’s Discount Usability Engineering approach, it is possible to test the usability of the
toolkit with only a small user group (Nielsen & Landauer, 1993), Nielsen identifies that testing is
possible with only 5 users (Nielsen, 2000). This approach requires that all of the users are from the
same user group and that only key areas are tested. This restriction of testing allows a focus on the
areas that have been identified as having potential usability problems through the usability inspection
and quality assurance. It also reflects that there is a significant time constraint on the usability
evaluation of the toolkit. It avoids wasting resource on evaluating functionality that has been ‘signedoff’ through the other evaluation methods, for example, many of the functionalities in the toolkit will
be similar to those seen in other applications. QA testing identifies whether the functionality is met and
further usability testing is not required to check cross-application functionality (e.g. file management
and basic editing functions).
Test users will be selected through their adherence to a user profile. This profile will be composed of
essential characteristics (such as users should be experienced developers with knowledge of a range of
development environments) and desirable characteristics (such as users who are not known by the
evaluators). Recruitment of users will be mainly in the educational sector. The demographics of the
participants will also be captured through a pre-test questionnaire, although it is unlikely that these will
have any impact on performance.
The tasks that will be used for the usability evaluation must accurately represent the intended actual
use of the application and occur within a realistic scenario. These tasks are developed out of the QA
session. Task performance is evaluated using SMART (Specific, Measureable, Achievable, Realistic,
Time-based) usability criteria.
Users will perform the various tasks whilst being observed by a facilitator. Users will be encouraged to
use a Think-Aloud protocol (Erikson & Simon, 1985) to explain what they are doing, to ask questions
and to give information. The facilitator will use an interactive style, asking users to expand upon
comments and activities. User interaction with the toolkit will be monitored and logged, providing data
on error rates, navigational paths, appropriateness of task structures, etc.
3.1.4 User Satisfaction Rating
Usability is considered to be composed of a number of dimensions [Nielsen, 1994 #17]: learnability,
memorability, efficiency, error rates and satisfaction The satisfaction that a user experiences with an
application is considered to be one of the most important aspects of usability. Determining the level of
satisfaction can be achieved in a number of ways including rating scales. Dissatisfied users will tend to
stop using the system, so achieving a positive result for this metric is vital. The level of acceptability
- 14 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
for satisfaction tends to be higher than for the other dimensions, as it is such an important factor for
system success.
User satisfaction will be determined through a post-test questionnaire. This questionnaire will be
constructed through merging a number of standard user satisfaction questionnaires, such as Brooke’s
System Usability Scale (Brooke, 1996), Lin’s Usability Index (Lin, Choong, & Salvendy, 1997) using
the approach provided by Perlman (Perlman, 1999). This merger is to create a questionnaire which is
relevant to the context provided by the application and the intended user group, with questions
assuming that the intended users are to be trained developers rather than the general public.
3.2 Proposed Assessment Instruments
Heuristics
A set of heuristics will be developed to enable the experts to assess the software from a usability
perspective. These will be a subset of those identified by authors such as Nielsen (Nielsen, 2002) and
from a competitive analysis of similar products (e.g. kar2ouche (Immersive Education, 2001))
Quality Scripts / Predefined User Tasks & Usability Criteria
Usability criteria will be generated from users these will provide a scoring mechanism for the
evaluation of the toolkit. A series of scripts and tasks will be provided to enable the evaluators and
users to explore various aspects of the toolkit. The interactions will be linked to the criteria thus
allowing the usability of the toolkit to be assessed.
User Questionnaire
A questionnaire will be developed to identify the user’s perception of the effectiveness and efficiency
of the toolkit and their level of satisfaction with the interaction.
3.3 Work to date on the Toolkit Usability Evaluation
1. Initial identification of heuristics
2. Initial version of user questionnaire
3. Currently attempting to create a paper prototype to enable evaluation with users. This also results in
the generation of usability criteria, quality scripts and tasks.
3.4 Future work on the Toolkit Usability Evaluation
The main activities for the first version of the Toolkit are provided here. The usability activities
required after this period will be focused on usability issues, problems and benefits that are identified
through the evaluation of Version 1 of the Toolkit.
- 15 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
Activity
Deliverable
31/01/03: Sunderland – Low Fidelity Usability Usability Recommendations for Toolkit
Inspection
Quality Scripts
Early February: Salford – Quality Assurance
Refined heuristics for Inspections
Identification of user testing tasks / issues
Refined recommendations
Early February:
Salford
participatory user testing
–
Qualitative, Refinement of usability issues
Refined Recommendations
17/02/03
Usability Recommendations Report
Similar activities will continue to take place on a monthly or more frequent basis (depending on need)
with the current version of the prototype. The User Satisfaction Rating Questionnaire will also be
piloted during the user testing activities.
07/05/03: Sunderland – Usability Inspection of 15/06/03: Toolkit Usability Report
Version 1
30/06/03: Usability Recommendations for Toolkit
09/05/03: Salford / INESC – Quality Assurance of 15/07/03 Plan for future Activities
Version 1
14/05/03: Sunderland – User Testing and User
Satisfaction Rating of Version 1
Table 3: Usability Evaluation Activities for Prototypes and Version 1
3.5 Dissemination of the Usability Evaluation

Case Study

Feedback to development team
- 16 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
4. Evaluation of Demonstrator
The evaluation of the Demonstrator is divided into four parts. The single parts relate to technological,
usability, pedagogical and psychological issues.
The main challenge for the present deliverable is that the development of the evaluation methodology
happens simultaneously to the development of the software program that has to be evaluated. This
means that the appearance and some of the functionality of the Demonstrator is not clear in all details.
Thus, a certain level of abstractness remains at a few issues. These issues will be put in concrete terms
within deliverable 7.1.1 (Operational Evaluation Methodology).
- 17 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
4.1 Technological Evaluation
4.1.1 Goals of the technological evaluation
The technological evaluation aims to define how the program works on the system, on which machine,
on which type of hardware, firmware and software, which type of plugin's are needed, what is the best
screen resolution, and how to ensure optimal performance.
This is a fundamental step for pre-product launching since it guarantees the consistency with existing
solutions, tests compatibilities and technological robustness. It is also important to define the minimum
requirements of the product and to decide the optimal way of explaining the process of installation and
how the product works.
4.1.2 How many tests and sample characteristics?
It is important to test the product with different sub-samples of participants:
-
People external to the project, who have never seen the product and do not have any
knowledge about it. The rationale for employing this group of users is due to the fact that
sometimes programmers and software designers tend to utilise the system in stereotypical
ways. For example, following certain navigation paths that reflect their own knowledge
about the system. This phenomenon makes them unaware of alternatives. Novices to the
product will view things from a different perspective. Things that are obvious to
programmers may seem strange or not intuitive for other people. This methodology is
important in order to identify bugs and to implement modifications (buttons, explanations
or animations) if necessary.
-
Participants dealing with the software have different levels of technological experience.
Two categories will be considered: 1) Users that deal with computers regularly and 2)
Users that deal with computers rarely. These two groups deal with applications in
different ways. Therefore, it is essential that the demonstrator is easy to use for both
groups.
All potential product users, which include teachers, children and parents for the VICTEC
project.
-
4.1.3 What kinds of technological tests are essential?
It is important to carry out some tests, which cover different operating systems with different
hardware, software, different types of browsers and different users. On the one hand, tests should
cover as much alternatives of use as possible to predict bugs. On the other hand, it is important to
define parameters relevant to the product performance and avoid superfluous testing. All these
parameters have influence on the product performance:
o Operating systems
o Hardware
o Screen resolution
o Browsers
o Plug-ins
- 18 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
4.1.4 The phases of the technological evaluation
Technological evaluation starts at the end of the first phase of development, when the development
team regards the product to be finished and delivers it for testing. At this time, copies of the product,
debugging grids and questionnaires are distributed to beta testers. Between two weeks and one month,
depending on the product’s size, beta testers and the development team find and correct bugs till they
assume that the product is really finished and ready to use.
4.1.5 Example for a specific questionnaire related to software and hardware:
1. Which of the following operating systems are you currently using?
Windows 95 or 95 SR2
Windows 98 or 98 SE
Windows ME
Windows NT Workstation
Windows NT server 4.0
Windows 2000 Professional
Windows 2000 Server
Windows XP
Mac OS 9
Mac OS X
Linux
Other Operating system (Please specify) ________________________
2. Which of the following hardware equipments are available on your computer?
Intel Pentium Celeron, II, III, IV or higher
AMD Duron, Athlon
Macintosh
Video and Audio hardware
Web cams
CD-ROM drive
DVD drive
Microphone
Other Hardware equipment (Please specify)______________________
3. What is the display resolution you are currently using?
640 x 480 px
800 x 600 px
- 19 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
1024 x 768 px
1280 x 1024 px
4. Which Browser are you using??
Internet Explorer 4.x
Internet Explorer 5.x
Internet Explorer 6.x
Netscape 4.x
Netscape 6.x
Other Browsers (Please specify) __________________
5. Which of the following Plug-In’s are available on your computer?
Macromedia Shockwave Director
Macromedia Shockwave Flash
Apple Quicktime Player
Realplayer
Adobe Acrobat
6. Which type of Internet access do you have?
56K modem,
ISDN 64k/128K,
ADSL/Cable,
Specialized connections,
Other (Please specify) __________________________
- 20 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
4.1.6 Grids of tests
Grids to help beta testers with their task can be prepared. These grids (see below) guarantee that all
bugs occurring during beta test phase are corrected. The first grid provides information about the
system on which the bug was occurring. The second grid contains a detailed description of the bug,
information on the importance of the bug and a date when to correct the bug. Each tester must
complete the two grids. The project coordinator must collect and organise the information of all
beta testers. This can also be done with a shared document allowing people to do the evaluation
work simultaneously.
- 21 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
Computer
Operating System
95
98
ME
Resolution
NT
2000
640
800
Browsers
1024
1280
IE5
Plug ins
IE6
NET4
Flash
Director
QT
GRID 1 – Description of hardware and software of test
Section
What
How many times
happened
always
once
Kind of problem
random
crash
graphic
Priority
display
sound
1
2
description
GRID 2 – Description of a bug
- 22 Deliverable2.2.2/final
3
Error
Who will
Correct
Who find, when, in
Message
correct
on: (date)
what computer,
Deliverables Report
IST-2001-33310 VICTEC
4.2 Evaluation of the demonstrator - Usability Evaluation
4.2.1 Aims of the Usability Evaluation
The definition of usability used in the VICTEC project is provided in section 3. Although usability is a
key quality of any interactive system, it is typically only noticed in a negative context, for example
when system features and behaviour obstruct usage, learning or satisfaction (Cockton, Lavery, &
Woolrych, 2003). This obstruction can be subtle and may be related to a number of factors, including
domain and task issues, semiotic and interaction mechanism expectations, human activity structures
and resources, etc. The usability evaluation of the VLE aims to consider a range of different areas
using a variety of techniques to gain relevant data.
Ensuring the usability of the VICTEC demonstrator is not a single, static activity, rather it is an
ongoing process which will have a number of iterations. The participatory design (Gould, Boies, &
Clayton, 1991) approach that guides the VICTEC development process requires early, regular and
continuous input from users throughout the design and implementation phases, thus allowing users to
make a significant contribution to the final product.
This will be achieved through using the most appropriate approaches, methods and techniques from
usability and incorporating the results and recommendations from the usability evaluations into
subsequent iterations of the VLE. This approach permits the usability of the VLE to become an
intrinsic and almost invisible aspect of the application rather than a bolted-on interface added at the
final moment. The need to include users in the development process as well as the final evaluation
requires the use of a range of complimentary usability techniques at a number of different times within
the VLE development lifecycle.
The major aim of the usability evaluation is to determine whether the VLE is usable by the intended
primary user group, children. Although there has been an increase in the awareness and need to focus
on children as a special user group (Brouwer-Janse et al., 1997), the majority of currently available
user-centred research has focused on the development of applications for the adult population.
However, an increasing number of studies, NIMIS (Brna, Martins, & Cooper, 1999; Cooper & Brna,
2000; Lieberman, 1999) support the view that evaluation with children can be achieved by slightly
modifying traditional usability techniques to focus on this particular user group.
A wide range of usability techniques are available (Nielsen, 1993; Nielsen, 2000) and these can be
modified for use depending on the situation and the aspects being evaluated. Usability testing
techniques (Rubin, 1994) such as monitoring of behaviour, observation of user activity and user
feedback are appropriate for children, using relatively standard techniques, with some change in focus
to facilitate elicitation (Hanna, Risden, & Alexander, 1997).
Usability is only one part of the VLE evaluation process, with psychological and pedagogical
evaluation activities also occurring. Within the usability evaluation, the focus of the evaluation is on
ease of use and user satisfaction, that is the effectiveness and efficiency of the user interface
components and interaction mechanisms to enable the user to have satisfying and positive interaction
experiences. Within the usability evaluation the questions we are seeking to answer are:
- 23 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
1. Do the input and output mechanisms of the demonstrator enable the user to interact effectively,
efficiently and enjoyably?
2. Is the interaction with the demonstrator a satisfying and positive user experience?
4.2.2 Issues impacting on method and technique choice
The usability evaluation for the demonstrator has two distinct sections. Firstly, the evaluation that
occurs as part of the development process (iterative usability evaluation) and secondly the evaluation
of the completed demonstrator (large-scale user testing). The selection of approaches, techniques and
measurement instruments were based on the issues that should be considered when modifying
techniques for children (de Vries, 1997; Hanna et al., 1997), see table 1.
Issue
Lack of awareness
application potential
Tool / Techniques
of
software Scenarios are to be provided that should allow
exploration of all key functional areas of the
demonstrator
Logging software will help to identify areas of missed
software potential
Communications difficulties (including Logging software doesn’t require communications with
low literacy levels)
the user
Storyboards and other non-text based stimuli will be
used within focus groups and questionnaires
High willingness to agree with the analyst Logging software is impartial
Questionnaires do not require agreement
In focus group often possible for users to state opinions
they would not state when in an interview situation.
Potentially limited social and interaction Logging Software will identify interaction ability
abilities
problems
Logging Software permits the logging of behaviour
without requiring any social interaction or engagement
with the analyst.
Questionnaires require limited interaction
Comfortable, safe, secure, known context
All evaluation activities to occur within the classroom.
Table 1: Issues to be considered for Tool Selection
- 24 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
4.2.3 Proposed methodology for the Usability Evaluation
4.2.3.1 Iterative Usability Evaluation
The approach to be taken during early design and implementation is based on the use of low fidelity
techniques (Nielsen, 1993; Snyder, 1996) such as scenarios, storyboards, paper prototypes and
screenshots. These techniques are associated with Discount Usability Engineering (Nielsen, 1994),
which was discussed in section 3. The use of these techniques and their related tools will result in the
design of appropriate interaction mechanisms and interface components created quickly at low cost.
The suitability and appropriateness of these mechanisms and components will be assessed through
transforming them to a high fidelity prototype, using software such as Kar2ouche (Immersive
Education, 2001) with the agent interaction simulated using a Wizard of Oz technique (Maulsby,
Greenberg, & Mander, 1993; NECTAR, 2001).
Wizard of Oz is a technique used to present advanced concepts of interactions to users. Basically, the
user interacts with what appears to be an computer system, but is in fact a simulation provided by
either a human (referred to as the wizard) or the combination of a human and a computer. The wizard
processes input from a user and emulates system output. The aim is to demonstrate computer
capabilities which do not yet exist, for technical reasons or lack of resources. The technique derives its
name from the character from the film The Wizard of Oz, whom everyone thought was a tall imposing
'statue' when in fact he was small man who controlled the 'statue' from behind a curtain.
This technique is simple and flexible and can be used to explore a range of usability issues throughout
the development lifecycle. As this technique does not require the application under development to be
stable nor complete, it is highly suitable for a prototyping approach. Due to the possibility of
simulating advanced interactions, the Wizard of Oz technique is highly applicable to interfaces for
intelligent systems which feature agents, advisors and/or natural language processing (NECTAR,
2001).
Within the usability evaluation the agent is evaluated in terms of user satisfaction, and high satisfaction
is likely to occur if the user empathises with the agent and enjoys their experience with the VLE. Three
levels of evaluation will occur, focused on agent appearance, movement and behaviour.
The agents are to be evaluated using an evaluation grid that is under development. This grid provides a
set of themes that will be used to evaluate user experience with the agent. Each of the themes, for
example believability, will be represented through a set of heuristics (see section 3.1) that will be used
to score the agents on a range of usability criteria. The grid is the communications vehicle for this
evaluation. The criteria will initially be based on an extensive literature review of embodied agent and
usability evaluation. They will then be tailored to the VICTEC project as we gain greater awareness of
issues such as empathy.
Various iterations of the evaluation of the interface elements of the demonstrator will occur. Each
iteration will commence with a quality assessment of the demonstrator that will be performed through
the use of heuristic walkthroughs (Sears, 1997) ascertaining that the prototypical VLE is sufficiently
robust to permit testing with end users to occur. 4 – 6 children will then evaluate the prototypical
- 25 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
demonstrator attempting specific tasks that will allow various usability aspects of the demonstrator to
be explored, evaluated and improved. After several iterations this will result in the emergence of the
final demonstrator that will be used for the large scale user testing as described below.
Location
In evaluating products with children, procedures need to be employed so that children feel readily able
to participate in the process and additionally that they feel comfortable with the evaluation procedures
themselves. A number of authors (Brouwer-Janse et al., 1997; Hanna et al., 1997) suggest that a
suitable context for children, is that of the classroom and the iterative usability evaluations with
VICTEC will be set within this context.
Date
The iterative approach to evaluation proposed will involve a number of different evaluation dates that
will emerge as a response to changes and modifications of the demonstrator.
Sample Size
Only small numbers of evaluators are required, with one expert evaluator (for the heuristic
walkthrough) and 4 –6 children in each country (Portugal, U.K., Germany).
4.2.3.1.1 Proposed Assessment Instruments
Heuristic Walkthrough
Specific, critical tasks will be walked through as part of the heuristic evaluation, focusing the attention
of the HCI expert on specific aspects of the interface.
Usability Testing
Children will evaluate the interaction mechanisms through performing a series of typical tasks with the
VLE. The debriefing from this evaluation will be achieved using focus groups. Focus groups (Gorman
& Clayton, 1997) were selected for eliciting views, expectations and needs from the user group as it
was felt they would result in the maximum amount of quality data. In addition, this technique has
previously been used successfully with children (de Vries, 1997). Focus groups allow a range of
perspectives to be gathered in a short time period in an encouraging and enjoyable way. The
satisfaction of the participants is felt to be of considerable importance and focus groups tend to be
enjoyable, stimulating experiences. The principal alternative to the focus group for this study was
interviewing, however, focus groups were selected as they are non-threatening, social and can result in
considerable input into the design process. Focus groups enable children to become integrated into the
design process in a way that is both efficient and effective and enables a determination of how children
interact with their environment. A further benefit of the use of focus groups is that they permit a
significant degree of flexibility in their application. Different groups of children respond in different
ways and may have preferred styles of interaction with the evaluator. Whilst some children may chose
to discuss the application others may wish to use paper and pencils.
- 26 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
4.2.3.2 Large-Scale User Testing with Completed Demonstrator
The large scale user testing will occur at the same time as the psychological evaluation:
Location
University of Hertfordshire
Date
June, 2004
Sample size
Table 2: Data for large-scale user testing
400 children, aged 8 – 12
4.2.3.2.1 Proposed Assessment Instruments
Logging Software
Each interaction will be logged using logging software. This will be used to gather data relating to
error rates, navigational paths, use of help, time performing tasks, functionalities used, etc. This will
help to identify aspects of the demonstrator that users found difficult to use, used in a way that was
unexpected or that users failed to discover. This information will be used to provide recommendations
for the demonstrator and to feed into the evaluation grid.
User Satisfaction Questionnaire
User Satisfaction Questionnaires will be given to each user. The usability questionnaire are to be
answered after the child’s interaction with the VLE. A number of questionnaire frameworks have been
developed for use in measuring user attitudes and satisfaction levels during software testing. The
questionnaire design for this project incorporates tried and tested techniques from a number of sources
(Brooke, 1996; Lin, Choong, & Salvendy, 1997; Perlman, 1999). All of the questionnaires reviewed
in this section have been assessed for reliability in a wide range of environments. Work is still being
carried out into the types of satisfaction questions to be asked.
Agent Evaluation Grid
The agent evaluation attempts to evaluate the agent itself and focuses on whether the agent approach is
a useful and valid way to expose children to learning about how to develop coping strategies for
bullying. The grid is used to analyse relevant results obtained from the other evaluation exercises. The
user is not directly exposed to the grid, rather it is used to aggregate a wide range of data from various
sources, including the logging of the interaction.
4.2.4 Work to date on the Usability Evaluation

Initial version of evaluation criteria for interaction mechanisms created

Initial identification of heuristics for walkthrough

Initial version of the agent evaluation grid focusing on agent appearance

Next step is to make firm recommendations for evaluating agent appearance and for the
appearance of the agent in the VLE.
- 27 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
4.2.5 Dissemination of the Usability Evaluation

Presentations for HCI based conferences.

Publications for peer reviewed HCI journals.

Feedback in teacher workshops.

Feedback to development team
- 28 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
4.3 Pedagogical Evaluation
4.3.1 Objectives of the Pedagogical Evaluation
The pedagogical evaluation will investigate the effects of interaction with the demonstrator on the
child users.
According to Bortz & Döring (2002) evaluation is a branch of empirical research that deals with the
measurement of effects of actions and interventions. Most evaluative research deals with the question
of whether a certain intervention has effect(s) on a predefined population and if so, what type of
effect(s). The intervention used in VICTEC is the interaction of child users with the Demonstrator.
The project aims to evaluate both cognitive and behavioural effects. The largest aspect of the
investigation involves the measurement of interaction effects on the bullying behaviour of the child
user. A further crucial objective for this investigation is the measurement of empathy.
Cognitive Effects
The cognitive evaluation aims to determine whether the interaction with the Demonstrator helps the
child user to learn something about bullying. Does the child have an idea about how bullying operates
after interacting with the demonstrator? Does the child know which psychological processes lead to
bullying situations? Is the child aware of strategies that help to avoid bullying situations and to deal
with bullying situations?
The evaluation of cognitive effects is the most important part of the evaluation. Since the interaction
process will be limited in time (see 4.3.3) it is probable that we will find non-significant behavioural
effects of the interaction, significant cognitive effects are expected.
Behavioural Effects
The evaluation of behavioural effects deals with the question of whether the interaction leads to
differences in the bullying behaviour of the child user. For example, do bullies bully less and do
victims develop and perform strategies that cause less victimization?
Emotional Effects
Emotional effects are not the main focus of the evaluation of the Demonstrator, as it cannot be
assumed that the interaction with the Demonstrator will change the emotional functioning of the child
users strongly. However, there are two main areas that will measure the emotional impact of the
interaction on the child user.
1) The measurement of satisfaction of the child user with the Demonstrator within the usability
evaluation. If the user is satisfied with the Demonstrator they should have had positive feelings during
or after interaction.
- 29 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
2) Changes in the empathic reaction of the child user with emphasis on both cognitive and affective
empathy components (see 4.3.4).
Further goals
The pedagogical evaluation’s further goals regard the teachers. Liaising closely with teachers will
allow the team to determine (see 4.3.2 and 4.3.4) whether they think that the Demonstrator is a useful
tool and easy to integrate into the school context, and eventually the school curriculum.
4.3.2 Pre-, Posttest-Design
The investigation of the cognitive and behavioural effects will be carried out as a pre-, posttest-design
consisting of three levels:
Level
Methodology
1) Pretest
Initial diagnosis of the child users’ bullying role
(the cognitive and behavioural aspects described
above) with four diagnostic instruments (see
4.3.4).
2) Application of demonstrator
The child users interact individually with the
Demonstrator.
3) Posttest
Final diagnosis of the child users with same
diagnostic instruments as used in the pretest.
Table 1: Methodology for the pedagogical evaluation
According to Perrez & Patry (1982) the investigation takes place in the “field” which they define as
“an ensemble of conditions that is not solely composed of controlled and systematically varied
variables, but of a vast number of constraints that are difficult to survey and whose impacts on certain
dependent variables are unknown in the first instance” (translation by the author). We agree with this
definition of "field", although not all researchers do.
For example, Cook & Campbell (1967) state “by field we understand any setting which respondents do
not perceive to have been set up for the primary purpose of conducting research”. Following this
definition the pedagogical evaluation would not take place in the field because our subjects (the child
users) know that they take part.
Gachowetz (1993) claims that for some authors the place where the study is carried out, is the decisive
point. All research taking place outside of the laboratory is field research according to them.
- 30 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
As the application of the diagnostic instruments and the Demonstrator is an invasion in the normal
processes taking place in the field (Gachowetz, 1993), we can speak of our investigation as a field
experiment.
Field experiments have the advantage of high ecological validity. On the other hand, as explained
above, compared to experiments in the lab the researcher has less control over variables that might
have an impact.
The sample used for this field experiment consists of subjects of the target group of the Demonstrator.
Since bullying is defined as repeated aggression in the context of schools, our sample consists of
school children. The Demonstrator is created for children from eight to twelve years of age, so our
sample consists of children from this age group.
Furthermore, the sample consists of approximately equal parts of pupils from all three participating
countries (Portugal, England and Germany). This is important to see if there are cultural particularities
that diminish or promote the intended effects of the Demonstrator.
The investigation will be carried out with entire school classes. This makes sense for four reasons.
Firstly, it would be a problem to take single pupils out of the class to do a test. It would imply the
danger of stigmatisation of the pupil. Secondly, from an organisational point of view it is easier to
work with entire classes. Thirdly, if entire classes experience the interaction with the Demonstrator this
gives the class teachers the chance to work with it, e.g. to initiate a discussion on bullying. And finally,
by examining entire classes it is most probable that we capture the whole range of bullying types.
Because every class has its bullies, victims and so on.
Our plan is to investigate four school classes in each country. Given an average class size of twenty to
thirty pupils, this would result in an overall number of approximately 300 subjects. Of course, we will
take care that the classes are from schools that are attended by children who have different social and
economic backgrounds.
Sample characteristics
Age of children
8-12 years
Countries assessed
U.K., Portugal, Germany
Sample Size
N: 300 (approximately);
4 school classes from each country
Socio-economic status of schools
Cross-section of lower, middle and upper class
regions
School location
Urban and rural schools
Table 2: Sample characteristics for the pedagogical evaluation
- 31 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
4.3.3 Why the Pedagogical Evaluation Has To Remain Exploratory
According to Annex 1 – “Description of work” of the contract for the VICTEC project (p. 18), the
pedagogical evaluation of the Demonstrator has to remain exploratory. This is due to the following
constraints of the investigation: field experiment, pedagogical constraints, organizational constraints
and effect size.
Due to the nature of field experiments, it will be impossible to control for all the conditions that may
have an impact on the dependent variables to be measured. For example, it is not possible to preclude
communication among the children in between the different parts of the investigation, although this
could have an impact on their perception of the Demonstrator and resulting bullying behaviour. Other
variables that might have an impact (e.g. age, gender and bullying type) can be controlled.
Alternatively, the impact of certain variables could be minimised by sampling a wide spectrum of
subjects. For example, investigating pupils from different social and economic backgrounds.
Further issues concern pedagogical and ethical constraints. Extreme caution should be taken when
evaluating the impact of a new software program which deals with a sensitive subject such as bullying.
Since it is not known how the interaction with the Demonstrator will affect the bullying behaviour of
children the possibility of unintended effects must be taken into account. For example, children could
learn how bullying works as intended (cognitive effect), but this could result in an increase in
proficient bullying behaviour negating the positive effects expected by the interaction.
The most important constraint is of an organisational nature. Since the teachers must follow the
curriculum the number of potential classes that can participate in the investigation is limited.
Furthermore, there are limitations of the time that is available for the investigation. This has the
following consequences:
1) Due to time constraints for the pre- and posttests, diagnostic instruments need to be carefully
selected. For example, interviews with all pupils would take too long.
2) The time allowed for the application of the Demonstrator is limited. It is currently estimated that the
interaction of the child user with the Demonstrator will last approximately 30 minutes. This is an
extremely short period of time that will probably not lead to substantial cognitive and behavioural
effects. Therefore, effect sizes are expected to be rather small.
3) Given the expected small effect sizes and the limited number of pupils that are available for the
investigation, the decision to carry out a research design without control groups has been made. If
classes were divided into two experimental and two control classes in each country, the sample sizes
would be so small that there would be the risk of finding no effects at all.
Since this investigation deals with a rather unexplored area it is difficult to develop precise hypotheses
for the effect size. However, a hypothesis on the direction of the effect for bullying behaviour can be
delineated: bullying behaviour should decrease and the understanding of bullying mechanisms should
increase, but, as pointed out, the concrete size of the effects is impossible to estimate.
4.3.4 Diagnostic Instruments of the Pedagogical Evaluation
Four diagnostic instruments in pre- and posttest are to be employed:
- 32 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
1) A teacher rating of children’s bullying behaviour.
2) A pupil test.
3) A bullying questionnaire for children.
4) An empathy questionnaire for children.
The diagnostic instruments have been chosen because they meet certain demands that were considered
to be important for the investigation. Most importantly, the combination of the instruments should
adequately measure the different aspects of effects of the interaction with the Demonstrator, especially
the cognitive and behavioural effects on bullying. Furthermore, the instruments should be easy to
apply, allowing them to be utilised in schools and not just solely in the laboratory. Finally, the
application should not be too time consuming (see 4.3.3). As a consequence instruments that can be
applied in groups have been selected. Explorative instruments such as the Rorschach-test (Rorschach,
1999; first in 1921) have been omitted as they require individual application.
a)
Teacher Rating (Behavioural Effects)
Procedure
The teachers will be asked to rate the bullying behaviour among pupils (as a perpetrator and as a
victim) on a scale of five (from very often to never). There are different scales regarding the different
types of bullying behaviour: physical bullying, verbal bullying, relational bullying and a scale for an
overall score.
Dependent Variables
The dependent variables to be obtained from this instrument are the ratings of the bullying behaviour
for all pupils for all bullying types.
Critique
The Teacher Rating is a diagnostic instrument which is easy to apply. Since teachers observe their
students on a daily basis, there is no additional effort involved. Teacher ratings of bullying behaviour
should elucidate bullying behaviour from an adult perspective. However, research has shown that the
attitudes and observations of bullying behaviour can be very different between teachers and pupils
(Schaefer, 1996). There are a number of reasons for this. Firstly, children who bully others try to
conceal their behaviour from adults to avoid punishment. Secondly, teachers may judge bullying
behaviour in a biased nature due to individual differences in relationship with different pupils.
However, the teacher rating is a core instrument for the evaluation. If the demonstrator is to be
accepted by schools, it is essential that the teachers are aware of the possibility of the role of the
demonstrator in reducing bullying behaviour.
- 33 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
b)
Pupil Test (Cognitive Effects)
Procedure
The children will receive a picture-story containing ambiguous content. The task of the pupils will be
to re-narrate the story.
Dependent Variables
The re-narration will be analysed in two ways.
1) The language of the re-narration will be analysed regarding the amount of aggression-associated
words used.
2) The content of the re-narration will be analysed. This analysis will for example deal with the
questions: Do the children interpret the picture-story in an aggressive or a non-aggressive manner? Are
there protagonists that can be associated with certain bullying types?
Critique
An advantage of this instrument is that it can be applied during class and thus be useful to the teachers
too. Social desirability effects will be minimal as the purpose of the re-narration will remain
ambiguous to the children participating.
c)
Questionnaire on Bullying (Behavioural Effects)
Procedure
The diagnostic procedure of the bullying questionnaire is described in the chapter on the psychological
evaluation (4.4).
Dependent Variables
The questionnaire on bullying includes a section on personal data. This is comprised of information
about the child’s family background (especially siblings), friends and hobbies. This data will be useful
in order to investigate whether the Demonstrator is effective for specific subgroups of students.
The main body of the questionnaire focuses on bullying behaviour from which data concerning
physical, verbal and relational bullying can be derived. Pupils are asked if they bully others and/or if
they are victims of bullying (physical, verbal or relational). This data is important in two respects.
Firstly, it can be investigated whether the application of the Demonstrator has any effect on the extent
of bullying in classes. Secondly, the data enables the students to be assigned different bullying roles.
This makes it possible to investigate if the effects of the interaction with the Demonstrator depend on
the user's bullying type. The size of the sample collected within the pedagogical evaluation is too small
to get statistically significant results for this question (especially the number of bullies will probably be
too small). This question is dealt with within the psychological evaluation (see 4.4), but because the
pedagogical evaluation takes place six months before the psychological evaluation, the results found
here can be of use in planning the psychological evaluation.
- 34 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
Critique
Pupils are the experts in bullying behaviour with the richest knowledge base as they have first hand
experiences in either bullying themselves, being bullied or being an observer.
However, it may be difficult for the children to disclose the truth to the researcher. All data will be
treated confidentially and the children will be assured that no one, not even their teacher will see the
data collected. Children may still withhold information either because they do not trust the researchers,
or because they try to answer in a socially desirable way. However, previous studies carried out into
the nature of bullying behaviour using individual private interviews with primary school children
(aged 6-9) has highlighted that most children trust researchers and are happy to cooperate in an honest
manner (Wolke et al, 2001).
d)
Questionnaire on Empathy (Cognitive and Emotional Effects)
What is Empathy?
The term “empathy” stems from Titchener (1909), who derived it from the Greek “empatheia” which
means “passion”, “passionate affection” or “to be much affected” (Levy, 1997). Titchener used
“empathy” as a translation of the German term “Einfühlung” which means “feeling into” somebody.
“Einfühlung” is described by Lipps (1903), who worked on aesthetics and originally used the term for
the description of a process between a person and an art object. “Lipps […] believed that empathy was
a form of inner imitation. An observer is stimulated by the sight of an object and responds by imitating
the object, loses consciousness of himself, and experiences the object as if his own identity had
disappeared and he had become the object himself” (Katz, 1963). Of course, imagining what this inner
imitation of an object looks like is quite difficult, but the important thing is that the notion of
“Einfühlung” was transferred to processes between two persons. Thus, a working definition of
empathy could be “a person feeling into another person”. In psychological literature the first person is
referred to as “observer”, the second as “target”.
When speaking of empathy today, two distinct perspectives of empathy have to be distinguished. They
describe the two possible results of an empathic process between an observer and a target: a cognitive
perspective and an affective perspective (Holz-Ebeling & Steinmetz, 1995).
Researchers who use the term “cognitive empathy” refer to the cognitive perspective, which means
the observer tries to understand how the target should feel in a given situation. The cues available for
the observer are the behaviour of the subject (including his bodily especially facial expression of
emotion) and the situation with which the target is dealing with. The result of this process of
understanding is a cognition, e.g. “I think the target is feeling sad, because he lost his wallet.”
“Affective empathy” refers to processes with an affective outcome. When such a process takes place
the observer feels something due to the perception of a target. That much is clear, but researchers are
discordant on the question of what quality the relationship between the emotion of the observer and the
emotion of the target must be. Some researchers (e.g. Stotland, 1969) just postulate that the observer’s
emotion has to be caused by the perception of the target, to be labelled empathic. Others (Kobayashi,
- 35 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
1995; Eisenberg & Strayer, 1987) state that the observer’s emotion should at least be adequate to the
target's inner state. A third party (e.g. Stroebe et al., 1996) demands a parallel affective outcome of the
observer and the target. We follow Stotland’s opinion because, we think that on the one hand it is not
possible to decide which emotion of the observer can be labelled “adequate” or not. On the other hand,
since the observer cannot distinguish between emotional reaction on the process of feeling in the target
and the emotion resulting from the feeling in process itself, it does not make sense to try to develop
questionnaire scales that do so.
Why Measure Empathy?
VICTEC stands for “Virtual Information and Communication Technology with Empathic Characters”.
Thus, it is clear that empathy is a key concept within the project. Two reasons can be highlighted to
illustrate the importance of measuring the empathy of the user and not solely concentrating on the
empathy of the characters.
Firstly, the concept of the Demonstrator is to help the child user to gain insight into the social and
psychological processes that lead to bullying. This means that after interacting with the Demonstrator
the users should have a fuller understanding concerning why other children behave the way they do in
bullying situations. If this is the case, the interaction with the Demonstrator should improve the
empathic abilities of the users. The use of the empathy questionnaire will be useful in determining
whether this is the case. It is crucial that the questionnaire is sensitive to changes in empathic abilities
and not just a measure of trait empathy. A literature review concerning empathy illustrates that there
has been one successful attempt in measuring empathy as a state among adults (Nezlek et al., 2001).
Furthermore, research shows that even trait measures are sensitive to changes (Badke-Schaub &
Schaub, 1986).
It is reasonable to assume that empathy serves as a confounding variable in the pedagogical evaluation.
It is possible that empathic children show greater learning effects than less empathic children because
they understand more fully the nature of social bullying situations that are presented on the screen and
have a higher ability of transferring the content presented there into real life. On the other hand, there
are indications that high empathy and less aggressive behaviour are associated (Miller & Eisenberg,
1988). Therefore, an effect of the interaction with the Demonstrator on the bullying behaviour on users
with high levels of empathy may not be found (ceiling effect).
Additionally, if empathy is measured within the pedagogical evaluation there is an area of general
psychological interest. Together with the data collected from the bullying questionnaire, it can be
investigated whether there are any differences in the empathic abilities for different bullying types. For
example, it may be possible to substantiate previous research questions (Sutton & Smith, 1999) as to
whether pure bullies are less empathic than bully/victims.
- 36 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
Procedure
Structure of the empathy Questionnaire
The questionnaire consists of five scales referring to the empathic outcomes (cognitive, affective)
mentioned above and the types of mediation.
Three types of mediation of targets’ emotions can be distinguished: 1) via situational cues (situation
mediated), 2) via bodily expression cues (expression mediated) and 3) via ideomotoric cues
(ideomotoric).
Situation mediated means that the observer perceives the situation in which the target is acting and
thereupon reacts empathic.
Expression mediated means that the observer reacts empathically upon the perceived bodily
expressions of the target's emotion. The target's emotions can show in mimic, gesture, posture,
paraverbal parameters (e.g. pitch of voice, speech-rate) and physiological parameters (e.g. flush).
Ideomotoric mediation refers to results of the research group of Prinz (2002). Basis of their research is
the so called “ideomotoric principle” by William James (1890) which means that “each imagination of
a body movement implies the tendency to perform this movement”.
Prinz and his colleagues extended this principle. According to them not only the imagination of a
movement, but the perception of another person's movement is able to trigger this tendency. This
means that the observer must have the same or at least similar motoric schemes as the target at his
disposal. A minimum similarity of motoric schemes of observer and target can be postulated for an
empathic reaction. Prinz and colleagues showed that subjects can predict the trajectories of their own
handwriting better than that of others. This could be due to the fact that they, of course, are familiar
with their own motoric schemes. Referring to empathy this could mean, that empathic persons have
wider and/or more flexible motoric schemes. But the concept need not be limited to motoric schemes,
it can comprise actions as well. This was demonstrated by Bach et al. (2001) with the game “paper,
scissors, stone”, where subjects had to predict the decisions of other subjects on the basis of their
movements.
The different scales of the questionnaire result from the combination of the different outcomes
(cognitive, affective) and types of mediation (situation, expression). Since the mechanism of
ideomotoric empathy is not entirely clear to us yet, the concept remains as a whole in the first instance.
Five aspects (scales) of empathy can be distinguished for the final questionnaire:

cognitive, situation mediated empathy (11 items)

cognitive, expression mediated empathy (11 items)

affective, situation mediated empathy (11 items)

affective, expression mediated empathy (7 items)

ideomotoric empathy (11 items)
- 37 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
The initial version of the questionnaire consists of 51 items. The scale “affective, expression mediated
empathy” consists of only seven items, because of difficulties in creating appropriate ones. The other
scales consist of eleven items each.
It was necessary to create a new empathy questionnaire due to a lack of appropriate existing measures.
The only existing empathy questionnaire for children stems from Bryant (1982; “index of empathy for
children and adolescents”) and does not distinguish between the five scales, which is necessary for the
VICTEC project. Furthermore, the validation process of Bryant's questionnaire is disputable posing a
risk if used for the current pedagogical evaluation. However, the scale was of use, because the items
have been incorporated into the new questionnaire.
A few items were taken from the empathy questionnaire devised by Leibetseder and colleagues (2001)
and adapted for children. The remaining items were developed by the VICTEC team.
As the questionnaire is a new instrument, the quality in terms of validity and reliability needs to be
considered.
Item Analysis, Reliability and Validity
The questionnaire must be available in three language versions: Portuguese, English and German.
Therefore, the items have been translated and quality analyses will be carried out in each country.
The investigation aims to include two hundred subjects in each country. One hundred aged eight and
one hundred aged twelve. Pupils will be selected from different schools allowing the questionnaire to
be evaluated for all educational backgrounds.
The investigation will provide data concerning complexity issues and item selectivity. Because the
questionnaire is designed as a state measure, it is not possible to calculate retest reliability. Thus, splithalf reliability will be calculated.
The assessment of validity faces some crucial difficulties. The usual validation procedure involves
correlating the questionnaire with another one that has previously been validated to measure the same
construct. However, this is not possible since there are no empathy questionnaires for children aside
from the Bryant questionnaire. The Bryant questionnaire cannot be used because it only has one
dimension (empathy in general) and the items have been integrated into the new questionnaire to be
validated. Furthermore, the method for validation used by Bryant is questionable itself.
There are 3 possibilities to validate the questionnaire:
1) Perform a factor analysis. The results of this factor analysis should reflect the three aspects of
empathy, cognitive, affective and ideomotoric, and the two different types of mediation. If this is the
case it would provide strong support for the validity of the questionnaire.
2) The scales on cognitive and affective empathy will be validated with self-assessments of the
cognitive and affective abilities of the children.
- 38 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
3) Experts (students of psychology and researchers) will assess, if the items are appropriate to measure
the specific scales they are developed for.
Dependent Variables
From the questionnaire, scores for all five scales and an overall score on empathy will be generated.
Why will Psycho-physiological Measures not be employed?
Psycho-physiological measures were considered as a possibility to assess the reactions and the
involvement of the child users when interacting with the Demonstrator. Psycho-physiological
measures could also have been a criterion for the validation of the affective empathy scales of the
questionnaire. However, after deliberation, it was decided not to use psycho-physiological measures.
The most important reason for refraining from using these measures was due to the opposition of the
most important groups involved in the pedagogical evaluation: the teachers, the children and the
parents. Parents in particular are very concerned about research projects that occur within schools and
would not allow any procedures that may be in any respect deleterious to their children.
Furthermore, such measures may jeopardise the quality of the data itself. Although it is easy to apply
measures such as heart rate or skin conductance (and only those can be taken into account) they
provide little information about the quality of the emotion felt by the subject. For example, if heart rate
is measured, the information provided is whether the subject is aroused or not.
Another issue is that psycho-physiological measures are only really appropriate for one-point
measurements and not for the measurement of processes. If physiological measures are recorded over a
long time period (like the interaction with the Demonstrator) the data is likely to be biased by
disruptive factors such as motoric action and would lead to interpretation difficulties. Furthermore,
physiological data is extremely noisy, resulting in the researcher having to collect data several times
in a row to get one accurate measure.
Finally, the application of such measures is time consuming (installing the devices, getting the base
rates..) which would pose a limitation for the current evaluation.
4.3.5 Measurement of Educator Focused Learning Criteria
Teachers will be asked to complete a questionnaire to determine whether they found the Demonstrator
useful and thought that the interaction sequences provided to the children were optimal from their
pedagogical point of view. Furthermore, the possibility of integrating the Demonstrator into school
curriculum needs to be evaluated from the teacher’s perspective.
Communication with the child users will be another crucial aspect of the evaluation. The effects of the
interaction with the Demonstrator will be measured with the diagnostic instruments mentioned above.
- 39 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
Informal communication and focus groups with the child users will help us to evaluate if children liked
playing with the Demonstrator and the utility of it.
4.3.6 Timetable for the Pedagogical Evaluation
Date
Action
Create, pilot and validate instruments (empathy
June 2003
questionnaire, pupil test and bullying questionnaire)
September 2003
Design for the statistical analysis of the data
January 2004
Pretest
February 2004
Application of Demonstrator in schools
March 2004
Posttest
Table 3: Timetable for pedagogical evaluation
- 40 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
4.4 Psychological Evaluation
4.4.1 Aims of the Psychological Evaluation
Psychological research concerning bullying and aggression in schools has increased over the past
decade. There is now substantial evidence within the literature that bullying and victimisation
problems are problematic worldwide with prevalence rates of victimisation ranging from 8% to 46%,
and bullying ranging from as little as 3% to 23% (Wolke et al. 2001). The consequences of bullying
behaviour have also been widely researched and reveal both short term and long term decrements for
children including mental health and behaviour problems (Wolke et al. 2000), physical health
problems (Wolke et al., 2001a), truancy (Sharp, 1995), criminal behaviour (Farrington, 1993) and even
suicide (Olweus, 1993).
Research studies are now considering the individual differences of children involved in bullying
behaviour in terms of those children categorised as ‘pure’ bullies, ‘pure’ victims, bully/victims and
neutral children, who can either be bystanders in bullying incidents or defenders to the victim.
However, there is still little known about the individual differences in cognitive styles for children
involved in bullying behaviour and coping and planning skills in combating victimisation. For
example, there is still uncertainty regarding whether ‘pure’ bullies are socially intelligent individuals
who are extremely competent in gauging the intentions of others, and whether ‘pure’ victims have
deficits in encoding and interpreting the behaviour of others. These skills are coined within the term
‘Theory of Mind’ which is used to understand social scenarios and the thinking of others. Theory of
mind skills allow children and adults to form representations of mental states such as pretending and
knowing and secondly to understand the relationships between states and actions. Theory of Mind
(ToM) tasks can be used to assess whether children can attribute mental states to themselves and others
in order to explain and predict behaviour. The ability to recognise that others can have false as well as
true beliefs is a central tenet to the development of Theory of Mind skills (E.g. Sally-Anne task,
Premack and Woodruff, 1978; Baron-Cohen et al., 1985). The ‘Sally-Anne Test’ is acted out by two
puppets. Sally is seen putting a marble in a specific place. Later, while Sally is away, Anne puts the
marble somewhere else. Sally returns to the room and the child is asked ‘where will Sally look for the
marble?’ One consideration concerns the distinction between Theory of Mind and empathy. Theory of
Mind is a broad cognitive concept where the understanding of emotions and empathy play a role. For
example, Theory of Mind can be used to explain and predict a great deal about human talk and action.
The theory encapsulates the mental states of thinking, knowing, guessing, remembering, hoping,
fearing, perceptions, intentions and emotions. There are all important elements for the development,
understanding and display of empathy. Research studies are also interested in the types of coping
mechanisms that children use to deal with bullying incidents such as telling somebody, or ignoring the
bully (Talamelli & Cowie, 2001). What is not known are the justifications and reasons why children
select specific types of coping mechanisms, and whether there are differences in the types of coping
mechanisms chosen by bullies, bully/victims, victims and neutral children.
- 41 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
The major aim of the psychological evaluation is to determine whether user characteristics (roles in
bullying behaviour) are reflected in user choices regarding the bullying scenarios in terms of mental
representations (e.g. theory of mind), action choices, empathy and differences in coping strategies.
Two major questions will be investigated:
1) Are there any differences in Theory of Mind responses for children classified as ‘pure’ bullies,
‘pure’ victims, bully/victims or neutral children for both direct and relational bullying
behaviour?
2)
a) Are there distinctions between the types of coping mechanisms selected by ‘pure’
bullies, ‘pure’ victims, bully/victims and neutral children when interacting with the
Virtual Learning Environment (VLE) bullying scenarios?
b) Are there differences in terms of the justifications that children state for selecting
coping mechanisms according to bullying status?
c) Are there any differences in how the children empathise with the characters in the
scenarios (empathy)?
4.4.2 Proposed methodology for the Psychological Evaluation
Location: The psychological evaluation is to be held at the University of Hertfordshire where 65
computers with the required specifications to run the demonstrator can be used for a consecutive
period of two weeks. Teams in Portugal and Germany do not have any access to such large numbers of
computers at a time. Children will interact with the VLE on an individual basis.
Date of the evaluation: It is proposed that the psychological evaluation will take place during June
2004. There are two important reasons for this:

School summer vacation is in mid July until the beginning of September and we need to coordinate
the use of the University computers when the university students are on summer vacation.

The results from the pedagogical evaluation will provide useful insights for the psychological
evaluation and these will be available by June 2004.
Sample Size: It is proposed that the psychological evaluation will involve approximately 400 children
aged 8-12 years from schools in Hertfordshire and the surrounding area, in the United Kingdom. A
large sample of at least this size is required in order to get a large enough sub-set of children who are
characterised as ‘pure’ bullies for either direct and relational bullying behaviour.
- 42 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
Proposed Assessment Instruments:
1) Bullying Questionnaire: A bullying questionnaire will be given to every child participating in
the psychological evaluation to complete before they commence interaction with the bullying
scenarios in the VLE. This questionnaire assesses the following areas:
- Friendship including liking & disliking
- Physical (direct) bullying
- Relational bullying
- Sibling bullying
- Verbal bullying
- hobbies (inc. computer games)
- Child’s perception of strength
Upon completion, children’s individual bullying status can be computed to determine whether
they are categorised as a ‘pure’ bully, ‘pure’ victim, bully/victim or neutral child for both direct
and relational bullying.
2) Theory of Mind Questions (ToM): ToM questions are to be devised for each child
participating in the evaluation to complete. Examples of such questions are ‘how do you think
Billy was feeling when he was hit by John?’ ‘How would you feel if you were Billy?’ ‘What do
you think Billy was feeling about John after he was hit?’ The ToM questions are to be included
at the end of the child’s interaction with the VLE. If the questions were integrated during the
VLE interaction, it was felt that this would greatly reduce the believability of the experience for
the child. A series of storyboards will be provided for the child at the end of the VLE
interaction depicting the major events that happened during each scenario episode. The relevant
ToM questions can then be integrated with these storyboards which should provide a reminder
of the events. Work is still being carried out into the types of ToM questions to be asked.
3) Justification Questions: During the child’s interaction with the bullying scenarios, the child
will have the choice to select a series of different coping strategies to deal with the incident that
they have witnessed. The coping mechanisms have been categorised into 5 different groups
(e.g. passive response style, future plan by yourself). Once the child has selected a coping
mechanism, they will be asked a series of questions through the use of the other characters in
the VLE so as not to reduce the believability of the interaction. Example questions are, ‘Why
did you select the coping response, go to tell the teacher?’ ‘What do you think the teacher will
do to deal with this bullying situation?’ The justification questions to be used still need to be
finalised.
- 43 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
4.4.3
Timetable for the Psychological Evaluation
DATE
January 2003
-
February 2003 – March 2003
March 2003 – April 2003
April 2003 – May 2003
May 2003 – June 2003
June 2003 onwards
January 2004 onwards
April 2004 – June 2004
June 2004
June 2004 onwards
-
EVENT
Translation of the bullying questionnaire into
German and Portuguese.
Ethic approval forms submitted to the UH board
for psychological evaluation.
Pilot the bullying questionnaire in schools in the
UK. Once this has been done, pilot studies will
take place in Portugal and Germany.
Carry out necessary modifications.
Schools visits to take place to evaluate the
content of the bullying scenarios for the
psychological evaluation.
Carry out necessary modifications.
Generation of Theory of Mind questions and
justification questions to be used in the
evaluation.
Theory of mind and justification questions to be
piloted to ensure that the children understand
them.
Carry out necessary modifications.
Ideas to be proposed for the activities that the
children will take part in during the day at UH in
addition to taking part in the evaluation. Travel
arrangements and possible sponsors to be
considered.
Schools contacted in the Hertfordshire region to
take part in the psychological evaluation. School
visits carried out where necessary to encourage
schools to take part.
Pedagogical evaluation takes place. Information
from this used to aid psychological evaluation.
Technical equipment checked etc for the running
of the evaluation.
Volunteers recruited to help run the evaluation.
Psychological evaluation takes place at UH.
Data collated and analysed.
Publications prepared.
Project feedback provided to schools.
- 44 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
4.4.4 Work to date on the Psychological Evaluation
1. Initial version of the Bullying Questionnaire completed.
2. Initial ToM questions formulated.
3. Initial Justification questions formulated.
4. Submission for ethics permission regarding research with children.
5. Next work is to pilot the bullying questionnaire in all 3 countries (U.K., Germany, Portugal).
6. Work with the schools to ensure that they understand the ToM questions chosen and
justification questions to be integrated into the VLE.
4.4.5 Educational impact of the Psychological Evaluation
The psychological evaluation will have high educational impact in terms of shedding light on the
different characteristics of ToM capabilities and deficits for bullying roles, the types of coping
mechanisms that children believe work and do not work, and the reasons why and whether there
are particular styles that victims use repeatedly that could result in the circular notion of
victimisation.
4.4.6 Dissemination
1. Presentations for Psychological based conferences.
2. Publications for peer reviewed psychological journals.
3. Feedback in teacher workshops.
4. Likely to attract media attention for written, radio/TV media.
- 45 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
5. Conclusion
The present document provided an overview of the design of the Toolkit and Demonstrator evaluation,
two software programs to be developed by the VICTEC project.
The main focus of this document is on the completeness of the evaluation aspects. With the great
variety of evaluation aspects this goal is reached. The proposed methods are easy to apply and
sufficient in light of the constraints the evaluators have to deal with (e.g. time limitation at the
pedagogical evaluation). Furthermore, attention is paid to collecting the data from different sources to
improve the validity of the data. This is evident for example in the participation of both usability
evaluation experts and users in the usability evaluation or in the inclusion of teachers and children in
the pedagogical evaluation. Thus, validity of evaluation results will be high.
The participation of teachers and users serves another goal of the VICTEC project, the dissemination
of the developed software. Feedback from these groups gives information whether the Demonstrator in
the present version will be accepted by those who are responsible for its use. Additionally, the methods
provide data that can be used to answer research questions that go beyond the primary evaluation
purpose, for example the relationship between bullying type and empathy (see chapter on pedagogical
evaluation).
The following D7.1.1 (operational evaluation methodology) will focus on the concrete instruments that
will be used for the evaluation. Especially the self-developed instruments will be introduced there in
detail (e.g. questionnaire items, time needed for application). Furthermore, D7.1.1 will focus on the
statistical or qualitative analysis of the data collected within the evaluation process. The integration of
data from this variety of different evaluation methods is a complex and difficult problem that has to be
solved now that the methods are clear.
- 46 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC
6. References

Bach, P., Knoblich, G., Friederici, A. D., & Prinz, W. (2001). Comprehension of action
sequences: The case of Paper, Scissors, Rock. In J. Moore and K. Stenning, Proceedings of the
Twenty-Third Annual Conference of the Cognitive Science Society (pp 39-44). Mahwah, NJ:
Lawrence Erlbaum Associates.

Badke-Schaub, P. & Schaub, H. (1986). Persönlichkeit und Problemlösen. Bamberg:
Diplomarbeit Lst Psychologie II Universität Bamberg.

Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a 'theory of
mind'? Cognition, 21, 37-46.

Bortz, J. & Döring, N. (2002). Forschungsmethoden und Evaluation für Human- und
Sozialwissenschaftler. Heidelberg: Springer.

Brna, P., Martins, A., & Cooper, B. (1999). My first story: Support for learning to Write
Stories. In G. Cumming, T. Okamoto & L. Gomez (Eds), Advanced Research in Computers
and Communications in Education, Amsterdam: IOS, 335-341.

Brooke, J. (1996). SUS: A 'quick and dirty' usability scale. In P. W. Jordan, B. Thomas, B. A.
Weerdmeester, & I. L. McClelland (Eds.), Usability evaluation in industry . London, UK:
Taylor & Francis.

Brouwer-Janse, M. D., Suri, J. F., Yawitz, M. A., de Vries, G., Fozard, J. L., & Coleman, R.
(1997). User interfaces for Young and Old. Interactions, 4(2), 34-46.

Bryant, B. K. (1982). An Index of empathy for children and adolescents. Child Development,
53(1), 413-425.

Cockton, G., & Woolrych, A. (2001). Understanding Inspection Methods: Lessons from an
Assessment of Heuristic Evaluation. In A. Blandford, J. Vanderdonckt, & P. Gray (Eds.),
People and Computers XV, Springer-Verlag, 171-192.

Cockton, G., Lavery, D., & Woolrych, A. (2003). Inspection Based Evaluations. In J. A. Jacko
& A. Sears (Eds.), The Human-Computer Interaction Handbook. Laurence Erlbaum
Associates. 1118-1138

Cooper, B., & Brna, P. (2000). Influencing the Intangible: Towards a Positive Ambience for
Learning through Sensitive Systems and Software Design in the Classroom of the Future. Paper
presented at the British Education Research Association, Cardiff.

de Vries, G. (1997). Involvement of School-aged Children in the Design Process. Interactions,
4(2), 41-2.

Eisenberg, N. & Strayer, J. (Eds.). Empathy and its development. Cambridge: Cambridge
Press.
- 47 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC

Erikson, T. D., & Simon, H. A. (1985). Protocol Analysis: Verbal Reports as Data. Cambridge,
MA: MIT Press.

Farrington, D. P. (1993). Understanding and preventing bullying. In M. Tonry (Ed.), Crime and
Justice (Vol. 17, pp. 381-458). Chicago: University of Chicago.

Gachowetz, H. (1993). Feldforschung. In: Roth, E. (ed.). Sozialwissenschaftliche Methoden
(3rd edition). München: Oldenbourg, 245-266.

Gorman, G. E. & Clayton, P. (1997). Qualitative Research for the Information Professional: a
practical handbook. London: Library Association Publishing.

Gould, J. D., Boies, S. J., & Clayton, L. (1991). Making usable, useful productivity-enhancing
computer applications. Communications of the ACM, 34(1), 75-85.

Hanna, L., Risden, K., & Alexander, K. (1997). Guidelines for Usability Testing with
Children,. Interactions, 4(2), 9-14.

Holz-Ebeling, F. & Steinmetz, M. (1995). Wie brauchbar sind die vorliegenden Fragebogen zur
Messung von Empathie? Kritische Analysen unter Berücksichtigung der Iteminhalte.
Zeitschrift für Differentielle und Diagnostische Psychologie, 16, 11-32.

Immersive Education (2001). Kar2ouche. Oxford.

International Standards Organisation (1998). Ergonomic requirements for office work with
visual display terminals (VDTs) -- Part 11: Guidance on usability.

Katz, R. L. (1963). Empathy: Its nature and uses. New York: Free Press.

Kobayashi, M. (1995). Selbstkonzept und Empathie im Kulturvergleich. Konstanz:
Universitätsverlag Konstanz.

Lavery, D., Cockton, G., & Atkinson, M. (1997). Comparison of Evaluation Methods Using
Structured Usability Problem Reports. Behaviour and Information Technology, 16(4/5), 246266.

Leibetseder, M., Laireiter, A.-R., Riepler, A. & Köller, T. (2001). E-Skala: Fragebogen zur
Erfassung von Empathie – Beschreibung psychometrische Eigenschaften. Zeitschrift für
Differentielle und Diagnostische Psychologie, 22(1), 70-85.

Lieberman, D. A. (1999). The Researcher's Role in the Design of Children's Media and
Technology. In A. Druin (Ed.), The Design of Children's Technology. San Francisco: Morgan
Kaufmann Publishers, 73-97

Lin, H. X., Choong, Y.-Y., & Salvendy, G. (1997). A Proposed Index of Usability: A Method
for Comparing the Relative Usability of Different Software Systems Usability Evaluation
Methods. Behaviour and Information Technology, 16(4/5), 267-278.
- 48 Deliverable2.2.2/final
Deliverables Report
IST-2001-33310 VICTEC

Lipps, T. (1903). Einfühlung, innere Nachahmung und Organempfindungen, Archiv für die
gesamte Psychologie, 2, 185-204.

Maulsby, D., Greenberg, S., & Mander, R. (1993). Prototyping an intelligent agent through
Wizard of OZ. Paper presented at the Human-Computer Interaction - INTERCHI '93, 277-284.

Miller, P. A. & Eisenberg, N. (1988). The Relation of Empathy to Aggressive and
Externalizing/Anitsocial Behavior. Psychological Bulletin, 103(3), 324-344.

NECTAR.
(2001).
User
Centred
Requirements
http://www.ejeisa.com/nectar/inuse/6.2/contents.htm.

Nezlek, J., Feist, G. J., Wilson, F. C. & Plesko, R. M. (2001). Day-to-Day variability in
empathy as a function of daily events and mood. Journal of Research in Personality, 35, 401423.

Nielsen, J. (1992). Finding usability problems through heuristic evaluation. Paper presented at
CHI'92, Monterey, CA, 373-380

Nielsen, J. (1993). Usability Engineering, Academic Press Inc: London.

Nielsen, J. (1994a). Enhancing the Explanatory Power of Usability Heuristics. Paper presented
at the CHI'94 Conference on Human Factors in Computing Systems, ACM Press, 152-158

Nielsen, J. (1994b). Guerrilla HCI: Using Discount Usability Engineering to Penetrate the
Intimidation Barrier. In R. G. Bias, Mayhew, D.J. (Ed.), Cost-Justifying Usability (pp. 245272). London: Academic Press Inc.

Nielsen, J. (2000). Why You Only Need to Test With 5 Users. Jakob Nielsen's Alertbox.
Available: http://www.useit.com/alertbox/20000319.html.

Nielsen, J. (2000a). Designing Web Usability: The Practice of Simplicity. New York: New
Riders Publishing.

Nielsen, J. (2002). Heuristic Evaluation. Available: http://www.useit.com/papers/heuristic/

Nielsen, J., & Landauer, T. K. (1993). A mathematical model of the finding of usability
problems. Paper presented at ACM INTERCHI '93, Amsterdam, ACM Press, 206-213.

Olweus, D. (1993). Bullying at school: What we know and what we can do. Oxford: Blackwell
Publishers.

Perlman, G. (1999). Web-Based User Interface Evaluation with Questionnaires, Available:
http://www.acm.org/~perlman/question.html.

Perrez, M. & Patry, J. L. (1982). Nomologisches Wissen, Technologisches Wissen,
Tatsachenwissen – drei Ziele sozialwissenschaftlicher Forschung. In: Patry, J. L. (ed.).
Feldforschung. Methoden und Probleme sozialwissenschaftlicher Forschung unter natürlichen
Bedingungen. Bern: Huber, 45-66.
- 49 Deliverable2.2.2/final
Handbook.
Available
at
Deliverables Report
IST-2001-33310 VICTEC

Premack, D., & Woodruff, G. (1978). Does the Chimpanzee have a theory of mind?
Behavioural and Brain Sciences, 1, 515-526.

Prinz, W. (2002). Ideomotorische Handlungstheorie. Presentation at the 43rd conference of the
German Society of Psychology, 22.-26.9.02, Berlin, Germany.

Rubin, J. (1994). Handbook of Usability Testing: How to Plan, Design and Conduct Effective
Tests: John Wiley and Sons.

Rudd, J., Stern, K., & Isensee, S. (1996). Low vs. high-fidelity prototyping debate.
interactions, 3(1), 76-85.

Schaefer, M (1996). Different Perspectives of bullying. Poster presented at the XIV Meetings
of ISSBD, August, Quebec, Canada.

Sears, A. (1997). Heuristic Walkthroughs: Finding the Problems Without the Noise.
International Journal of Human-Computer Interaction, 9(3), 213-234.

Sharp, S. (1995). How much does bullying hurt? The effects of bullying on the personal
wellbeing and educational progress of secondary aged students. Educational and Child
Psychology, 12(2), 81-88.

Snyder, C. (1996). Using Paper Prototypes to Manage Risk. Software Design and Publishing
Magazine.

Stotland, E. (1969). Exploratory investigations of empathy. In: Berkowitz, L. (Ed.). Advances
in experimental social psychology. New York: Academic Press, p. 271-314.

Stroebe, W., Hewstone, M. & Stephenson, G. M. (1996, Eds.). Sozialpsychologie – Eine
Einführung. Berlin: Springer.

Sutton, J. & Smith, P. K. (1999). Bullying as a group process: An adaptation of the participant
role approach. Aggressive Behaviour, 25, 97-111.

Talamelli, L., & Cowie, H. (2001). How pupils cope with bullying: Successful and unsuccessful
strategies. London: University of Surrey, Roehampton in association with HSBC.

Titchener, E. (1909). Experimental psychology of the thought processes. New York:
Macmillan.

Wolke, D., Woods, S., Bloomfield, L., & Karstadt, L. (2000). The association between direct
and relational bullying and behaviour. J Child Psychol Psychiatry, 41(8), 989-1002.

Wolke, D., Woods, S., Bloomfield, L., & Karstadt, L. (2001). Bullying involvement in primary
school and common health problems. Archives of Disease in Childhood, 85, 197-201.

Wolke, D., Woods, S., Schulz, H., & Stanford, K. (2001). Bullying and victimisation of
primary school children in South England and South Germany: Prevalence and school factors.
British Journal of Psychology, 92, 673-696.
- 50 Deliverable2.2.2/final
Download