Uploaded by Viet Nguyen

Chapter 9

advertisement
9
Evaluation in CALL
Tools, interactions, outcomes
Catherine Caws and Trude Heift
As an ever-growing area of research, computer-assisted language learning (CALL) abounds with
tools, systems or learning environments that are constantly changing to reflect the latest technology trends. Faced with the reality that quality control in CALL is not a simple task, researchers
have been looking for new and more specific measures to assess the pedagogical and educational
values brought about by these new technologies. In this chapter, we reflect on the changing
nature of evaluation in CALL by focusing on three key issues that emerged from our research:
the evaluation of tools, interactions and outcomes. In doing so, our reflections are based on data
collections from two case studies: FrancoToile, an online oral corpus of French speakers, and
E-Tutor, an online written, form-focused ICALL environment. We address the various stages of
evaluation and focus on its iterative nature by emphasising the relationship between the tools,
the interactions and the outcomes in relation to the task. As such, and while using these two
specific yet distinct case studies, our analysis applies to any context where a digital platform
becomes the focus of a language learning environment.
Evaluation in the context of language learning and technology
Evaluation is a multifaceted concept. Traditionally, evaluation in education included a strong
human dimension within a fairly stable learning system, anchored in a set institution. Assessment of learning generally followed a linear pattern of activities regulated by a behaviourist
approach. However, with the advent of different types of technologies as well as the ubiquitous
nature of computers, the core concepts of evaluation have been radically transformed. Learning
takes place anywhere, anytime, in a nonlinear fashion, and in conditions that seemed impossible
as recently as a few decades ago. Due to this ever-changing nature of learning, evaluation needs
to address these changes, and the ways in which learning systems and contexts, learners and/or
concepts are evaluated need to be diversified. Accordingly, evaluation in computer-assisted language learning (CALL) ideally includes several factors: (1) the learner, (2) the tool that is being
utilised, (3) their interactions in relation to a task and (4) the learning outcomes.
This chapter mainly focuses on three of these aspects of CALL evaluation, namely the tool,
the interactions and the outcomes, in addition to discussing some of the issues relating to the
learner. Generally, our reflections will be in part based on qualitative and quantitative data that we
127
Catherine Caws and Trude Heift
collected over several years and that emerged from a series of case studies using two specific CALL
tools: FrancoToile and E-Tutor. While these two systems constitute only two examples amongst
the multitude of CALL tools, they illustrate and emphasise the fact that CALL evaluation needs to
adapt to the diversification of CALL contexts by using a multitude of assessment measures.
A need for evaluation
In today’s language learning context, CALL evaluation often depends on experiments that include different actors, tools and artefacts. Adopting a sociocultural perspective, we define the
actors as anyone directly or indirectly involved in the learning process: learners, instructors,
native speakers with whom learners may interact as well as developers involved in the initial
design of an instrument, among others. The tools are as varied and as indefinable as the arte­
facts they can produce. For instance, with the advent of Web 2.0 and the central role of socialnetworking, tools for language learning range from a piece of software or digital repository,
developed in-house to meet specific pedagogical goals and needs, to applications or systems
created by a public or private enterprise in order to attract web traffic. As a consequence of this
symbiosis of education and popular tools, the range of artefacts that language learners can produce nowadays has also grown exponentially: oral and written digital productions (sometimes
published online), blog entries, vlogs, wikis, full sites and/or simple answers to language-related
episodes, as well as more structured exercises. This variety of learner productions constitutes
rich corpora and merits its own evaluation, and in fact has already nurtured multiple empirical
studies. Moreover, the complexity of CALL has lead to an ever-growing need for evaluation.
Levy and Stockwell (2006: 41) cite Johnson (1992), who differentiates evaluation from research by
asserting ‘the purpose of an evaluation study is to assess the quality, effectiveness and general value
of a program or other entity. The purpose of research is to contribute to the body of scholarly
knowledge about a topic or to contribute to theory’. Levy and Stockwell (see also Levy, Chapter 7 this volume), however, admit that the distinction between the two is not fully clear and,
in fact, one might argue that CALL research does need to include proper evaluative methods.
If evaluation is to be more limited than pure empirical research, then the need for evaluation
resides in its potential for assessing learning tasks and outcomes by, at the same time, establishing
the factors that need to be taken into account when designing CALL tools and/or tasks.
The need for evaluation is also summarised by Chapelle (2001: 53), who suggests ‘an evaluation has to result in an argument indicating in what ways a particular CALL task is appropriate for
particular learners at a given time’. Moreover, an additional argument in favour of including evaluation methods within research methods can be illustrated by a need to link evaluation to development of a CALL tool and its implementation in authentic learning settings. Hubbard (1996)
framed this need for an integrated approach by, for instance, devising a CALL methodological
framework that includes an evaluation module designed to assess the learner fit or the teacher fit
of a system. In a similar manner, Levy and Stockwell (2006: 42) associate evaluation with design
by claiming an obvious overlap between the two concepts, and recall important features of evaluation such as the fact that evaluation studies ‘have a practical outcome’ and ‘draw value from the
process as well as from the product of the evaluation’. Finally, the diversity of CALL contexts also
motivates the ever-growing need for evaluation. As expressed by Stockwell, CALL is:
a field that by nature is divergent and dynamic, and for this reason, we might argue that
diversity in CALL is something that is not only inevitable, but also something that is necessary to provide the best options for the myriad of contexts in which it is used.
(Stockwell 2012: 5)
128
Evaluation in CALL
Types of evaluation
The diversity of CALL contexts, however, also nurtures diversity of CALL evaluation tools,
methods and studies. The goals, objects and actors motivating the evaluation also affect the
types of evaluation that will be required or designed. In addition, the current culture of CALL,
and, more specifically, the growing role of digital media in the daily life of learners, cannot be
ignored. With a goal to empower students to ‘control technology’, Selber (2004: 93) argues that
for proper use of technology in education to happen, ‘contexts of use deserve at least as much
attention as contexts of design’. Indeed, by giving power to users to reflect and evaluate their
own use of instruments, they are developing the type of meta-knowledge that they can use to
properly manipulate, analyse and eventually resist some aspects of the digital world in which
they live. As a result of this ever-changing nature of CALL and, while evaluation is typically understood as the assessment of instruments, we need to add a human factor to CALL evaluation
to reflect the dynamic and diverse aspects of language learning.
Moreover, and as a result of the dynamic nature of CALL, evaluation requires as much,
and probably more, scrutiny than ever (see e.g. Hubbard 1987; Dunkel 1990; Chapelle 2001;
Felix 2005; Leaky 2011). Hubbard (1987), for instance, recommends that effectiveness of CALL
software be checked in its relation to the language approach it reflects and thus promotes. For
instance, a system promoting an acquisition approach of language learning will ‘provide comprehensible input at a level just beyond that currently acquired by the learner’ (Hubbard 1987:
236), while a system promoting a behaviourist approach will ‘require the learner to input the
correct answer before proceeding’ (p. 231). Hubbard further argues that software evaluation
must also address other aspects of the educational context in which the system is used: learner
strategies and the institution-specific syllabus. Likewise, in her synthesis of effectiveness research
in computer-assisted instruction (CAI) and CALL, Dunkel (1990: 20) calls for a need to produce
effective ways to assess the impact of CALL using ‘nontechnocentric’, experimental and/or
ethnographic research studies that highlight the ‘importance of the central components of the
education situation – the people and the classroom culture, and the contents of the educational
software’. Accordingly, the current trend seems to shy away from comparative studies aiming
to show that a group of learners or a certain tool performs better than another or none at all,
mainly because their focus is limited to one aspect of the educational context, that is, the tool.
Dunkel (1990: 19) notes that, for this reason, research is veering more towards descriptive and
evaluative research that can address questions of validity and effectiveness of instruments for specific learners and language skills or define users’ attitudes and perceptions towards CALL. This
type of systematic evaluation of CALL in all its aspects is also what Chapelle (2001: 52) recommends. She considers that in order to improve CALL evaluation three conditions must be met:
First, evaluation criteria should incorporate findings of a theory-based speculation about
ideal conditions for SLA [. . .]. Second, criteria should be accompanied by guidance as to
how they should be used; in other words, a theory of evaluation must be articulated. Third,
both criteria and theory need to apply not only to software, but also to the task that the
teacher plans and that the learner carries out.
(Chapelle 2001: 52)
Along the same lines, Felix (2005: 16) also contests the value of numerous effectiveness research
findings for their lack of focus because ‘the ever pursued question of the impact of ICT on
learning remains unanswerable in a clear cause and effect sense’. She argues for a more systematic approach to evaluative research based on limited variables and outcomes and with a
129
Catherine Caws and Trude Heift
potential for improving learning processes. Moreover, a recent exhaustive study by Leaky (2011)
proposes a new framework for evaluating CALL research using a system that relies on the inherent synergy that occurs when, what he calls, the ‘three Ps’ (platforms, programs and pedagogies)
intersect. His model is unique in the sense that, following previous recommendations for systematicity, it combines CALL enhancement criteria with qualitative and quantitative measures
to enhance the evaluation of platforms, programs or pedagogies. For instance, the evaluation
flowchart (Figure 9.1) includes twelve criteria: ‘language learning potential, learner fit, meaning
focus, authenticity, positive impact, practicality, language skills, learner control, error correction
and feedback, collaborative CALL, teacher factor and tuition delivery modes’ (Leaky 2011: 249).
Regarding instruments, evaluative studies use several approaches to test the effectiveness
and validity of new materials or artefacts, be it a web application or software; a combination of
qualitative and quantitative methods will potentially lead to the best results. Checklists, however,
appear to be a very common instrument to evaluate educational software. Depending on the
goal of the evaluation (be it to test the functionality of a tool, assess users’ attitudes towards a
tool or obtain specific feedback from students and instructors), the items included in the evaluation checklist will vary (see e.g. Hubbard 1987; Levy and Stockwell 2006; Leaky 2011). As
per Chapelle’s (2001: 53) recommendations, CALL software evaluation constitutes only the first
level of CALL evaluation, followed by an evaluation of CALL activities planned by the teacher
(level 2) and, most importantly, an evaluation of learners’ performances during CALL activities
(level 3).This paradigm is useful in helping us expand on the notion of evaluation; yet, it remains
very focused on the idea that CALL implies ‘opportunities for interactional modifications to
negotiate meaning’ (ibid.). In sum, such evaluation analysis falls within an interactional theory
framework.
While evaluation of instruments remains the most common type of assessment in CALL,
and too often the goal of the evaluative study, we argue here that a focus on other aspects of the
Technical
Planning
FrancoToile
Prototype
Testing
& Evaluations
Figure 9.1
130
System
Development
Modifications
Life cycle of the tool development and implementation
Evaluation in CALL
CALL context can lead to fruitful results in enhancing the overall experience with technology.
Our argument falls within theoretical frameworks originally rooted in the sciences, in particular, activity theory, theory of affordances or complexity theory, because they share a common
holistic, ecological view on learning (see Part I of this volume). Notions of interaction of various
elements, continuous evolution of dynamic forces, adaptation and iteration of occurrences characterise the nature of CALL today. Such contexts of CALL learning and evaluation are intrinsically nonlinear. Moreover, pedagogy is one instance of the ecology of learning that is often
neglected. Nokelainen (2006: 183), in an attempt to reflect the need to evaluate the pedagogical
usability of digital learning material, uses criteria that combine learners’ experience (control and
activity) with such aspects as ‘intuitive efficiency’, ‘motivation’ or ‘added value’. By pedagogical
usability, Nokelainen (2006: 181) makes reference to the fact that learning is an unquantifiable concept and, when technical elements (i.e. computers or digital learning materials) are
introduced to the learning context, they should constitute an added value that can be clearly
identifiable. Combined with this concept of ‘pedagogical usability’, however, other approaches
to evaluation need to be mentioned. Hémard (1997, 2003), for instance, proposes evaluation
guidelines to address the design issues that many authors face when trying to conceive a CALL
tool. Matching user and task, determining user-task feasibility and offering flexibility of use,
among others, in addition to including instruments such as checklists and walk-throughs to
assess usability and learnability of a system, will lead to a more accurate and thorough evaluation
of both tools and learning processes (see also e.g. Smith and Mosier 1986; Brown 1988).
The changing nature of evaluation
With an increasing number of CALL applications incorporating an increasing level of system
complexity as well as opportunities for interactions, autonomous learning and creativity, evaluating the effectiveness of such systems is also changing drastically. In addition, the way in which
the assessment of learners and learning occurs within and outside of such systems must be considered. Ultimately, the introduction of CALL in education has meant a significant impact on
the learner, the context of learning and instruction; to this end, we see three aspects that qualify
for further scrutiny: making a distinction between feedback and evaluation, taking advantage
of peer and self-evaluations, and reexamining the role of the instructor in relation to these new
learning models.
Feedback versus evaluation
While feedback and evaluation are closely related in that both aim at some form of assessment,
there are also important differences. Feedback generally refers to a more formative, interim
assessment of learner performance that is aimed to coach the learner and, more generally, steer
learning. In contrast, evaluation aims at a more summative assessment that usually reflects a goal,
a standard and notions of validity and reliability. Furthermore, and as stated earlier, it not only
takes into account the learner but, ideally, also the tool, their interactions as well as the outcomes.
However, along with the changes brought about by new technologies, which have influenced
CALL evaluation, similarly, the concept of feedback has undergone changes as well. Instead of
feedback being restricted to automated system feedback, learners also provide feedback to each
other when working collaboratively in various social media learning environments (see e.g.
Ware and O’Dowd 2008).
Chapelle (1998) proposes an interdependence of design and evaluation of CALL learning
activities. For instance, she suggests that when CALL materials are designed, they ideally ‘include
131
Catherine Caws and Trude Heift
features that prompt learners to notice important aspects of the language’ (p. 23). The noticing
of particular features can be prompted and achieved, among others, when learners’ performance
is followed by computer-generated feedback during learner-system interactions. Such feedback
can come from instructional materials containing explicit exercises aimed at providing learners
with practice on particular grammatical forms and meanings (see e.g. Heift 2010a). Such materials, which can focus on specific areas of grammar or vocabulary, reading or listening, are aimed
at providing learners with immediate feedback about the correctness of their responses to questions in a manner that engages learners in focused interactions that illuminate their gaps in their
knowledge. For instance, this can be achieved with natural language processing (NLP), commonly implemented in Intelligent CALL (ICALL) applications (see Tschichold and Schulze,
Chapter 37 this volume). Here, the automatic feedback is enhanced by a better match between
the user and the task, leading to improved learnability, and more flexibility in terms of data display and/or data entry. Matching the previous language experience of the user is the ultimate
goal to improve evaluation of the learner’s ability. This type of information is commonly found
in a learner model, which provides information not only on the learners’ performance but also
their learning preferences. Hémard (1997: 15) underlines the importance and significance of
user models by highlighting that ‘the more information on the potential users, the greater the
designer can match the demands placed on the users with their cognitive characteristics’, adding
that a better ‘understanding of tasks to be performed must inevitably lead to improved learnability and increased performance’.
Peer evaluation/self-evaluation and learning
As rightly expressed by Ellis and Goodyear (2011: 21), ‘the increasing availability of ICT has
widened the range of places in which students can learn, and they now expect greater flexibility in educational provision’. In addition, the affordances allowed by systems such as blogging,
microblogging, vlogging, multiuser platforms or networked learning sites (such as Duolinguo,
a site where language learners practice the target language by translating a text, which can be
submitted by any user of the free language learning site) have greatly developed the concept of
peer- and self-evaluation. Here, users are (often subconsciously) compelled to provide feedback
and comments to their peers and assess their own contributions by adding to them or editing
them. Ultimately learning (i.e. e-learning) and evaluating may become one process that engages
users in self-awareness, develops meta-cognitive skills and self-regulation and elevates intrinsic
motivation, by also leading to more learner autonomy. As Ellis and Goodyear (2011: 26) explain,
‘learning can be understood as induction into a community of practice, in which appropriation
of cultural tools and participation in cultural practices go hand in hand with increasing recognition and status in a community’.
In such a sustainable CALL environment, offering new possibilities for evaluation of users,
tasks and instruments, the role of the various actors engaged in the CALL context may also be
changing and/or may be reevaluated. Besides the CALL learner/user, however, another key
actor is the instructor. To what extent and in what ways is that role evolving?
The role of the instructor
When evaluation becomes increasingly more diverse and allows for much flexibility and accountability on the part of the learner, what happens to the instructor? At a time when massive
open online courses (MOOCs) are creating a buzz in and out of higher education milieus, the
technophobes who years ago feared for the survival of the teacher now fear for the survival of
132
Evaluation in CALL
the institution per se. We cannot deny the fact that, as learners become progressively digitally
literate and dependent, the role of the instructor is also radically changing while, at the same
time, becoming more critical. As we are reminded by many studies, instructors are still highly
visible agents, actively communicating with students, mentoring, guiding and slowly transforming them into independent thinkers (Levy and Stockwell 2006; Warschauer 2012) – and yet,
instructors need to disengage from the ‘lecturing’ mode to address changing learning environments. A recent study by Bates and Sangrà (2011) on managing technology in higher education,
for instance, notes that while flexible access to learning has increased in recent years, the quality
of instructing with technology has not increased in a similar manner due to a lack of investment
in training. What does this mean for CALL evaluation?
Within a constructivist learning context where learners and instructors are interconnected
via multiple digital channels, evaluation of computer-mediated tasks is complex. CALL evaluation by instructors is directly related to tracking interactions and new learning opportunities
afforded by the instrument. In a full online context, access to all modes of interactions (i.e. student to student, student to instructor, student to system) help the instructor to guide the learner,
set up appropriate tasks, and evaluate the quality of the interactions. In a blended learning environment, the face-to-face interactions often help to clarify difficulties that may have occurred
during the online interactions by also ensuring proper communication among all actors involved
in the CALL scenario. In addition to the mentoring role played by instructors, several systems
have provided valuable interaction data to inform instructional design within CALL (Fischer
2012). One outcome of this interaction data analysis is the urgent need for further training of
learners in CALL contexts (Hubbard 2004; Levy and Stockwell 2006; Hubbard and Romeo
2012).Training is needed to fill the gap between what users of technology do as independent or
collaborative social agents and what they should do as independent or collaborative e-learners.
Evaluating e-learning in such a way is meant to focus on the learner(s) and the learning activity.
In their description of teaching-as-design, Ellis and Goodyear (2011: 119) state that ‘regular
evaluation, reflection and review are needed to close the loop between students’ experiences of
learning and the (re)design and on-going enhancement of all aspects of educational provision’.
In sum, within a Vygotskian approach and sociocultural view of learning (see Part I of this
volume), instructors need to constantly assess the situation in which e-learning occurs to properly evaluate, amongst other agents, such feedback mechanisms as ‘extrinsic feedback’, which
is provided by others through several channels and tools, and ‘intrinsic feedback’ as a result of
self-monitoring (see Ellis and Goodyear 2011: 124).
Evaluation of CALL takes several forms and involves several agents and instruments. The
following will examine some of these factors in more detail by focusing on two case studies
involving two specific CALL tools.
Core issues in evaluation: Analysis of two case studies
The two case studies that serve to illustrate our perspective on evaluation are linked to two tools,
which were specifically designed within a L2 pedagogy context. These tools and the empirical
data that they allowed to generate also illustrate that, in CALL, a balance of qualitative and
quantitative data is needed in order to evaluate the values brought about by computer-mediated
instruments and/or activities (see e.g. Felix 2005; Colpaert 2006). Leaky further explains:
There is a general agreement on the need in a field such as CALL, anchored as it is
between the humanities and the world of technology, to balance qualitative with quantitative data. It is not that the humanities can only be subject to qualitative study and the world
133
Catherine Caws and Trude Heift
of technology only subject to quantitative analysis, but rather that human interaction, or
‘inter-subjectivity’, is so complex as not to be easily quantifiable and that technology so
utterly dependent on empiricism and logic as to miss the affective, the ‘human’, the persona,
and the synergistic.
(Leaky 2011: 5–6)
This statement emphasises the need for a well-informed approach to evaluation by assessing
all aspects of the CALL context. In the following, we will focus on aspects of the tool (i.e. its
effectiveness), the tasks and interactions that occur within a set activity (i.e. the process) and the
outcomes of this activity (i.e. the product). Our basic precept is centred on the need for a cyclical approach to learning design in a view to recycle the outcomes of our evaluation into new
learning processes (Caws and Hamel 2013).
Evaluating the tools: FrancoToile and E-Tutor
When evaluating CALL tools or contexts of learning, the question of effectiveness arises. As
pointed out by Colpaert (2006), measuring the effectiveness of a learning system is a difficult
task because it involves many variables that are often overlooked or not taken into consideration
when first designing a system. Effectiveness was a key factor in the development of both FrancoToile and E-Tutor.
FrancoToile (http://francotoile.uvic.ca) is a digital library of short videos featuring
French-speaking individuals from around the world. Built as a web-based bilingual interface, the
tool allows users to view videos, read transcripts and annotations, and search through the video
database using various keywords combinations (see Kohn 2012). It was originally designed to fill
a gap in the availability of authentic video documents featuring native speakers engaged in spontaneous discourse; the intent was also to mirror real-life interactions in ‘normal’ conditions (e.g.
conversations with background street noise have not been edited) so that learners could prepare
themselves to the authentic language that they will encounter when visiting or living overseas.
Following recommendations to adopt a conceptual and methodological approach to
CALL design in order to create applications that are ‘usable’ as opposed to simply ‘available’
(see e.g. Hémard 2003, 2004; Colpaert 2006), we adopted the Analysis-Design-DevelopmentImplementation-Evaluation (ADDIE) methodology (see e.g. Colpaert 2006; Strickland 2006).
ADDIE is an instructional systems design (ISD) that is particularly well suited to guide developers in the creation and evaluation of language software or other language-related computer
systems. One advantage of the ADDIE model is that ‘each stage delivers output which serves
as input for the next stage’ (Colpaert 2006: 115). We applied this pedagogy and design-based
research approach to create the system (see Figure 9.1) and also to design the annotation system.
CALL ergonomics also guided our study because it constitutes a methodological and theoretical framework that seeks to describe interactions between users and instruments in an
attempt to ameliorate these interactions so that learning can be maximised. CALL ergonomics
research – in particular, interaction-based research – adopts a user-centred approach which is
grounded in mediated activity theories (see e.g. Raby 2005) or instrumented activity theory
(Verillon and Rabardel 1995; see also Blin, Chapter 3 this volume). The basic precept of these
theories is that human beings adapt, change and learn through their interactions with machines,
tools or other human beings. In other words, these interactions are socially and culturally constructed (see e.g. Leont’ev 1981; Rabardel 1995).
Similarly, E-Tutor was also developed with an iterative research and development process in
mind. E-Tutor is an ICALL system for beginner and intermediate learners of L2 German, which
134
Evaluation in CALL
covers learning content distributed over a total of fifteen chapters. Each chapter begins with an
introductory text (e.g. a story or dialogue) that highlights the focus of the chapter. Each chapter offers different learning activities that allow students to practise chapter-related vocabulary
and grammar. In addition, there are learning activities for pronunciation, listening and reading
comprehension, culture and writing. There are currently ten activity types implemented in the
system (e.g. sentence building, reading comprehension, essay) in addition to an introductory unit
on pronunciation.
The design underlying E-Tutor was strongly motivated by pedagogical considerations. We
aimed at a CALL system that emulates a learner-teacher interaction by focusing on individualised interaction between the learner and the CALL system. For this, two main design criteria
have to be met: first, the system needs a sophisticated answer-processing mechanism to be able to
provide students with individualised, error-specific feedback; second, the system needs to collect
and maintain information about its users and their behaviour while they are working with the
CALL program. Accordingly, E-Tutor was designed as an ICALL system with a natural language
processing (NLP) component that performs a linguistic analysis of learner input. It checks for
correct syntax, morphology, and to a lesser extent, semantics, to provide error-specific feedback
through an automatic evaluation of the learner’s input. Moreover, the system offers a dynamic
assessment of each learner by considering past and current learner performance and behaviour
in relationship to a particular learning activity. As a result, the system’s interaction with each
student is individualised as to the kinds of errors in the student input as well as the ways they
are communicated to the learner. To achieve this, however, an ongoing, iterative evaluation of
learner progress is needed.
Colpaert (2006), in describing different approaches to software development, also advocates
a pedagogically driven approach but, at the same time, alludes to a problem of system design,
namely bridging the gap between language pedagogy and technology. A CALL program might
include the latest technological fads but lack language pedagogy, or it might reflect sound language teaching pedagogy but not effectively exploit the technology. On the other hand, even
the best team of CALL software designers cannot always anticipate the ways in which learners
will use a CALL system. Accordingly, many CALL studies have shown a discrepancy between
the intention behind certain software features and their actual use. For this reason, E-Tutor also
followed an iterative, cyclical process of development, implementation and evaluation (see Colpaert 2006) to enhance the effectiveness and scope of the tool.
In both systems, FrancoToile and E-Tutor, evaluation of the tools is a condition for their
dynamic development. In the case of FrancoToile, a series of case studies were implemented
whereby the tool became the focus of specific learning activities in the form of guided and
free explorations, within set pedagogical conditions and parameters, hence leading to interactions between learners and the system. Data were collected from multiple sources: pre- and
postactivity online questionnaires, activity sheets (used by participants during interventions),
computer screen video captures using Camstasia Studio and recorded focus group interviews
(Caws 2013). While users were involved in authentic learning tasks that were measured and/or
observed, the analysis of their interactions (as recorded with the screen video capture software
and through online questionnaires and interviews) were used to define the degree to which
the interface was conducive (or not) to performing a task. Results of this analysis were recycled
into improvement of the tool itself. In the case of E-Tutor, evaluation of the tool resulted from
an unceasing analysis of system performance and learner input with the goal to improve overall
system functionality and system features as well as to create a learner corpus from recycled data
on learner-system interactions that were collected over five years. Overall, cross-sectional as
well as longitudinal data were collected from automated server logs, learner-system interactions
135
Catherine Caws and Trude Heift
and user questionnaires as well as retrospective user interviews. Initially, data analyses served to
improve system performance with regards to error identification, system responses and interface
design (Heift and Nicholson 2001), while later analyses focused on pedagogical issues such as
improving learner feedback or enhancing learner-computer interactions by adding additional
system features and learning tools (e.g. help options) (Heift 2010b).
Evaluating learner-system interactions: The process
Knowing what exactly users do when they interact with our systems has been the focus of our
iterations of evaluation. This kind of focus allowed us to concentrate on the process of learning
rather than simply directing our analysis towards the outcomes of a particular task or focusing
exclusively on the system itself. In her argument for more systematic CALL research focusing
on processes, Felix (2005: 16) explains that investigating ‘how technologies might be impacting
learning processes and as a consequence might improve learning outcomes’ is critical. Likewise, Leaky (2011) proposes a model of evaluating CALL that inherently requires ‘stable’ environments and takes into consideration processes of learning, processes of manipulating digital
materials and processes of teaching with technology. Fischer (2012: 14) also adds that the study
of tracking data ‘gives a clear and discrete view of students’ actions’ and thus helps us to better
understand how students use software, as opposed to how we think that they are using it (see
also Fischer 2007: 40).
In the case of both FrancoToile and E-Tutor, the methods used to evaluate learning processes
involved the analysis of a ‘work situation’, as per CALL ergonomics principles, namely the study
of ‘the association of subject and a task in set conditions’ (Raby 2005: 181). Data collection
instruments included both quantitative user logs that recorded learner-tool interactions as well
as qualitative analyses of user responses from questionnaires, activity sheets and retrospective
interviews. In addition, in the case of FrancoToile, video screen recordings provided further
insight on learner-tool interactions. Data analyses focused on what learners did while they were
immersed in the task. For instance, are they following directives given to them? Are they using
the tools available to them most effectively? Are they using other digital tools to complete the
tasks? How do they respond to system interventions? Do they seem to exhibit signs of cognitive
overload? What are their own perceptions of the task? In sum, such a close analysis of user-tasktool interactions places the user at the centre of the analysis with a goal to improve the process
(Rabardel 1995; Raby 2005; Bertin and Gravé 2010). As for both FrancoToile and E-Tutor,
monitoring students’ behaviours during our CALL tasks and/or activities provided essential
data to improve the instruments we use, and thus improve the efficiency and effectiveness of a
task process.
Evaluating learning outcomes: The product
While it is essential to analyse the interactions to evaluate the processes of learning, we cannot
discount the need to measure the degree of accuracy with which users complete their tasks,
hence evaluating the overall learning outcomes of interactions with CALL tools. Assessments
of outcomes occur either with or through technology, and while properly evaluating learning
outcomes can be challenging (see e.g. Leaky 2011), ICALL systems such as E-Tutor do offer
that possibility in that they aim at automated understanding and generation of natural human
languages. This generation of CALL applications makes it possible to develop input-based instruction, whereas the NLP capabilities allow for the computer to provide an analysis of learners’
language and relevant feedback. Moreover, the CALL activities can meet the individual needs
136
Evaluation in CALL
of learners by modelling what the student knows based on the evidence found in his or her
writing. Such models can then be used for making suggestions about useful areas of instruction
(Heift and Schulze 2007).
Evaluating the outcome of a task or an activity is an essential part of the overall assessment for
language learners who seek to position themselves in the learning context. This type of evaluation does not need to fall under the sole role of the instructor, as stated earlier. New instruments
designed for language learning or other learning environments can help to transfer or share this
role with other actors, especially learners themselves. While learning to self- and peer-evaluate,
students will develop meta-cognitive skills along with other critical knowledge that they will be
able to use in any environment where evaluation plays a crucial role.
Future directions
This chapter aimed to illustrate that CALL evaluation involves activities, tasks and instruments
that coexist within a holistic frame.When we create CALL activities, we need to take into account
the users’ present experiences and the fact that those are influenced by former experiences with
similar systems. Moreover, these experiences have repercussions on their interactions with other
instruments in future learning situations. As such, ‘getting to know our learners in depth through
pretask surveys and/or observations will help us better train participants to use systems, as well
as adjust our systems to better match participants’ functional skills’ (Caws and Hamel 2013: 32).
Accordingly, one may ask whether a stronger and more informed focus on CALL evaluation
has achieved a decline in the tenacious scepticism towards the use of technology in language
learning of its early days along with the belief that its impact had yet to be asserted. We think
yes, simply for two main reasons. First, the instrument in question has become a cultural artefact
comparable to any other artefact used for learning (e.g. a notebook, a book or a film). Second,
the questions that centre around issues of effectiveness have developed in such a way that they,
by now, include a number of nontechnological components of the learning context such as
learning strategies, learning tasks, the role of the instructor or the physical design of the learning
space, among others.
Yet, evaluating processes with technology has never been as promising due to the diversity of
the digital tools that have become available. These tools have not only changed the role of the
instructor – who by now has become much more of a guide than the former dominant knowledge provider – but they also allow for self-evaluation, peer-evaluation and critical assessment of
systems in shared practice and space. Thus they permit learners to freely reflect upon their own
experiences as well as those of others. As Selber (2004: 141) explains, the development of critical
literacy deserves much attention, and the development of a meta-knowledge about the role of
technology in learning gives learners the power to be producer as much as user of technology.
Accordingly, what does this imply for the development and use of CALL tools?
Given that digital tools, especially social networking tools, occupy most of our students
learning and social spaces, whether consciously or not, we must ensure that they interact with
CALL tools as opposed to react (see e.g. Hémard 2003). As rightly expressed by Lantolf and
Pavlenko (2001: 148), engagement matters because ‘it is the activity and significance that shape
the individuals’ orientation to learn or not’. As a result of the omnipresence of these tools, evaluation of and for CALL includes such instruments. This, amongst other benefits, will inevitably
transform the relationship that language learners develop with technology.
While CALL evaluations through analysis of interactions (i.e. data on behaviours, or data
on outcomes) may reveal important insights into different types of learner engagement with
the instrument, distinct learning strategies (or lack of) for task completion or achievement of a
137
Catherine Caws and Trude Heift
learning outcome and/or technical issues of system benefits and deficiencies, an effective evaluation must focus more closely on design: namely the relationship between tool, interaction and
learning outcome in relation to the task. Design is a multifaceted, complex concept and its role
in enhancing learning is far from new (see e.g. Levy 2002); however, it is important to study the
wider context that affects the success or failure of CALL activities instead of solely focusing on
the tool. After all, a tool is not used in a vacuum but instead, learners as well as learning contexts
are involved in learning processes, and thus none of them can be regarded and/or assessed in
isolation.
Further readings
Brown, C.M. (1988) Human-Computer Interface Design Guidelines, New York, NY: Ablex.
This book offers a practical introduction to software design for the development of interfaces oriented
towards the user. This guide is based on research on human performance and interactions, and on
practical experience.
Hamel, M.-J. (2012) ‘Testing the usability of an online learner dictionary prototype: Process and product
oriented analysis’, Computer Assisted Language Learning, 25(4): 339–365.
The article describes a usability study on the quality of the learner-task-dictionary interaction in the
context of the design and development of an online dictionary for L2 French. Study findings provide
insight into the learners’ dictionary search and look-up strategies and prompt suggestions for interface
design and testing methodology.
Hubbard, P. (1988) ‘An integrated framework for CALL courseware evaluation’, CALICO Journal, 6(2):
51–72.
The article provides a methodological framework for CALL software evaluation from which teachers
can develop their own evaluation procedures. The interaction of the three main parts of the framework
(i.e. operational description, teacher fit, learner fit) and their interactions are described and ideas for
use are provided.
Rabardel, P. (1995) Les hommes et les technologies: approche cognitive des instruments contemporains, Paris: A. Colin.
This book offers a good insight into interactions between human beings and machines within various
disciplines. It explains the concept of instrumented tasks and principles of ergonomics within a cognitive
approach.
Ware, P.D. and O’Dowd, R. (2008) Peer feedback on language form in telecollaboration. Language Learning & Technology, 12(1): 43–63.
The article describes a longitudinal study on the impact of peer feedback on language development
with L2 learners who engaged in weekly asynchronous discussions. Findings indicate that while all
students preferred feedback on language form as part of their exchanges with peers, feedback only
occurred when students were explicitly required to do so. Pedagogical suggestions are provided.
References
Bates, A.W. and Sangrà, A. (2011) Managing Technology in Higher Education: Strategies for Transforming Teaching
and earning, San Francisco, CA: Jossey-Bass.
Bertin, J.C. and Gravé, P. (2010),‘In favor of a model of didactic ergonomics’, in J.C. Bertin, P. Gravé and J.P.
Narcy-Combes (eds), Second Language Distance Learning and Teaching: Theoretical Perspectives and Didactic
Ergonomics, Hershey PA: Information Science Reference, IGI Global USA: 1–36.
Brown, C.M. (1988) Human-Computer Interface Design Guidelines, New York, NY: Ablex.
Caws, C. (2013) ‘Evaluating a web-based video corpus through an analysis of user interactions’, ReCALL,
25(1): 85–104.
Caws, C. and Hamel, M.-J. (2013) ‘From analysis to training: Recycling interaction data into learning processes’, Cahiers de l’ILOB, 5: 25–36.
Chapelle, C. (1998) ‘Multimedia CALL: Lessons to be learned from research on instructed SLA’, LLT
Journal, 2(1): 22–34.
Chapelle, C. (2001) Computer Applications in Second Language Acquisition: Foundations for Teaching, Testing and
Research, Cambridge: Cambridge University press.
138
Download