Chapter 9

9 Evaluation in CALL Tools, interactions, outcomes Catherine Caws and Trude Heift As an ever-growing area of research, computer-assisted language learning (CALL) abounds with tools, systems or learning environments that are constantly changing to reflect the latest technology trends. Faced with the reality that quality control in CALL is not a simple task, researchers have been looking for new and more specific measures to assess the pedagogical and educational values brought about by these new technologies. In this chapter, we reflect on the changing nature of evaluation in CALL by focusing on three key issues that emerged from our research: the evaluation of tools, interactions and outcomes. In doing so, our reflections are based on data collections from two case studies: FrancoToile, an online oral corpus of French speakers, and E-Tutor, an online written, form-focused ICALL environment. We address the various stages of evaluation and focus on its iterative nature by emphasising the relationship between the tools, the interactions and the outcomes in relation to the task. As such, and while using these two specific yet distinct case studies, our analysis applies to any context where a digital platform becomes the focus of a language learning environment. Evaluation in the context of language learning and technology Evaluation is a multifaceted concept. Traditionally, evaluation in education included a strong human dimension within a fairly stable learning system, anchored in a set institution. Assessment of learning generally followed a linear pattern of activities regulated by a behaviourist approach. However, with the advent of different types of technologies as well as the ubiquitous nature of computers, the core concepts of evaluation have been radically transformed. Learning takes place anywhere, anytime, in a nonlinear fashion, and in conditions that seemed impossible as recently as a few decades ago. Due to this ever-changing nature of learning, evaluation needs to address these changes, and the ways in which learning systems and contexts, learners and/or concepts are evaluated need to be diversified. Accordingly, evaluation in computer-assisted language learning (CALL) ideally includes several factors: (1) the learner, (2) the tool that is being utilised, (3) their interactions in relation to a task and (4) the learning outcomes. This chapter mainly focuses on three of these aspects of CALL evaluation, namely the tool, the interactions and the outcomes, in addition to discussing some of the issues relating to the learner. Generally, our reflections will be in part based on qualitative and quantitative data that we 127 Catherine Caws and Trude Heift collected over several years and that emerged from a series of case studies using two specific CALL tools: FrancoToile and E-Tutor. While these two systems constitute only two examples amongst the multitude of CALL tools, they illustrate and emphasise the fact that CALL evaluation needs to adapt to the diversification of CALL contexts by using a multitude of assessment measures. A need for evaluation In today’s language learning context, CALL evaluation often depends on experiments that include different actors, tools and artefacts. Adopting a sociocultural perspective, we define the actors as anyone directly or indirectly involved in the learning process: learners, instructors, native speakers with whom learners may interact as well as developers involved in the initial design of an instrument, among others. The tools are as varied and as indefinable as the arte facts they can produce. For instance, with the advent of Web 2.0 and the central role of socialnetworking, tools for language learning range from a piece of software or digital repository, developed in-house to meet specific pedagogical goals and needs, to applications or systems created by a public or private enterprise in order to attract web traffic. As a consequence of this symbiosis of education and popular tools, the range of artefacts that language learners can produce nowadays has also grown exponentially: oral and written digital productions (sometimes published online), blog entries, vlogs, wikis, full sites and/or simple answers to language-related episodes, as well as more structured exercises. This variety of learner productions constitutes rich corpora and merits its own evaluation, and in fact has already nurtured multiple empirical studies. Moreover, the complexity of CALL has lead to an ever-growing need for evaluation. Levy and Stockwell (2006: 41) cite Johnson (1992), who differentiates evaluation from research by asserting ‘the purpose of an evaluation study is to assess the quality, effectiveness and general value of a program or other entity. The purpose of research is to contribute to the body of scholarly knowledge about a topic or to contribute to theory’. Levy and Stockwell (see also Levy, Chapter 7 this volume), however, admit that the distinction between the two is not fully clear and, in fact, one might argue that CALL research does need to include proper evaluative methods. If evaluation is to be more limited than pure empirical research, then the need for evaluation resides in its potential for assessing learning tasks and outcomes by, at the same time, establishing the factors that need to be taken into account when designing CALL tools and/or tasks. The need for evaluation is also summarised by Chapelle (2001: 53), who suggests ‘an evaluation has to result in an argument indicating in what ways a particular CALL task is appropriate for particular learners at a given time’. Moreover, an additional argument in favour of including evaluation methods within research methods can be illustrated by a need to link evaluation to development of a CALL tool and its implementation in authentic learning settings. Hubbard (1996) framed this need for an integrated approach by, for instance, devising a CALL methodological framework that includes an evaluation module designed to assess the learner fit or the teacher fit of a system. In a similar manner, Levy and Stockwell (2006: 42) associate evaluation with design by claiming an obvious overlap between the two concepts, and recall important features of evaluation such as the fact that evaluation studies ‘have a practical outcome’ and ‘draw value from the process as well as from the product of the evaluation’. Finally, the diversity of CALL contexts also motivates the ever-growing need for evaluation. As expressed by Stockwell, CALL is: a field that by nature is divergent and dynamic, and for this reason, we might argue that diversity in CALL is something that is not only inevitable, but also something that is necessary to provide the best options for the myriad of contexts in which it is used. (Stockwell 2012: 5) 128 Evaluation in CALL Types of evaluation The diversity of CALL contexts, however, also nurtures diversity of CALL evaluation tools, methods and studies. The goals, objects and actors motivating the evaluation also affect the types of evaluation that will be required or designed. In addition, the current culture of CALL, and, more specifically, the growing role of digital media in the daily life of learners, cannot be ignored. With a goal to empower students to ‘control technology’, Selber (2004: 93) argues that for proper use of technology in education to happen, ‘contexts of use deserve at least as much attention as contexts of design’. Indeed, by giving power to users to reflect and evaluate their own use of instruments, they are developing the type of meta-knowledge that they can use to properly manipulate, analyse and eventually resist some aspects of the digital world in which they live. As a result of this ever-changing nature of CALL and, while evaluation is typically understood as the assessment of instruments, we need to add a human factor to CALL evaluation to reflect the dynamic and diverse aspects of language learning. Moreover, and as a result of the dynamic nature of CALL, evaluation requires as much, and probably more, scrutiny than ever (see e.g. Hubbard 1987; Dunkel 1990; Chapelle 2001; Felix 2005; Leaky 2011). Hubbard (1987), for instance, recommends that effectiveness of CALL software be checked in its relation to the language approach it reflects and thus promotes. For instance, a system promoting an acquisition approach of language learning will ‘provide comprehensible input at a level just beyond that currently acquired by the learner’ (Hubbard 1987: 236), while a system promoting a behaviourist approach will ‘require the learner to input the correct answer before proceeding’ (p. 231). Hubbard further argues that software evaluation must also address other aspects of the educational context in which the system is used: learner strategies and the institution-specific syllabus. Likewise, in her synthesis of effectiveness research in computer-assisted instruction (CAI) and CALL, Dunkel (1990: 20) calls for a need to produce effective ways to assess the impact of CALL using ‘nontechnocentric’, experimental and/or ethnographic research studies that highlight the ‘importance of the central components of the education situation – the people and the classroom culture, and the contents of the educational software’. Accordingly, the current trend seems to shy away from comparative studies aiming to show that a group of learners or a certain tool performs better than another or none at all, mainly because their focus is limited to one aspect of the educational context, that is, the tool. Dunkel (1990: 19) notes that, for this reason, research is veering more towards descriptive and evaluative research that can address questions of validity and effectiveness of instruments for specific learners and language skills or define users’ attitudes and perceptions towards CALL. This type of systematic evaluation of CALL in all its aspects is also what Chapelle (2001: 52) recommends. She considers that in order to improve CALL evaluation three conditions must be met: First, evaluation criteria should incorporate findings of a theory-based speculation about ideal conditions for SLA [. . .]. Second, criteria should be accompanied by guidance as to how they should be used; in other words, a theory of evaluation must be articulated. Third, both criteria and theory need to apply not only to software, but also to the task that the teacher plans and that the learner carries out. (Chapelle 2001: 52) Along the same lines, Felix (2005: 16) also contests the value of numerous effectiveness research findings for their lack of focus because ‘the ever pursued question of the impact of ICT on learning remains unanswerable in a clear cause and effect sense’. She argues for a more systematic approach to evaluative research based on limited variables and outcomes and with a 129 Catherine Caws and Trude Heift potential for improving learning processes. Moreover, a recent exhaustive study by Leaky (2011) proposes a new framework for evaluating CALL research using a system that relies on the inherent synergy that occurs when, what he calls, the ‘three Ps’ (platforms, programs and pedagogies) intersect. His model is unique in the sense that, following previous recommendations for systematicity, it combines CALL enhancement criteria with qualitative and quantitative measures to enhance the evaluation of platforms, programs or pedagogies. For instance, the evaluation flowchart (Figure 9.1) includes twelve criteria: ‘language learning potential, learner fit, meaning focus, authenticity, positive impact, practicality, language skills, learner control, error correction and feedback, collaborative CALL, teacher factor and tuition delivery modes’ (Leaky 2011: 249). Regarding instruments, evaluative studies use several approaches to test the effectiveness and validity of new materials or artefacts, be it a web application or software; a combination of qualitative and quantitative methods will potentially lead to the best results. Checklists, however, appear to be a very common instrument to evaluate educational software. Depending on the goal of the evaluation (be it to test the functionality of a tool, assess users’ attitudes towards a tool or obtain specific feedback from students and instructors), the items included in the evaluation checklist will vary (see e.g. Hubbard 1987; Levy and Stockwell 2006; Leaky 2011). As per Chapelle’s (2001: 53) recommendations, CALL software evaluation constitutes only the first level of CALL evaluation, followed by an evaluation of CALL activities planned by the teacher (level 2) and, most importantly, an evaluation of learners’ performances during CALL activities (level 3).This paradigm is useful in helping us expand on the notion of evaluation; yet, it remains very focused on the idea that CALL implies ‘opportunities for interactional modifications to negotiate meaning’ (ibid.). In sum, such evaluation analysis falls within an interactional theory framework. While evaluation of instruments remains the most common type of assessment in CALL, and too often the goal of the evaluative study, we argue here that a focus on other aspects of the Technical Planning FrancoToile Prototype Testing & Evaluations Figure 9.1 130 System Development Modifications Life cycle of the tool development and implementation Evaluation in CALL CALL context can lead to fruitful results in enhancing the overall experience with technology. Our argument falls within theoretical frameworks originally rooted in the sciences, in particular, activity theory, theory of affordances or complexity theory, because they share a common holistic, ecological view on learning (see Part I of this volume). Notions of interaction of various elements, continuous evolution of dynamic forces, adaptation and iteration of occurrences characterise the nature of CALL today. Such contexts of CALL learning and evaluation are intrinsically nonlinear. Moreover, pedagogy is one instance of the ecology of learning that is often neglected. Nokelainen (2006: 183), in an attempt to reflect the need to evaluate the pedagogical usability of digital learning material, uses criteria that combine learners’ experience (control and activity) with such aspects as ‘intuitive efficiency’, ‘motivation’ or ‘added value’. By pedagogical usability, Nokelainen (2006: 181) makes reference to the fact that learning is an unquantifiable concept and, when technical elements (i.e. computers or digital learning materials) are introduced to the learning context, they should constitute an added value that can be clearly identifiable. Combined with this concept of ‘pedagogical usability’, however, other approaches to evaluation need to be mentioned. Hémard (1997, 2003), for instance, proposes evaluation guidelines to address the design issues that many authors face when trying to conceive a CALL tool. Matching user and task, determining user-task feasibility and offering flexibility of use, among others, in addition to including instruments such as checklists and walk-throughs to assess usability and learnability of a system, will lead to a more accurate and thorough evaluation of both tools and learning processes (see also e.g. Smith and Mosier 1986; Brown 1988). The changing nature of evaluation With an increasing number of CALL applications incorporating an increasing level of system complexity as well as opportunities for interactions, autonomous learning and creativity, evaluating the effectiveness of such systems is also changing drastically. In addition, the way in which the assessment of learners and learning occurs within and outside of such systems must be considered. Ultimately, the introduction of CALL in education has meant a significant impact on the learner, the context of learning and instruction; to this end, we see three aspects that qualify for further scrutiny: making a distinction between feedback and evaluation, taking advantage of peer and self-evaluations, and reexamining the role of the instructor in relation to these new learning models. Feedback versus evaluation While feedback and evaluation are closely related in that both aim at some form of assessment, there are also important differences. Feedback generally refers to a more formative, interim assessment of learner performance that is aimed to coach the learner and, more generally, steer learning. In contrast, evaluation aims at a more summative assessment that usually reflects a goal, a standard and notions of validity and reliability. Furthermore, and as stated earlier, it not only takes into account the learner but, ideally, also the tool, their interactions as well as the outcomes. However, along with the changes brought about by new technologies, which have influenced CALL evaluation, similarly, the concept of feedback has undergone changes as well. Instead of feedback being restricted to automated system feedback, learners also provide feedback to each other when working collaboratively in various social media learning environments (see e.g. Ware and O’Dowd 2008). Chapelle (1998) proposes an interdependence of design and evaluation of CALL learning activities. For instance, she suggests that when CALL materials are designed, they ideally ‘include 131 Catherine Caws and Trude Heift features that prompt learners to notice important aspects of the language’ (p. 23). The noticing of particular features can be prompted and achieved, among others, when learners’ performance is followed by computer-generated feedback during learner-system interactions. Such feedback can come from instructional materials containing explicit exercises aimed at providing learners with practice on particular grammatical forms and meanings (see e.g. Heift 2010a). Such materials, which can focus on specific areas of grammar or vocabulary, reading or listening, are aimed at providing learners with immediate feedback about the correctness of their responses to questions in a manner that engages learners in focused interactions that illuminate their gaps in their knowledge. For instance, this can be achieved with natural language processing (NLP), commonly implemented in Intelligent CALL (ICALL) applications (see Tschichold and Schulze, Chapter 37 this volume). Here, the automatic feedback is enhanced by a better match between the user and the task, leading to improved learnability, and more flexibility in terms of data display and/or data entry. Matching the previous language experience of the user is the ultimate goal to improve evaluation of the learner’s ability. This type of information is commonly found in a learner model, which provides information not only on the learners’ performance but also their learning preferences. Hémard (1997: 15) underlines the importance and significance of user models by highlighting that ‘the more information on the potential users, the greater the designer can match the demands placed on the users with their cognitive characteristics’, adding that a better ‘understanding of tasks to be performed must inevitably lead to improved learnability and increased performance’. Peer evaluation/self-evaluation and learning As rightly expressed by Ellis and Goodyear (2011: 21), ‘the increasing availability of ICT has widened the range of places in which students can learn, and they now expect greater flexibility in educational provision’. In addition, the affordances allowed by systems such as blogging, microblogging, vlogging, multiuser platforms or networked learning sites (such as Duolinguo, a site where language learners practice the target language by translating a text, which can be submitted by any user of the free language learning site) have greatly developed the concept of peer- and self-evaluation. Here, users are (often subconsciously) compelled to provide feedback and comments to their peers and assess their own contributions by adding to them or editing them. Ultimately learning (i.e. e-learning) and evaluating may become one process that engages users in self-awareness, develops meta-cognitive skills and self-regulation and elevates intrinsic motivation, by also leading to more learner autonomy. As Ellis and Goodyear (2011: 26) explain, ‘learning can be understood as induction into a community of practice, in which appropriation of cultural tools and participation in cultural practices go hand in hand with increasing recognition and status in a community’. In such a sustainable CALL environment, offering new possibilities for evaluation of users, tasks and instruments, the role of the various actors engaged in the CALL context may also be changing and/or may be reevaluated. Besides the CALL learner/user, however, another key actor is the instructor. To what extent and in what ways is that role evolving? The role of the instructor When evaluation becomes increasingly more diverse and allows for much flexibility and accountability on the part of the learner, what happens to the instructor? At a time when massive open online courses (MOOCs) are creating a buzz in and out of higher education milieus, the technophobes who years ago feared for the survival of the teacher now fear for the survival of 132 Evaluation in CALL the institution per se. We cannot deny the fact that, as learners become progressively digitally literate and dependent, the role of the instructor is also radically changing while, at the same time, becoming more critical. As we are reminded by many studies, instructors are still highly visible agents, actively communicating with students, mentoring, guiding and slowly transforming them into independent thinkers (Levy and Stockwell 2006; Warschauer 2012) – and yet, instructors need to disengage from the ‘lecturing’ mode to address changing learning environments. A recent study by Bates and Sangrà (2011) on managing technology in higher education, for instance, notes that while flexible access to learning has increased in recent years, the quality of instructing with technology has not increased in a similar manner due to a lack of investment in training. What does this mean for CALL evaluation? Within a constructivist learning context where learners and instructors are interconnected via multiple digital channels, evaluation of computer-mediated tasks is complex. CALL evaluation by instructors is directly related to tracking interactions and new learning opportunities afforded by the instrument. In a full online context, access to all modes of interactions (i.e. student to student, student to instructor, student to system) help the instructor to guide the learner, set up appropriate tasks, and evaluate the quality of the interactions. In a blended learning environment, the face-to-face interactions often help to clarify difficulties that may have occurred during the online interactions by also ensuring proper communication among all actors involved in the CALL scenario. In addition to the mentoring role played by instructors, several systems have provided valuable interaction data to inform instructional design within CALL (Fischer 2012). One outcome of this interaction data analysis is the urgent need for further training of learners in CALL contexts (Hubbard 2004; Levy and Stockwell 2006; Hubbard and Romeo 2012).Training is needed to fill the gap between what users of technology do as independent or collaborative social agents and what they should do as independent or collaborative e-learners. Evaluating e-learning in such a way is meant to focus on the learner(s) and the learning activity. In their description of teaching-as-design, Ellis and Goodyear (2011: 119) state that ‘regular evaluation, reflection and review are needed to close the loop between students’ experiences of learning and the (re)design and on-going enhancement of all aspects of educational provision’. In sum, within a Vygotskian approach and sociocultural view of learning (see Part I of this volume), instructors need to constantly assess the situation in which e-learning occurs to properly evaluate, amongst other agents, such feedback mechanisms as ‘extrinsic feedback’, which is provided by others through several channels and tools, and ‘intrinsic feedback’ as a result of self-monitoring (see Ellis and Goodyear 2011: 124). Evaluation of CALL takes several forms and involves several agents and instruments. The following will examine some of these factors in more detail by focusing on two case studies involving two specific CALL tools. Core issues in evaluation: Analysis of two case studies The two case studies that serve to illustrate our perspective on evaluation are linked to two tools, which were specifically designed within a L2 pedagogy context. These tools and the empirical data that they allowed to generate also illustrate that, in CALL, a balance of qualitative and quantitative data is needed in order to evaluate the values brought about by computer-mediated instruments and/or activities (see e.g. Felix 2005; Colpaert 2006). Leaky further explains: There is a general agreement on the need in a field such as CALL, anchored as it is between the humanities and the world of technology, to balance qualitative with quantitative data. It is not that the humanities can only be subject to qualitative study and the world 133 Catherine Caws and Trude Heift of technology only subject to quantitative analysis, but rather that human interaction, or ‘inter-subjectivity’, is so complex as not to be easily quantifiable and that technology so utterly dependent on empiricism and logic as to miss the affective, the ‘human’, the persona, and the synergistic. (Leaky 2011: 5–6) This statement emphasises the need for a well-informed approach to evaluation by assessing all aspects of the CALL context. In the following, we will focus on aspects of the tool (i.e. its effectiveness), the tasks and interactions that occur within a set activity (i.e. the process) and the outcomes of this activity (i.e. the product). Our basic precept is centred on the need for a cyclical approach to learning design in a view to recycle the outcomes of our evaluation into new learning processes (Caws and Hamel 2013). Evaluating the tools: FrancoToile and E-Tutor When evaluating CALL tools or contexts of learning, the question of effectiveness arises. As pointed out by Colpaert (2006), measuring the effectiveness of a learning system is a difficult task because it involves many variables that are often overlooked or not taken into consideration when first designing a system. Effectiveness was a key factor in the development of both FrancoToile and E-Tutor. FrancoToile (http://francotoile.uvic.ca) is a digital library of short videos featuring French-speaking individuals from around the world. Built as a web-based bilingual interface, the tool allows users to view videos, read transcripts and annotations, and search through the video database using various keywords combinations (see Kohn 2012). It was originally designed to fill a gap in the availability of authentic video documents featuring native speakers engaged in spontaneous discourse; the intent was also to mirror real-life interactions in ‘normal’ conditions (e.g. conversations with background street noise have not been edited) so that learners could prepare themselves to the authentic language that they will encounter when visiting or living overseas. Following recommendations to adopt a conceptual and methodological approach to CALL design in order to create applications that are ‘usable’ as opposed to simply ‘available’ (see e.g. Hémard 2003, 2004; Colpaert 2006), we adopted the Analysis-Design-DevelopmentImplementation-Evaluation (ADDIE) methodology (see e.g. Colpaert 2006; Strickland 2006). ADDIE is an instructional systems design (ISD) that is particularly well suited to guide developers in the creation and evaluation of language software or other language-related computer systems. One advantage of the ADDIE model is that ‘each stage delivers output which serves as input for the next stage’ (Colpaert 2006: 115). We applied this pedagogy and design-based research approach to create the system (see Figure 9.1) and also to design the annotation system. CALL ergonomics also guided our study because it constitutes a methodological and theoretical framework that seeks to describe interactions between users and instruments in an attempt to ameliorate these interactions so that learning can be maximised. CALL ergonomics research – in particular, interaction-based research – adopts a user-centred approach which is grounded in mediated activity theories (see e.g. Raby 2005) or instrumented activity theory (Verillon and Rabardel 1995; see also Blin, Chapter 3 this volume). The basic precept of these theories is that human beings adapt, change and learn through their interactions with machines, tools or other human beings. In other words, these interactions are socially and culturally constructed (see e.g. Leont’ev 1981; Rabardel 1995). Similarly, E-Tutor was also developed with an iterative research and development process in mind. E-Tutor is an ICALL system for beginner and intermediate learners of L2 German, which 134 Evaluation in CALL covers learning content distributed over a total of fifteen chapters. Each chapter begins with an introductory text (e.g. a story or dialogue) that highlights the focus of the chapter. Each chapter offers different learning activities that allow students to practise chapter-related vocabulary and grammar. In addition, there are learning activities for pronunciation, listening and reading comprehension, culture and writing. There are currently ten activity types implemented in the system (e.g. sentence building, reading comprehension, essay) in addition to an introductory unit on pronunciation. The design underlying E-Tutor was strongly motivated by pedagogical considerations. We aimed at a CALL system that emulates a learner-teacher interaction by focusing on individualised interaction between the learner and the CALL system. For this, two main design criteria have to be met: first, the system needs a sophisticated answer-processing mechanism to be able to provide students with individualised, error-specific feedback; second, the system needs to collect and maintain information about its users and their behaviour while they are working with the CALL program. Accordingly, E-Tutor was designed as an ICALL system with a natural language processing (NLP) component that performs a linguistic analysis of learner input. It checks for correct syntax, morphology, and to a lesser extent, semantics, to provide error-specific feedback through an automatic evaluation of the learner’s input. Moreover, the system offers a dynamic assessment of each learner by considering past and current learner performance and behaviour in relationship to a particular learning activity. As a result, the system’s interaction with each student is individualised as to the kinds of errors in the student input as well as the ways they are communicated to the learner. To achieve this, however, an ongoing, iterative evaluation of learner progress is needed. Colpaert (2006), in describing different approaches to software development, also advocates a pedagogically driven approach but, at the same time, alludes to a problem of system design, namely bridging the gap between language pedagogy and technology. A CALL program might include the latest technological fads but lack language pedagogy, or it might reflect sound language teaching pedagogy but not effectively exploit the technology. On the other hand, even the best team of CALL software designers cannot always anticipate the ways in which learners will use a CALL system. Accordingly, many CALL studies have shown a discrepancy between the intention behind certain software features and their actual use. For this reason, E-Tutor also followed an iterative, cyclical process of development, implementation and evaluation (see Colpaert 2006) to enhance the effectiveness and scope of the tool. In both systems, FrancoToile and E-Tutor, evaluation of the tools is a condition for their dynamic development. In the case of FrancoToile, a series of case studies were implemented whereby the tool became the focus of specific learning activities in the form of guided and free explorations, within set pedagogical conditions and parameters, hence leading to interactions between learners and the system. Data were collected from multiple sources: pre- and postactivity online questionnaires, activity sheets (used by participants during interventions), computer screen video captures using Camstasia Studio and recorded focus group interviews (Caws 2013). While users were involved in authentic learning tasks that were measured and/or observed, the analysis of their interactions (as recorded with the screen video capture software and through online questionnaires and interviews) were used to define the degree to which the interface was conducive (or not) to performing a task. Results of this analysis were recycled into improvement of the tool itself. In the case of E-Tutor, evaluation of the tool resulted from an unceasing analysis of system performance and learner input with the goal to improve overall system functionality and system features as well as to create a learner corpus from recycled data on learner-system interactions that were collected over five years. Overall, cross-sectional as well as longitudinal data were collected from automated server logs, learner-system interactions 135 Catherine Caws and Trude Heift and user questionnaires as well as retrospective user interviews. Initially, data analyses served to improve system performance with regards to error identification, system responses and interface design (Heift and Nicholson 2001), while later analyses focused on pedagogical issues such as improving learner feedback or enhancing learner-computer interactions by adding additional system features and learning tools (e.g. help options) (Heift 2010b). Evaluating learner-system interactions: The process Knowing what exactly users do when they interact with our systems has been the focus of our iterations of evaluation. This kind of focus allowed us to concentrate on the process of learning rather than simply directing our analysis towards the outcomes of a particular task or focusing exclusively on the system itself. In her argument for more systematic CALL research focusing on processes, Felix (2005: 16) explains that investigating ‘how technologies might be impacting learning processes and as a consequence might improve learning outcomes’ is critical. Likewise, Leaky (2011) proposes a model of evaluating CALL that inherently requires ‘stable’ environments and takes into consideration processes of learning, processes of manipulating digital materials and processes of teaching with technology. Fischer (2012: 14) also adds that the study of tracking data ‘gives a clear and discrete view of students’ actions’ and thus helps us to better understand how students use software, as opposed to how we think that they are using it (see also Fischer 2007: 40). In the case of both FrancoToile and E-Tutor, the methods used to evaluate learning processes involved the analysis of a ‘work situation’, as per CALL ergonomics principles, namely the study of ‘the association of subject and a task in set conditions’ (Raby 2005: 181). Data collection instruments included both quantitative user logs that recorded learner-tool interactions as well as qualitative analyses of user responses from questionnaires, activity sheets and retrospective interviews. In addition, in the case of FrancoToile, video screen recordings provided further insight on learner-tool interactions. Data analyses focused on what learners did while they were immersed in the task. For instance, are they following directives given to them? Are they using the tools available to them most effectively? Are they using other digital tools to complete the tasks? How do they respond to system interventions? Do they seem to exhibit signs of cognitive overload? What are their own perceptions of the task? In sum, such a close analysis of user-tasktool interactions places the user at the centre of the analysis with a goal to improve the process (Rabardel 1995; Raby 2005; Bertin and Gravé 2010). As for both FrancoToile and E-Tutor, monitoring students’ behaviours during our CALL tasks and/or activities provided essential data to improve the instruments we use, and thus improve the efficiency and effectiveness of a task process. Evaluating learning outcomes: The product While it is essential to analyse the interactions to evaluate the processes of learning, we cannot discount the need to measure the degree of accuracy with which users complete their tasks, hence evaluating the overall learning outcomes of interactions with CALL tools. Assessments of outcomes occur either with or through technology, and while properly evaluating learning outcomes can be challenging (see e.g. Leaky 2011), ICALL systems such as E-Tutor do offer that possibility in that they aim at automated understanding and generation of natural human languages. This generation of CALL applications makes it possible to develop input-based instruction, whereas the NLP capabilities allow for the computer to provide an analysis of learners’ language and relevant feedback. Moreover, the CALL activities can meet the individual needs 136 Evaluation in CALL of learners by modelling what the student knows based on the evidence found in his or her writing. Such models can then be used for making suggestions about useful areas of instruction (Heift and Schulze 2007). Evaluating the outcome of a task or an activity is an essential part of the overall assessment for language learners who seek to position themselves in the learning context. This type of evaluation does not need to fall under the sole role of the instructor, as stated earlier. New instruments designed for language learning or other learning environments can help to transfer or share this role with other actors, especially learners themselves. While learning to self- and peer-evaluate, students will develop meta-cognitive skills along with other critical knowledge that they will be able to use in any environment where evaluation plays a crucial role. Future directions This chapter aimed to illustrate that CALL evaluation involves activities, tasks and instruments that coexist within a holistic frame.When we create CALL activities, we need to take into account the users’ present experiences and the fact that those are influenced by former experiences with similar systems. Moreover, these experiences have repercussions on their interactions with other instruments in future learning situations. As such, ‘getting to know our learners in depth through pretask surveys and/or observations will help us better train participants to use systems, as well as adjust our systems to better match participants’ functional skills’ (Caws and Hamel 2013: 32). Accordingly, one may ask whether a stronger and more informed focus on CALL evaluation has achieved a decline in the tenacious scepticism towards the use of technology in language learning of its early days along with the belief that its impact had yet to be asserted. We think yes, simply for two main reasons. First, the instrument in question has become a cultural artefact comparable to any other artefact used for learning (e.g. a notebook, a book or a film). Second, the questions that centre around issues of effectiveness have developed in such a way that they, by now, include a number of nontechnological components of the learning context such as learning strategies, learning tasks, the role of the instructor or the physical design of the learning space, among others. Yet, evaluating processes with technology has never been as promising due to the diversity of the digital tools that have become available. These tools have not only changed the role of the instructor – who by now has become much more of a guide than the former dominant knowledge provider – but they also allow for self-evaluation, peer-evaluation and critical assessment of systems in shared practice and space. Thus they permit learners to freely reflect upon their own experiences as well as those of others. As Selber (2004: 141) explains, the development of critical literacy deserves much attention, and the development of a meta-knowledge about the role of technology in learning gives learners the power to be producer as much as user of technology. Accordingly, what does this imply for the development and use of CALL tools? Given that digital tools, especially social networking tools, occupy most of our students learning and social spaces, whether consciously or not, we must ensure that they interact with CALL tools as opposed to react (see e.g. Hémard 2003). As rightly expressed by Lantolf and Pavlenko (2001: 148), engagement matters because ‘it is the activity and significance that shape the individuals’ orientation to learn or not’. As a result of the omnipresence of these tools, evaluation of and for CALL includes such instruments. This, amongst other benefits, will inevitably transform the relationship that language learners develop with technology. While CALL evaluations through analysis of interactions (i.e. data on behaviours, or data on outcomes) may reveal important insights into different types of learner engagement with the instrument, distinct learning strategies (or lack of) for task completion or achievement of a 137 Catherine Caws and Trude Heift learning outcome and/or technical issues of system benefits and deficiencies, an effective evaluation must focus more closely on design: namely the relationship between tool, interaction and learning outcome in relation to the task. Design is a multifaceted, complex concept and its role in enhancing learning is far from new (see e.g. Levy 2002); however, it is important to study the wider context that affects the success or failure of CALL activities instead of solely focusing on the tool. After all, a tool is not used in a vacuum but instead, learners as well as learning contexts are involved in learning processes, and thus none of them can be regarded and/or assessed in isolation. Further readings Brown, C.M. (1988) Human-Computer Interface Design Guidelines, New York, NY: Ablex. This book offers a practical introduction to software design for the development of interfaces oriented towards the user. This guide is based on research on human performance and interactions, and on practical experience. Hamel, M.-J. (2012) ‘Testing the usability of an online learner dictionary prototype: Process and product oriented analysis’, Computer Assisted Language Learning, 25(4): 339–365. The article describes a usability study on the quality of the learner-task-dictionary interaction in the context of the design and development of an online dictionary for L2 French. Study findings provide insight into the learners’ dictionary search and look-up strategies and prompt suggestions for interface design and testing methodology. Hubbard, P. (1988) ‘An integrated framework for CALL courseware evaluation’, CALICO Journal, 6(2): 51–72. The article provides a methodological framework for CALL software evaluation from which teachers can develop their own evaluation procedures. The interaction of the three main parts of the framework (i.e. operational description, teacher fit, learner fit) and their interactions are described and ideas for use are provided. Rabardel, P. (1995) Les hommes et les technologies: approche cognitive des instruments contemporains, Paris: A. Colin. This book offers a good insight into interactions between human beings and machines within various disciplines. It explains the concept of instrumented tasks and principles of ergonomics within a cognitive approach. Ware, P.D. and O’Dowd, R. (2008) Peer feedback on language form in telecollaboration. Language Learning & Technology, 12(1): 43–63. The article describes a longitudinal study on the impact of peer feedback on language development with L2 learners who engaged in weekly asynchronous discussions. Findings indicate that while all students preferred feedback on language form as part of their exchanges with peers, feedback only occurred when students were explicitly required to do so. Pedagogical suggestions are provided. References Bates, A.W. and Sangrà, A. (2011) Managing Technology in Higher Education: Strategies for Transforming Teaching and earning, San Francisco, CA: Jossey-Bass. Bertin, J.C. and Gravé, P. (2010),‘In favor of a model of didactic ergonomics’, in J.C. Bertin, P. Gravé and J.P. Narcy-Combes (eds), Second Language Distance Learning and Teaching: Theoretical Perspectives and Didactic Ergonomics, Hershey PA: Information Science Reference, IGI Global USA: 1–36. Brown, C.M. (1988) Human-Computer Interface Design Guidelines, New York, NY: Ablex. Caws, C. (2013) ‘Evaluating a web-based video corpus through an analysis of user interactions’, ReCALL, 25(1): 85–104. Caws, C. and Hamel, M.-J. (2013) ‘From analysis to training: Recycling interaction data into learning processes’, Cahiers de l’ILOB, 5: 25–36. Chapelle, C. (1998) ‘Multimedia CALL: Lessons to be learned from research on instructed SLA’, LLT Journal, 2(1): 22–34. Chapelle, C. (2001) Computer Applications in Second Language Acquisition: Foundations for Teaching, Testing and Research, Cambridge: Cambridge University press. 138

Chapter 9

Related documents

Products

Support

Chapter 9

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib