COMPUTER ASSISTED LANGUAGE LEARNING, 2018 VOL. 31, NOS. 1–2, 1–26 https://doi.org/10.1080/09588221.2017.1375960 Vocabulary learning through viewing video: the effect of two enhancement techniques Maribel Montero Perez a,b,c , Elke Petersa and Piet Desmeta,c a Department of Linguistics, KU Leuven, Belgium; bResearch Foundation Flanders (FWO); cimec - ITEC KU Leuven, Kortrijk, Belgium ABSTRACT KEYWORDS While most studies on L2 vocabulary learning through input have addressed learners’ vocabulary uptake from written text, this study focuses on audio-visual input. In particular, we investigate the effects of enhancing video by (1) adding different types of L2 subtitling (i.e. no captioning, full captioning, keyword captioning, and glossed keyword captioning which provides access to meaning) and (2) informing vs. not informing students that viewing would be followed immediately by a test of vocabulary from the video (Test Announcement). The study adopted a 2 (+/- Test Announcement) x 4 (Type of Captioning) between-subject design, resulting in 8 experimental groups. 227 Dutchspeaking university students watched three French (= L2) videos in one of eight conditions. Results revealed that students in the glossed keyword captions group (with access to meaning) scored best on the form recognition and meaning recall tests. Analyses of the look-up behaviour of students in the glossed keyword captioning group revealed that looking up a given word was positively related to the learning of that word. Test Announcement did not affect word learning or look-up behaviour. Participants’ vocabulary size was directly related to their learning gains as well as to their look-up behaviour in the glossed keyword condition. Vocabulary acquisition; incidental learning; intentional learning; gloss; captioning; video 1. Introduction One of the most studied topics in the field of second language (L2) vocabulary acquisition has been how exposure to L2 input affects the ‘picking up’ of new words. The basic premise underlying that question is that second language learners need to allocate sufficient attentional resources to a given word in order to have chances to acquire that word (Ellis, 1994; Hulstijn, 2001). Next, initial form-meaning links, that is, the ‘assignment of meaning to the orthographical representation of the word’ (Rott, 2007, p. 166), need to be installed. In order to CONTACT Maribel Montero Perez maribel.monteroperez@kuleuven.be Supplemental data for this article can be accessed at https://doi.org/10.1080/17597269.2017.1368059. © 2017 Informa UK Limited, trading as Taylor & Francis Group 2 M. MONTERO PEREZ ET AL. do so, learners can try to infer word meaning from context or decide to consult other resources such as (electronic) dictionaries. Next, repeated encounters with the word are required (Hulstijn, 2001) in order to install a durable memory trace. Yet, it has been shown that learners do not necessarily notice novel words in the input and that when they try to derive the word’s meaning, inferencing processes do not necessarily lead to correct form-meaning links (Laufer, 2003, 2005). That is why vocabulary learning from input (only) has been found to be an unpredictable, slow and fallible process (Horst, Cobb, & Meara, 1998) which tends to result in limited vocabulary learning gains (Laufer, 2003). Unsurprisingly, researchers have investigated how to boost vocabulary acquisition from input by including options for input enhancement (Sharwood Smith, 1991). These options aim at directing learners’ attention to novel words to facilitate the noticing of those words (Schmidt, 2001) and the making of form-meaning connections, both of which are considered initial and essential steps in the acquisition of a novel lexical item (Hulstijn, 2001; Schmitt, 2008). In order to stimulate learners’ noticing and initial form-meaning connections, researchers have looked at the impact of features such as highlighting, glosses (see Abraham, 2008 for a meta-analysis on this topic), multimedia annotations, and electronic dictionaries (e.g. Chun & Plass, 1996; De Ridder, 2002; Hulstijn, 1993; Peters, Hulstijn, Sercu, & Lutjeharms, 2009) on vocabulary uptake from written input. Studies using audio-visual input have mainly investigated the effect of L1 or L2 subtitling (i.e. captioning) on learning gains (e.g. Peters, Heynen, & Puimege, 2016; Sydorenko, 2010; Winke, Gass, & Sydorenko, 2010). These differ however from the techniques explored in reading studies in that they do not provide access to meaning or include options to make words more salient. This study focuses on vocabulary learning through viewing audio-visual input and investigates the effect of two enhancement techniques which aim at stimulating learners’ noticing and form-meaning mappings of novel words. The first technique consists of adding different types of subtitling in the target language, such as full captioning, keyword captioning and glossed captioning, i.e. with access to meaning. The second technique is Test Announcement and refers to the instructions that learners receive before viewing the video: Participants were either informed or not informed about the vocabulary tests that would follow after viewing the video. This study investigates (1) the effect of the two enhancement techniques on the initial learning of unknown words from audio-visual input, (2) the effect of Test Announcement on students’ use of glossed captions, and (3) whether the use of the glossed captions for a given word correlates with a participant’s learning of that word. 2. Literature review 2.1. Captioning and L2 vocabulary learning Previous studies on vocabulary learning from audio-visual input have mainly focused on the benefits of adding verbatim L2 subtitling, i.e. full captioning, for COMPUTER ASSISTED LANGUAGE LEARNING 3 vocabulary learning. These studies have shown that audio-visual input enhanced with full captions may have a facilitative effect on students’ acquisition of lexical items from video input (e.g. Baltova, 1999; Chai & Erlam, 2008; Markham, 1999; Sydorenko, 2010; Winke et al., 2010): significantly higher scores were found for students watching captioned clips than students watching video only (see Montero Perez, Van Den Noortgate, and Desmet, 2013 for a meta-analysis). As pointed out by a systematic review of the literature on vocabulary acquisition from spoken input (Peters & Montero Perez, 2015), most of the studies have assessed the effect of captions on learners’ initial knowledge of the form-meaning link, both receptively and productively (cf. Nation, 2001 on the nine aspects of word knowledge). Captioning groups were found to outperform the video-only group on form recognition tests, i.e. recognize the correct word form (e.g. Montero Perez, Peters, Clarebout, & Desmet, 2014; Neuman & Koskinen, 1992; Sydorenko, 2010) and form production tests, i.e. provide the correct L2 word form (Baltova, 1999; Danan, 1992). Results on the acquisition of word meaning, as opposed to word form, from full captioned video, seem to be less clear-cut. Only studies involving beginning to low intermediate learners (Baltova, 1999; Huang & Eskey, 1999; Sydorenko, 2010; Winke et al., 2010) reported significantly higher scores for full captioning groups on word meaning tests compared to video only groups. In a study with (high-) intermediate learners (Montero Perez et al., 2014), no differences in meaning recall (i.e. provide a translation of a set of words) were found between the full captioning group and the video only group, probably due to a floor effect in the test. The authors concluded that learning the meaning of the target words may have been too challenging, especially since they had low imageability. Together, results from previous research reveal two potential shortcomings of full captioning as an enhancement technique. First, full captioning does by itself not contain features that overtly draw learners’ attention to novel items in order to promote noticing. This may be problematic since we cannot assume that learners will necessarily notice novel items in the input (Laufer, 2003, 2010) or identify items as being unfamiliar (Laufer & Yano, 2001). Second, full captioning does not provide explicit access to word meaning. As van Zeeland and Schmitt (2013) pointed out, the acquisition of word meaning can be considered one of the most difficult aspects of word knowledge to acquire incidentally. Providing access to word meaning while watching audio-visual input thus seems important if we want to prompt successful and more substantial acquisition of form-meaning links. In order to address the first shortcoming, Montero Perez et al. (2014) investigated the impact of salience in the captions by comparing the effectiveness of three types of captions: full captions, keyword captions (only the words that were essential to the meaning, including the novel words, of a sentence were presented in the captioning line) and full captions with highlighted keywords (the 4 M. MONTERO PEREZ ET AL. same keywords as in the keyword captions are highlighted in the full caption) with a control group for vocabulary. Results revealed that the keywords and highlighted keywords groups performed as well as the full captioning group on the form recognition test and outscored the control group. It was also found that the keyword groups outperformed the control group on the multiple choice meaning recognition test (i.e. choose the correct translation of the target word). These findings suggest that keyword captions or highlighted keywords may not only encourage learners’ noticing but also have a beneficial effect on their initial acquisition of word meaning. Yet, results on the meaning recall tests were very poor and did not yield significant differences between the conditions. In another study, Montero Perez, Peters, and Desmet (2015) did find that participants in the keyword group significantly outperformed participants in the full captioning group on a form recognition test. Yet, no differences were found on the tests that tapped into knowledge of word meaning. A second shortcoming of full captioning is that it does not provide explicit word meaning information of unknown words in the captions. Students thus have to rely on contextual clues to derive the meaning of unknown words. Yet, this is a very challenging process which is particularly difficult given the realtime nature of video input. In order to provide learners with more opportunities to make initial form-meaning links while viewing, it has been proposed to provide explicit access to the meaning of novel words in the form of pre-teaching and pre-learning activities or glossaries (Chai & Erlam, 2008; Webb, 2010a, 2010b). Sydorenko (2010, p. 65) proposed to enhance videos with ‘dynamic glosses’ but no further specifications were provided. Yet, we are not aware of experimental research in which the effects of access to meaning before or while viewing audio-visual input were empirically investigated. The effectiveness of providing access to meaning has been extensively investigated in studies on vocabulary learning from written input. These studies investigated the effects of texts enhanced with glosses (Hulstijn, Hollander, & Greidanus, 1996), electronic dictionaries (Laufer & Hill, 2000; Peters, 2007a), and multimedia annotations (Chun & Plass, 1996). The majority of those studies found that glosses stimulate vocabulary learning (e.g. Chun & Plass, 1996; Watanabe, 1997) because they may draw learners’ attention to unknown words (Nation, 2001), contribute to ‘small increments in vocabulary learning’ (Laufer & Hill, 2000, p. 59) and ‘ensure a correct word encoding in the mental lexicon’ (Rott, 2007, p. 170). Yet, while studies generally agree upon the benefits of glosses for word learning (both at the level of form and meaning), the relationship between gloss consultation for a given word and the learning of that word is less clear cut (Chun & Plass, 1996; Laufer & Hill, 2000). It was for instance found that the frequency of look-ups of a word (i.e. the total number of lookups of a given word) does not necessarily predict the learning of that word or only correlates moderately with learning (Laufer & Hill, 2000). COMPUTER ASSISTED LANGUAGE LEARNING 5 2.2. Test announcement Some studies on vocabulary learning from written input have investigated the use of prereading instructions, which refer to telling or not telling L2 learners that they will receive a vocabulary test immediately after the reading task, as a technique to direct learners’ attention to the vocabulary used in the text. Peters (2007a) and Peters et al. (2009) reported that students who were informed about the vocabulary tests before the reading task had used the electronic dictionary more intensively. Peters et al. (2009) also reported significantly higher form recognition scores in the groups that were informed about the vocabulary tests before reading the text. Hulstijn (2001) argued that the use of such task instructions is a methodological feature, which allows to create conditions for incidental (no vocabulary test announcement) or intentional vocabulary learning (presence of test announcement). We are aware of only one study on vocabulary learning from audio-visual input that has investigated the effects of previewing instructions or test announcement on vocabulary learning. In a small-scale eye-tracking study, Montero Perez et al. (2015) investigated the impact of vocabulary test announcement, on learners’ vocabulary scores, measured in terms of form recognition and meaning recall, after watching three short videos. Results revealed that students who had received a test announcement outperformed the other students but only on the meaning recall test. The study also investigated the impact of test announcement on learners’ use, as measured by means of eye-movement data, of two types of captioning: full and keyword captioning. Generally speaking, results indicated that students who were informed about the vocabulary tests before viewing the videos spent significantly more time on the target words (henceforth referred to as TWs), in terms of total fixation duration, than students who were not informed about the vocabulary tests. This finding suggests that Test Announcement can indeed be considered an enhancement technique but a large-scale study is required to identify its impact on vocabulary learning. 2.3. Vocabulary size and vocabulary learning from video Vocabulary size refers to the number of words for which learners demonstrate some knowledge of word meaning (Anderson & Freebody, 1981). Research into incidental vocabulary learning from written input has shown that learners’ vocabulary size is positively related to word learning through reading (Horst et al., 1998). Few studies on vocabulary learning through viewing have taken learners’ vocabulary size into account. In addition, findings with regard to the role of vocabulary size for incidental vocabulary learning from viewing are inconclusive. While Rodgers did not find a significant relation, Montero Perez et al. (2014, 2015) as well as Peters et al. (2016) reported positive correlations between a learner’s vocabulary knowledge and the uptake of new words from 6 M. MONTERO PEREZ ET AL. viewing video. Therefore, this study will further explore the impact of learners’ vocabulary knowledge on their learning of new words from viewing. 3. Aim and research questions The literature review revealed that few studies have investigated the effectiveness of different types of captioning: (1) the impact of salient TWs in the captioning line on word learning requires further research; (2) we are not aware of research that has investigated the effect of captions with access to meaning on vocabulary knowledge. Furthermore, the previous section showed that few studies investigated Test Announcement. In order to address these gaps, this study will explore two enhancement techniques: (1) four types of captioning (no captioning, full captioning, keyword captioning, and glossed keyword captioning, that is, with access to meaning) and (2) Test Announcement (informing students vs. not informing students about vocabulary posttests before viewing the videos). The following research questions guided this study: 1. Does Type of Captioning affect initial vocabulary learning through viewing video? 2. Does Test Announcement affect initial vocabulary learning through viewing video? 3. Is there an interaction effect between Type of Captioning and Test Announcement on initial vocabulary learning through viewing video? 4. Does Test Announcement affect the look-up of the 18 target (i.e. previously unknown) words as measured in terms of (1) the number of distinct target words looked up and (2) the total number of target words looked up in the glossed keyword conditions? 5. Is a learner’s look-up behaviour of a given target word related to his learning of that word? Additionally, this study will also look at whether learners’ vocabulary size is correlated with the dependent variables under investigation, i.e. vocabulary learning and look-up behaviour. 4. Method 4.1. Participants The participants were 227 undergraduate students (122 males, 105 females) at a Flemish University. They studied law or economics and had an obligatory course in either legal or economic French. All participants were native speakers of Dutch and ranged in age from 17 to 21 years. Students were considered intermediate to high intermediate learners of French, the L2 of this study, according to their scores on a vocabulary size test (see Instruments and Results section). COMPUTER ASSISTED LANGUAGE LEARNING 7 4.2. Design The study adopted a 2 £ 4 between-subject design. The first independent variable refers to the presence or absence of a vocabulary test announcement and allowed us to distinguish between an incidental vocabulary learning group and an intentional vocabulary learning group (Hulstijn, 2001). Learners in the incidental group (INCID) were not informed about the vocabulary posttests that would follow the viewing. The intentional group (INTEN), on the other hand, was explicitly informed about the vocabulary posttests. Both groups knew that a comprehension task would be administered. Type of Captioning was the second independent variable and referred to the type of on-screen text that was added to the clips: no captions (NC), full captions (FC), keyword captions (KC), or glossed keyword captions (GC). The combination of the two independent variables resulted in eight experimental groups: Full captioning and incidental (FCINCID) Keyword captioning and incidental (KCINCID) Glossed keyword captioning and incidental (GCINCID) No captioning and incidental (NCINCID) Full captioning and intentional (FCINTEN) Keyword captioning and intentional (KCINTEN) Glossed keyword captioning and intentional (GCINTEN) No captioning and intentional (NCINTEN) Eight intact classes were recruited for this study. They were randomly assigned to one of the eight conditions. 4.3. Materials 4.3.1. Video selection Three authentic clips, which were originally created for a native French-speaking audience, were selected from the websites of a Belgian and Swiss current affairs TV program. Because our study was embedded within a formal language learning setting, we opted for relatively short clips. The first clip (2’29’’) presented the production and export strategies of a brewery in the North of France. The second (4’24’’) and third clip (3’32’’) presented respectively the marketing strategies and the (economic) history and perspectives of the Lego© factory. The three clips included passages with voice-over techniques and interview excerpts. Five experienced lecturers of French found that the difficulty level, topic, and image functionality of the clips was appropriate for learners with an upper-intermediate level. They rated the complementarity of the images to the dialogue average. Captions and keyword captions were added manually to the clips. 8 M. MONTERO PEREZ ET AL. 4.3.2. Video with keyword captions Whereas full captions consist in the verbatim transcription of the dialogue, keyword captions represent only one word (e.g. malt) or a maximum of four consecutive words (e.g. les campagnes de pub). Keywords are defined as those words of a sentence or paragraph that are essential to learners’ meaning construction. In order to select the keywords, five experienced lecturers read the transcripts of the clips and highlighted the keywords. The final set of keywords represented 17.11% of the total number of words (i.e. 295 out of 1,724 words). The keywords appeared in isolation and centred on the captioning line. The mean presentation duration of the keywords (in both keyword conditions) was set at 1.6 s1. but was also contingent upon the length and type of the keyword (i.e. single-word item or multiword unit). 4.3.3. Video with glossed keyword captions Glossed keyword captions are defined as keyword captions with access to meaning: each keyword is linked to its corresponding L1 context-bound translation (see Supplementary material). In order to access the translation of a keyword caption, students needed to tap the space bar. This action paused the video and simultaneously visualized the translation of the keyword in a box centred on the screen (see Figure 1). The video resumed and the L1 translation disappeared when the students tapped the space bar again. The provision of L1 translations in the gloss ties in with Nation (2001, p. 175), who pointed out that ‘the first requirement of a gloss is that it should be understood’ and that the difference between L1 and L2 glosses is therefore less crucial than often hypothesized (Nation, 2001; Yoshii, 2006). Thus, by including the L1 translation rather than the L2 definition or a combination of both, we may (1) minimize the interruption of the video and listening flow and (2) avoid interference with students’ L2 reading skills. 4.4. Target words The target words are the words that are unknown to the learners prior to the experiment and that are included in the vocabulary posttests. An initial set of ^ (to ripen). Figure 1. Example of GC clip with activated gloss for the French target word ‘mUrir’ COMPUTER ASSISTED LANGUAGE LEARNING 9 Figure 2. Vocabulary pretest. twenty TWs was selected on the basis of a pilot-study (N = 40). Second, in order to verify the familiarity of the students in the current study with the TWs, we administered a pretest (see Figure 2), which consisted of the potential target words and 18 distracters (high-frequency words). Distracters were included in order not to alert students2 to the target words (Read, 2000). The format of the pretest was based on Wesche and Paribakht’s Vocabulary Knowledge Scale (1996) for measuring learners’ depth of knowledge. Students took the pretest test four weeks before the actual learning session, as recommended by Read (2000). Results of the pretest revealed that the majority of the students knew the meaning of two TWs (figurine and cuve), which were therefore excluded from our selection. The test did not reveal knowledge of the 18 remaining TWs. The 18 selected TWs included single-word items (5 verbs, 6 nouns and 1 adjective) as well as 6 multiword units (see Table 1). The use of authentic clips made it impossible to keep the frequency of occurrence of the TWs constant. While the majority of the TWs appeared only once, three nouns and one verb had a higher occurrence. The TWs appeared either in the full captions or in isolation in both keyword conditions (KC and GC). In order to infer word meaning, students in the FC, KC, and NC groups had to rely on contextual clues, which were available for each TW. For instance, the context in the following sentence could help learners to understand the meaning of the TW ‘^etre a fond’ (to be very enthousiastic about something): « Et pour les petits, ce sera essentiellement Lego parce que je trouve que c’est une invitation a la realisation, a la creation et puis ça les sort de Table 1. Overview of 18 target words per clip and per type. Type 5 verbs 6 multiword units 6 nouns 1 adjective Clip 1 ^ brasser (2x); mUrir malt (3x); houblon (3x); levure (4x); amertume; fermentation Clip 2 fr^oler le naufrage; ^etre a fond; ça cartonne Clip 3 se disperser; larguer; solidifier faire un tabac; gravir les echelons; doper les ventes berceau encastrable 10 M. MONTERO PEREZ ET AL. devant tous ces ecrans. Donc Lego, je suis a fond » [‘And for the children, I will essentially buy Lego because I think it invites children to realize and create things, and it also keeps them away from the screen. So I am really into Lego’]. Students in the GC groups could access the TWs’ meaning by tapping the space bar (cf. supra). 4.5. Instruments 4.5.1. Vocabulary size Previous studies have found that learners’ vocabulary size significantly correlates with their listening success (Staehr, 2009; van Zeeland & Schmitt, 2012) and vocabulary learning (Horst et al., 1998), and provides a rough estimate of learners’ overall L2 proficiency. In order to estimate learners’ vocabulary knowledge, we designed a 51-item receptive written multiple-choice test that encompassed three frequency bands: 2,001-4,000 (19 items), 4,001-5,000 (16 items), and 5,001-7,000 (16 items). We determined word frequency by means of the Routledge (Lonsdale & Le Bras, 2009), Verlinde (Selva, Verlinde, & Binon, 2002), and DPC (Paulussen, Macken, Vandeweghe, & Desmet, 2013) frequency lists. For each test item, participants chose the correct answer from four Dutch translation options. Students’ scores on the vocabulary size test would function as covariate in the statistical analyses in order to control learner differences. Example of multiple choice test item: Concertation (f). – – – – Optreden (performance) Toegeving (concession) Overleg (deliberation) Opvatting (belief) 4.5.2. Comprehension tasks We developed a comprehension task, which focused on global and detailed understanding of the video content, for each of the three clips. The questions did however not target passages that included TWs in order to avoid an effect of those questions on vocabulary learning. Because we only included comprehension tasks to make sure that participants would watch the clips attentively, the analyses of the comprehension tasks are confined to the descriptive statistics (see Supplementary material). Sample of comprehension questions. – Explain why the French Craftworks association was established. (clip 1) – Explain why Lego is a classic marketing example. (clip 2) – According to the interviewee (+ picture), what caused the crisis at Lego? (clip 3) COMPUTER ASSISTED LANGUAGE LEARNING 11 Table 2. The order and format of the 4 vocabulary posttests. Test 1: Form recognition Amertume: O yes O no Test 4: Meaning recognition Amertume: 1. evenwicht (balance) 2. ontgoocheling (disappointment) 3. bitterheid (bitterness) 4. deskundigheid (expertise) Test 2: Clip association If yes, in which clip? O Brewery O Lego Test 3: Meaning Recall Translate into Dutch Amertume = … 4.5.3. Vocabulary posttests In order to assess the impact of the videos on the learning of the 18 TWs, four vocabulary posttests measuring receptive form and meaning knowledge were administered. The use of multiple vocabulary tests should allow to track small increases in word knowledge (i.e. partial knowledge) and to measure vocabulary knowledge more accurately (Nation & Webb, 2011). Posttests 1, 2, and 3 were presented together and assessed form recognition, clip association, and meaning recall of 18 TWs and 15 distracters (see Table 2). First, learners completed the form recognition column (see Table 2, posttest 1) and ticked off ‘yes’ if the word had appeared in one of the clips and ‘no’, if it had not. If learners had ticked off ‘yes’, they also had to complete the clip association column and indicate in which clip the word had appeared (see Table 2, posttest 2). Posttest 3 (see Table 2, posttest 3) was the meaning recall test, in which students had to translate the L2 TWs and 15 distracters into Dutch. The fourth vocabulary posttest was a multiple choice meaning recognition test. Learners were asked to tick off the correct translation of the target word from four Dutch translation possibilities (see Table 2, test 4). The four posttests were carefully sequenced in order to avoid test effects. Participants submitted tests 1, 2, and 3 before receiving test 4. 4.5.4. Questionnaire Students completed a short questionnaire on-line in order to gain a clearer understanding of their primary focus of attention (meaning, unknown lexical items, vocabulary in general) while viewing the videos. They also had to provide some background information: L1, self-evaluation of their listening and reading skills, and age. 4.5.5. Log files Students in the GCINTEN and GCINCID conditions logged on with their unique student ID and password in order to access the clips. This made it possible to log3 their individual look-up strategies (Laufer & Hill, 2000). Yet, they were not informed that we would register their use of the glossed keywords. 4.6. Procedure One month before the learning session, all participants took the pretest and the vocabulary size test during regular classroom hours. The lecturers did not 12 M. MONTERO PEREZ ET AL. inform students about the experiment and presented both tests as intake measures that are provided at the beginning of the academic year. Two weeks before the learning session, students were told that they had to participate in a ‘multimedia learning session’, which was presented as a regular class4. During the learning session, which took 90 min, participants worked individually and had a PC and headset at their disposal. We started by explaining that the session was part of a large-scale study investigating the use of video in the L2 classroom. Students were, at this stage, not informed about the actual research goals. Next, during the 5-minute introduction, all participants were familiarized with the videoplayer, and with the functionalities of the glossed keywords (only GCINCID and GCINTEN). Students in the INCID groups were informed about the upcoming comprehension tasks whereas students in the INTEN groups were informed about the comprehension task and the vocabulary posttests. All learners watched each clip twice5. After seeing the first clip twice, we handed out the corresponding comprehension task. Having completed the first comprehension task, participants followed similar procedures for the second and third clip6. Next, participants took the four vocabulary posttests (see Table 2). All tests were paper-and-pencil tests. Finally, students completed the questionnaire online. Two weeks after the learning sessions, students were debriefed about the research objectives. 4.7. Scoring and analyses 4.7.1. Vocabulary size and vocabulary posttests All tests were scored binomially, with 1 mark awarded to a correct answer and 0 to an incorrect one. In order to investigate research questions 1–3, we performed two-way (type of captioning and test announcement as independent variables) multivariate analyses of covariance (MANCOVA) with learners’ score on the vocabulary size test as covariate. The covariate allows to reduce within-group error variance and to assess the impact of the independent variables more accurately (Field, 2009). The b-values of the covariate, which indicate the strength of the relationship between the covariate and the dependent variable, will be interpreted as well. Positive b-values indicate that an increase in the covariate also leads to an increase of the dependent variable. Negative b-values reveal an opposite pattern. Multivariate analyses were used because the dependent variable (vocabulary learning) was assessed by means of multiple scores (Tacq, 1997). 4.7.2. Look-up behaviour of glossed keywords In order to answer research question 4, we extracted a log file for each student from the database that was connected to the videoplayer. First, we counted the distinct TW look-ups in order to find out how many of the 18 TWs had been looked-up by each participant. This coding thus refers to whether or not a TW was looked up. Second, we counted the total number of TW look-ups, that is, all COMPUTER ASSISTED LANGUAGE LEARNING 13 the clicks on the TWs during the two viewings. Univariate analyses of covariance were computed in order to analyze the effect of Test Announcement on the look-up counts. Vocabulary size was added as covariate for the reasons outlined in Section 4.7.1. 4.7.3. Relationship between TW look-up behaviour and TW learning To investigate whether there is a relationship between the look-up of a given TW and the learning of that word (research question 5), we computed GEE (Generalized Estimating Equations) analyses to perform a repeated measures logistic regression (Hardin & Hilbe, 2003). GEE analyses are appropriate when the dependent variable is scored dichotomously. This type of analysis also allows for the inclusion of treatment variables, participant variables and item variables in one model. In the analyses, the 18 TWs were included as the repeated measures within participants. The predictors in our model were one treatment variable (Test Announcement), one participant variable (vocabulary size), and one item variable7 (the look-up behaviour for a given TW). We added all the parameters to our initial model. Yet, parameters that did not contribute significantly were removed from the model and the model was refit with the significant parameters. 5. Results 5.1. Vocabulary size test The average score on the vocabulary size test was 33.64 (SD = 6.18) out of 51. The results of an ANOVA analysis showed that the conditions did not significantly differ on the vocabulary size test F(7, 219) = 1.46, p = .182, hp2 = .05. The reliability index proved to be acceptable (N = 227, Cronbach’s alpha = .78). 5.2. Vocabulary learning Research questions 1–3 focused on the effect of Type of Captioning, Test Announcement, and the interaction of both factors on vocabulary learning. Table 3 summarizes the descriptive statistics of the four vocabulary posttests. As can be seen, students in the captioning groups achieved the highest scores on the form recognition test and recognized, on average, 10 out of 18 TWs. A similar pattern was found for the clip association test, though scores were generally lower than those of the form recognition test. The descriptives of the meaning recognition test indicate that learners correctly recognized the meaning of approximately 11 out of 18 TWs. However, learners’ meaning recognition of the TWs was not pretested in order to avoid a testing effect as learners could install form-meaning links by completing a meaning recognition pretest. That is why the results will not be interpreted any further. The meaning recall results indicate that the GC groups were most successful in installing initial form-meaning mappings. Finally, descriptive statistics 14 M. MONTERO PEREZ ET AL. Table 3. Descriptive statistics of vocabulary tests for 18 target words. Test Announcement Type of Caption N INCID NC 26 FC 26 KC 29 GC 28 All INCID 109 INTEN NC 26 FC 30 KC 31 GC 31 All INTEN 118 All students 227 Form recognition M SD 8.04 3.50 9.00 3.74 11.14 3.08 11.43 3.39 9.96 3.67 7.85 2.91 10.47 3.56 10.94 3.52 12.16 2.45 10.46 3.47 10.22 3.57 Clip association M SD 6.27 2.86 7.08 3.65 9.10 2.78 9.32 3.29 8.00 3.38 6.23 2.30 8.50 3.28 8.29 3.39 10.00 3.07 8.34 3.30 8.18 3.33 Meaning recognition M SD 10.69 2.75 10.29 2.88 11.56 2.58 12.73 2.46 11.34 2.79 9.65 2.65 10.72 3.67 11.00 2.93 13.83 2.27 11.34 3.28 11.34 3.05 Meaning recall M SD 3.15 2.59 2.88 2.57 3.07 2.39 6.89 3.52 4.03 3.25 2.04 1.69 2.50 2.21 3.45 3.13 7.42 3.02 3.94 3.36 3.98 3.30 Note: Nine participants (2 FCINCID, 2 KCINCID, 2GCINCID, 1 FCINTEN, and 2 GCINTEN) forgot to fill out the other side of the form overleaf of the meaning recognition test. Scores of the meaning recognition test are therefore based on a smaller number of participants (N = 218). suggest that both the incidental and intentional groups achieved similar scores on the three vocabulary posttests. In order to statistically analyze the effect of the independent variables on the dependent variables (form recognition, clip association, and meaning recall), we first checked the correlations between those tests. Because the three tests correlated significantly,8 a multivariate analysis was conducted in order to take into account the correlation between the multiple vocabulary scores (Tacq, 1997). The two-way MANCOVA yielded a significant main effect of Type of Captioning on the three vocabulary tests, Wilks’s lambda F(3, 218) = 18.50, p < .001, hp2 = .20. The analysis did not reveal a significant main effect of Test Announcement on the three tests and no significant interaction effect between Type of Captioning and Test Announcement was found either. The effect of Type of Captioning is significant for the three tests included in the analyses (see Table 4). The effect size statistics revealed a large effect of Type of Captioning on the three outcomes (see Table 4), with the largest effect on the meaning recall test (hp2 = .39). Table 4. Results of MANCOVA on vocabulary learning. Test Form recognition Clip association Meaning recall *p < .001. Source Type of Captioning Test Announcement Type of Captioning * Test Announcement Vocabulary size (covariate) Type of Captioning Test Announcement Type of Captioning * Test Announcement Vocabulary size (covariate) Type of Captioning Test Announcement Type of Captioning * Test Announcement Vocabulary size (covariate) Error df 3 1 3 1 3 1 3 1 3 1 3 1 218 F 17.60 0.93 0.84 42.40 16.23 0.43 1.16 70.70 46.98 0.46 1.86 70.08 p < .001* .337 .473 < .001* < .001* .512 .324 < .001* < .001* .498 .138 < .001* hp2 .20 .004 .01 .16 .18 .002 .02 .25 .39 .002 .03 .24 b 0.22 0.25 0.22 COMPUTER ASSISTED LANGUAGE LEARNING 15 The Bonferroni post-hoc comparisons (see Figure 3) revealed that the mean scores of the captioning groups (GC, FC, and KC) were significantly higher than the mean scores of the NC group on the form recognition and clip association test. In addition, the GC group also significantly outperformed the FC group on the aforementioned tests. No significant difference was found between KC and GC. For the meaning recall test, the Bonferroni post-hoc comparisons revealed a different pattern. The mean score of the GC groups were significantly higher than the mean scores of the FC, KC, and NC groups on the meaning recall test. No differences were found between FC, KC, and NC. Results show that the covariate vocabulary size was significantly related to vocabulary learning, Wilks’s lambda F(1, 218) = 30.84, p < .001, hp2 = .30. The positive b-values (Table 4) indicate a positive relationship between learners’ vocabulary size and the vocabulary scores: The larger learners’ vocabulary size, the better they score on the respective vocabulary tests. This finding holds true for the three vocabulary posttests, as shown in Table 4. The questionnaire results revealed that 95 out of 109 INCID participants reported to mainly focus on the content, yet 47 students also mentioned that they paid attention to specific words. A similar approach was adopted by the INTEN participants: 102 out of 118 participants reported that video content was Figure 3. Bar graph with results of form recognition, clip association, and meaning recall test per Type of Captioning with arrows indicating significant differences (Bonferroni post-hoc tests) between conditions 16 M. MONTERO PEREZ ET AL. Table 5. Mean N of distinct TW look-ups and mean N of total TW look-ups in the incidental and intentional glossed captions groups. GCINCID GCINTEN All students N 28 30 58 Distinct TW look-ups M SD 14.89 4.02 14.33 3.06 14.60 3.53 Total number of TW look-ups M SD 23.36 6.76 20.23 6.57 21.74 6.79 Note: The logging engine failed to adequately register look-ups of one student, which is why we only have 30 participants (instead of 31 participants) in the GCINTEN analyses of look-up behaviour. their primary focus of attention while 58 students mentioned to have also focused on specific words. 5.3. Look-up behaviour The fourth research question focused on the effect of Test Announcement on participants’ look-up behaviour of the TWs and is thus exclusively based on the data of the GCINTEN and GCINCID groups. As can be seen from Table 5, learners looked up on average 14.89 (INCID) and 14.33 (INTEN) out of 18 TWs. For the total number of TW look-ups (i.e. in both viewings), 23.36 TWs on average were looked up in the INCID group compared to 20.23 in the INTEN group. As shown by the ANCOVA analysis in Table 6, Test announcement did not significantly impact learners’ look-ups (distinct and total number of TW lookups). Results also reveal that students’ vocabulary size (covariate) was significantly related to their look-up behaviour (see Table 6). For distinct as well as for the total number of look-ups, we found negative b-values, which indicate that learners’ look-ups were negatively related to their vocabulary size: the larger their vocabulary size, the fewer their look-ups. In order to refine our understanding of learners’ use of the glossed captions, we also looked at the number of look-ups per viewing per clip and calculated absolute percentages (dividing the number of distinct TW look-ups for that clip by the number of target words in that clip). These percentages show that during the first viewing, students looked up 70,06% of the target words, or 66.5% (clip 1), 62.5% (clip 2), and 72.75% (clip 3) of the TWs per clip. During the second viewing, they only looked up 12,72% of the TWs (see Table 7). Table 6. Results of three one-way ANCOVA analyses for look-up behaviour. Look-up Mean distinct N of target word look-ups Mean total N of target word look-ups *p < .05. Source Test Announcement Vocabulary size (covariate) Test Announcement Vocabulary size (covariate) Error df 1 1 1 1 55 F 0.02 5.00 0.95 5.56 p .880 .029* .335 .022* hp2 .00 .08 .02 .09 b ¡0.184 ¡0.36 COMPUTER ASSISTED LANGUAGE LEARNING 17 Table 7. Mean N of distinct TW look-ups and mean N of total TW look-ups by viewing and Test Announcement. N GCINCID 28 GCINTEN 30 All students 58 Distinct TW lookups, first viewing M SD 12.61 4.50 11.70 3.73 12.14 4.11 Distinct TW lookups, second viewing M SD 2.29 2.36 2.63 1.94 2.47 2.14 Total number of TW Total number of TW look-ups, second look-ups, first viewing viewing M SD M SD 15.32 4.30 8.04 4.37 12.97 4.33 7.27 4.22 14.10 4.44 7.64 4.27 Table 8. Results of GEE analysis of form recognition, with distinct TW look-ups and total N of TW look-ups. Analysis with distinct TW look-up Form recognition Wald x2 df p b Intercept 10.019 1 .002 ¡2.185 Test Announcement = INCID 0.005 1 .941 .012 .a .a .a Test Announcement = INTEN 0a Look-up 11.517 1 .001 .576 Vocabulary size 12.277 1 < .001 .071 Analysis with total number of TW look-ups Wald x2 df p b 8.909 1 .003 ¡2.133 0.003 1 .953 ¡.010 0a .a .a .a 16.452 1 < .001 .348 11.752 1 .001 .071 a set to zero because the parameter is redundant. 5.4. Is look-up behaviour related to learning? Finally, we investigated which variables (Test Announcement, Vocabulary size, and distinct or total N of look-ups) predict vocabulary learning (form recognition and meaning recall) through viewing glossed captioned video. Since Test Announcement did not significantly contribute to our model, it was excluded from further analyses. For form recognition, the results of the GEE analyses reveal that distinct TW look-up, and vocabulary size predicted the probability of correct form recognition of a TW. Similarly, Table 8 shows that the total number of look-ups and correct form recognition are positively associated. For meaning recall, the analysis (see Table 9) with distinct look-up as well as the analysis including the total number of look-ups per target word yielded similar results as for the form recognition test showing that the covariates are significant predictors of meaning recall success. In sum, vocabulary size as well as the two types of look-up behaviour (distinct and total N of look-ups) of a given word are significant predictors of correct form recognition and meaning recall for that word. Table 9. Results of GEE analysis of meaning recall, with distinct TW look-ups and total N of TW look-ups. Analysis with distinct TW look-up Meaning recall Wald x2 df p b Intercept 52.134 1 < .001 ¡4.361 Test Announcement = INCID 0.605 1 .437 .133 .a .a .a Test Announcement = INTEN 0a Look-up 13.801 1 < .001 .787 Vocabulary size 38.054 1 < .001 .095 a set to zero because the parameter is redundant. Analysis with total number of TW look-ups Wald x2 df p b 45.965 1 < .001 ¡4.172 0.394 1 .530 .109 0a .a .a .a 19.513 1 < .001 .400 34.773 1 < .001 .095 18 M. MONTERO PEREZ ET AL. 6. Discussion The present study set out to investigate (1) the impact of Type of Captioning and Test Announcement on initial vocabulary learning; (2) the influence of Test Announcement on the use of one particular Type of Captioning, that is, glossed keyword captioning, and (3) the relationship between students’ use of glossed keyword captioning and their learning gains. Additionally, it also investigated the relationship between learners’ vocabulary size and their learning gains and look-up behaviour. 6.1. Vocabulary learning 6.1.1. Type of captioning In answer to the first research question, the results revealed that the use of a type of captioning contributes to vocabulary learning through audio-visual input. For form recognition and clip association, we found that the students in the three captioning groups (FC, KC, and GC) significantly outperformed the control group. This finding corroborates previous research on the benefits of full (Baltova, 1999; Danan, 1992; Sydorenko, 2010; Winke et al., 2010) and keyword captions (Montero Perez et al., 2014). Results also indicated that most learning gains are at the level of form recognition. On average, participants in the captioned groups recognized between 9 and 12 out of 18 TWs. Because learners were able to associate the majority of the correctly recognized words with the corresponding clip, it can be stated that they were also successful in retrieving relevant information from episodic memory (Craik & Tulving, 1975; Pulido, 2007), which is activated when picking up a concept from the video passage. The post-hoc analyses of the form recognition and clip association tests did not reveal any differences between the KC and FC group. Generally speaking, the provision of on-screen text encourages L2 learners to notice target words, but visually salient TWs did not contribute to greater form recognition success. This indicates that learners in the FC group were paying attention to the unknown words even though they were not visually salient, which was also found in Montero Perez et al. (2014). As suggested by the results of eye-tracking research (Godfroid, Boers, & Housen, 2013), this may be due to readers’ unfamiliarity with the TWs, that is, unknown words may, by themselves, draw their attention and stimulate their noticing of those words. The GC groups, however, significantly outperformed the FC and NC groups on the form recognition and clip association tests. This finding indicates that keyword captions with access to meaning may enhance the quality of students’ attention, therefore resulting in superior form recognition scores than FC and NC groups. There were, however, no differences between the GC and KC group on the form recognition test. This finding is not surprising because the COMPUTER ASSISTED LANGUAGE LEARNING 19 visualisation of the target word forms was identical in both conditions and the test only tapped into word form recognition. The GC group was most successful on the meaning recall test. More particularly, the GC groups recalled the meaning of approximately 7 TWs, compared to the FC, KC, and NC groups, who learned the meaning of 2–3 out of 18 TWs. There were no differences between the FC, KC, and NC groups on the meaning recall test. The low learning gains in those groups are in line with previous research on the learning of form-meaning connections from written (e.g. Horst et al., 1998) and aural (e.g. van Zeeland & Schmitt, 2013) input. It is clear from the present study that providing students with access to meaning through glossed keyword captions results in a significantly higher uptake rate from video in terms of initial form-meaning connections made. Glossed captions can therefore also be considered a valuable technique to implement in audio-visual material in order to increase learners’ lexical coverage of those materials, as suggested by Webb (2010a, 2010b). In addition, students briefly pause the video when accessing the gloss, which may also facilitate the encoding of form-meaning links. Together, the findings of this study thus warrant a role for providing access to meaning through glosses when watching audio-visual input. 6.1.2. Test announcement In answer to the second research question, we found that Test Announcement did not influence learning gains. The questionnaire results seem to suggest that the majority of the students tended to prioritize meaning but also preserved a certain amount of attentional resources for processing unknown words while watching for meaning. Even though they used those resources irrespective of their test announcement or the type of captioning, students may have been influenced by the comprehension task. More particularly, it is plausible that learners paid attention to unknown words because they considered the words important in function of the upcoming comprehension tasks (Hulstijn, 2001). Peters (2007a) and Peters et al. (2009, p. 243) argued that ‘students allocate their attentional resources in function of the specificity of the task they have to perform.’ Because the comprehension tasks were administered after seeing each clip twice and vocabulary tests were only given after the comprehension tasks, learners may have been more focused on content comprehension because that task was more directly relevant while viewing the clips. Since both the INCID and INTEN groups were informed about the comprehension task, it may also account for the lack of differences between the groups. Our findings thus tend to indicate that test announcement did not necessarily lead to an exclusively incidental or intentional approach to vocabulary learning while viewing. If we want to stimulate the intentional learning of new words, form-focused activities before or after viewing videos could be a more effective approach. 20 M. MONTERO PEREZ ET AL. 6.1.3. Vocabulary size By including learners’ vocabulary size as a covariate in the statistical analyses, we were not only able to control for individual differences and asses the impact of the independent variables more accurately, we also have information about the relation between vocabulary size and word learning from viewing video. Results revealed a positive relation between vocabulary size and all aspects of word knowledge measured. The higher learners’ score on the vocabulary size test, the higher their scores on the posttests. Effect sizes on all tests are large which indicate that vocabulary size plays an important role in vocabulary learning from viewing. This finding is congruent with previous research studies on reading (Horst et al., 1998) and viewing (e.g. Peters et al., 2016). 6.2. Look-up behaviour The fourth research question deals with the impact of Test Announcement on the look-up of target words in the GCINTEN and GCINCID groups. Unlike previous research (Peters, 2007a, 2009), results of the current study revealed that Test Announcement did not significantly affect learners’ look-up behaviour while watching clips. Learners in the INCID and INTEN group looked up, on average, 14.60 of the 18 TWs and can be qualified as ‘maximal strategy’ (Hulstijn, 1993, p. 145) users. Learners’ task approach (i.e. with the majority of the learners focusing on meaning, irrespective of test announcement) can provide a possible explanation for the similar look-up patterns in the INCID and INTEN conditions. The analyses of covariance enabled us to identify learners’ vocabulary size as an important factor in explaining differences in look-up behaviour. Vocabulary size correlated negatively with the number of look-ups of all glossed words as well as with the look-ups of the TWs, which is in line with Peters (2007b). This indicates that learners with a larger vocabulary size looked up fewer words, presumably because they had more lexical coverage (Webb & Rodgers, 2009) and could more easily guess the meaning of unknown words where necessary. 6.3. Relationship between look-up behaviour and word learning The last research question focused on the relationship between students’ look-up of a given word and their form recognition and meaning recall of that word. Whether a TW was looked up or not (i.e. distinct TW look-up) significantly correlated with form recognition and meaning recall. Thus, when students had looked up the meaning of a given TW, the likelihood of correctly recognizing that word and recalling its meaning significantly increased. The positive relationship might be explained by the fact that gloss consultation is comparable to a ‘dictionary search’, which ‘requires the learner to attend to the words’ (Laufer, 2010, p. 18) and corroborates the importance of attention for word learning (Hulstijn, 2001). This finding not only corroborates previous research on the COMPUTER ASSISTED LANGUAGE LEARNING 21 relationship between TW look-up and acquisition in the context of written input (e.g. Chun & Plass, 1996; Hulstijn, 1993; Peters, 2007a) but also expands on existing studies by applying the value of glosses to audio-visual materials. Similarly, the total number of look-ups of a given target word is a significant predictor of successful form recognition and meaning recall. This finding indicates that looking up a target word several times tends to increase chances of correct form recognition and meaning recall. Even though it correlates less strongly with the vocabulary posttest scores than students’ distinct look-ups, results suggest that more look-ups contribute to the consolidation of the formmeaning connection and therefore increase the chance of correctly recognizing a word and recalling its meaning. For the other predictors (Test Announcement and Vocabulary size), results of the GEE analyses reconfirmed findings of the vocabulary tests: (1) Test Announcement is not significantly related to form recognition and meaning recall and (2) the relationship between learners’ vocabulary size and their scores on the form recognition and meaning recall tests is directly proportional. In sum, the analyses show that learners’ engagement with TWs by accessing meaning augments the likelihood of correctly recognizing a word and recalling its meaning. The directly positive relationship between the use of glossed captions and acquisition thus make a strong claim for providing learners with access to meaning while watching video and provides empirical evidence in support of Webb’s study on the potential of glossaries (2010b). 7. Limitations The research reported in this article is inevitably characterized by a number of limitations. First, this study focused on vocabulary learning through relatively short clips in which the majority of the TWs were only used once. While this study shows that short clips present numerous opportunities to stimulate initial word learning, longer clips are required in which repeated encounters with the TWs are offered, if we want to encourage more durable learning. This study thus focused on initial vocabulary gains, which is why only immediate posttests were used. Even though the use of immediate posttests may be considered appropriate in the context of the present study, that is ‘a learning session in which words are presented for the first time’ (Hulstijn, 2003, p. 372), it could have been interesting to assess the durability of word retention in the glossed conditions. Second, for reasons of feasibility, only form recognition and meaning recall were pretested. We did not pretest learners’ meaning recognition of the target words since this test could have caused a learning effect (i.e. form-meaning connections made while completing the test), which could have influenced posttest scores on the form recognition and meaning recall tests. This does however also mean that the results of the meaning recognition posttest should be interpreted with care, as indicated in the results section. 22 M. MONTERO PEREZ ET AL. Third, gathering learners’ look-up times (duration of gloss activation per TW) might have provided more information on learners’ exact use. More particularly, it could have shed light on the difference between the first lookup and other look-ups in terms of time spent on the gloss. Unfortunately, time stamps in the logging data were not accurate enough to include this in the analyses. 8. Implications and concluding remarks The results of this study have a series of pedagogic implications. First, the findings of the present study highlight the potential of audio-visual input as a source of initial vocabulary acquisition and suggest that this type of input is comparable to written input in terms of learning gains (Webb, 2015). In addition, video is so easily accessible in everyday life (YouTube, TV, DVD, etc.) that one of its most important benefits probably lies in its potential to facilitate increased exposure to the foreign language outside the classroom (see for instance Lin & SiyanovaChanturia, 2015 for a discussion on internet television and vocabulary learning outside the classroom). This also ties in with the objectives of extensive viewing programs (Webb, 2015, p. 159), that is to encourage ‘regular silent uninterrupted viewing of L2 television’ in out-of-class contexts after ‘initial classroombased viewing.’ Large amounts of input are expected to strengthen previously learned words, stimulate noticing of novel words, and offer sufficient repeated encounters of words in order to strengthen the memory encoding of formmeaning links. Second, findings reveal that test announcement did not lead to more learning or attention paid to the target words (as measured in terms of look-up behaviour). This seems due to the fact that learners tended to prioritize meaning. If the goal of the activity is to draw learners’ attention to new words in a viewing context, it could be more appropriate and effective to provide word-focused instruction or preteaching activities in which the new words and their meanings are provided, rather than announcing that there will be tests on the vocabulary used in the clips. Third, it is clear from this study that the provision of access to meaning while watching audio-visual material should be strongly encouraged because it not only promotes students’ form recognition but also facilitates the construction of form-meaning connections. Even though the creation of glossed captions requires technological knowledge, other options such as for instance Yabla.com, are available which offer videos in which learners can click on words in the full captions to get access to dictionaries. To the best of our knowledge, this study was the first to investigate the use and effects of glossed keyword captions. We therefore hope that the encouraging results on the potential of glossed captions will inspire further research on vocabulary acquisition through audio-visual input. COMPUTER ASSISTED LANGUAGE LEARNING 23 Notes 1. Merleau (1981, as referenced in Guillory, 1998) recommended a presentation duration of 1 second for words between 5 and 8 characters. However, because students in the GC groups needed time to activate the gloss, we slightly prolonged keyword presentation duration. 2. In addition, students did not only take the pretest test during that session but also took a 51-item vocabulary size test. This test can be considered an additional distracter since it presents 51 additional test items, which make it even more difficult to remember the TWs that were included in the pretest. 3. The video player was connected to a PHP page that sent the log data to a MySQL database. This enabled us to query the database and retrieve log data per participant. 4. Afterwards, students were informed that their participation in the study counted as 2 of the 14 required practice sessions in the online learning environment, which accompanies the syllabus. 5. Even though the video of learners in the GC conditions paused each time they looked up a word, the total viewing time was similar to the other conditions. Participants (all conditions) could not forward or rewind the clips. 6. Although it might be preferable to counterbalance the order of the clips, the use of paper-and-pencil tests made it difficult to change the order of the clips within one group (see Limitations section). 7. Three nouns and one verb had a higher frequency of occurrence. Since frequency of occurrence correlated with distinct look-up (r = .085, p = .006) and the total number of look-ups (r = .255, p < .001), frequency was not included in the GEE analyses in order to avoid multicollinearity effects. 8. The form recognition test correlated significantly with the clip association test (r = .895, p < .001) and the meaning recall test (r = .515, p < .001). The meaning recall test correlated significantly with the clip association test (r = .635, p < .001). Disclosure statement No potential conflict of interest was reported by the authors. Notes on contributors Maribel Montero Perez is a postdoctoral researcher (FWO - Research Foundation Flanders) within the imec-ITEC-KU Leuven research group. Her research focuses on different aspects of second language vocabulary acquisition and instruction. Specifically, she investigates the role of viewing (TV, video, …) and types of subtitling for second language vocabulary acquisition. Elke Peters is an associate professor of English at the KU Leuven. She coordinates the research group “Language, education and society”. Her research interests involve incidental and deliberate learning of single words and multiword units in a foreign language. She is interested in how reading, listening and TV viewing can contribute to vocabulary learning. Piet Desmet is a full professor of French and applied linguistics and computer-assisted language learning at KU Leuven, campus Kulak Kortrijk, and imec, Belgium. He coordinates the research team imec-ITEC-KU Leuven, focusing on educational technology with a main interest in language learning and technology. He leads a range of research projects in this field devoted to the integration of human language technologies into CALL and to the effectiveness of adaptive and personalized learning environments. 24 M. MONTERO PEREZ ET AL. ORCID Maribel Montero Perez http://orcid.org/0000-0002-0868-588X References Abraham, L. (2008). Computer-mediated glosses in second language reading comprehension and vocabulary learning: A meta-analysis. Computer Assisted Language Learning, 21(3), 199–226. Anderson, R.C., & Freebody, P. (1981). Vocabulary knowledge. In J. Guthrie (Ed.), Comprehension and teaching: Research reviews (pp. 77–117). Newark: IRA. Baltova, I. (1999). Multisensory language teaching in a multidimensional curriculum: The use of authentic bimodal video in core French. The Canadian Modern Language Review, 56, 31–48. Chai, J., & Erlam, R. (2008). The effect and the influence of the use of video and captions on second language learning. New Zealand Studies in Applied Linguistics, 14, 25–44. Chun, D.M., & Plass, J.L. (1996). Facilitating reading comprehension with multimedia. System, 24, 503–519. Craik, F. I. M., & Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology, 104 ( 3), 268–294. Danan, M. (1992). Reversed subtitling and dual coding theory: New directions for foreign language instruction. Language Learning, 42, 497–527. De Ridder, I. (2002). Visible or invisible links: Does the highlighting of hyperlinks affect incidental vocabulary learning, text comprehension, and the reading process? Language Learning & Technology, 6, 123–146. Ellis, N.C. (1994). Vocabulary acquisition: The implicit ins and outs of explicit cognitive mediation. In N.C. Ellis (Ed.), Implicit and explicit learning of languages (pp. 211–282). London: Academic Press. Field, A. (2009). Discovering statistics using SPSS. London: Sage publications. Godfroid, A., Boers, F., & Housen, A. (2013). An eye for words: Gauging the role of attention in incidental L2 vocabulary acquisition by means of eye tracking. Studies in Second Language Acquisition, 35, 483–517. Guillory, H.G. (1998). The effects of keyword captions to authentic French video on learner comprehension. CALICO Journal, 15, 89–108. Hardin, J.W., & Hilbe, J.M. (2003). Generalized estimating equations. London: Chapman & Hall/CRC. Horst, M., Cobb, T., & Meara, P. (1998). Beyond a clockwork orange: Acquiring second language vocabulary through reading. Reading in a Foreign Language, 11, 207–223. Huang, H.-C., & Eskey, D.E. (1999). The effects of closed-captioned television on the listening comprehension of intermediate english as a second language (ESL) students. Journal of Educational Technology Systems, 28(1), 75–96. Hulstijn, J.H. (1993). When do foreign-language readers look up the meaning of unfamiliar words? The influence of task and learner variables. The Modern Language Journal, 77, 139–147. Hulstijn, J.H. (2001). Intentional and incidental second language vocabulary learning: a reappraisal of elaboration, rehearsal and automaticity. In P. Robinson (Ed.), Cognition and second language instruction (pp. 258–286). Cambridge: Cambridge University Press. Hulstijn, J.H., Hollander, M., & Greidanus, T. (1996). Incidental vocabulary learning by advanced foreign language students: The influence of marginal glosses, dictionary use, and reoccurrence of unknown words. The Modern Language Journal, 80, 327–339. COMPUTER ASSISTED LANGUAGE LEARNING 25 Laufer, B. (2003). Vocabulary acquisition in a second language: Do learners really acquire most vocabulary by reading? some empirical evidence. The Canadian Modern Language Review, 59, 567–587. Laufer, B. (2005). Focus on form in second language vocabulary learning. In S.H. FosterCohen, M. Garcia-Mayo, & J. Cenoz (Eds), Eurosla yearbook volume 5 (pp. 223–250). Amsterdam: John Benjamins Publishing Company. Laufer, B. (2010). Form focused instruction in second language vocabulary learning. In R. Chacon-Beltran, C. Abello-Contessa, M.M. Torreblanca-Lopez, & M.D. Lopez-Jimenez (Eds.), Further insights into non-native vocabulary teaching and learning (pp. 15–27). Clevedon, UK: Multilingual Matters. Laufer, B., & Hill, M. (2000). What lexical information do L2 learners select in a CALL dictionary and how does it affect word retention ? Language Learning & Technology, 3, 58–76. Laufer, B., & Yano, Y. (2001). Understanding unfamiliar words in a text: Do L2 learners understand how much they don’t understand ? Reading in a Foreign Language, 13, 549– 566. Lonsdale, D., & Le Bras, Y. (2009). A frequency dictionary of French core vocabulary for learners. London, New York: Routledge. Markham, P. (1999). Captioned videotapes and second-language listening word recognition. Foreign Language Annals, 32, 321–328. Montero Perez, M., Peters, E., Clarebout, G., & Desmet, P. (2014). Effects of captioning on video comprehension and incidental vocabulary learning. Language Learning & Technology, 18, 118–141. Montero Perez, M., Peters, E., & Desmet, P. (2015). Enhancing vocabulary learning through captioned video: An eye-tracking study. The Modern Language Journal, 99, 308–328. Montero Perez, M., Van Den Noortgate, W., & Desmet, P. (2013). Captioned video for L2 listening and vocabulary learning: A meta-analysis. System, 41, 720–739. Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: University Press. Nation, I. S. P., & Webb, S. (2011). Researching and analyzing vocabulary. Boston: Heinle. Neuman, S.B., & Koskinen, P. 1992. Captioned television as comprehensible input: Effects of incidental word learning from context for language minority students. Reading Research Quarterly, 27, 95–106. Paulussen, H., Macken, L., Vandeweghe, W., & Desmet, P. (2013). Dutch parallel corpus: A balanced parallel corpus for Dutch-English and Dutch-French. In P. Spyns & J. Odijk (Eds.), Essential speech and language technology for Dutch (pp. 185–199). Heidelberg: Springer. Peters, E. (2007a). Manipulating L2 learners’ online dictionary use and its effect on L2 word retention. Language Learning & Technology, 11, 36–58. Peters, E. (2007b). The relationship between L2 learners’ online dictionary use and word retention. Language Forum, 33, 45–64. Peters, E., Heynen, E., & Puimege, E. (2016). Learning vocabulary through audiovisual input: The differential effect of L1 subtitles and captions. System, 63, 134–148. Peters, E., Hulstijn, J.H., Sercu, L., & Lutjeharms, M. (2009). Learning L2 German vocabulary through reading: The effect of three enhancement techniques compared. Language Learning, 59, 113–151. Peters, E., & Montero Perez, M. (2015). Vocabulary acquisition through audio(-visual) input: how to measure learning gains ? Paper presented at the workshop on reliability and validity in SLA research, Barcelona. 26 M. MONTERO PEREZ ET AL. Pulido, D. (2007). The relationship between text comprehension and second language incidental vocabulary acquisition: A matter of topic familiarity ? Language Learning, 57, 155– 199. Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University Press. Rott, S. (2007). The effect of frequency of input-enhancements on word learning and text comprehension. Language Learning, 57, 165–199. Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp. 3–32). Cambridge: Cambridge University Press. Schmitt, N. (2008). Review article. instructed second language vocabulary learning. Language Teaching Research, 12, 329–363. Selva, T., Verlinde, S., & Binon, J. (2002). Le Dafles, un nouveau dictionnaire electronique pour apprenants du français. In Proceedings of the Tenth EURALEX International Congress (pp. 199–208). Copenhagen: CST. Sharwood Smith, M. (1991). Speaking to many minds: on the relevance of different types of language information for the L2 learner. Second Language Research, 7, 118–132. Staehr, L.S. (2009). Vocabulary knowledge and advanced listening comprehension in english as a foreign language. Studies in Second Language Acquisition, 31, 577–607. Sydorenko, T. (2010). Modality of input and vocabulary acquisition. Language Learning & Technology, 14, 50–73. Tacq, J. (1997). Multivariate analysis techniques in social science research: From problem to analysis. London: Sage. van Zeeland, H., & Schmitt, N. (2012). Lexical coverage in L1 and L2 listening comprehension: The same or different from reading comprehension ? Applied Linguistics, 34, 457– 479. van Zeeland, H., & Schmitt, N. (2013). Incidental vocabulary acquisition through L2 listening: A dimensions approach. System, 41, 609–624. Watanabe, Y. (1997). Input, intake, and retention. effects of increased processing on incidental learning of foreign language vocabulary. Studies in Second Language Acquisition, 19, 287–307. Webb, S. (2010a). Pre-learning low-frequency vocabulary in second language television programs. Language Teaching Research, 14, 501–515. Webb, S. (2010b). Using glossaries to increase the lexical coverage of television programs. Reading in a Foreign Language, 22, 201–221. Webb, S. (2015). Extensive viewing. language learning through watching television. In D. Nunan & J.C. Richards (Eds.), Language learning beyond the classroom (pp. 159–168). New York: Routledge. Webb, S., & Rodgers, M. P. H. (2009). Vocabulary demands of television programs. Language Learning, 59, 335–366. Wesche, M., & Paribakht, T.S. 1996. Assessing second language vocabulary knowledge: Depth vs. breadth. The Canadian Modern Language Review, 53, 13–40. Winke, P., Gass, S., & Sydorenko, T. (2010). The effects of captioning videos used for foreign language listening activities. Language Learning & Technology, 14, 65–86. Yoshii, M. (2006). L1 and L2 glosses: Their effects on incidental vocabulary learning. Language Learning & Technology, 10, 85–101. Hulstijn, J. H. (2003). Incidental and intentional learning. In C. Doughty & M. H. Long (Eds.), Handbook of second language acquisition (pp. 349–381). Malden, MA: Blackwell. Lin, P. M. S. (2015). Internet television for L2 learning. In D. Nunan & J. C. Richards (Eds.), Language learning beyond the classroom (pp. 149–158). London: Routledge. Copyright of Computer Assisted Language Learning is the property of Routledge and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.