Vocabulary Growth of the Advanced EFL Learner Meral Ozturk* mozturk@uludag.edu.tr Abstract This article reports the results of three studies conducted between the years 20052010 on the vocabulary growth of advanced EFL university students in an English-medium degree programme. Growth in learners’ written receptive as well as productive vocabularies was investigated in one longitudinal and two cross-sectional studies over three years. While the first two studies used the receptive and semi-productive versions of the Vocabulary Levels Test, study 3 used the more recent Vocabulary Size Test (Nation and Beglar, 2007). The overall results of the three studies suggested that learners’ vocabularies did expand both receptively and productively, however the growth was rather modest. Learners' receptive vocabulary sizes were 5-6,000 words and expanded by about 500 words a year. There was also evidence for severe attrition in the final year. Productive vocabulary expanded by 10% in the longitudinal study. Receptive knowledge of academic vocabulary did not improve significantly due to a ceiling effect, but productive growth was significant. Frequency seems to have a stable overall effect in vocabulary development. However, for only one of the three tests used (i.e. the Vocabulary Levels Test) an implicational scale between the levels could be established. *Uludag University, Education Faculty, ELT Department, Turkey 2 Keywords: vocabulary growth, vocabulary size, receptive vocabulary, productive vocabulary, frequency, advanced learner, EFL Introduction A large vocabulary size is important in using English. Research has shown that for written receptive tasks like reading newspapers, novels, or academic texts, 8-10,000 words1 are necessary (Nation, 2006; Hazenberg and Hulstijn, 1996), and for spoken receptive tasks like watching English TV programmes or movies, 7-8,000 words are needed (Webb and Rodgers, 2009a; Webb and Rodgers, 2009b). For most EFL learners, these targets are quite challenging if not impossible to attain. Part of the reason is that English language courses do not usually target vocabulary beyond a few thousand (Cobb, 1995) on the assumption that having mastered the core vocabulary of English (i.e. the most frequent 2000 or so words) learners will maintain progress on their own. While in the earlier stages of language learning vocabulary learning is guided by the teacher and the coursebook, the advanced learner is left to their own devices to learn a large vocabulary mainly through language use. The question is whether extended language use promotes such learning and whether learners continue to expand their vocabularies fast enough to achieve the desired sizes. Another issue concerns patterns of lexical development. Do L2 learners’ vocabularies grow in predictable ways, or do they grow idiosyncratically depending on individual learners’ personal interests and needs? The present study will investigate the potential of word frequency to predict the path of development. Word frequency has long been a major guiding principle in setting lexical 3 targets for L2 learners, and it is assumed that learners should and will proceed according to frequency. However, few studies have so far empirically tested it. The present study will investigate these questions in relation to EFL learners who use English for academic purposes in English-medium degree programmes. As noted by Meijer (2006), Englishmedium programmes are spreading in non-English-speaking countries ‘especially but not exclusively in Europe’ and the kind of learner concerned does not represent a marginal subset of English language learners. L2 Vocabulary Growth The literature on L2 vocabulary growth is rather small and only three of these studies (Cobb & Horst, 2000; Schmitt & Meara, 1997; Milton & Meara, 1995) specifically deal with vocabulary growth through academic study at the tertiary level while others involve learning through direct language study in language courses rather than learning through language use (Milton, 2009, pp.79-85; Laufer, 1998; Read, 1988). Unfortunately, the results of these studies are conflicting regarding the evidence for progress. While Milton & Meara (1995) report significant gains in the learners’ receptive vocabularies, the other two fail to provide evidence for any substantial expansion of vocabulary size (Cobb & Horst, 2000; Schmitt & Meara, 1997). Growth rates reported in these studies also vary considerably. Milton and Meara (1995) investigated receptive vocabulary growth of European exchange students in a British university and they estimated the annual growth rate to be 2650 words on 4 average. On the other hand, Cobb and Horst (2000) found that the second year students’ receptive vocabularies in a university in Hong Kong differed from the first years’ by only 200 words, and the first years did not make any significant gains after six months. Schmitt and Meara’s (1997) EFL learners in Japan gained only 330 words receptively in a year. Obviously, more data from similar contexts are needed in order to gain insight into the nature of vocabulary growth of these learners. The present study will improve on previous research in many ways. All three of the aforementioned studies were limited in duration, not exceeding one year, during which sizebale gains may be hard to surface. The present studies, on the other hand, will cover a much longer time span, i.e. three years. While previous studies looked at receptive size only, the studies in this paper will look at both receptive and productive size. The only other study which investigated productive growth is Laufer (1998), who found an increase by 850 words in the vocabularies of her high-school learners in Israel after one year of language study. In the present research, both cross-sectional (studies 1 and 3) and longitudinal (study 2) designs will be employed. While the first two studies will use the receptive and (semi-) productive Vocabulary Levels Test (Nation, 2001) to measure vocabulary size, study 3 will use the recently developed Vocabulary Size Test by Nation and Beglar (2007). Growth in academic vocabulary, i.e. subtechnical vocabulary that occur frequently across a variety of academic disciplines but are not so common in non-academic texts 5 (Nation, 2001, p.187), will also be investigated. Although growth in this area is to be expected given the opportunities for exposure to academic vocabulary, Cobb and Horst (2000) did not find evidence for progress in the knowledge of academic words of the first year students over six months or from the first to the second year. Conducted under similar conditions, the present study will investigate if significant gains could be obtained over several years of academic study. It is expected that the context of the English-medium degree programme where the three studies that will be reported here were conducted will provide enough immersion in the target language to induce vocabulary development. However, the generally held conviction among the students in the programme that their English proficiency in general and vocabulary knowledge in particular deteriorated in the course of their studies runs counter to this expectation. The present study will shed light on this as well. The Frequency Effect An important factor affecting vocabulary development in a second language is frequency of words in the language. Frequency exerts its influence through input. In the L1, learners are exposed to words of varying frequency in receptive language use, and words that appear more often in the input stand better chance of being learnt as repeated encounters raise salience of the word, provide richer clues to meaning, and strengthen memory traces. Vermeer (2001), in a study with native English-speaking 6 children in primary education, found significant correlations between the frequency of a word in the input and the probability of knowing it. In L2 vocabulary acquisition, frequency is likely to have a stronger effect. Input to L2 learners is usually graded in vocabulary difficulty which is largely decided on the basis of word frequency, to the effect that high frequency vocabulary becomes even more frequent and effect of frequency more pronounced. Several studies have provided evidence for a frequency effect. Even though frequency is a continuous variable, these studies used test instruments where test words were drawn from lists of words divided, for the sake of convenience (Meara, 2010, p.3), into one-thousand-word bands of frequency. These studies have shown significant differences in scores between frequency levels and a decrease in knowledge as the frequency level decreased. Laufer et al. (2004) measured vocabulary knowledge of adult ESL learners with intermediate to advanced proficiency in English in four one-thousand-word bands of frequency (2K, 3K, 5K and 10K) using the Vocabulary Levels Test, and found that higher frequency words at 2K and 3K levels were easier for these learners than the lower frequency words in the 5K level which, in turn, were easier than the 10K level words. Milton (2009) reports a similar pattern in Greek learners whereby learners' knowledge of words were highest at the 1K level and steadly decreased over the following four adjacent levels of lower frequency. Laufer & Paribakht (1998) have also found increasingly higher scores across the frequency levels. Milton (2007) formulates this as ‘the frequency model of lexical learning’ profiling learners’ knowledge over frequency levels on a graph borrowed from Meara (1992). The following graph is the vocabulary profile of a typical learner. Words Known 7 100 90 80 70 60 50 40 30 20 10 0 1K 2K 3K 4K 5K Frequency Levels Figure 1. Vocabulary profile of a typical learner (Meara, 2010, p.6) The model claims that ‘a typical learner’s knowledge is high in the frequent columns and lower in the less frequent columns giving a distinctive downwards slope from left to right. As learner knowledge increases, this profile moves upwards until it hits a ceiling at 100% when the profile ought to flatten at the most frequent levels and the downwards slope, left to right, shifts to the right into less frequent vocabulary bands.’ (Milton, 2007, p.49). Milton’s own research (2007, 2009) generally supported the model yielding normal frequency profiles for 60% of the learners. On the other hand, a substantial proportion of learners deviated from a normal profile and even the most able learners were not able to hit the 100% ceiling in the highest frequency levels but plateaued at around 85-90%. This suggests that while frequency has a strong effect in vocabulary learning other factors might be at play. 8 Some researchers went further and looked for the presence of an implicational scale among the levels. Read (1988) has shown that frequency levels in the VLT form an implicational scale whereby a learner ‘…who achieved the criterion score at a lower frequency level-say, the 5,000-word level- could normally be assumed to have mastered the vocabulary of higher frequency levels - 2,000 and 3,000 words- as well’ (Read, 1988, p.18). This finding has been replicated by Schmitt, Schmitt & Clapham (2001). While frequency is clearly an important factor in vocabulary learning, the case for frequency could be made from previous studies only for the earlier stages of vocabulary learning since the test instruments used either did not measure knowledge in lower frequency levels beyond 5K (X-Lex in Milton 2007, 2009) or measured only the 10K level skipping the levels in between (Vocabulary Levels Test in other studies). Frequency might not have the same strong effect on vocabulary learning in advanced levels as in earlier levels. Since words that need to be learned at an advanced level will generally be of low frequency, other factors like personal interest might become more decisive in determining which words are learnt. The present study will test for an implicational scale covering a greater range of frequency levels in receptive vocabulary knowledge in study 3. The presence of an implicational scale will also be tested for productive vocabulary knowledge in studies 1 and 2. Vocabulary Levels Test (VLT from now on) scores in studies 1 and 2 will also be investigated for an implicational scale for the sake of comparison. 9 The three studies here will seek for answers to the following research questions: 1. Do the written receptive and written productive vocabulary sizes of advanced EFL learners in English-medium degree programs continue to grow over time and at what rate do they grow? 2. Does the knowledge of academic vocabulary in a foreign language develop receptively and productively through academic study? 3. Do word frequency levels form an implicational hierarchy in developing a written receptive and a written productive vocabulary in a foreign language through academic study? Study 1 Fifty-five first-year and forty-five fourth-year students in four intact classes in the ELT programme in a university in Turkey participated in the study. All spoke Turkish as their L1. They were highly advanced in English as they had to pass a very competitive national English test to be admitted to the programme. In the department, they were immersed in an English language environment, which should be conducive to further development of vocabulary. Beginning from the first-year, all intradepartmental courses are offered in English and take up 73% of all the courses in the four-year curriculum and 74% of the credit hours that have to be taken to graduate from the programme. In these courses, the course material, lectures, class discussions, oral presentations, written projects and exams are mediated through English. English language skills courses are offered in the first year and the rest of the courses are related to learners’ subject area 10 which includes linguistics, English language teaching, language acquisition, and English literature. Study 1 employed a cross-sectional design comparing the first- and fourth-years in terms of English vocabulary size. Any difference in vocabulary size between the two groups is assumed to be the result of the extra years of exposure to English through academic study by the fourth-year group as both groups studied under similar conditions in the programme. Both groups had to take the same courses throughout with the exception of a few interdepartmental elective courses which are taught in the learners’ L1; both groups were taught by the same teachers as the staff is pretty stable; and the course material is unlikely to have grossly differed in the three intervening years. Initial English proficiency and vocabulary size of the two groups is likely to be very similar since the national English test admits students to the department from a very narrow range of scores each year. However, the method of calculation of the national test scores has been changed between the years when the fourth-years and the first-years sit the English test (the years 2002 and 2005 respectively) and therefore a direct comparison of the learners’ initial proficiency scores was not possible. On the other hand, the content of the test has not changed from the year 2002 to 2005. In both occasions, 60% of the items measured reading comprehension, 25% tested grammar and vocabulary and 15% were translation items. Thus, both groups studied for the same kind of an exam, and any washback effect from the test has probably led to the development of the same kind of linguistic skills in English. 11 Learners’ receptive and productive vocabularies at different frequency levels were measured. For this purpose, the receptive and (semi-) productive versions of the Vocabulary Levels Test (Nation, 2001) were used. As a measure of receptive vocabulary size, Version B of the VLT (Nation, 2001) was used. The test measures knowledge of 156 words in total from four frequency levels (2K, 3K, 5K, and 10K levels) as well as academic vocabulary from the Academic Word List (i.e. AWL (Coxhead, 2000)). Thirty words from each frequency band and thirty-six words from the AWL were tested. The test uses a matching format as in the following example, where three words are being tested (horse, pencil, wall): 1 2 3 4 5 6 business clock horse pencil shoe wall _____________ part of a house _____________ animal with four legs _____________ something used for writing In scoring the test, one point was given for each correct answer and section scores were computed by counting the number of correct answers in a given section. The test had good overall and group reliabilities (KR 21=0.89 overall, KR 21=0.93 first years, KR 21=0.75 fourth years). Productive vocabulary size was measured by Version C of the Productive Levels Test (PVLT) (Nation, 2001). This test was chosen because of its structural similarity to the receptive version although its validity as a test of productive vocabulary knowledge has 12 been questioned (Read, 2000, pp. 124-6; Schmitt, 2010, pp.203-5). The test format simulates the mental processing of words in production where users go from a word’s meaning to its form. It does not measure vocabulary that learners can use in production. It measures vocabulary which are ‘available for productive use’ (Laufer & Nation, 1999, p.41). As far as validity goes, PVLT is no different from the receptive tests (VLT, Vocabulary Size Test and the Yes-No tests (Meara, 2010). These receptive tests are practically decontextualised, and they do not measure ‘use’ of vocabulary, either, in that in answering these tests learners are not using the words to understand written texts. To borrow a common dichotomy from SLA, the receptive tests and the PVLT are tests of lexical competence as opposed to lexical performance. In the present study, the test was reliable overall and for each group of learners (KR 21=0.84 overall, KR 21=0.87 first years, KR 21=0.73 fourth years). The test measures knowledge of 90 words in total. It parallels the receptive version and consists of the same sections (the four frequency levels and the academic word level from the slightly larger University Word List (Xue and Nation, 1984)) with 18 items each. Although the words tested in the receptive and productive versions are not entirely identical (in fact, two-thirds of the items in the productive version were different), the two tests are not necessarily incomparable. The test words are random selections from a frequency list, and the test scores represent knowledge of all words in a frequency level rather than knowledge of the exact words tested. One word is no better than another drawn from the same level. Therefore, this should not be a major 13 drawback, although ideally one would want all the words to be the same for full comparibility. Each item in the productive test consists of a single sentence with a blank for one of the words. The learner is asked to provide the missing word. The test, however, is productive in a ‘controlled’ way (Laufer & Nation, 1999) as the beginning of each missing word is given to limit the number of possible answers. Thus, only one word is possible for a given blank. Here is an example for the word bicycle. He was riding a bic___________. The test was scored with one point given for each correct answer on the test. There was no penalty for misspelled or wrongly inflected words unless the spelling mistake distorted the pronunciation or the orthographic form of the word or the inflection error involved an irregular form. Thus, willy for wily, council for counsel, homojen (L1 spelling) for homogeneous were counted incorrect due to spelling errors. Also, the inflected forms stretchen for stretched or thrusted for thrust were counted incorrect. On the other hand, omission of letters which did not alter the spoken or written form of the word to an important degree (e.g. orchides for orchids), omission of plurals or tense markers required by the sentence context were not considered as mistakes. 14 Both tests were given in one session during normal class hours in the second half of the 2005-2006 academic year. Each group of learners was tested separately. Half of the learners in a group were given the receptive test first while the other half were given the productive test first in order to prevent fatigue from having an effect on test performance. No two learners sitting next to each other received the same version first, so that any possibility of cheating is eliminated. There were some overlapping items between the receptive and productive tests, which might have provided an advantage to those who answered the receptive test first. It was not possible to check for this possibility in the present data. However, study 2, using the same first-year data as in study 1, has found no statistically significant effect of test order on test scores. Instructions in English appeared on the first page of each test. The learners were advanced enough in English to understand the instructions and no problems with the instructions were reported. Each student did both of the tests. On completing a given version, the learner was immediately given the other version. The tests were completed in about an hour. The results for the receptive test are reported in means and mean percentages in Table 1. One outlier from the fourth-year group and two outliers from the first-year group were removed from the analysis of the receptive data. The one from the fourthyear group was removed for being too high (98% overall) and those of the first-years 15 were too low (25% and 36% respectively). The results were examined using a 2x4 analysis of variance with the year of study as the between-subjects factor with 2 levels (Year 1 and Year Four) and the frequency level as the within-subjects factor with 4 levels (2K, 3K, 5K, and 10K levels). The academic word section was analysed separately, as it did not represent a level of frequency. Table 1. Receptive Test Results of learners in study 1 Mean % SD 2000 1 4 N=53 N=44 28.53 28.13 95% 94% 1.70 1.71 3000 1 4 N=53 N=44 25.89 25.90 86% 86% 3.87 2.52 5000 1 4 N=53 N=44 21.09 20.77 70% 69% 6.06 3.64 10000 1 4 N=53 N=44 9.51 9.18 32% 31% 5.87 3.75 Academic 1 4 N=53 N=44 30.70 31.79 85% 88% 5.03 3.18 Total 1 4 N=53 N=44 115.71 115.79 74% 74% 19.84 10.91 ANOVA revealed the main effect for frequency significant (F=829.47, p=.000) while the main effect for the year of study (F=.152, p=.697) and the interaction between the two (F=.102, p=.959) were not statistically significant at the .05 level. Assuming equal starting vocabularies for the two groups, it seems that the fourth years' receptive vocabularies did not grow further in the three years. The similar performance of two groups across the frequency levels suggested that they were not qualitatively different, either. Overall, the differences between frequency levels were all significant according to Bonferroni tests. The vocabulary profile of the whole group in Figure 1 shows that learners’ scores across frequency levels linearly decreased with decreasing frequency. Thus, learners’ knowledge was greatest at the highest frequency level, and systematically decreased over the following levels as frequency decreased. Mean scores were very close to the 100% ceiling in the 2K level (around 95%) and relatively high in the 3K level 16 (86%), but only 28 learners (29%) hit 100% in the 2K level and 12 learners (12%) in the 3K level. 30 25 20 15 10 5 0 2K 3K 5K 10K Figure 2. Vocabulary profile of the whole group in receptive VLT in study 1 (N=97) Following Read (1988) and Schmitt et.al. (2001), A Guttman scalability analysis (Hatch and Lazaraton, 1991) was used to test for the presence of an implicational scale between the frequency levels. Both studies found high degrees of scalability whereby a learner who attained the criterion for mastery at a given level can be safely assumed to have also attained mastery at higher levels of frequency. As in the previous studies, the criterion for mastery in the present study was set at 90%. Scores higher than or equal to the criterion was assigned a 1 while scores below the criterion were assigned a 0. Hatch & Lazaraton (1991) recommend at least .90 for the coefficient of reproducibility and .60 for the coefficient of scalability to be obtained in order to establish an implicational scale. The results of the present study revealed coefficients higher than the minimum values recommended (Crep=.995, MMrep=.822, Cscal=.971). They were also similar to, or 17 even higher than, those obtained in the previous two studies (Crep=.93 and .92 for the two separate administrations in Read (1988), and .993 and .995 for the two different versions of the test in Schmitt et.al.(2001); Cscal= .90 and .84 in Read (1988) and .971 and .978 in Schmitt et.al. (2001)). These results suggested that the frequency levels in the VLT formed an implicational scale. Both groups of learners displayed knowledge of many of the words in the academic word section (85% and 88%), however the difference between the two groups did not reach statistical significance at the .05 level (t= 1.304, p=.196). The results of the productive test are given below in Table 2. One outlier from each group was omitted from the analysis. The omitted fourth-year student scored too high in the test overall (83%) and the first-year student scored too low (5%). Table 2. Productive Test Results of learners in study 1 Mean SD 2000 1 4 N=54 N=44 13.52 14.84 75% 82% 3.12 2.11 3000 1 4 N=54 N=44 7.69 8.52 43% 47% 3.67 3.19 5000 1 4 N=54 N=44 5.46 6.25 30% 35% 2.28 1.79 10000 1 4 N=54 N=44 2.12 2.14 12% 12% 1.98 1.41 Academic 1 4 N=54 N=44 7.20 7.68 40% 43% 3.22 2.70 Total 1 4 N=54 N=44 35.99 39.43 40% 44% 12.63 8.98 The ANOVA yielded only one significant effect. The main effect for frequency (F=877.30, p=.000) was significant while the main effect for year of study (F=2.91, 18 p=.091) and the interaction between the two (F=2.51, p=.059) were not. These results suggested that learners' productive vocabularies did not grow significantly in three years, but frequency was again effective in determining the overall course of development. Bonferroni post-hoc tests have shown all differences between frequency levels significant. Learners’ profile in Figure 2 shows a linear decrease with decreasing frequency. Differently from the receptive scores, there is a sharp difference between the 2K scores and the rest. Still, 2K scores were relatively low (75 vs 82 %), and only 8 learners (8%) hit the 100%. 16 14 12 10 8 6 4 2 0 2K 3K 5K 10K Figure 3. Vocabulary profile of the whole group in productive VLT in study 1 (N=98) For the Guttman analysis the 90% criterion for mastery was not applicable as there were too few scores which passed the criterion (i.e. 34 scores in 2K and none in the 3K, 5K and 10K levels). Therefore, a different criterion was used to assign the values of 1 or 0 to level scores. A score was assigned a 1 if it was higher than the learner’s score at the next level of lower frequency. Thus, if a learner has scored 10 in the 2K level and 8 in 19 the 3K level his /her 2K score was assigned a 1. If, on the other hand, he /she has scored 8 in the 2K but 10 in the 3K level, his / her 2K score received a 0. The results revealed a relatively high coefficient of scalability although lower than that of the receptive test whereas the coefficient of reproducibility was lower than the minimum .90 recommended by Hatch & Lazaraton (1991), which makes it dubious to make a case for an implicational scale with sufficient confidence (Crep=.817, MMrep=.901, Cscal=.848). The difference between the two groups in academic word section was not statistically significant (t=.799, p=426). The gap between receptive and productive knowledge of academic vocabulary within the same individuals, however, seems rather large. While learners’ receptive knowledge of academic words is quite substantial, their productive vocabulary is half the size of their receptive vocabulary (85% vs 40% for the first years and 88% vs 43% for the fourth years). Nevertheless, the results of this study need to be treated with caution as the study employed a cross-sectional design, and, although, the first- and fourth-years were argued to be similar in initial proficiency, there is still the possibility that they might have been different. Therefore, Study 2 will employ a long-term longitudinal design in order to verify the results of Study 1. 20 Study 2 Eighteen students from among the 55 first-year students in study 1 participated in Study 2. Of the 55 students, only the results of those for whom fourth-year data could be obtained were used in this study. These learners were tested twice on their vocabulary knowledge. The first time was in 2005 when they were in their first-year of study. They were tested again in their final year in the programme in 2008. The same materials were used on both occasions. The three-year lapse between the two testing events was considered long enough for any kind of learning not to carry over from the first to the second testing. It was not possible to collect the fourth-year data in class. As the learners were free to register for a course in any of the six groups that were available, the two intact groups used in the first-year data no longer existed. Therefore, the tests had to be distributed by hand to the 55 learners who participated in the first-year data to be completed in their own time. Eighteen learners answered and returned the tests with a return rate of 32%. In order to maintain testing conditions as similar to the first time as is possible, the order of the two tests was counterbalanced across the participants. Half of the 55 learners were given written instructions to answer the receptive test first, and the other 21 half were asked to answer the productive test first. Fortunately, the data from the 18 learners preserved the balance in the order of the tests as it turned out that 9 of the learners had answered the receptive test first and the other 9 learners had completed the productive test first. The learners were instructed to answer the tests on their own and not to apply to external sources like a dictionary or another person. After the scoring of the tests was complete, the results were shared with the students individually. The fact that some of the target words appeared in both tests had a potentially contaminating effect on the data. There were 27 such words which amounted to about one third of the items on the productive test. It was possible that the overlapping items provided advantage to those students who answered the receptive test first. Having seen an item in the receptive test first might have aided the subsequent recall of the item in the productive test. The learners noticed the presence of these overlaps as they pointed out this fact to the researcher after the testing. In order to check for a possible advantage of the overlapping items on the productive test results, the productive scores of students who did the receptive test first in the fourth-year data (Mean= 56.89, SD=11.85) were compared to those who answered the productive test first (Mean= 50.33, SD=10.39) using independent samples t-test. The order effect could not be tested in the first-year data as information as to the order of tests for individual learners was not available. Although those who answered the 22 receptive test first seem to have gained advantage by about 6 items on the whole test, this advantage was not statistically significant at the .05 level (t= -1.288, p=.216). Thus, it appears that test order is unlikely to have influenced the results to a significant degree. However, future studies using the Levels tests are advised not to counterbalance the order of the tests, but give the productive test before the receptive. The results of the receptive test for the first-year and for the fourth year are given in Table 3 below. One subject was removed from the final analysis as she reported that she obtained the original tests from the internet after the first administration and that specifically studied the tests between the two administrations. Table 3. Receptive Test Results of learners in study 2 (N=17) 2000 3000 5000 10000 Academic Total 1 4 1 4 1 4 1 4 1 4 1 4 Mean 28.94 96% 29.24 97% 27.41 91% 27.88 93% 23.71 79% 24.88 83% 11.47 38% 13 43% 33.35 93% 34.29 95% 124.88 83% 129.29 86% SD 1.95 1.25 3.85 2.54 4.10 2.34 4.87 5.13 2.26 2.11 14.17 9.32 N.B. The maximum score for the academic vocabulary level is 36, for other frequency levels 30, and for the total 156. The ANOVA results were similar to those of study 1. While the main effect for frequency was statistically significant (F=265.92, p=000), neither the main effect for year of study (F=.944, p=.339), nor the interaction between the two (F=.386, p=.764) reached significance. The non-significant effect of the year of study suggested that these learners 23 did not improve their vocabularies to a significant degree. This finding is surprising because the learners who participated in this study did so voluntarily without any return for their efforts other than the feedback on their lexical competence. They are likely to have highly positive attitudes towards and a high degree of motivation for vocabulary learning. A comparison of this group of learners with the larger group in Study 1 has shown that they scored significantly better overall on the receptive test (t=-2,698, p=.009) as well as on the productive test (t=-3,363, p=.001). Therefore, they were likely to have a larger starting vocabulary. Even the better learners, however, do not seem to make any progress receptively. Benferroni post-hoc tests on frequency revealed all differences between frequency levels significant. The Guttman scalogram analysis using a 90% criterion for mastery suggested the presence of an implicational scale between the levels (Crep=.986, MMrep=.867, Cscal=.894) . The vocabulary profile of the learners in Figure 3 is somewhat flatter in the higher frequency levels than those in study 1, which resulted from the closer performance of the study 2 learners in the 2K and 3K levels. Mean scores were very close to 100% and about half (53%) of the learners hit 100% in the 2K and 3K levels each. 24 35 30 25 20 15 10 5 0 2K 3K 5K 10K Figure 4. Vocabulary profile of the whole group in receptive VLT in study 2 (N=17) The difference in academic word scores between the first year and the fourth year of study (i.e. 1 word on average), although statistically significant (t= -2.885, p=.011), was very small in terms of the number of words learnt during the intervening three years amounting to an increase by about 16 words over the 570 words of the AWL. Of course, this might be due to a ceiling effect. Learners’ scores in the academic vocabulary section were already very high in the first year, only one to three items short from the maximum score and there was little room for improvement. The results of the productive test are given in Table 4. The ANOVA revealed two significant effects: the main effect for frequency (F=255.018, p=000) and the main effect for year of study (F=8.530, p=.006). The interaction was not significant (F=1.577, p=.200). 25 Table 4. Productive Test Results of learners in study 2 (N=17) 2000 3000 5000 10000 Academic Total 1 4 1 4 1 4 1 4 1 4 1 4 Mean 15.29 85% 16.18 90% 8.94 50% 11.65 65% 6.88 38% 8.24 46% 3.00 17% 4.88 27% 9.00 50% 11.29 63% 43.12 48% 52.24 58% SD 1.99 1.29 2.82 2.94 1.69 2.56 1.46 3.04 2.72 2.62 8.43 9.62 *The maximum score for each frequency level is 18, and for the total 90. Overall learners increased their scores by about 10% from year 1 to year 4. The increase uniformly occured in all sections of the test by 1-2 words on average. The Benferroni test showed all the differences between frequency levels significant, and the vocabulary profile in Figure 4 indicates a linear decreasing effect. Learners did rather well in the 2K level (85% vs 90%), but only 3 learners (18%) hit 100%. The Guttman analysis did not suggest an implicational scale between frequency levels (Crep=.844, MMrep=.892, Cscal=.444) with both the reproducibility and the scalability coefficients being lower than the minima (>.90 and >.60 respectively) suggested by Hatch & Lazaraton (1991). 18 16 14 12 10 8 6 4 2 0 2K 3K 5K 10K Figure 5. Vocabulary profile of the whole group in productive VLT in study 2 (N=17) 26 For the academic vocabulary, the overall gain in three years was around 13%, which was also statistically significant (t=-2.885, p=.011). In this study as in study 1, there was a large gap between the receptive and productive knowledge of academic vocabulary with the gap getting smaller in the fourth year (the receptive-productive ratio being 43% in the first year (93% -50%) and 32% in the fourth year (95% -63%)). Study 3 Study 3 was conducted against the possibility that the receptive gains are underrepresented in the first two studies due to the limitations of the test instrument used. The Vocabulary Levels Test is not sensitive to gains in lower frequency levels especially after the 5K level as the frequency bands selected are not spaced evenly and more bands are measured from the first 5K words while the second 5K (up to 10K) is measured with only one band. Any gains made within this broad band will not be detected by the Levels Test. This is certainly a possibility for 20% (N=11) of the learners in study 1 and 30% (N=6) of learners in study 2 who already have attained mastery at the 5K level and likely to have moved to lower frequency levels beyond. Study 3 will use a measure which is more sensitive to gains in lower frequency vocabulary. The participants in study 3 were drawn from the same context as in studies 1 and 2. There were 174 participants altogether who were in different stages of their studies. 27 48 of these were first-year students, 60 were second-years, 34 were third-years, and 32 were fourth-years. These groups were assumed to represent different levels in terms of general English proficiency as well as vocabulary knowledge because of the differences in the number of years of study. In study 3, the size of learners’ receptive vocabularies was measured with the Vocabulary Size Test (Nation and Beglar, 2007; http://www.victoria.ac.nz/lals/staff/paul-nation/nation.aspx). also available Learners’ at: productive knowledge could not be measured due to the absence of an equivalent productive test. The Vocabulary Size Test is based on word frequency lists from the British National Corpus, arranged into 14 one-thousand-word bands of decreasing frequency. The test contains 10 target words from each frequency band with a sampling rate of 1 in 100. The target words are presented in short sentences with non-defining contexts. The test uses a multiple-choice format in which the choices are single-word or phrase-length definitions. An example item from the 2K band is given below: nil: His mark for that question was nil. a. very bad b. nothing (key) c. very good d. in the middle In the present study, learners were tested only on nine of the fourteen frequency levels. The 1K level and the four levels from 11K-14K were not tested. The 1K level was judged to be too easy and the levels beyond the 10K too difficult for the learners from the 28 performance of their peers in the Levels Test in the first two studies. The exclusion of these levels resulted in a shorter and more feasible test. In scoring the test, one point was given for each correct answer. The ovarall test score was converted to a size score over 10,000 words targeted in this reduced version of the test. In the calculation of overall size, the learner’s test score was multiplied by 100 as each word in the test represented 100 word families. Given the English proficiency levels of the learners and their near perfect performance in the 2K level in the Vocabulary Levels Test, the learners were credited with knowledge of the whole of the 1K level words, and accordingly the size score was increased by 1000 for each learner. The test was administered to students in normal class hours. The learners were told that they were being asked to answer a test which measured how many words they knew in English. It was believed that the learners will be more motivated to do the test if they also benefited from it, and therefore, they were promised their test results. Considering the concern some learners might feel in having their results announced publicly on a notice board, they were given a choice to learn their test score individually in private. No time limits were set for the test, but it was completed in about 40 minutes by most students. The slowest learner took 50 minutes and the fastest as few as 20 minutes. The results of the test for the four learner groups are given in Table 5. The KR 21 reliabilities were mostly acceptable with the exception of the third years. Overall, 29 learners answered about half of the items on the test correctly (47 out of 90 items). This converts to a receptive size of 5686 words on average. The scores spread over a wide range: the highest score was 69 and the lowest was 22. Expressed in terms of vocabulary size, these figures correspond to 7900 and 3200 word families respectively with a difference of 4700 word families. None of the learners seem to have made the 8-10,000 written-receptive target, and are at varying distances from it. Table 5. Results of the Vocabulary Size Test in study 3 Year First Mean Two 6.54 Three 5.90 Four 5.67 Five 4.83 Six 4.23 Seven 3.67 Eight 5.00 Nine 3.17 Ten 3.44 Test 42.44 Size 5243.75 (N=48) SD 1.37 1.57 1.66 1.45 1.36 1.31 1.56 1.36 1.70 7.84 783.86 Second Mean 7.58 6.40 6.18 6.07 4.33 3.98 6.28 3.68 3.13 47.65 5765.00 (N=60) SD 1.43 1.56 1.47 2.07 1.59 1.42 2.03 1.47 1.83 9.61 961.08 Third Mean 8.03 6.88 7.15 6.74 4.65 4.74 7.03 4.00 4.03 53.24 6323.53 (N=34) SD 1.17 1.39 1.65 1.29 1.41 1.29 1.57 1.26 1.51 6.94 694.16 Fourth Mean 7.53 6.16 6.72 5.97 3.72 4.03 5.91 3.34 2.91 46.28 5628.13 (N=32) SD .95 1.39 1.37 1.49 1.69 1.31 1.94 .97 1.45 8.24 823.54 Total Mean 7.37 6.31 6.33 5.84 4.25 4.05 6.01 3.54 3.35 47.05 5705.17 (N=174) SD 1.39 1.53 1.62 1.79 1.53 1.38 1.93 1.34 1.70 9.12 912.11 F 10.32 3.03 6.99 9.46 2.15 4.29 9.08 3.14 3.02 11.12 p .000* .031* .000* .000* .095 .006* .000* .027* .031* .000* KR 21 0.64 0.77 0.56 0.68 0.74 ANOVA revealed all the effects significant (Main effect of frequency: F=212.485, p=.000; Main effect for the year of study: F=11.128, p=.000; Interaction: F=2.735, P=.000). Learners’ overall scores steadily increased by the year of study except a drop in the last year. The Benferroni post-hoc tests revealed all the differences between the first three groups significant while the fourth years’scores were not significantly different from the first years’ and the second years’and they were significantly lower than the third years’. The mean increase over the whole test between two successive groups (first years vs second years and second-years vs third years) was about 5 words, which 30 corresponds to an annual increase by about half a thousand words (521 words vs 559 words respectively). While these are not as impressive as the annual growth rate (i.e. 2650 words a year) reported in Milton and Meara (1995), they are a little better than the 330 word families reported in Schmitt and Meara (1997), the 200 words in Cobb and Horst (2000), or the insignificant gains in Study 1 and 2 in this paper. The significant increase in receptive vocabulary in study 3 in comparison to study 1 and 2, could be explained by the greater capacity of the Vocabulary Size Test to measure knowledge in lower frequency levels with greater sensitivity. The vocabulary profile of the whole group across the levels of the test (cf. Figure 6) somewhat deviates from a normal profile. The most noticeable deviation is the unexpected peak in the 8K scores, which were not significantly different from the 3K, 4K and 5K scores according to Benferroni tests, and more learners (i.e. 18 vs 14, 16, and 9 respectively) passed the 90% cut-point for mastery. An examination of the test words in this level suggested that the presence of three cognates (palette, authentic, cabaret) were likely to have boosted performance at this level. Disregarding the 8K level, learners’ profile shows a general decreasing trend with frequency so that the scores were highest at the 2K level and lowest at the 10K level. However, this decrease seems to be taking place slowly and become obvious over several frequency bands. There are three such clusters in the data: 3K, 4K and 5K form a cluster, the 6K and 7K form another cluster, and finally 9K and 10K also cluster together. Bonferroni tests revealed the differences between the levels in a cluster non-significant (except the difference between 3K and 31 5K). Learners’ performance at the 2K level was significantly better than all other levels. However, the scores were not as high as that would be expected given the level of the learners and their performance in the receptive version of the VLT. Learners scored 7.37 on average out of 10 and only 22% of the learners (i.e. 39) demonstrated mastery in this level on the 90% criterion while only 6 (3%) learners hit 100%. In the light of this finding, the validity of our earlier assumption that the learners would have full mastery of the 1K level was checked with further data obtained from the second year group. 17 learners answered the 1K level section of the test where one of the distractors for the word reason was replaced as recommended by Beglar (2009) for failing to behave properly as a distractor. The average score for this level was 9.52 over 10 items, which suggested that learners’ vocabulary sizes in Table 6 have been overestimated by about 50 words only in each group. As this overestimation was uniform across proficiency groups, it does not invalidate the foregoing conclusions drawn from the data concerning group differences. 8 7 6 5 4 3 2 1 0 2K 3K 4K 5K 6K 7K 8K 9K 10K Figure 6. Vocabulary profile of the whole group in the Vocabulary Size Test in study 3 (N=174) 32 The interaction effect seems to be the result of the fourth years’ performance across the frequency levels (cf.Figure 7). While the other three groups display a remarkably similar pattern of scores over the levels (cf. Figure 8 without the fourth years), the fourth year group deviate from this pattern. 9 8 7 6 First Year 5 Second Year 4 Third Year Fourth Year 3 2 1 0 2K 3K 4K 5K 6K 7K 8K 9K 10K Figure 7. Vocabulary profiles of the four year groups in the Vocabulary Size Test in study 3 (N=174) 33 9 8 7 6 First Year 5 Second Year 4 Third Year 3 2 1 0 2K 3K 4K 5K 6K 7K 8K 9K 10K Figure 8. Vocabulary profiles of the first three groups in the Vocabulary Size Test in study 3 (N=142) The fourth-years scored unexpectedly lower overall and in test sections. Their scores were always lower than those of the third years’. They scored lower than the second years in seven of the nine sections. They were almost always better than the first years. They generally fell somewhere between the first and second years. To explain this unexpected performance, the possibility of an initial discrepancy with the other groups was investigated. Learners’ scores in the university admissions test were expected to provide some clue to their initial proficiency and indirectly to the size of their vocabularies as these two are closely related (Alderson, 2005). While being a composite measure, the university admissions score is largely based on scores in an English proficiency test and therefore would be highly indicative of English proficiency levels of the learners prior to the start of their studies. Scores were not available for individual 34 learners who participated in this study. However, the descriptive statistics provided by the National University Testing Centre for the whole student population in the department (cf. Table 6) do not indicate any important gaps between the fourth-years and the other groups. So, the fourth years were not initially disadvantaged. A sudden attrition in the final year is not altogether unlikely, though. In an interview about their test results the learners suggested that they have limited extracurricular involvement with English as they devote most of their time to the preparation for a written (nonEnglish) exam for prospective teachers in order to secure a position as English language teachers in state schools after graduation. The performance of the fourth years in this study might also offer an explanation for the non-significant results in receptive scores in the first two studies, which sampled from the first and fourth years skipping the intermediate years. The decrease in receptive size in the fourth year might have disguised the growth in the intervening years. It will be alarming, however, in view of L2 vocabulary learning if all this attrition took place in the final year and most of what was gained in three years was lost in one year. Other studies also report attrition in vocabulary knowledge. Schmitt and Meara (1997) note that 28% of their subjects decreased in vocabulary size, and in Milton and Meara (1995) 5 of the 53 subjects regressed to a lower size. This suggests that vocabulary learning is not only a matter of learning new words or new aspects of known words, but also of preserving what is known. 35 Table 6. Learners’ University Admissions Test scores in study 3 Current Level Year of Admission N Mean SD Min. Max. First Years 2009 164 342 5.35 317 368 Second Years 2008 164 358 2.67 355 375 Third Years 2007 154 352 3.33 346 366 Fourth Years 2006 154 351 3.07 345 371 A number of Guttman scalogram analyses were performed on the data (cf. Table 7). The 90% criterion was not usable as there were often either too few or no scores that met the criterion. Therefore, as in the first two studies, a given score was defined with respect to the score for the next frequency level. This analysis was first applied to the whole data, but it did not indicate the presence of an implicational scale between the levels. In the hope of obtaining evidence for an implicational scale, the analyses were repeated without the 8K level and without the fourth year data where the results turned out to be rather different than was predicted. Another analysis included only the levels corresponding to those in the VLT on the basis of the possibility that the VLT revealed an implicational scale because it used larger frequency bands. None of these analyses, however, suggested the presence of an implicational scale. 36 Table 7. Guttman scalogram results in study 3 All Crep 0.539 Without Level 8 0.554 Without Fourth Years 0.536 VLT Levels MMrep 0.54 0.573 0.655 0.641 Cscal 0.39 0.044 0.344 0.342 0.518 General Discussion Receptive Growth While the first two studies did not provide evidence for significant receptive growth, study 3 suggested that this might be due to a backslide in the final year sweeping the gains made in three years. When the final year is excluded, study 3 has shown that learners’ receptive vocabularies do increase but do so slowly by about 500 words a year. Nation (1990, p.11) estimates the receptive growth rate for native speakers to be between 1,000 and 2,000 new words per year. The learners in the present study did not seem to expand their receptive vocabularies at that rate. These results are not dissimilar to those obtained in some of the studies mentioned earlier, either (Cobb & Horst, 2000; Schmitt & Meara, 1997). Although the first two studies in the present research did not provide a size estimate, study 3 has shown that the size of learners’ vocabularies was about 5-6,000 words. This is obviously lower than the 10,000-word target set for academic study (Hazenberg & Hulstijn, 1996), suggesting that learners do not attain the vocabulary target for academic reading incidentally through reading. 37 The learners’ main input for written receptive development came from reading of academic texts in their disciplines. Academic reading involves incidental learning of vocabulary, and research suggests that such learning is minimal in L2 learners (see Horst, Cobb and Meara, 1998 and Horst, 2005 for a review of this research). Of course, the slow vocabulary growth in the present study might be the result of the specific learning conditions in the institution where the research has been conducted. Nation (2007) argues that substantial vocabulary learning from reading will occur if learners are exposed to large amounts of text, and there is some research evidence for this (Horst, 2005). The amount of reading these learners have undertaken might have been insufficient either because the learners were assigned small amounts of reading in the courses they have taken or because there wasn’t sufficient enforcement for the completion of the reading assignments. However, slow growth seems typical for this kind of learning context as other studies conducted in an EFL academic context (Schmitt & Meara, 1997; Cobb & Horst, 2000) found similarly slow growth rates. There were several disadvantages of the learning context which made it unfavourable to vocabulary learning from reading. First, the learners in the present study were relatively advanced learners with most of the high frequency vocabulary already in stock. The words that these learners will need to learn were often low frequency words which are less likely to be learned incidentally because learners will encounter them less frequently in their reading and fewer opportunities will arise to learn them. Also, the gaps in these learners’ present vocabularies might not be causing them too much trouble 38 in their reading as most of them have a substantial number of words in their disposal to scaffold them in their reading. An unknown word now and then might not seem too serious, where they can be compensated by guessing in ‘pregnant contexts’ (Mondria & Wit-de Boer, 1991), looking up in a dictionary, or otherwise be simply ignored. Therefore, learners may not feel the need to make an effort to learn new words. Second, the kind of vocabulary these learners were exposed to in their reading is likely to be different from the kind of vocabulary measured by a size test based on general written English. The learners were exposed to a specific type of English in their disciplines which may not be lexically as diverse as general English or even general academic English. There is research evidence that indicates use of smaller vocabularies in specific disciplines. Sutarsyah et.al. (1994) compared the vocabulary of an economics textbook with a general corpus of academic English. The former contained less than half the number of different words found in the general corpus (5438 and 12744 respectively). Ward (1999) examined the vocabulary of engineering and concluded that 2000 words will be sufficient for reading engineering texts. This suggests that, when reading in their subject area, learners will encounter only a subset of the 8-10,000 vocabulary of general written academic English and perhaps even a smaller subset of general written English, and there will be fewer opportunities for lexical learning from reading these texts. The possibility of smaller size vocabularies in specific disciplines might also explain the negative correlation between study time and vocabulary gain in 39 Milton & Meara (1995), whereby learners who spent more time on academic study gained fewer words on a general vocabulary size test. Third, in academic reading ‘technical vocabulary’, i.e. vocabulary that relate to learners’ area of study, might seem more important to acquire, causing learners to pay less attention to the learning of general vocabulary reflected as insignificant gains on the size tests. Lessard-Clouston (2006) reports significant gains in the technical vocabularies of native and non-native theology students in Canada in both size and depth over one term providing evidence for enhanced attention to technical vocabulary. In future research, the two aspects of learners’ vocabulary development needs to be investigated together to identify the interrelationships between general and technical vocabulary development. There might be differences across disciplines in this respect as specific disciplines are likely to make different demands on the learner with respect to technical vocabulary. Chung & Nation (2003) have compared the technical vocabulary of an applied linguistic text and an anatomy text, and found that one in every five words in the applied linguistics text is technical while a technical word occurred once in every three words in the anatomy text, suggesting that applied linguistics might have a smaller technical vocabulary than anatomy. Studies on vocabulary development in a variety of disciplines are, therefore, needed. 40 Productive Growth The growth in written productive vocabulary (i.e. writing vocabulary) over three years was statistically significant in the longitudinal data, but non-significant in the crossectional data involving a greater number of subjects. The increase might be characteristic of better learners as learners in the longitudinal study had larger initial vocabularies than the larger group in the crossectional study. Although the increase is only 10%, it is likely to have dropped to this figure from a higher percentage as the attrition observed in the receptive scores in the final year is likely to have occured in the productive scores as well. In the crossectional data, on the other hand, the increase in the middle years might have been masked by this attrition. On the whole, the learning environment can be argued to be conducive to productive growth. In the programme learners are given opportunities for written production in the form of term papers and sit-in exams. However, greater increase would have been expected given the length of study and the starting receptive vocabularies of the learners. Considering that many of the words in the productive test were already known to these learners receptively given their performance on the receptive test, the transformation from receptive to productive seemed rather slow. For faster development, more frequent and regular production is to be recommended. 41 The unsatisfactory growth in productive vocabulary can also be related to the idea of “comprehensible output” (Swain, 1985), which involves stretching of one’s linguistic resources in production. Certain learner behaviour in academic writing will reduce the pressure to stretch the resources. One of these is the avoidance behaviour. Learners are known to avoid difficult vocabulary in production (Blum & Levenston, 1978), and since words known only receptively are difficult to produce they might have been avoided in writing with the result that opportunities for learning them productively were lost. Another student behaviour that might have a negative effect on comprehensible output involves paraphrasing others’ work with minimal modification in written assignments, which hardly stretches one’s linguistics resources. This kind of writing is not likely to contribute much to learners’ productive vocabularies. Growth in Academic Vocabulary The high scores of the first-years on receptive academic vocabulary in study 1 and 2 suggest that these learners already had receptive knowledge of a great proportion of academic vocabulary probably prior to their studies (85%-95%). The improvement in receptive academic vocabulary in three years was therefore quite modest. The greater starting academic vocabularies of the learners stand in contrast to Cobb and Horst’s subjects (2000) in Hong Kong who knew around 70% of the academic vocabulary at the start of their studies and to Read’s learners (1988) who knew 64%. This advanced receptive knowledge of academic vocabulary of the learners in the present study is likely 42 to have been developed in the course of the long and painstaking preparation these learners undertake to pass the English Test of the University Entrance Exam, which includes non-fiction text where academic vocabulary is likely to come up frequently. It is also interesting to note that learners’ scores on the academic section were remarkably higher than those on the 5K section although the words in the two sections were similar in frequency (Laufer, 1998). From this data, academic vocabulary emerges as a psychologically distinct category although concerns have been recently raised as to the validity of the AWL (Martinez et. al., 2009; Wang et.al., 2008) or even the existence of a so-called academic vocabulary (Hyland & Tse, 2007). While the development of receptive academic vocabulary was held back by a ceiling effect there was plenty of room for productive development as productive knowledge of these learners was about half the size of their receptive vocabulary in study 1 and 2. The expansion in the productive knowledge of academic vocabulary size was relatively large in the longitidunal data (13%). However, greater improvement would have been expected given the kind of academic work these learners had to undertake. The same explanations as those for the general productive vocabulary are likely to hold for the somewhat unsatisfactory development in the academic productive vocabulary as well. 43 Frequency The effect of frequency was invariably significant in all five tests used in the three studies. Learners’ vocabulary scores tended to decrease as the frequency of words decreased. This study provides further emprical support for the frequency model of lexical learning in that learners’ knowledge of higher frequency words tend to develop faster than those of lower frequency. In lower frequency levels, the effect of frequency becomes apparant over wider bands, e.g. two-thousand-word bands of frequency. However, the presence of an implicational scale could only be established for the receptive Vocabulary Levels Test. The two studies in the literature (Read, 1988; Schmitt et.al., 2001) that found an implicational scale also used the Vocabulary Levels Test. When other tests are used no implicational scale seems to be present. This might have to do with the degree of knowledge required in different tests. Of the three tests used in the present study, the VLT was easier than the Productive VLT, but it was also easier than the VST. Nation and Beglar (2007, p.11) claim that the two receptive tests are not of equal difficulty and that the Vocabulary Size Test is slightly more demanding than the VLT. Learners’ performance in the present study supported this claim. They answered correctly a greater proportion of items in the higher frequency levels in the Levels Test and a greater number of learners displayed mastery. The two tests are likely to be tapping different degrees of receptive knowledge. These results suggest that the frequency effect is relatively strong, but when the measurement requires a lower degree of knowledge it is stronger. Frequency seems to have a predictive power on the initial learning of words which requires limited knowledge. For deeper learning, frequency may 44 not be the sole factor that determines L2 vocabulary development, and other factors might be at play. Learners' performance in the 2K level in the VST was lower than expected. While learners in study 1 and 2 scored at least 95% or more in the 2K level in the VLT, the learners in study 3 scored around 75% at this level. It is interesting that learners with relatively large vocabularies of 5-6000 words should still have gaps in the 2K level when the test requires more precision. Few learners reached the 100% ceiling, but the plateau seems to change with the test. Conclusion This study has investigated the vocabulary growth of advanced EFL learners in an academic mainly incidental learning context. The conclusion that is suggested by the three studies in this paper about the nature of L2 vocabulary development is that expansion of vocabulary size in advanced levels through academic study seems to be rather slow even though there is some significant progress. Learners do not seem to add many new words to their vocabularies nor do they seem to transfer many words from receptive to productive. Knowledge of academic vocabulary also followed a similar pattern. The data further suggest the possibility of regression in receptive size when level of involvement with the target language decreases. Thus, the data provided emprical 45 support for the present learners’ sense of deterioration in their lexical knowledge in the final year. For the middle years, it looks more like the misperception of slow growth as deterioration. Frequency seems to have a stable overall effect in vocabulary development whereby learners’ knowledge changes linearly with frequency. However, for two of the three tests used an implicational scale between the levels could not be established. The present research attempted to provide a bigger and more accurate picture of vocabulary growth at the advanced level by using both cross-sectional and longitudinal data covering a longer time span than in previous studies as well as by using multiple test instruments. It should be noted, however, that the tests measure vocabulary knowledge, and not vocabulary use. Receptive and productive use of vocabulary involve other knowledge and skills than vocabulary knowledge alone such as the guessing of unknown words or finding out the referent of known words in receptive use, or using the word in a grammatically correct and pragmatically appropriate way in productive use. It is possible for a learner to have a greater /fewer number of words that she/he can use receptively or productively than suggested here. Also, the results are valid as far as written vocabulary is concerned, and do not generalize to spoken vocabulary. The test instruments employ a rather restricted definition of vocabulary knowledge, which is limited to knowledge of the form and basic conceptual meaning of a word. Therefore, correctly answered items on the tests cannot be assumed to be known to the learners in further depth. 46 The plateau in vocabulary expansion suggested in this paper might be characteristic of advanced vocabulary learning in academic learning situations, which is largely incidental. More research into vocabulary development of L2 learners in different learning contexts (e.g. incidental learning contexts vs language courses; ESL vs EFL) and different proficiency levels are needed. Growth in the productive vocabulary also requires further research. However, we do not yet have a productive vocabulary test instrument comprable to the Vocabulary Size Test or the EVST which samples evenly from frequency levels in a modern frequency list. Vocabulary size targets also need to be identified for receptive and productive vocabulary through further research. The receptive target recommended for academic study in English is based on a study on the Dutch language (Hazenberg and Hulstijn, 1996), and there are no studies on productive targets. Reliable targets need to be established and anounced to language learners, teachers and materials writers if vocabulary growth is to be maintainedz to the required levels. The slow growth of the learners in the present study suggests that advanced learners in English-medium degree programs do not learn a great number of words 47 through academic study, and this does not seem to be confined to the contex of the present study. To ensure greater growth in such contexts, larger amounts of reading and writing requirements as well as stronger enforcement of the requirements are advised. Also, extra support and guidance may be given to the learners in the form of an advanced vocabulary course. It will be wrong to expect vocabulary to take care of itself. Notes 1” Word” refers to “word family” throughtout the text. Acknowledgements I am grateful to the students in the ELT Department of the Faculty of Education in Uludag University, Turkey for sparing their time to provide the data for this study. References Beglar, D. (2009). A Rasch-based validation of the Vocabulary Size Test. Language Testing, 26(4), 1-22. Blum, S. & Levenston, E. A. (1978). Universals of lexical simplification. Language Learning, 28, 399-415. 48 Cameron, L. (2002). Measuring vocabulary size in English as an additional language. Language Teaching Research, 6(2), 145-173. Chung, T.M., & Nation, P. (2003). Technical vocabulary in specialized texts. Reading in a Foreign Language, 15(2), 103-116. Cobb, T. http://www.lextutor.ca/tests/levels Cobb, T. (1995). Imported tests: Analysing the task. Paper presented at TESOL (Arabia), Al-Ain, United Arab Emirates. Cobb, T., & Horst, M. (2000). Vocabulary sizes of some City University students. City University (HK) Journal of Language Studies, 1, 59-68. Also retrievable at: http://www.lextutor.ca/cv/index.html#publications Coxhead, A. (2000). A new academic word list. TESOL Quartely, 34(2), 213-238. Alderson, J.C. (2005). Diagnosing Foreign Language Proficiency: The Interface Between Learning and Assessment. London: Continuum. Hazenberg, S., & Hulstijn, J.H. (1996). Defining a minimal receptive second-language vocabulary for non-native university students: An empirical investigation. Applied Linguistics, 17(2), 145-163. Horst, M. (2005). Learning vocabulary through extensive reading: A measurement study. The Canadian Modern Language Review, 61(3), 355-382. Horst, M., Cobb, T., & Meara, P. (1998). Beyond a clockwork orange: Acquiring second language vocabulary through reading. Reading in a Foreign Language, 11, 207-223. 49 Hyland, K. & Tse, P. (2007). Is there an “Academic Vocabulary”? TESOL Quarterly, 41(2), 235-253. Laufer, B. (1989). What percentage of lexis is essential for comprehension? In C. Lauren, & M. Nordman (Eds.), From Humans Thinking to Thinking Machines (pp. 316-323). Clevedon, UK: Multilingual Matters. Laufer, B. (1991). How much lexis is necessary for reading comprehension? In P.J.L. Arnaud, & H. Bejoint (Eds.), Vocabulary in Applied Linguistics (pp. 126-132). Basingstoke: Macmillan. Laufer, B. (1992). Reading in a foreign language: How does L2 lexical knowledge interact with the reader’s general academic ability? Journal of Research in Reading, 15, 95103. Laufer, B. (1998). The development of passive and active vocabulary in a second language: Same or different?. Applied Linguistics, 19(2), 255-271. Laufer, B., & Paribakht, T.S. (1998). The relationship between passive and active vocabularies: Effects of language learning context. Language Learning, 48(3), 365391. Laufer, B., & Nation, P. (1999). A vocabulary size test of controlled productive ability. Language Testing, 16(1), 33-51. Laufer, B., Elder, C., Hill, K., and Congdon, P. (2004). Size and strength: Do we need both to measure vocabulary knowledge? Language Testing, 21(2), 202-226. 50 Lessard-Clouston, M. (2006). Breadth and depth: Specialized vocabulary learning in theology among native and non-native English speakers. The Canadian Modern Language Review, 63(2), 175-198. Martinez, I.A., Beck, S.C., & Panza, C.B. (2009). Academic vocabulary in agriculture research articles: A corpus-based study. English for Specific Purposes, 28, 183-198. Meara, P.(1992). EFL Vocabulary Tests. Swansea: _lognostics. Meara, P.(2010). (Second Edition). EFL Vocabulary Tests. Swansea: _lognostics. Meijer, A. (2006). Second International Conference on Integrating Content and Language in Higher Education. Journal of English for Academic Purposes, 5, 333-334. Milton, J. (2009). Measuring Second Language Vocabulary Acquisition. Bristol: Multilingual Matters. Milton, J. (2007). Lexical profiles, learning styles and the construct validity of lexical size tests. In H.Daller, J.Milton & J.Treffers-Daller (Eds.). Modelling and Assessing Vocabulary Knowledge. Cambridge: CUP, 47-58. Milton, J., & Meara, P. (1995). How periods abroad affect vocabulary growth in a foreign language? ITL Review of Applied Linguistics, 107-108, 17-34. 51 Mondria, J. A., & Wit-de Boer, M. (1991). The effects of contextual richness on the guessability and retention of words in a foreign language. Applied Linguistics, 12 (3), 249-267. Nation, I.S.P. (1990). Teaching and Learning Vocabulary. Boston: Heinle & Heinle. Nation, I.S.P. (2001). Learning Vocabulary in Another Language. Cambridge: Cambridge University Press. Nation, I.S.P. (2006). How large a vocabulary is needed for reading and listening? The Canadian Modern Language Review, 63(1), 59-82. Nation, I.S.P. http://www.victoria.ac.nz/lals/staff/paul-nation/nation.aspx Nation, I.S.P. and Beglar, D. (2007). A vocabulary size test. The Language Teacher, 31(7), 9-12. Nurweni, A., & Read, J. (1999). The English vocabulary knowledge of Indonesian university students. English for Specific Purposes, 18(2), 161-175. Read, J. (2000). Assessing Vocabulary. Cambridge, UK: Cambridge University Press. Read, J. (1988). Measuring the vocabulary knowledge of second language learners. RELC Journal, 19(2), 12-25. Schmitt, N. (2010). Researching Vocabulary: A Vocabulary Research Manual. Basingstoke, UK: Palgrave Macmillan. 52 Schmitt, N. and Meara, P. (1997). Researching vocabulary through a word knowledge framework: Word associations and verbal suffixes. Studies in Second Language Acquisition, 20, 17-36. Schmitt, N., Schmitt, D. & Clapham, C. (2001). Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test. Language Testing, 18 (1), 55-88. Sutarsyah, C., Nation, P., & Kennedy, G. (1994). How useful is EAP vocabulary for ESP? A corpus based case study. RELC Journal, 25(2), 34-50. Swain, M. (1985). Communicative competence: Some roles of comprehensible input and comprehensible output in its development. In S. Gass, & C. Madden (Eds.), Input in Second Language Acquisition (pp. 235-256). New York: Newbury House. Wang, J., Liang, S.I., & Ge, G.C. (2008). Establishment of a medical academic word list. English for Specific Purposes, 27, 442-458. Ward, J. (1999). How large a vocabulary do EAP engineering students need? Reading in a Foreign Language, 12(2), 309-323. Webb, S. & Rodgers, M.P.H. (2009a). Vocabulary demands of television programs. Language Learning. 59(2), pp.335-366. Webb, S. & Rodgers, M.P.H. (2009b). The lexical coverage of movies. Applied Linguistics. 30(3), pp.407-427. 53 Xue, G. and Nation, I.S.P. (1984). A university word list. Language Learning and Communication, 3, 215-229.