Vocabulary Knowledge: Size and Strength Test The 2008 Asia TEFL International Conference "Globalizing Asia: The Role of ELT". Sanur Paradise Plaza Hotel, Bali, Indonesia August 1-3, 2008. Hananto hananto@uph.edu English Department Universitas Pelita Harapan Abstact This paper describes the trial of a bilingual paper-based vocabulary test of size and strength. The test was generated by a computer-based vocabularysize item bank/test named Vocabulary Item Bank of English (VIBE) based on Coxhead’s (2000) Academic Word List. The VIBE test has a double task measuring two different strengths of word knowledge: active recall and passive recognition. The active recall task requires the test takers to supply one missing letter of the English target word and the passive recall task asks them to select the meaning of the target word in Indonesian. Because of the double tasks, three different scoring systems are possible based on: (1) the active recall task, (2) the passive recognition task, and (3) both of the two tasks. This paper focuses on the difference between (1) and (2) by comparing their central tendency (i.e. the means). (1) and (2) have a moderate correlation (0.77) and t-test results show that there was a significant difference between the two scoring systems. The active recall task score was significantly lower than the passive recognition score, meaning that the active recall task was easier than the passive recognition. It suggests that there may be a different strength hierarchy of word knowledge than previously proposed by Laufer and Goldstein (2004). Word knowledge and its measurement Word knowledge has been defined in several different ways. Some suggest that it should be seen as a taxonomy of components. An influential statement along these lines was produced by Richards (1976) and elaborated by Nation (1990). Nation proposed a list of multi-component word knowledge covering spelling, pronunciation, grammatical form, relative frequency, collocation and restrictions on the use of the word, as well as the distinction between receptive and productive knowledge. Nation’s componential framework above has been used in many vocabulary assessments. From an assessment perspective, it appears impossible to cover all components in one test because of time constraint and the unavailability of adequate measures (Schmitt 2001). Most vocabulary tests based on Nation’s components of vocabulary knowledge usually measure just one of the subknowledges. When just one (or two) sub-knowledge is tested, it is possible to test a large number of lexical items. Therefore, the test can claim to represent the learner’s total vocabulary. Such tests are called vocabulary “breadth” or “size” tests. This type of vocabulary test usually focuses only on word-form and word-meaning relationship. Other tests attempt to measure several subknowledges (Read 1998). Vocabulary tests that measure each lexical item for several areas of knowledge are known as vocabulary “depth” tests. The limitation of these tests is that it is not feasible to cover a large number of items; consequently the items do not constitute a representative range of the target items. Both vocabulary-size and vocabulary-depth tests deal with vocabulary knowledge as the knowledge of discrete word items independent of contexts which they appear. Item types such as multiple-choice, word-definition matching (Beglar and Hunt 1999, Sutaryah 2003, Nation 1993), word completion (Laufer and Nation 1999) and the checklist (Meara 1992) fit in nicely within what Chapelle (1998) refers to as the trait view. Degrees of word knowledge Because knowing a word involves degrees of knowledge, there are different degrees of strength of word knowledge. The most widely recognized division on a scale of degrees of vocabulary knowledge is the receptive (passive) and productive (active) distinction. Melka (1997) points out that there has been no consistency in the way that the two types of vocabulary knowledge have been measured. Another common distinction of degrees of vocabulary knowledge is recognition and recall. Recognition here means that test-takers are presented with some choices and are asked to select the target word, whereas in the case of recall they are provided with some stimulus and are asked to supply the target word from their memory. More recently, Laufer and Goldstein (2004) and Laufer, et al. (2004) try to overcome the confusion between receptive (passive) and productive (active) vocabulary knowledge by distinguishing four degrees of word knowledge (Table 1). Table 1: Degrees of Word Knowledge Active (retrieval of form) Passive (retrieval of meaning) Recall Supply the L2 word Recognition Select the L2 word Supply the L1 word Select the L1 Laufer and Goldstein (2004) and Laufer, et al. (2004) hypothesize that four degrees of strength constitute a hierarchy of difficulty as follows (from easiest to hardest): Recognizing a word meaning (passive recognition) Recognizing a word form (active recognition) Recalling a word meaning (passive recall) Recalling a word form (active recall) They believe that the ability to recognize words, whether passively or actively, will generally precede the ability to recall them, and that recall of meaning will precede the recall of form. The study The purpose of this study was twofold: (1) to investigate the hierarchy of word knowledge strength above and (2) to examine the connection between them. This study, however, investigated only the passive recognition and the active recall. The research questions were as follows: 1. Is the knowledge of recalling a word form stronger than that of recognizing a word meaning? 2. To what extent is the relationship between the two of them? The answers to these questions will help teachers and learners decide to focus their effort on knowledge of word-form or word-meaning. Methodology The instrument used was Vocabulary Item-Bank of English (VIBE) (Hananto 2007). VIBE is a computer-based lexical tool designed both for vocabulary-size test and word-list learning based on West’s (1954) General Service List and Coxhead’s (2000) Academic Word List (downloadable at http\www\l-pis.com). In this study, the VIBE was used to generate 40 items randomly from the Academic Word List (see appendix). The test was administered in pencil-and-paper matching format in which the test takers had to do two tasks: (1) filling-in the missing letter (a very sensitive active recall) and (2) writing a number that indicated the meaning of the target word (passive recognition). The test was administered to 156 Universitas Pelita Harapan (UPH) students. They ranged in ability from intermediate to advanced and were enrolled in Academic Reading course (short semester, 2007-2008 academic year). They were divided into five classes. Based on the two test tasks, the test-takers were given three different scores as follows: 1. Word-Form score (WF Score), which was based on the active recall task; 2. Word-Meaning score (WM Score), which was based on the passive recognition task; 3. Word-Form and Word-Meaning score (WF&WM Score), based on both tasks. The WF score and the WM score were then compared by using t-test to determine whether differences in the two scores were statistically significant. Additionally, the two scores were correlated by applying the Pearson Product-Moment correlation to estimate their degree of relationship. Results The descriptive statistics and the internal consistency (Cronbach Alpha) of the test are shown in Table 2. It should be noted that the internal consistency of the VIBE depended on the sample drawn from the item-bank. Table 2: Descriptive statistics and internal-consistency Min Max Mean SD r WF Score (%) (Active Recall) 48 100 85.5 11.3 0.83 WM Score (%) (Passive Recognition) 30 100 74.6 16.3 0.88 WF&WM Score (%) 25 100 69.5 17.0 0.88 The first research question asked whether the knowledge of recalling a word form was stronger than that of recognizing a word meaning as proposed by Laufer and Goldstein (2004). The results in Table 2 show that the answer was negative. The central tendencies (i.e. the mean) of the WF score (85.5) and WM score (74.6) were clearly different. The t-test result shows that the means of the three scoring systems were significantly different (t = 13.142 df = 155 p = .000). The second research question asked the relationship between the WF score and WM score. Although the two scores were significantly and positively correlated, the correlation was only moderate (0.77). This correlation coefficient was lower than what Laufer and Goldsten (2004) found in their study. Discussion During various tryouts of the VIBE test, both computer-based and paper-based, similar results to this present study were found. The WF scores were always higher than the WM Scores. The differences between the WF Scores and WM Scores reveal an interesting phenomenon. It can have a profound implication for the strength hierarchy of word knowledge hypothesis. The result did not support the strength hierarchy of word knowledge proposed by Laufer and Goldstein (2004). According to their strength hierarchy hypothesis, the word meaning scores should have been higher than the word form scores. In the present study, however, the opposite was true: the WF score was significantly higher than the WM scores, in other words, the active recall was easier than the passive recognition. The conflicting findings of this study and Laufer and Goldstein’s study might be attributed to two variables: (1) the sensitivity of the tests used and (2) the proficiency levels of the participants. First, the test used in Laufer & Goldstein’s study was less sensitive (i.e. more difficult) than the VIBE test used in this study. Second, the participants in Laufer and Goldstein’s study were ESL learners residing in English medium countries with sufficient opportunities to use English actively in both speech and writing while in this study the subjects were mostly intermediate level EFL students who had very limited opportunities to use English actively outside the classrooms. The lower-proficiency level of the participants in this study was evidenced by their relatively low mean scores. Lower-proficiency students and more advanced students might have different strength hierarchies of word knowledge. In advanced students, knowledge of wordmeanings may be stronger than knowledge of word-forms, as Laufer and Goldstein found in their study. For lower-proficiency students, however, knowledge of word-forms seemed to be stronger than knowledge of word-meaning. They may know the English word-forms without necessarily knowing their meanings. The higher word-form scores than the word-meaning scores in the present study have led the author to propose a different strength hierarchy of word knowledge for lower-proficiency learners as follows (from the strongest to the weakest): 1. 2. 3. 4. Selecting the L2 target word Selecting the equivalent L1 word Supplying the L2 target word Supplying the meaning in L1 equivalent The alternative strength hierarchy of word knowledge hypothesis, however, is only a speculation, especially the second and the third order (i.e. passive recognition and active recall respectively). The findings in this study did not shed light on the second and third order which might be reversed or reordered. There is clearly a need for further research in this area since this study was not intentionally designed to investigate all of the four strengths hierarchy of word knowledge. EFL teachers and learners need to pay more attention not only to word forms but also to word meanings. EFL learners tend to get a lot of exposures to word forms without necessarily exposures to their meanings, for example, through guessing the meaning from the context. Some word-games (such as scrabble and hangman) also focus on word-forms without any reference to word meanings. Additionally, some language competitions (e.g. Spelling Bee) motivate the study of word-forms and ignoring the word-meanings. Works Cited Beglar, D., & Hunt, A. (1999). Revising and Validating the 2000 Level and University Word Level Vocabulary Test. Language Testing, 16(2), 131-162. Chapelle, C. A. (1998). Construct Definition and Validity Inquiry in SLA Research. In L. F. Bachman & A. D. Cohen (Eds.), Interfaces Between Second Language Acquisition and Language Testing Research. (pp. 32-70). Cambridge: Cambridge University Press. Coxhead, A. (2000). A New Academic Word List. TESOL Quarterly, 34, 213-238. Hananto. (2007). Developing and Validating a Computer-Based Vocabulary-Size Test., Unpublished dissertation. Unika Atma Jaya, Jakarta. Laufer, B., & Goldstein, Z. (2004). Testing Vocabulary Knowledge: Size, Strength, and Computer Adaptiveness. Language Learning, Vol. 54, No. 3, 399-436. Laufer, B., Elder, C., Hill, K., & Congdon, P. (2004). Size and Strength: Do We Need both to Measure Vocabulary Knowledge? Language Testing, 21(2), 202 - 226. Meara, P. (1992). EFL Vocabulary Tests. Swansea: Centre for Applied Language Studies, University of Wales. Melka, F. (1997). Receptive vs. Productive Aspects of Vocabulary. In Schmitt & McCarthy (Eds.), Vocabulary: Description, Acquisition and Pedagogy (pp. 84102). Cambridge: Cambridge University Press. Nation, I. S. P. (1983). Testing and Teaching Vocabulary. Guidelines (RELC supplement), 5, 12-25. Nation, I. S. P. (1990). Teaching and Learning Vocabulary. New York: Heinle and Heinle. Read, J. (1998). Validating a Test to Measure Depth of Vocabulary Knowledge. In A. J. Kunnan (Ed.), Validation in language assessment (pp. 41-59). Mahwan, New York: Lawrence Edbaum. Richards, J. (1976). The Role of Vocabulary Teaching. TESOL Quarterly, 10 (1), 77-89. Schmitt, N., Schmitt, D., & Clapham, C. (2001). Developing and Exploring the Behaviour of Two New Versions of the Vocabulary Levels Test. Language Testing, 18(1), 55-88. Sutarsyah, C. (2003). Word-Definition Matching Format (A Vocabulary Level Test for EFL learners). Paper presented at the The 51st TEFLIN International Conference 2003, Bandung. West, M. (1953). A General Service List of English Words. London: Longman. Appendix: The paper-based VIBE test used. NAMA: ……………………………….. VIBE_PBT_Matching_1 Isi SATU HURUF yang hilang dan ANGKA yang menunjukkan artinya. Contoh Soal: Contoh Jawaban: 1 buku = ho _ se (…) = ho u se (3) 2 kucing = b _ ok (…) = b o ok (1) 3 rumah =c_t (…) =cat (2) 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Academic-Word Level sebelumnya cocok, sesuai mencari prinsip-prinsip moral melukai, merugikan menetapkan, menentukan merupakan, menunjukkan menyeluruh, keseluruhan filsafat jaminan, menjamin tahunan, setiap tahun ciri-ciri, keistimewaan memudahkan menurut hukum, sah berhubungan dgn. pengobatan mengeluarkan, meniadakan bagian, barang pelengkap membedakan bermacam-macam berubah, berkembang cari-ciri khusus tubrukan; pengaruh yang kuat bidang, kawasan alasan, sebab pengganti kerugian utama hubungan; berhubungan dengan persamaan, perbandingan merasa runtuh, hancur, gagal, jatuh disebut jarak; bergerak, bergeser menirukan meneruskan melarang, menghalangi sistem tingkatan status rancangan, rencana jenis kelamin, perkelaminan melukai, merugikan pelaksana, pelaku COMP _ TIBLE DE _ OTE ET _ IC GUAR _ NTEE IN _ URE OVE _ ALL PHIL _ SOPHY PR _ OR S _ EK SPE _ IFY AN _ UAL COMP _ NENT DIFFER _ NTIATE DIV _ RSE EV _ LVE EXC _ UDE FACI _ ITATE FEA _ URE LE _ AL MED _ CAL ANA _ OGY COL _ APSE CON _ ACT IM _ ACT MO _ IVE OF _ SET PARA _ ETER PER _ EIVE PRI _ ARY SE _ TOR HIER _ RCHY IN _ URE PRACT _ TIONER PRO _ ECT PRO _ EED PRO _ IBIT RA _ GE S_X SIM _ LATE SO-C _ LLED (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…) (…)