This paper describes the development and trial of a bilingual paper

advertisement
Vocabulary Knowledge: Size and Strength Test
The 2008 Asia TEFL International Conference
"Globalizing Asia: The Role of ELT".
Sanur Paradise Plaza Hotel, Bali, Indonesia
August 1-3, 2008.
Hananto
hananto@uph.edu
English Department
Universitas Pelita Harapan
Abstact
This paper describes the trial of a bilingual paper-based vocabulary test of
size and strength. The test was generated by a computer-based vocabularysize item bank/test named Vocabulary Item Bank of English (VIBE) based
on Coxhead’s (2000) Academic Word List. The VIBE test has a double
task measuring two different strengths of word knowledge: active recall
and passive recognition. The active recall task requires the test takers to
supply one missing letter of the English target word and the passive recall
task asks them to select the meaning of the target word in Indonesian.
Because of the double tasks, three different scoring systems are possible
based on: (1) the active recall task, (2) the passive recognition task, and
(3) both of the two tasks. This paper focuses on the difference between (1)
and (2) by comparing their central tendency (i.e. the means). (1) and (2)
have a moderate correlation (0.77) and t-test results show that there was a
significant difference between the two scoring systems. The active recall
task score was significantly lower than the passive recognition score,
meaning that the active recall task was easier than the passive recognition.
It suggests that there may be a different strength hierarchy of word
knowledge than previously proposed by Laufer and Goldstein (2004).
Word knowledge and its measurement
Word knowledge has been defined in several different ways. Some suggest that it
should be seen as a taxonomy of components. An influential statement along these lines
was produced by Richards (1976) and elaborated by Nation (1990). Nation proposed a
list of multi-component word knowledge covering spelling, pronunciation, grammatical
form, relative frequency, collocation and restrictions on the use of the word, as well as
the distinction between receptive and productive knowledge.
Nation’s componential framework above has been used in many vocabulary
assessments. From an assessment perspective, it appears impossible to cover all
components in one test because of time constraint and the unavailability of adequate
measures (Schmitt 2001). Most vocabulary tests based on Nation’s components of
vocabulary knowledge usually measure just one of the subknowledges. When just one (or
two) sub-knowledge is tested, it is possible to test a large number of lexical items.
Therefore, the test can claim to represent the learner’s total vocabulary. Such tests are
called vocabulary “breadth” or “size” tests. This type of vocabulary test usually focuses
only on word-form and word-meaning relationship.
Other tests attempt to measure several subknowledges (Read 1998). Vocabulary
tests that measure each lexical item for several areas of knowledge are known as
vocabulary “depth” tests. The limitation of these tests is that it is not feasible to cover a
large number of items; consequently the items do not constitute a representative range of
the target items.
Both vocabulary-size and vocabulary-depth tests deal with vocabulary knowledge
as the knowledge of discrete word items independent of contexts which they appear. Item
types such as multiple-choice, word-definition matching (Beglar and Hunt 1999,
Sutaryah 2003, Nation 1993), word completion (Laufer and Nation 1999) and the
checklist (Meara 1992) fit in nicely within what Chapelle (1998) refers to as the trait
view.
Degrees of word knowledge
Because knowing a word involves degrees of knowledge, there are different
degrees of strength of word knowledge. The most widely recognized division on a scale
of degrees of vocabulary knowledge is the receptive (passive) and productive (active)
distinction. Melka (1997) points out that there has been no consistency in the way that the
two types of vocabulary knowledge have been measured.
Another common distinction of degrees of vocabulary knowledge is recognition
and recall. Recognition here means that test-takers are presented with some choices and
are asked to select the target word, whereas in the case of recall they are provided with
some stimulus and are asked to supply the target word from their memory.
More recently, Laufer and Goldstein (2004) and Laufer, et al. (2004) try to
overcome the confusion between receptive (passive) and productive (active) vocabulary
knowledge by distinguishing four degrees of word knowledge (Table 1).
Table 1: Degrees of Word Knowledge
Active
(retrieval of form)
Passive
(retrieval of meaning)
Recall
Supply the L2 word
Recognition
Select the L2 word
Supply the L1 word
Select the L1
Laufer and Goldstein (2004) and Laufer, et al. (2004) hypothesize that four degrees of
strength constitute a hierarchy of difficulty as follows (from easiest to hardest):



Recognizing a word meaning (passive recognition)
Recognizing a word form (active recognition)
Recalling a word meaning (passive recall)

Recalling a word form (active recall)
They believe that the ability to recognize words, whether passively or actively,
will generally precede the ability to recall them, and that recall of meaning will precede
the recall of form.
The study
The purpose of this study was twofold: (1) to investigate the hierarchy of word
knowledge strength above and (2) to examine the connection between them. This study,
however, investigated only the passive recognition and the active recall.
The research questions were as follows:
1. Is the knowledge of recalling a word form stronger than that of recognizing a
word meaning?
2. To what extent is the relationship between the two of them?
The answers to these questions will help teachers and learners decide to focus
their effort on knowledge of word-form or word-meaning.
Methodology
The instrument used was Vocabulary Item-Bank of English (VIBE) (Hananto
2007). VIBE is a computer-based lexical tool designed both for vocabulary-size test and
word-list learning based on West’s (1954) General Service List and Coxhead’s (2000)
Academic Word List (downloadable at http\www\l-pis.com). In this study, the VIBE was
used to generate 40 items randomly from the Academic Word List (see appendix). The
test was administered in pencil-and-paper matching format in which the test takers had to
do two tasks: (1) filling-in the missing letter (a very sensitive active recall) and (2)
writing a number that indicated the meaning of the target word (passive recognition).
The test was administered to 156 Universitas Pelita Harapan (UPH) students.
They ranged in ability from intermediate to advanced and were enrolled in Academic
Reading course (short semester, 2007-2008 academic year). They were divided into five
classes.
Based on the two test tasks, the test-takers were given three different scores as follows:
1. Word-Form score (WF Score), which was based on the active recall task;
2. Word-Meaning score (WM Score), which was based on the passive recognition
task;
3. Word-Form and Word-Meaning score (WF&WM Score), based on both tasks.
The WF score and the WM score were then compared by using t-test to determine
whether differences in the two scores were statistically significant. Additionally, the two
scores were correlated by applying the Pearson Product-Moment correlation to estimate
their degree of relationship.
Results
The descriptive statistics and the internal consistency (Cronbach Alpha) of the test
are shown in Table 2. It should be noted that the internal consistency of the VIBE
depended on the sample drawn from the item-bank.
Table 2: Descriptive statistics and internal-consistency
Min
Max
Mean
SD
r
WF Score (%)
(Active Recall)
48
100
85.5
11.3
0.83
WM Score (%)
(Passive Recognition)
30
100
74.6
16.3
0.88
WF&WM Score (%)
25
100
69.5
17.0
0.88
The first research question asked whether the knowledge of recalling a word form
was stronger than that of recognizing a word meaning as proposed by Laufer and
Goldstein (2004). The results in Table 2 show that the answer was negative. The central
tendencies (i.e. the mean) of the WF score (85.5) and WM score (74.6) were clearly
different. The t-test result shows that the means of the three scoring systems were
significantly different (t = 13.142 df = 155 p = .000).
The second research question asked the relationship between the WF score and
WM score. Although the two scores were significantly and positively correlated, the
correlation was only moderate (0.77). This correlation coefficient was lower than what
Laufer and Goldsten (2004) found in their study.
Discussion
During various tryouts of the VIBE test, both computer-based and paper-based,
similar results to this present study were found. The WF scores were always higher than
the WM Scores. The differences between the WF Scores and WM Scores reveal an
interesting phenomenon. It can have a profound implication for the strength hierarchy of
word knowledge hypothesis.
The result did not support the strength hierarchy of word knowledge proposed by
Laufer and Goldstein (2004). According to their strength hierarchy hypothesis, the word
meaning scores should have been higher than the word form scores. In the present study,
however, the opposite was true: the WF score was significantly higher than the WM
scores, in other words, the active recall was easier than the passive recognition.
The conflicting findings of this study and Laufer and Goldstein’s study might be
attributed to two variables: (1) the sensitivity of the tests used and (2) the proficiency
levels of the participants. First, the test used in Laufer & Goldstein’s study was less
sensitive (i.e. more difficult) than the VIBE test used in this study.
Second, the participants in Laufer and Goldstein’s study were ESL learners
residing in English medium countries with sufficient opportunities to use English actively
in both speech and writing while in this study the subjects were mostly intermediate level
EFL students who had very limited opportunities to use English actively outside the
classrooms. The lower-proficiency level of the participants in this study was evidenced
by their relatively low mean scores.
Lower-proficiency students and more advanced students might have different
strength hierarchies of word knowledge. In advanced students, knowledge of wordmeanings may be stronger than knowledge of word-forms, as Laufer and Goldstein found
in their study. For lower-proficiency students, however, knowledge of word-forms
seemed to be stronger than knowledge of word-meaning. They may know the English
word-forms without necessarily knowing their meanings.
The higher word-form scores than the word-meaning scores in the present study
have led the author to propose a different strength hierarchy of word knowledge for
lower-proficiency learners as follows (from the strongest to the weakest):
1.
2.
3.
4.
Selecting the L2 target word
Selecting the equivalent L1 word
Supplying the L2 target word
Supplying the meaning in L1 equivalent
The alternative strength hierarchy of word knowledge hypothesis, however, is only a
speculation, especially the second and the third order (i.e. passive recognition and active
recall respectively). The findings in this study did not shed light on the second and third
order which might be reversed or reordered. There is clearly a need for further research in
this area since this study was not intentionally designed to investigate all of the four
strengths hierarchy of word knowledge.
EFL teachers and learners need to pay more attention not only to word forms but
also to word meanings. EFL learners tend to get a lot of exposures to word forms without
necessarily exposures to their meanings, for example, through guessing the meaning from
the context. Some word-games (such as scrabble and hangman) also focus on word-forms
without any reference to word meanings. Additionally, some language competitions (e.g.
Spelling Bee) motivate the study of word-forms and ignoring the word-meanings.
Works Cited
Beglar, D., & Hunt, A. (1999). Revising and Validating the 2000 Level and University
Word Level Vocabulary Test. Language Testing, 16(2), 131-162.
Chapelle, C. A. (1998). Construct Definition and Validity Inquiry in SLA Research. In L.
F. Bachman & A. D. Cohen (Eds.), Interfaces Between Second Language
Acquisition and Language Testing Research. (pp. 32-70). Cambridge: Cambridge
University Press.
Coxhead, A. (2000). A New Academic Word List. TESOL Quarterly, 34, 213-238.
Hananto. (2007). Developing and Validating a Computer-Based Vocabulary-Size Test.,
Unpublished dissertation. Unika Atma Jaya, Jakarta.
Laufer, B., & Goldstein, Z. (2004). Testing Vocabulary Knowledge: Size, Strength, and
Computer Adaptiveness. Language Learning, Vol. 54, No. 3, 399-436.
Laufer, B., Elder, C., Hill, K., & Congdon, P. (2004). Size and Strength: Do We Need
both to Measure Vocabulary Knowledge? Language Testing, 21(2), 202 - 226.
Meara, P. (1992). EFL Vocabulary Tests. Swansea: Centre for Applied Language Studies,
University of Wales.
Melka, F. (1997). Receptive vs. Productive Aspects of Vocabulary. In Schmitt &
McCarthy (Eds.), Vocabulary: Description, Acquisition and Pedagogy (pp. 84102). Cambridge: Cambridge University Press.
Nation, I. S. P. (1983). Testing and Teaching Vocabulary. Guidelines (RELC
supplement), 5, 12-25.
Nation, I. S. P. (1990). Teaching and Learning Vocabulary. New York: Heinle and
Heinle.
Read, J. (1998). Validating a Test to Measure Depth of Vocabulary Knowledge. In A. J.
Kunnan (Ed.), Validation in language assessment (pp. 41-59). Mahwan, New
York: Lawrence Edbaum.
Richards, J. (1976). The Role of Vocabulary Teaching. TESOL Quarterly, 10 (1), 77-89.
Schmitt, N., Schmitt, D., & Clapham, C. (2001). Developing and Exploring the
Behaviour of Two New Versions of the Vocabulary Levels Test. Language
Testing, 18(1), 55-88.
Sutarsyah, C. (2003). Word-Definition Matching Format (A Vocabulary Level Test for
EFL learners). Paper presented at the The 51st TEFLIN International Conference
2003, Bandung.
West, M. (1953). A General Service List of English Words. London: Longman.
Appendix: The paper-based VIBE test used.
NAMA: ………………………………..
VIBE_PBT_Matching_1
Isi SATU HURUF yang hilang dan ANGKA yang menunjukkan artinya.
Contoh Soal:
Contoh Jawaban:
1 buku
= ho _ se
(…)
= ho u se
(3)
2 kucing
= b _ ok
(…)
= b o ok
(1)
3 rumah
=c_t
(…)
=cat
(2)
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
Academic-Word Level
sebelumnya
cocok, sesuai
mencari
prinsip-prinsip moral
melukai, merugikan
menetapkan, menentukan
merupakan, menunjukkan
menyeluruh, keseluruhan
filsafat
jaminan, menjamin
tahunan, setiap tahun
ciri-ciri, keistimewaan
memudahkan
menurut hukum, sah
berhubungan dgn. pengobatan
mengeluarkan, meniadakan
bagian, barang pelengkap
membedakan
bermacam-macam
berubah, berkembang
cari-ciri khusus
tubrukan; pengaruh yang kuat
bidang, kawasan
alasan, sebab
pengganti kerugian
utama
hubungan; berhubungan dengan
persamaan, perbandingan
merasa
runtuh, hancur, gagal, jatuh
disebut
jarak; bergerak, bergeser
menirukan
meneruskan
melarang, menghalangi
sistem tingkatan status
rancangan, rencana
jenis kelamin, perkelaminan
melukai, merugikan
pelaksana, pelaku
COMP _ TIBLE
DE _ OTE
ET _ IC
GUAR _ NTEE
IN _ URE
OVE _ ALL
PHIL _ SOPHY
PR _ OR
S _ EK
SPE _ IFY
AN _ UAL
COMP _ NENT
DIFFER _ NTIATE
DIV _ RSE
EV _ LVE
EXC _ UDE
FACI _ ITATE
FEA _ URE
LE _ AL
MED _ CAL
ANA _ OGY
COL _ APSE
CON _ ACT
IM _ ACT
MO _ IVE
OF _ SET
PARA _ ETER
PER _ EIVE
PRI _ ARY
SE _ TOR
HIER _ RCHY
IN _ URE
PRACT _ TIONER
PRO _ ECT
PRO _ EED
PRO _ IBIT
RA _ GE
S_X
SIM _ LATE
SO-C _ LLED
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
(…)
Download