Chapter 1

advertisement
VIETNAM NATIONAL UNIVERSITY, HANOI
COLLEGE OF FOREIGN LANGUAGES
DEPARTMENT OF POSTGRADUATE STUDIES
------------------------------
GIÁP THỊ AN
EVALUATING A FINAL ENGLISH READING TEST
FOR THE STUDENTS AT HANOI, TECHNICAL AND
PROFESSIONAL SKILLS TRAINING SCHOOL –
HANOI CONSTRUCTION CORPORATION
(ĐÁNH GIÁ BÀI KIỂM TRA HẾT MÔN TIẾNG ANH CHO HỌC
SINH TRƯỜNG TRUNG HỌC KỸ THUẬT VÀ NGHIỆP VỤ HÀ NỘI
– TỔNG CÔNG TY XÂY DỰNG HÀ NỘI)
M.A minor thesis
Field: Methodology
Code: 60.14.10
HANOI –2008
VIETNAM NATIONAL UNIVERSITY, HANOI
COLLEGE OF FOREIGN LANGUAGES
DEPARTMENT OF POSTGRADUATE STUDIES
--------------------------------
GIÁP THỊ AN
EVALUATING A FINAL ENGLISH READING TEST
FOR THE STUDENTS AT HANOI, TECHNICAL AND
PROFESSIONAL SKILLS TRAINING SCHOOL –
HANOI CONSTRUCTION CORPORATION
(ĐÁNH GIÁ BÀI KIỂM TRA HẾT MÔN TIẾNG ANH CHO HỌC
SINH TRƯỜNG TRUNG HỌC KỸ THUẬT VÀ NGHIỆP VỤ HÀ
NỘI – TỔNG CÔNG TY XÂY DỰNG HÀ NỘI)
MA. minor thesis
Field:
Methodology
Code:
60.14.10
Supervisor: Phùng Hà Thanh, M.A.
HANOI – 2008
I clarify that the study is my own work. The thesis, wholly or partially, has not been
submitted for higher degree.
ii
ACKNOWLEGEMENTS
I would like to express my deepest thanks to my supervisor Ms. Phùng Hà Thanh,
M.A. for the invaluable support, guidance, and timely encouragement she gave me
while I was doing this research. I am truly grateful to her for her advice and
suggestions right from the beginning when this study was only in its formative stage.
I would like to send my sincere thanks to the teachers at English Department,
HATECHS, who have taken part in the discussion as well given insightful comments
and suggestion for this paper.
My special thanks also go to students in groups KT1, KT2, KT3, KT4 -K06 for their
participation to the study as the subjects of the study. With out them, this project
could not have been so successful.
I owe a great debt of gratitude to my parents, my sisters, my husband especially my
son, who have constantly inspired and encouraged me to complete this research.
iii
ABSTRACT
Test evaluation is a complicated phenomenon which has been paid much attention by
number of researchers since the importance of language test in assessing the
achievements of students was raised. When evaluating a test, evaluator should have
concentrated on criteria of a good test such as the mean, the difficulty level,
discrimination, the reliability and the validity.
This present study, researcher chose the final reading test for students at HATECHS
to evaluate with an aim at estimating the reliability and checking the validity. This is
a new test that followed the PET form and was used in school year 2006 – 2007 as a
procedure to assess the achievement of students at HATECHS. From the
interpretation of the data got from scores, researcher has found out that the final
reading test is reliable in the aspect of internal consistency. The face and construct
validity has been checked as well and the test is concluded to be valid based on the
calculated validity coefficients. However, the study remains limitations that lead to
the researcher’s directions for future studies.
iv
TABLE OF CONTENT
ACKNOWLEGEMENT
ii
ABSTRACT
iii
TABLE OF CONTENT
iv
LIST OF ABBREVIATION
vi
LIST OF TABLES
GLOSSARY OF TERMS
PART ONE: INTRODUCTION
vii
viii
1
1. Rationale
2
2. Objectives of the study
3
3. Scope of the study
3
4. Methodology of the study
3
5. The organization of the study
4
PART TWO: DEVELOPMENT
5
CHAPTER 1: LITERATURE REVIEW
6
1.1. Language testing
1.1.1. Approaches to language testing
6
6
1.1.1.1. The essay translation approach
6
1.1.1.2. The structuralist approach
6
1.1.1.3. The integrative approach
7
1.1.1.4. The communicative approach
8
1.1.2. Classifications of Language Tests
8
1.2. Testing reading
10
1.3. Criteria in evaluating a test
13
1.3.1. The mean
13
1.3.2. The difficulty level
13
1.3.3. Discrimination
14
1.3.4. Reliability
15
1.3.5. Validity
17
CHAPTER 2: METHODOLOGY AND RESULTS
19
2.1. Research questions
19
2.2. The participants
19
2.3. Instrumentation and Data collection
20
v
2.3.1 Course objectives; Syllabus and Materials for teaching
20
2.3.1.1. Course objectives
20
2.3.1.2. Syllabus
20
2.3.1.3. Assessment instruments
21
2.3.2. Data collection
23
2.3.3. Data analysis and Results
23
2.3.3.1. Test score analysis
24
2.3.3.2. The reliability of the test
25
2.3.3.3. The test validity
28
2.3.3.4. Summary to the results of the study
32
PART THREE: CONCLUSION
33
1. Conclusion
34
2. Limitations
34
3. Future directions
35
REFERENCES
36
APAENDIX 1: THE FINAL READING TEST
38
APPENDIX 2: THE PET
44
APPENDIX 3: QUESTIONS FOR DISCUSSION
50
vi
LIST OF ABBREVIATIONS
HATECHS: Hanoi, Technical and Professional Skill Training School
M:
Mean
N:
Number of students
P:
Part
PET:
Preliminary English Test
r:
Validity coefficient
R xx:
Reliability Coefficient
SD:
Standard Deviations
Ss:
Students
vii
LIST OF TABLES
Table 1: Types of language tests
9
Table 2: Types of tests
10
Table 3: Types of reliability
16
Table 4: The syllabus for teaching English – Semester 2
21
Table 5: Components of the PET reading test
22
Table 6: Components of the final reading test
22
Table 7: The Raw Scores of the final reading test and the PET
24
Table 8: The reliability coefficients
27
Table 9: The validity coefficients
31
viii
GLOSSARY OF TERMS
Discrimination is the spread of scores produced by a test, or the extent to which a test
separates students from one another on a range of scores from high to low. Also used
to describe the extent to which an individual multi-choice item separates the students
who do well on the test as a whole from those who do badly.
Difficulty is the extent to which a test or test item is within the ability range of a
particular candidate or group of candidates.
Mean is a descriptive statistic, measuring central tendency. The mean is calculated by
dividing the sum of a set of scores by the number of scores.
Median is a descriptive, measuring central tendency: the middle score or value in a set.
Marker, also scorer is the judge or observer who operates a rating scale in the
measurement of oral and written proficiency. The reliability of markers depends in
part on the quality of their training, the purpose of which is to ensure a high degree of
comparability, both inter- and intra-rater.
Mode is a descriptive statistic, measuring central tendency: the most frequent
occurring score or score interval in a distribution.
Raw scores – test data in their original format, not yet transformed statistically in any
way ( eg by conversion into percentage, or by adjusting for level of difficulty of task
or any other contextual factors).
Reading comprehension test is a measure of understanding of text.
Reliability is the consistency, the extent to which the scores resulting from a test are
similar wherever and whenever it is taken, and whoever marks it.
ix
Score is the numerical index indicating a candidate’s overall performance on some
measure. The measure may be based on ratings, judgements, grades, number of test
items correct.
Standard deviation is the property of the normal curve. Mathematically, it is the
square root of the variance of a test.
Test analysis is the data from test trials are analyzed during the test development
process to evaluate individual items as well as the reliability and validity of the test
as a whole. Test analysis is also carried out following test administration in order to
allow the reporting of results. Test analysis may also be conducted for research
purposes.
Test item is part of an objective test which sets the problem to be answered by the
student: usually either in multi choice form as statement followed by several choices
of which one is the right answer and the rest are not; or in true/false statement which
the student must judge to be either right or wrong.
Test taker is a term used to refer to any person undertaking a test or examination.
Other terms commonly used in language testing are candidate, examinee, testee.
Test retest is the simplest method of computing test reliability; it involves
administering the same test to the same group of subjects on two occasions. The time
between administrations is normally limited to no more than two weeks in order
minimize the effect of learning upon true scores.
Validity is the extent to which a test measures what it is intended to measure. The test
validity consists of content, face and construct validity.
1
PART ONE: INTRODUCTION
2
1. Rationale
Testing is necessary in the process of language teaching and learning, therefore it has
gained much concern from teachers and learners. Through testing, teacher can evaluate
learners’ achievements in a certain learning period; self assess their different teaching
method and provide input into the process of language teaching (Bachman, 1990, p. 3).
Thanks to testing, learners also self-assess their English ability to examine whether
their levels of English meet the demand of employment or studying abroad. The
important role of test makes test evaluation necessary. By evaluating tests, test
designers would have the best test papers for assessing their students.
Despite the importance of testing, in many schools, tests are designed without
following any rigorous principles or procedures. Thus, the validity and the reliability
should be doubted. In HATECHS, the final English course tests had been designed by
teachers at English Department at the end of the course, and some of tests were used
repeatedly with no adjustment. In school year 2006 – 2007, there has been a change in
test designing. Final tests were designed according to PET (Preliminary English Test)
procedure. The PET is from Cambridge Testing System for English Speakers of Other
Languages. Based on the PET, new Final Reading Test has been also developed and
used as an instrument to assess students’ achievement in reading skill. The test was
delivered to students at the end of school year 2006 - 2007, there was not any
evaluation. To decide whether the test is reliable and valid a serious study is needed.
The context at HATECHS has inspired the author, a teacher of English to take this
opportunity to undertake the study entitled “Evaluating a Final English Reading Test
for the Students at Hanoi Technical and Professional Skills Training School” with an
aim to evaluate the test to check the validity and reliability of the test. The author was
also eager to have a chance to find out some suggestions for test designers to get better
and more effective test for their students.
3
2. Objectives of the study
The study is aimed at evaluating final reading test for the students at Hanoi, Technical and
Professional Skill Training School. The test takers are non-majors. The results of the test
will be analyzed, evaluated and interpreted with the aims:
- to calculate the internal consistency reliability of the test
- to check the face and construct validity of the test
3. Scope of the study
Test evaluation is a wide concept and there are many criteria in evaluating the test.
Normally, there are four major criteria - item difficulty, the discrimination, reliability and
the validity when any test evaluator wants to evaluate a test. However, it is said that item
difficulty and the discrimination of the test are difficult to evaluate and interpret; therefore,
with in this study the researcher focuses on the reliability and the validity of the test as a
whole.
At HATECHS, at the end of Semester 1, there is a reading achievement test, and at the end
of the first year, after finishing 120 periods studying English, there is a final reading test.
The researcher chose the final test to evaluate the internal consistency reliability, face and
construct validty.
4. Methodology of the study
In this study, the author evaluated the test by adopting both qualitative and
quantitative methods. The research is quantitative in the sense that th e data will be
collected through the analysis to the scores of the 30 random papers of students at the
Faculty of Finance and Accounting. To calculate the internal consistency reliability
researcher use Formula called Kuder-Richardson 21 and Pearson Correlation
Coefficient Formula would be adopted to calculate the validity coefficient. It is
qualitative in the aspect of using a semi-structured interview with open questions
4
which were delivered to teachers at HATECHS at the annual meeting on teaching
syllabus and methodology. The conclusion to the discussion would be used as the
qualitative data of the research.
5. The organization of the study
The study is divided into three parts:
Part one: Introduction – is the presentation of basic information such as the
rationale, the scope, the objectives, the methods and the organization of the study.
Part two: Development – This part consists of two chapters
Chapter 1: Literature Review – in which the literature that related to language of
testing and test evaluation.
Chapter 2: Methodology and Results – is concerned with the methods of the study,
the selection of participants, the materials and the methods of data collection and
analysis as well the results of the process of data analysis.
Part three: Conclusion – this part will be the summary to the study, limitations as
well the recommendations for further studies.
Then, Bibliography and Appendices
5
PART TWO: DEVELOPMENT
6
CHAPTER 1
LITERATURE REVIEW
This chapter is an attempt to establish theoretical backgrounds for the study.
Approaches to language testing and testing reading as well as some literature to the
test evaluation will be reviewed.
1.1. Language testing
1.1.1. Approaches to language testing
1.1.1.1. The essay translation approach
According to Heaton (1998), this approach is commonly referred to as the pre scientific stage of language testing. In this approach, no special skill or expertise in
testing is required. Tests usually consist of essay writing, translation and grammati cal
analysis. The tests, for Heaton, also have a heavy literary and cultural bias. He also
criticized that public examination i.e. secondary school leaving examinations
resulting from the essay translation approach sometimes have an aural/oral
component at the upper intermediate and advanced levels though this has sometimes
been regarded in the past as something additional and in no way an integral part of
the syllabus or examination (p. 15)
1.1.1.2. The structuralist approach
“This approach is characterized by the view that language learning is chiefly concerned
with the systematic acquisition of a set of habits. It draws on the work of structural
linguistics, in particular the importance of contrastive analysis and the need to identify and
measure the learner’s mastery of the separate elements of the target language: phonology,
vocabulary and grammar. Such mastery is tested using words and sentences completely
divorced from any context on the grounds that a language forms can be covered in the test
in a comparatively short time. The skills of listening, speaking, reading and writing are
also separated from one another as much as possible because it is considered essential to test
one thing at a time”
Heaton, 1998, p.15).
7
According to him, this approach is now still valid for certain types of test and for
certain purposes such as the desire to concentrate on the testees’ ability to write by
attempting to separate a composition test from reading. The psychometric approach to
measurement with its emphasis on reliability and objectivity forms an integral part of
structuralist testing. Psychometrists have been able to show early that such traditional
examinations as essay writing are highly subjective and unreliable. As a result, the
need for statistical measures of reliability and validity is considered to be the utmost
importance in testing: hence the popularity of the multi-choice item – a type of item
which lends itself admirably to statistical analysis.
1.1.1.3. The integrative approach
Heaton (1998, p.16) considered this approach the testing of language in context and is
thus concerned primarily with meaning and the total communicative effect of
discourse. As the result, integrative tests do not seek to separate language skills into
neat divisions in order to improve test reliability: instead, they are often designed to
assess the learner’s ability to use two or more skills simultaneously. Thus, integrative
tests are concerned with a global view of proficiency – an underlying language
competence or ‘grammar of expectancy’, which it is argued every learner possesses
regardless of the purpose for which the language is being learnt.
The integrative testing, according to Heaton (1998) are best characterized by the use
of cloze testing and dictation. Beside, oral interviews, translation and essay writing
are also included in many integrative tests – a point frequently overlooked by those
who take too narrow a view of integrative testing.
Heaton (1998) points out that cloze procedure as a measure of reading diffi culty and
reading comprehension will be treated briefly in the relevant section of the chapter on
testing reading comprehension. Dictation, another major type of integrative test, was
previously regarded solely as a means of measuring students’ skills of listening
comprehension. Thus, the complex elements involved in tests of dictation were
largely overlooked until fairly recently. The integrated skills involved in test dictation
8
includes auditory discrimination, the auditory memory span, spelling, the reco gnition
of sound segments, a familiarity with the grammatical and lexical patterning of the
language, and overall textual comprehension.
1.1.1.4. The communicative approach
According to Heaton (1998, p.19), “the communicative approach to language testing
is sometimes linked to the integrative approaches. However, although both
approaches emphasize the importance of the meaning of utterances rather than their
form and structure, there are nevertheless fundamental differences between the two
approaches”. The communicative approach is said to be very humanistic. It is
humanistic in the sense that each student’s performance is evaluated according to his
or her degree of success in performing the language tasks rather than solely relation
to the performance of other students. (Heaton, 1998, p.21).
However, the communicative approach to language testing reveals two drawbacks.
First, teachers will find it difficult to assess students’ ability without comparing
achievement results of performing language tests among students. Second,
communicative approach is claimed to be somehow unreliable because of various
real-life situation. (Hoang, 2005, p.8). Nevertheless, Heaton (1988) proposes a
solution to this matter. In his point of view, to avoid the lack of reliabili ty, very
careful drawn - up and well-established criteria must be designed, but he does not set
any criteria in detail.
It a nutshell, each approach to language testing has its weak points and strong point
as well. Therefore, a good test should incorporate features of these four approaches.
(Heaton, 1988, p.15).
1.1.2. Classifications of Language Tests
Language tests may be of various types but different scholars hold different views on
the types of language tests.
9
Henning (1987), for instant, establishes seven kinds of language test which can be
demonstrated as follows:
No
1
Types
Characteristics
Objective vs objective

tests
Objective tests have clear making scale; do not need much
consideration of markers.

Subjective tests are scored based on the raters’ judgements or
opinions. They are claimed to be unreliable and dependent.
2
Direct vs indirect tests

Direct tests are in the forms of spoken tests (in real life
situations)
3

Indirect tests are in the form of written tests.
Discrete vs integrative

Discrete tests are used to test knowledge in restricted areas.
tests

Integrative tests are used to evaluate general languge
knowledge.
4
Aptitude, achievement

and proficiency tests
Aptitude tests (intelligence tests) are used to select students in
a special programme.

Achievement tests are designed to assess students’ knowledge
in already-learnt ereas.

Proficiency tests (placement tests) are used to select students
in desired field.
5
Criterion referenced

vs norm-reference
Criterion referenced tests: The instructions are designed after
the tests are devised. The tests obey the teaching objectives
tests
perfectly.

Norm-reference tests: there are a large number of people from
the target population. Standards of achievement such as the
mean, average score are established after the course.
6
7
Speed test and power

Speed tests consist of items, but time seems to be insufficient.
tests

Power tests contain difficult items, but time is sufficient.
Others
Table 1: Types of language tests
(Source: Henning, 1987, pp 4-9)
10
However, Hughes (1989) mentions two categories: kinds of tests and kinds of
language testing. Basically, kinds of language testing consist of direct vs indirect
testing, norm-referenced testing vs criterion-referenced testing, discrete vs integrative
testing, objective vs subjective testing (Hughes, 1989, pp 14-19). Apart from this, he
develops one more type of test called communicative language testing which is
described as the assessment of the ability to take part in acts of communication
(Hughes, 1989, p.19). Hughes also discusses kinds of tests which can be illustrated in
the following table:
No
1
Kinds of tests
Characteristics
Sufficient command of language for a particular
Proficiency
purpose
2
3
Achievement
Final Achievement
Organized after the end of the course
Progress Achievement
Measure the students’ progress
Diagnostic
Find students’ strengths and weaknesses; what
further teaching necessary.
4
Placement
Classify students into classes at different levels.
Table 2: Types of tests
(Source: Hoang, 2005, p.13 as cited in Hughes, 1990, pp 9-14)
Language tests are divided into two types by Mc Namara (2000) based on test
methods and test purposes. About test methods, he believes that there exists two basic
types, namely traditional paper and pencil language tests which are used to assess
either separate components or receptive understanding; performance tests. Regarding
to test purpose he divides language tests into two types: achievement tests and
proficiency tests.
1.2. Testing reading
Reading can be defined as the interaction between the reader and the text (Aebersold
& Field, 1997). This dynamic relationship portrays the reader as creating meaning of
11
the text in relation to his or her prior knowledge (Anderson, 1999). Reading i s one of
four main skills, which plays a decisive role in process of acquiring a language.
Therefore, testing reading comprehension is also important. Traditionally, testing
reading is no doubt because of the social important of literacy and because these tests
are considered more reliable than speaking test.
Alderson (1996) proposes that reading teachers feel uncomfortable in testing reading.
To him, although most teachers use a variety of techniques in their reading classes,
they do not tend to use the same variety of techniques when they administer reading
tests. Despites the variety of testing techniques, none of them is subscribed to as the
best one. Alderson (1996, 2000) considers that no single method satisfies reading
teachers since each teacher has different purposes in testing. He listed a number of
test techniques or formats often used in reading assessments, such as cloze tests,
multiple-choice
techniques,
alternative
objective
techniques
(e.g.,
matching
techniques, ordering tasks, dichotomous items), editing tests, alternative integrated
approaches (e.g., the C-test, the cloze elide test), short-answer tests (e.g., the freerecall test, the summary test, the gapped summary), and information-transfer
techniques. Among the many approaches to testing reading comprehension, the three
principal methods have been the cloze procedure, multiple-choice questions, and
short answer questions (Weir, 1997).
Cloze test is now a well-known and widely-used integrative language test. Wilson
Taylor (1953) first introduced the cloze procedure as a device for estimating the
readability of a text. However, what brought the cloze procedure widespread
popularity was the investigations with the cloze test as a measure of ESL proficiency
(Jonz, 1976, 1990; Bachman, 1982, 1985; Brown, 1983, 1993). The results of the
substantial volume of research on cloze test have been extremely varied. Furthermore,
major technical defects have been found with the procedure. Alderson (1979), for
instance, showed that changes in the starting point or deletion rate affect reliability
and validity coefficients. Other researchers like Carroll (1980), Klein-Braley (1983,
1985) and Brown (1993) have questioned the reliability and different aspects of
validity of cloze tests.
12
According to Heaton (1998) “cloze test was originally intended to measure the
reading difficulty level of the text. Used in this way, it is a reliable means of
determining whether or not certain texts are at an appropriate level for particular
groups of students” (p.131). However, for Heaton the most common purpose of the
cloze test is to measure reading comprehension. It has long been argued that cloze
measures text involving the interdependence of phrases, sentences and paragraphs
within the text. However, a true cloze is said generally to measure global reading
comprehension although insights can undoubtedly be gained into particular reading
difficulty. In contrast, Cohen (1998) concludes that cloze tests do not assess global
reading ability but they do assess local-level reading. Each research tends to show his
evident to prove their arguments; however, most of them agree that cloze procedure is
really effective in testing reading comprehension.
Another technique that Alderson (1996, 2000), Cohen (1998), and Hughes (2003)
discuss is ‘multiple-choice’; a common device for text comprehension. Ur (1996,
p.38) defines multiple-choice questions as consisting “... of a stem and a number of
options (usually four), from which the testee has to select the right one”. Alderson
(2000: 211) states that multiple-choice test items are so popular because they provide
testers with the means to control test-takers’ thought processes when responding; they
“… allow testers to control the range of possible answers …”
Weir (1993) points out that short-answer tests are extremely useful for testing reading
comprehension. According to Alderson (1996, 2000), ‘short-answer tests’ are seen as
‘a semi-objective alternative to multiple choice’. Cohen (1998) argues that openended questions allow test-takers to copy the answer from the text, but firstly one
needs to understand the text to write the right answer. Test-takers are supposed to
answer a question briefly by drawing conclusions from the text, not just responding
‘yes’ or ‘no’. The test-takers are supposed to infer meaning from the text before
answering the question. Such tests are not easy to construct since the tester needs to
see all possible answers. Hughes (2003: 144) points out that “the best short -answer
questions are those with a unique correct response”. However, scoring the responses
13
depends on thorough preparation of the answer-key. Hughes (2003) proposes that this
technique works well when the aim is testing the ability to identify referents.
These above techniques are what usually used in testing reading, however, it difficult
to say which the most effective one is because it depends on the purpose of teachers
in assessing their students.
1.3. Criteria in evaluating a test
Test evaluation is a complicated phenomenon; this process needs to analyze number
of criteria. However, there are five main criteria that most researchers evaluate their
tests; they are the mean, difficult level, discrimination, reliability and validity.
1.3.1. The mean
According to a dictionary of language testing by Milanovic and some other authors,
the mean, also the arithmetical average is a descriptive statistic, measuring central
tendency. The mean is calculated by dividing the sum of a set score by the number of
score. Like other measures of central tendency the mean gives an indication of the
trend or the score which is typical of the whole group. In normal distributions the
mean is closely aligned to the median and the mode. This measure is by far the most
commonly used and it is the basis of a number of statistical tests of comparison
between groups commonly used in language testing. (Milanovic et al, 1999, p.118)
In language test evaluation, this also a criterion needs evaluating because the mean
score of the test will tell you how difficult or easy the test was for the given group.
This is useful for evaluators to have reasonable adjustment to the test as a whole.
1.3.2. The difficulty level
Difficulty level of a test tells you how difficult or easy each item of the test is.
Difficulty also shows the ability range of a particular candidate or group of
14
candidates. “In language testing, most tests are designed in such a way that the
majority of items are not too difficult or too easy for the relevant sample of test
candidates.” (Milanovic et al, 1999, p.44)
Item difficulty requirements vary according to test purpose. In selection test, for
example, there may be no need for fine graded assessment within the ‘pass’ or ‘fail’
groups so that the most efficient test design will have a majority f items cluste ring
near the critical cut-score. Information about item difficulty is also useful in
determining the order of items on a test. Tests tend to begin with easy items in order
to boost confidence and to ensure that weaker candidates do not waste valuable time
on items which are two difficult for them.
For test evaluators, difficulty level of a test should be analyzed for its importance in
deciding the sequence of items on a test. As well, this is one of factors that affect the
test scores of test-takers.
1.3.3. Discrimination
According to Heaton, “the discrimination index of an item indicates the extent to
which the item discriminates between the testees, separating the more able testees
from the less able (Heaton, 1998, p. 179). For him, the index of discrimination tells
us whether those students who performed well on the whole test tended to do well or
badly on each item in the test.
As well, in Milanovic’s definition, it is understood as “a fundamental property of a
language test, in their attempt to capture the range of individual abilities. On that
basis the more widely discrimination is an important indicator of a test’s reliability”.
(Milanovic et al, 1999, p.48)
By looking at the test scores, can the evaluators check the discrimination. Because of
its decisive role in categorizing the test takers into bad and good group,
discrimination of a test needs analyzing in the process of evaluating a test.
15
1.3.4. Reliability
Reliability is another factor of a test should be estimated by the test evaluator.
“Reliability is often defined as consistency of measurement”. (Bachman & Palmer,
1996, p.19). A reliable test score will be consistent across different characteristics of
the testing situation. Thus, reliability can be considered to be a function of
consistency of scores from one set of test tasks to another. Reliability is also means
“the consistency with which a test measure the same thing all the time”. (Harrison,
1987, p.24)
For test evaluators, reliability can be estimated by some of methods such as “p arallel
form, split half, rational equivalence, test-retest and inter-rater reliability checks”
(Milanovic et al, 1999, p.168). According to Shohamy (1985), the types and the
description as well the ways to calculate the reliability are summarized in the
following table:
16
Reliability types
1. Test-retest
Description
How to calculate
The extent to which the test
Correlations between scores
score are stable from one
of the same test given on two
administration
occasions
assuming
occurred
to
no
another
learning
between
two
occasions
2. Parallel form
The extent to which 2 tests
Correlations
between
two
taken form the same domain
forms of the same rater on
measure the same things
different occasions or one
occasion
3. Internal consistency
The extent to which the test
Kuder-Richardson
questions are related to one
21
Formula
another, and measure the
same trait
4. Intra-rater
The extent to which the same
Correlations between scores
rater is consistent in his
of the same rater on different
rating form one occasion to
occasions, or one occasion.
another, or in occasions but
with different test-takers
5. Inter-rater
The extent to which the
Correlations
among rating
different raters agree about
provided by different raters.
the assigned score or rating.
Table 3: Types of reliability
(Source: Hoang, 2005, p.31 as cited in Shohamy, 1985, p.71)
However, the reliability is said to be a necessary but not a sufficient quality of a test.
And the reliability of a test should be closely interlocked with its v alidity. While
reliability focuses on the empirical aspects of the measurement process, validity
focuses on theoretical aspects and seeks to interweave these concepts with the
empirical ones. For this reason it is easier to assess reliability than validity.
17
Test reliability could be analyzed by looking at the test score. If the test score
unchanged in different times the test is taken, the test is said a reliable one and vice versa. However, this depends on some of conditions and situations such as the
circumstances in which the test is taken, the way in which it is marked and the
uniformity of the assessment it makes. Therefore, it is necessary for evaluators when
they try to estimate the reliability of a test.
1.3.5. Validity
Validity is the most important consideration in test evaluation. The concept refers to
the appropriateness, meaningfulness and usefulness of the specific inferences made
from test scores. Test evaluation is the process of accumulating evidence to support
such inferences. Validity, however, is a unitary concept. Although evidence may be
accumulated in many ways, validity refers to the degree to which that evidence
supports the inferences that are made from scores. The inferences regarding specific
uses of a test are validated, not the test itself.
Traditionally, validity evidence has been gathered in three distinct categories:
content-related, criterion related and constructed evidence of validity. More recent
writing on validity theory stress the important of viewing validity as a ‘u nitary
concept’ (Messick, 1989). Thus, while the validity evidence is presented in separate
categories, this categorization is principally an organizational technique for the
purpose of the presentation of research in this manual.
According to Milanovic et al (1999), content and construct validity are conceptual
whereas concurrent and predictive (criterion-related) validity are statistical. Or in
other words, scores obtained on the test may be used to investigate criterion -related
validity, for example, by relating them to other test scores or measure such as
teachers’ assessment or future prediction. (pp. 220-221)
Another type of test validity, for Milanovic et al, is face validity which refers to the
degree to which a test appears to measure the knowledge or abilities it claims to
18
measures, as judged by untrained observer such as the candidate taking the test or
the institution which plans to administer it. (Milanovic et al, 1999, p. 221)
In a book by Alderson et al (1995), the authors divided validity i nto other
categories; they are internal, external and construct validity. Internal validity
according to them consists of three sub-types – face, content and response validity.
For external validity, there are two sub-types; they are concurrent and predictive
validity. And construct validity relates to five forms; they are comparison with
theory, internal correlations, comparison biodata and psychological characteristic,
multitrait – multi method analysis and convergent – divergent validation and factor
analysis. (Alderson et al, 1995, pp. 171-186)
The validity of the test is paid much attention by number of researchers, test
evaluators should take time in checking the validity of the test based on the
categories of it which is categorized by authors and researchers. Through the test
scores, evaluators check whether the test is valid or not so that they will have good
adjustment to the test they evaluated.
Summary: In this chapter, we have attempted to establish the theoretical framework
for the thesis. Language testing is one of most important procedures for language
teachers in student assessing. There are number of approaches to language testing
and testing reading. This has been discussed in the first part of the chapter. The
second matter has been explored in the chapter is the theory of test evaluation which
related to the criteria of a test need analyzing by test evaluators.
19
CHAPTER 2
METHODOLOGY AND RESULTS
This chapter will include the research questions, the selection of participants who
took part in the study and the testing materials. The methods of data collection and
data analysis as well the results are presented afterwards.
2.1. Research questions
On the basis of the literature review, this chapter aims at answering two research
questions:
1) Is the final reading test for the students at HATECHS reliable?
2) To what extent is the final reading test valid in terms of face and construct?
2.2. The participants
The students at HATECHS are from different provinces, cities and towns in the Nor th
of Vietnam. They are generally aged between 18 and 21. Thirty participants were
chosen randomly from students at Faculty of Finance and Accounting of school year
2006 – 2007. All of them are first year students. In addition, seven teachers at English
Department were chosen for the interview. These teachers are all female and mostly
get more than five year experience of teaching English. These teachers all took part in
teaching the students at the school year 2006-2007.
At the school, the students take an English course in the first year. The course is
divided into two components, each lasts 60 periods. It is a compulsory subject at
school. After finishing the course, they are required to have pre-intermediate level.
However, students often have varying English levels prior to the course. Some of
them have learnt English for 7 years at high school, or some have learnt it for 3 years
due to each part of the country. Some of them even have never learnt English because
at the lower level of school they learned other foreign languages not English. It is
20
therefore important for teachers to apply appropriate methods in teaching them to
help them become more proficient. It is also critical that teachers give them suitable
tests which meet their need and the requirements of the subject.
2.3. Instrumentation and Data collection
2.3.1 Course objectives; Syllabus and Materials used for the students at HATECHS
In this section, we will discuss the syllabus, the course book for teaching reading and
the standard evaluation for students at HATECHS
2.3.1.1. Course objectives
Teaching objectives for the reading are to help students after finish the English course
at HATECHS be able to:
-
be aware of the reading skill techniques
-
enrich the students’ vocabulary in various topics.
-
be at the proficiency of pre-intermediate level.
2.3.1.2. Syllabus
During the course, the book that the students use is Let’s Study by Do Tuan Minh,
National University Publication House, 2005. The book consists of 20 units, and in
the first semester the students are expected to cover 10 first units and the second
semester the last 10 units will be covered. The final reading test would be based on
the contents of 10 last units. The total time for the whole semester is 60 classes in 10
weeks (each class lasts 45 minutes), in each class the time for reading is one fourth.
The syllabus is described in the following table:
21
Unit
Title
Time (classes)
Pages
11
My hometown
5
81
12
What’s the weather like today?
5
87
13
Traveling
5
92
14
Holidays and festivals
5
99
15
Future jobs
5
106
Stop and check and test 1
2
16
A British Wedding
5
117
17
At school
5
125
18
City life and country life
6
133
19
Part-time jobs
6
140
20
Social evils
6
147
Stop and check, test 2
2
General revision
2
Final test
1
Total
60
Table 4: The syllabus for teaching English – Semester 2
2.3.1.3. Assessment Instruments
* Standard for the Final Reading English Test
Basing on what they have been taught, teachers of English Department design the
reading test to measure students’ achievement according to course objectives.
Preliminary English Test (PET) is used as a model to construct final reading test for
students at HATECHS. PET is used only for reading skill. Accordingly, the PET’s
standard for reading test is presented.
According to PET’s standard, a reading test should consist of five parts which are
presented as follows. The PET (See appendix 2) was chosen as the criterion measure
to evaluate the final reading test.
22
Parts
Texts
Items
Questions
1
Five signs
01- 05
5 multi-choice questions
2
Eight related texts
06 - 10
5
paragraphs
for
people
descriptions and related texts
3
Text for getting information
11 - 20
10 True/ False questions
4
Text with viewpoints or ideas
21 - 25
5 multi-choice questions
5
Text with gap filling
26 - 35
10 multi-choice questions
Table 5: Components of the PET reading test
* The Final Reading Test for students at HATECHS
Based on the PET’s standard, components of the final reading test is summarized in
the next table: (See Appendix for the details of the test and the key)
Parts
Texts
Items
Weight
Marks
1
Five signs
5
20 %
5
2
Eight related texts
5
20 %
5
3
Text for getting information
10
20 %
10
4
Text with viewpoints or ideas
5
20 %
5
5
Text with gap filling
10
20 %
10
Total
35
100%
35
Table 6: Components of the final reading test
The test was designed in PET form, therefore all the instructions are clear.
23
2.3.2. Data collection

Step 1: The students took the final reading test. They did the test at the final examination for the
course. Then, we randomly chose 30 test papers of the students at Accounting and Finance
Department for the study.

Step 2: The students who were chosen as the participants of the study were asked to take the
PET. This test took place two weeks after the final exam. The participants had not been
announced the time of the test taking place.

Step 3: Test papers of final reading test and the PET were collected and marked by two
teachers. Two teachers were also randomly chosen and they did not know the participants. The
tests were marked according to the keys to tests provided by test designers.

Step 4: The researcher conducted a semi-structured interview with seven teachers at English
Department. Researcher gave out questions related to final reading test to the teachers for
discussing. The researcher noted down the opinions and points of view and used for the study.
2.3.3. Data analysis and Results
The reliability of the test would be calculated followed Kuder-Richardson 21 formula basing on
test scores. The formula helped researcher found out the coefficient that shows the reliability of the test.
To find out the face validity, the researcher gave out a semi-structure interview to the teachers at
the teachers’ annual meeting at the beginning of the school year. In this meeting, mostly teachers
have discussions about the teaching methods also syllabus improving. At the English Department,
there are 7 teachers. At the meeting of school year 2007-2008, teachers of English Department had
main discussions about the new final test for students at HATECHS as the idea of the researcher.
In order to find out the evidence for the construct validity of the test researcher also asked her
participants to do a reading test that from the sample PET for 2007 exams by Cambridge ESOL
examination. Then the test papers were collected and marked by 2 teachers at the Department.
24
After that, the researcher collected the test scores to interpret them using statistical instruments. In
this study, researcher used Pearson Correlation Coefficient Formula to check the validity of the test.
2.3.3.1. Test score analysis
First of all, the raw scores of the final reading test and PET are presented in the table below. This
raw scores include the scores of test as a whole and the detailed scores in each part of the test.
Ss
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Whole
32
32
32
28
28
28
26
26
26
26
25
25
25
25
23
23
23
23
21
21
18
18
18
14
14
11
7
7
4
4
FINAL TEST
P.1
P.2
P.3
5
5
8
5
5
9
5
5
7
4
4
8
5
4
7
5
3
8
5
3
6
5
4
7
5
3
8
4
4
7
5
4
6
5
3
7
4
4
7
4
3
8
4
2
7
4
4
6
4
3
6
4
4
7
5
3
6
4
4
5
3
2
5
3
1
5
4
3
1
3
2
4
3
2
4
2
1
3
2
1
2
2
1
1
1
0
0
1
0
1
P.4
5
4
5
4
4
5
3
2
5
4
4
5
4
4
4
5
2
4
2
3
5
2
3
2
2
2
1
2
2
0
P.5
9
9
10
8
8
7
9
8
5
7
6
5
6
6
8
4
8
4
5
5
3
7
7
3
3
3
1
1
1
2
Whole
28
25
28
28
25
24
24
23
22
23
23
22
23
22
23
20
20
21
20
21
21
19
18
19
18
14
14
4
14
4
P.1
5
5
5
5
5
5
5
4
4
4
5
4
4
4
5
3
4
4
3
5
4
4
3
4
3
2
4
2
3
2
PET
P.2
P.3
4
8
3
7
3
7
4
8
3
7
3
6
4
7
2
6
2
7
3
5
4
6
2
5
3
5
2
5
2
5
3
4
2
4
2
4
3
5
2
5
3
6
2
5
1
4
2
4
2
5
2
4
1
4
1
0
1
2
0
0
P.4
3
3
3
2
3
3
2
3
2
2
2
3
2
3
3
2
2
2
3
2
2
1
1
2
2
2
1
0
1
1
Table 7: The Raw Scores of final reading test and the PET
*Note: The total score of the test is 35
P.5
8
7
10
9
7
7
6
8
7
9
6
8
9
8
8
6
6
9
6
7
6
7
9
7
6
4
4
1
7
1
25
The Mean and Standard Deviation is calculated by the hereby formulas:
M 
 fx
M: Mean
N
∑: sum of
N: Number of students
x: raw score
f: frequency
And:
 x  x 
2
SD 
x: raw score
N
∑: sum of
N: number of students
x : the mean
And the results are presented in the following table:
M
SD
Whole
21
7.92
FINAL TEST
P1
P2
P3
3.8
2.9
5.5
1.20 1.40 2.40
P4
3.3
1.37
P5
5.6
2.60
Whole
20
5.67
P1
4.0
0.96
PET
P2
P3
2.5
4.9
1.10 1.9
P4
2.1
0.77
P5
6.8
2.08
Table 8: Means and Standard Deviations
2.3.3.2. The reliability of the test
Reliability estimate is one of difficult strategies and it is normally impossible to
achieve a perfectly reliable test but the test constructors must make their test as
reliable as possible. They do this by reducing the causes of unsystematic variation to
a minimum. They should ensure, for example, that the test instructions are clear, and
that there are no ambiguous items.
26
The final reading test is a kind of objective test, types of reliability related to the
correlations between the raters need not to be calculated because the raters mark the
test based on the keys to the tests; therefore, the test scores would be the same by
different raters in any occasions. In other words, the parallel-form, intra-rater and
inter-rater reliability need not calculate. In language testing, test - retest reliability is
also not an appropriate strategy. Psychologically the students always want to get
better results in the second time doing the test.
Internal Consistency Reliability
There are several techniques to calculate the internal consistency reliability such as
split-half, Kuder-Richardson 20 and Kuder-Richardson Formula 21. However, in
practice the Formula 21 is said to be the easiest in computing. For this Formula, if we
can not calculate the variances of each item or it is very difficult for us to calculate
them, we can use it to calculate the consistency reliability. The Formula is illustrated
as follows:

x2

x
n 
n
rtt 
1
2
n 1
s t








where:
rtt:
the KD reliability
n:
number of items in the test
x:
the mean of score on the test
s2t :
the variances of the test scores
Shohamy (1985), on the other hand, also presents Kuder-Richardson Formula 21 in
another form:
27
Rxx  1 

x. K  x
K . SD 2



where:
x:
mean
SD:
standard deviation
K:
number of items on a test
According to Kuder-Richardson Formula 21 in Shohamy’s form, the researcher
calculated the Rxx in the following table:
Test
Rxx  1 

x. K  x
K . SD 2

x (mean)
SD (standard deviation)
Final test
21
7.92
0.87
PET
20
5.67
0.80


Table 8: The reliability coefficients
In practice, the ideal reliability coefficient is 1. A test with a reliability coefficient of
1 is the one which would give precisely the same results for a particular set of
candidates regardless of when it is administered. A test with reliability coefficient of
0 would give the sets of results quite unconnected with each other, and the test would
fail to be a reliable one. For a reading test, according to Lado (1961), a highly reliable
reading test is usually in the 0.90 to 0.99 range of reliability coefficient while oral
tests or essay types may be in the 0.7 to 0.79 range (cited in Huges, 1989, p. 32).
Table 8 shows the results of reliability coefficients calculated from test scores. The
coefficient of Rxx is equal to 0.87. From the results of the calculation, the final reading
test is quite good in accordance with the range of reliability coefficient by Lado
(1961). In other words, the final reading test for students at HATECHS is reliable in
terms of internal consistency.
28
In testing, a test valid must be reliable, but a reliable test may not be valid at all. The
final reading test for students at HATECHS is reliable as the result above. To find out
whether it is valid or not, the evidence for test validity will be explored in the next part.
2.3.3.3. The test validity
The primary concern for any test is that the interpretations and the uses we make from
the test score are valid. The evidence that we collect in support of the validity of
particular can be of three general types: content relevance, criterion relatedness and
meaningfulness of construct (Rouhani, 2006, as cited in Bachman, 1990). And for a
teacher in setting his test, face validity is also vital (Harrison, 1983, p.11). With in this
thesis, the author wished only to find the face and construct validity which are said to be
important for test designers.
Face validity
Face validity, for Harrison (1983), is concerned with what teachers and students think
of the test. A question would be raised that whether the test appears to the teachers
and students reasonable way to assess the students, or whether it seems trivial, or too
difficult, or unrealistic. The face validity is, therefore, found out by only way is to
ask the teachers and the students concerned for their opinions either formally by
means of a questionnaire or informally by discussion or staff room.
At the annual meeting of teachers at English Department at HATECHS, there were
number of arguments on the new final reading test. For the question whether it is a
reasonable instrument to assess students, six of seven teachers agreed that the final
reading test was an effective instrument to evaluate the achievement of students in
reading skill. This seemed to be agreeable with the result that researcher got in
finding out the mean of the test scores. The mean was calculated and equal to 21.
However, one of seven teachers, who disagreed with the final test, had the idea of
assessing students’ ability in reading using cloze procedure which is said to be one of
effective procedures in assessing the achievement of students. With the mean of test
29
score was 21 together with the number of students got marks under 18 was not big,
almost teachers agreed that the test was reasonably difficult. Or in other words, the
test was not too difficult or too easy. In the question about the reality of the test,
almost teachers agree that it is highly realistic. When designing the test, test designer
used number of texts in the textbook which help students be familiar with what they
have been taught. The intention of test designer make the test be realistic.
To sum up, in the discussion of teachers, the researcher had found out that the final
reading test for students at HATECHS has face validity.
Construct validity
According to Hughes (1995) we should randomly choose the participants to take two
tests (treatment test and criterion test) and then compare the result of the two tests. If
the comparison between the two sets of scores reveals a high level of agreement, then
the treatment test may be considered valid. (Hughes, 1995, pp. 23-34).
In this study, two tests are Final Test and PET, Final test is the treatment one and
PET is the criterion. The agreement between two tests is called ‘validity coefficient’,
a mathematical measure of similarity. To find out the validity coefficient, the
researcher used the formula called Pearson Correlation Coefficient. This coefficient is
symbolized as r. The formula is shown as follows:
r
 xy
NS x S y
where
r: validity coefficient
x: X- X (X, X : scores, mean on the treatment test)
y: Y- Y (Y, Y : scores, mean on the criterion test)
30
N: number of students
S x: the Standard Deviation of treatment test
S y: the Standard Deviation of criterion test
According to the formula, it is necessary to find out the total of x.y. This is based on
the scores of two tests and the result is shown as below:
∑ (X- X
Ss
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
X
32
32
32
28
28
28
26
26
26
26
25
25
25
25
23
23
23
23
21
21
18
18
18
14
14
11
7
7
4
4
Y
28
25
28
28
25
24
24
23
22
23
23
22
23
22
23
20
20
21
20
21
21
19
18
19
18
14
14
4
14
4
)(
Y- Y )
X- X
Y- Y
11
11
11
7
7
7
5
5
5
5
4
4
4
4
2
2
2
2
0
0
-3
-3
-3
-7
-7
-10
-14
-14
-17
-17
8
5
8
8
5
4
4
3
2
3
3
2
3
2
3
0
0
1
0
1
1
-1
-2
-1
-2
-6
-6
-16
-6
-16
∑ (X- X
)(
Y- Y )
(X- X )( Y- Y )
88
55
88
56
35
28
20
15
10
15
12
8
12
8
6
0
0
2
0
0
-3
3
6
7
14
60
84
224
102
272
1227
If we continue to calculate like this way, we will have the results of five pa rts of two
tests as follows:
31
∑ (X- X
)(
Y- Y )
Part 1 = 23.8
Part 2 = 29.1
Part 3 = 109.1
Part 4 = 18.1
Part 5 = 126
After calculating the total of (X- X )( Y- Y ), the coefficients are computed follows the
Pearson’s formula. And the last results are presented in the following table:
r
 xy
NS x S y
Whole
0.91
P1
P2
P3
0.69 0.63 0.80
P4
P5
0.57 0.78
Table 9: The validity coefficients
Table 9 shows the correlation coefficients between two tests as a whole and five parts
as the details. Hughes (1995) points out that, perfect agreement between two sets of
scores will result in a validity coefficient of 1. Total lack of agreement will give a
coefficient of zero (Hughes, 1995, p. 24). Having a look at Table 9, we can see
clearly the validity coefficient as a whole is 0.91; this means the agreement between
two tests is “high level”. This is enhanced by the coefficients of each part in the tests.
The statistics in Table 9 show the validity coefficients between five parts of tests.
They are 0.69, 0.63, 0.80, 0.57 and 0.78 respectively. All of them show the
comparatively high level of agreement between the two tests. From what we have
calculated and criterion of Hughes for validity correlation coefficient, we can come to
conclusion that the Final Reading Test for students at HATECHS is construct valid.
32
2.3.3.4. Summary to the results of the study
Chapter 2 has provided the practical context of the study. In this chapter, we have
attempted to look at the reliability of the final reading test for students at HATECHS
as well to find out the evidences for the validity of the test. This was gained from test
score analysis. The results are summarized as follows: (The total scores on the test is
35 – each correct answer got 1 points, wrong one got 0 point)
1. Mean of the final reading test scores: M = 21
2. The Standard deviation of the test scores: SD = 7.92
3. The reliability coefficient: R xx = 0.87
4. The validity coefficient: r = 0.91.
5. The test has face validity.
Based on the results of the data analysis, we realize that the test is valid. It is reliable
but not very high reliable – the high reliable one is said to have the reliability
coefficient ranging from 0.90 to 0.99. However, in the condition of school where the
research was carried out, the coefficient is 0.87 is acceptable for the reading test.
33
PART THREE: CONCLUSION
34
1. Conclusion
As the research questions raised for exploration at the beginning of the study have
been answered, it is high time we should bring all the issues together.
In Part One, the primary concern of the thesis has been stated. The issue relates to the
practice of evaluating the reliability and validity of the final reading test for students
at HATECHS. However, the practical concern would necessarily raise theoretical
questions.
Therefore, in Chapter 1, Part Two of the thesis, we have reviewed the theories
relating to approaches to language testing and reading testing. In the section of test
evaluation, the criteria in evaluating a test have been discovered. This helps to
establish the theoretical backgrounds for actual study in Chapter 2.
In Chapter 2, the main part of the thesis, the researcher has analyzed the test scores
and found out evidences to prove the reliability and the validity of the test. In this
chapter, the researcher has had conclusion to her study, that the final reading test for
students at HATECHS is valid and reliable.
2. Limitations
Due to the researcher’s knowledge and time, the study cannot avoid the limitations.
Firstly, the study is limited to the evaluating the reliability and the validity of the test,
in evaluating the validity of the test, the researcher has just found out the evidence for
face and criterion-related validity. Secondly, the number of participants is 30, which
is rather small for the number of students of the school; as well the participants only
from Department of Finance and Accounting. Finally, the condition and
circumstances where the test taking place are not discussed, this leads to the
unreliability of the test in the aspect of test-retest reliability. For the limitations, the
researcher would like to bring about the future directions in the next section.
35
3. Future Directions
From the results and limitations to the study, in the future direction, we wish to study
further on the evaluating the test in details. Or in other words, in the future, the study
would continue with the test item analyzing. Doing so, we could explore the
discrimination of the test as well the item difficulty of the test. Additionally, we wish
to interpret the test scores of bigger number of participants. Finally, in the future
studies, we also wish to give out suggestions for test designers for a better final
reading test for students at HATECHS.
36
REFERENCES
Aebersold, J., & Field, M. (1997). From reader to reading teacher. Cambridge, UK:
Cambridge.
Alderson, J. C. (1996). The testing of reading. In C. Nuttall (Ed.) Teaching reading skills
in a foreign language, 212-228. Oxford: Heinemann.
Alderson, J. C. (2000). Assessing Reading. Cambridge: Cambridge University Press.
Alderson, J. C., Clapham, C., and Wall, D. (1995). Language Test Construction and
Evaluation. Cambridge: Cambridge University Press.
Alderson, J.C. (1979). The effect on the cloze test of changes in deletion frequency.
Journal of Research in Reading, 2, 108-118.
Anderson, N. (1999). Exploring second language reading: issues and strategies.
Boston: Heinle.
Bachman, L. F. & Palmer, A. S. (1996). Language testing in
practice. Oxford: Oxford University Press.
Bachman L. F. (1990). Fundamental Considerations in Language
Testing. Oxford: Oxford University Press.
Bachman, L. F. (1982). The trait structure of cloze test scores. TESOL Quarterly, 16, 61-70.
Bachman, L.F. (1985). Performance on cloze test with fixed-ratio and rational deletions.
TESOL Quarterly, 19, 535-556.
Brown, J.D. (1993). What are the characteristics of natural cloze tests? Language Testing,
10, 93 -116.
Carol, B. J. (1980). Testing communicative performance: An interim study. London:
Pergamon Institute of English.
Cohen, A. D. (1998). Strategies and processes in test taking and SLA. In L. F. Bachman
and A. D. Cohen (Eds.) Interfaces between second language acquisition and
language testing research, 90-111. Cambridge: Cambridge University Press.
Harrison, A. (1983). A Language Testing Handbook. London: Mcmillan Press.
Heaton, J. B. (1988). Writing English Test. London: Longman.
Henning, G. (1987). A Guide to Language Testing. Cambridge: Cambridge University Press.
37
Hoang Van Trang. (2005). Evaluating the reliability of the achievement writing test for the
first-year students in the English Department, CFL-VNU and some suggestions for
changes. Unpublished MA. College of Foreign Languages, Vietnam National
University, Hanoi: Vietnam.
Hughes, A. (2003). Testing for Language Teachers. Cambridge: Cambridge University Press.
Jonz, J. (1976). Improving the basic egg: The multi-choice cloze. Language Learning, 26, 255-265.
Jonz, J. (1990). Another turn in the conversation: What does cloze mean? TESOL
Quarterly, 24, 61-83.
Klein-Braley, C. (1983). A cloze is a question. In J.W. Oller, Jr., (Ed.), Issues in language
Testing research (pp. 218 – 228). Rowley, MA: Newbury House.
Klein-Braley, C. (1985). A cloze-up on the C-test: A study in the construct validation of
authentic tests. Language Testing, 14, 47-84.
Lado, R. (1961). Language Testing. London: Longman.
Messick, S. (1989). Validity. Linn, R.L. (Ed) Educatioanl Measurement. Third Ed.
American Council on Education. Macmillan Publishing Co. N.Y.
Milanovic, M. (Ed.) (1999). Dictionary of language Testing. Local Examinations
Syndicate: Cambridge University Press.
Raatz, U. (1985). Better theory for better tests? Language Testing, 2, 60-75.
Shohamy, E. (1985). A practical Handbook in Language Testing for the Second Language
Teachers. Tel-Aviv: Tel-Aviv University Press.
Ur, P. (1996). A Course in Language Teaching. Cambridge: Cambridge University Press.
Weir, C. J. (1990). Communicative Language Testing. New York: Prentice Hall.
Weir, C. J. (1993). Understanding and Developing Language Tests. New York: Prentice
Hall International.
38
APPENDIX 1: THE FINAL READING TEST
TRƯỜNG TRUNG HỌC KỸ THUẬT VÀ NGHIỆP VỤ HÀ NỘI
------------------------------------
FINAL ENGLISH TEST
Skill: Reading
Time allowed: 60 minutes
Mark
Marker’s signature:
Your name:……………………………
1……………………………
Group:……………………………………
2…………………………..
Date of birth: ……………………………
Date:……………………………………..
-----------------------------------------------------------------Part 1
Question from 1-5
 Look at the sign in each question
 Someone asks you what it means.
 Mark the letter next to the correct explanation – A, B, C or D on
your answer sheet.
Example:
0
Silence please
Examination in
progress
A
B
C
D
Please be quiet while people are taking
their examination
Do not talk to the examiner
Do not speak during the examination.
The examiner will tell you when you can
talk
Example:
0
1
Stand in a queue,
here
for the
tickets
A
B
C
D
A
D
PART 1
B
C
It is difficult to buy the tickets.
You can buy the ticket anywhere you
like.
Tickets are available in a queue.
To buy tickets, you must queue here.
39
2
We are closed
for staff
training until
9.30
3
Please leave the
shop before 9
p.m.
4
5
Please hand in
your key at the
desk
Sorry- All
tables fully
booked this
evening
A
B
C
D
We can train you to work here.
We are not open today because of staff
training.
The shop is run by trained staff.
The shop will open at 9.30 today.
A
B
C
D
The shop opens at 9 p.m.
The shop will close at 9 p.m.
People can stay at the shop until 9
p.m.
People can enter the shop after 9 p.m.
A
B
C
D
Don’t lock the room.
Keep your key safe.
Lock your door before leaving.
Leave your key at reception.
A
B
C
D
You can only have a meal if you have
booked.
You do not need to reserve a table.
The restaurant is not open this
evening.
If you wait you will be given a table.
Part 2
Questions 6- 10
 The people below all want to learn a new sport.
 On the next page there are descriptions of eight sports centres.
 Decided which sports centre would be the most suitable for the
following people.
 For questions 6-10, mark the correct letter (A-H) on your answer
sheet.
Example:
0
6
7
A
G
B
H
PART 2
C
D
F
Dionysis works in the city centre and wants to take
up a sport that he can do regularly in his lunch
hour. He enjoys activities which are fast and a bit
dangerous.
John and Betty already play golf at weekends. Now
they have retired, they want to learn a new activity
they can do together in the mornings in the
countryside.
40
8
9
10
In six weeks’ time, Juan is having a holiday on a
Caribbean island, where he plans to explore the
ocean depths. He has a 9-to-5 job and wants to
prepare for this holiday after work.
Tomoko and Natalie are 16. They want to do an
activity one evening a week and get a certificate at
the end. They would also like to make new friend.
Alice has a well-paid but stressful job. She would
like to take up a sport which she can do outside the
city each weekend. She also wants go get to know
some new people.
Sporting Opportunities
A
Suzanne’s Ridding School
B
You can start a horse-riding at any
age. Choose private or group lessons
any weekday between 9 a.m. and 8.30
p.m. (3.30 p.m. or Saturdays). There
are 10 kilometres tracks and paths
for leisurely rides across farmland
and open country. You will need a
ridding hat.
C
Adonis Dive Centre
Our Young Sailor’s Course leads
to the Stage 1 Sailing
qualification. You’ll learn how
to sail safely and the course
also covers sailing theory and
first aid. Have fun with other
members afterwards in the
clubroom. There are 10 weekly
two-hour lessons (Tuesdays 6
p.m. – 8 p.m.).
D
Our experienced instructors offer
one-month courses in deep-sea diving
for beginners. There are two evening
lessons a week, in which you learn to
breath underwater and use the
equipment safely. You only need a
swimming costume and towel. Reduced
rates for couple.
E
Hilton Ski Centre
If you are take our 20-hour course a
week or two before your skiing
holiday, you’ll enjoy you holiday
more. Learn how to use a ski-lift,
how to slow down and, most
importantly, how to stop! The centre
is open from noon to 10 p.m. Skis and
boots can be hired.
Lackford Sailing Club
Windmill Tennis Academy
Learn to play tennis in the
heart of the city and have fun
at our tennis weekends. Arrive
on Friday evening, learn the
basic strokes on Saturday and
play in a competition on Sunday.
There’s also a disco and
swimming pool. White tennis
clothes and a racket are
required.
F
Avon Watersports Club
We use a two kilometer length of
river for speedboat racing and
water-skiing. A beginners’
course consists of ten 20-minute
lessons. You will learn to
handle bots safely and
confidently, but must be
convenient central position and
is open daily from 9 a.m. to 4
p.m., with lessons all through
the day
41
G
Glenmoreie Golf Club
After a three-hour introduction with
a professional golfer, you can join
this golf club. The course stretches
across beautiful rolling hills and is
open from dawn until dusk daily.
There are regular social evenings on
Saturdays in the club bar. You will
need your own golf equipment.
H
Hadlow Aero Club
Enjoy a different view of the
countryside from one of our twoseater light aeroplanes. After a
50-hour course with our
qualified instructor, you could
get your own pilot’s license.
Beginners’ lessons for over-18s
are arranged on weekdays after 4
p.m.
Part 3
Questions 11 – 20




Look at the statements below about the wedding tradition.
Read the text to decide if each statement is correct or incorrect.
If it is correct, mark A on your answer sheet.
If it is not correct, mark B on your answer sheet.
Example:
0
PART 3
A
B
11.Wedding cake was made of sugar and honey.
12.Until today the wedding cake is the symbol of good luck and
fertility.
13.A single woman can place a piece of wedding cake under her pillow.
14.The smell of flowers attract the evil spirits.
15.To spread the good fortune and luck the bride throw the bouquet of
flower.
16. In early time, a woman wore her wedding dress in her wedding
ceremony.
17.The wedding dress is normally white because this is the color of
virginity.
18.The bride should make her own a dress for her wedding.
19.In early time, the golden ring is the symbol of love and marriage.
20.The vein in the third finger was believed to run directly to the
heart.
42
Wedding Tradition
Wedding cake
The first wedding cake dated back to the
Middle Age. It was the made of sugar icing and
decorated with meaningful symbols like doves,
horseshoes, etc. Until today the wedding cake
is the symbol of good luck and fertility. The
bride and the groom cut the wedding cake
together and from that moment they share their
new life together. All the guests should eat
some to ensure good luck. A single woman can
place a piece of wedding cake under her pillow
and should dream of the man she is going to
marry.
Wedding Dress
In early
tradition
is a sign
can drive
times, a woman wear her best dress in her wedding. The
of wearing a white dress was only started in 1949. White
of virginity and joy. People also believe that the coulor
away evil spirits.
It was believe that the bride should never make her own dress or
try it before the wedding. She shouldn’t let her groom see her in
her wedding dress before the wedding, either. These were to make
sure that the marriage took place.
Bridal Bouquet
Flowers played a very important part in olden times – the smell of
the flowers were believed to ward off evil spirits and bring good
fortune. The throwing of the bouquet is a way of spreading the
bride’s good fortune and luck. Whoever catches it will be blessed
with good luck and will be the next to marry.
Wedding Ring
In the past, a golden ring was given to the bride’s family in
payment for the bride. Now it is simply the symbol of love and
marriage. The unbroken circle is also age-old symbol of ‘Eternity’.
It is a tradition to place the wedding ring on the middle finger of
the left hand. Perhaps it’s because the ancient Romans believed
that the vein in the third finger ran directly to the heart, so the
wearing of rings on that finger joined the couple’s hearts and
destinies.
Part 4
Questions 21 – 25
 Read the text and questions
43

For each question, mark the letter next to the correct answer –
A, B, C or D – on your answer sheet.
Example:
0
My name is Mandi. Three months ago,
I went to a disco where I met a boy
called Tom. I guessed he was older
than me, but I liked him and thought
it didn’t matter. We danced a couple
of times, then asked how old I was.
I told him I was 16. I thought that
if I told him my real age, he
wouldn’t want to know me, as I’m
only 13.
After the disco we arranged to meet
the following weekend. The next
Saturday we went for a burger and
had a real laugh. Afterwards he
walked me to my street and
22
Who is she writing to?
A
her boyfriend
B
her parents
C
a teenage magazine
D
a schoolfriend
23
Why is Mandi worried?
A
Tom has been behaving strangely.
B
She’s been telling lie.
C
She’s not allowed to go to disco.
D
Her parents are angry with her.
24
Why can’t Tom come to Mandi’s house?
A
She doesn’t want her parents to meet him.
B
Her parents don’t like him.
C
He’s nervous of meeting her parents.
D
She doesn’t want him to see where she live.
25
Which of these answers did Mandi receive?
A
Tell me what you really feel.
B
You must
everyone.
C
Everyone’s been unfair to you.
being
C
Now I really don’t know what to do.
I can’t go on lying to my parents
every time we go out, and Tom keeps
asking why he can’t com round to my
house. I’m really worried and I need
some advice.
Why has Mandi written this?
A
to describe her boyfriend
B
to prove how clever she is
C
to explain a problem
D
to defend her actions
by
PART 4
B
kissed me goodnight. Things went
really well. We see each other a
couple of times a week, but I’ve had
to lie to my parents about where I’m
going and who with. I’ve always got
on with them, but I know that if
they found out how old Tom was
they’d stop me seeing him.
21
start
A
D
honest
with
44
D
Don’t worry, I’m sure Tom will change
his mind.
Part 5
Questions 26 – 35
 Read the text below and choose the correct word for each space.
 For each question, mark the letter next to the correct word A, B, C or D – on your answer sheet.
Example:
A
D
0
PART 5
B
C
For many young people sport is (0)……… popular part of school life and (26)………in one of the
school teams and playing in matches is very important. (27)………….someone is in a team it means
a lot of extra practice and often spending a Saturday or Sunday away (28)…………home, as many
matches are played then.
It (29)……….also involve traveling to other town to play against other school teams and then
(30)………..on after the match for a meal or a drink. Sometimes parents, friends or other students
will travel with the team to support (31)………..own side.
When a school team wins a match it is the whole school which feels proud, (32)………only the
players. It can also mean that a school (33)……….well-known for being good at certain sports and
pupils from that school may end up playing (34)………national and international teams so that the
school has some really (35)……..names associated with it!
(0)
A
a
B
an
C
the
D
and
26
A
having
B
being
C
taking
D
putting
27
A
If
B
As
C
Then
D
So
28
A
at
B
on
C
for
D
from
29
A
ought
B
is
C
can
D
has
30
A
being
B
staying
C
leaving
D
spending
31
A
their
B
its
C
our
D
whose
32
A
but
B
however
C
and
D
not
33
A
turns
B
makes
C
comes
D
becomes
34
A
up
B
to
C
for
D
beside
35
A
old
B
new
C
common
D
famous
45
Appendix 2: The PET (criterion measure)
------------------------------------------------------------------Reading
(Time allowed: 60 minutes)
Part 1
Question from 1-5
 Look at the sign in each question
 Someone asks you what it means.
 Mark the letter next to the correct explanation – A, B, C or D on
your answer sheet.
Example:
0
Silence please
Examination in
progress
A
B
C
D
Please be quiet while people are
taking
their examination
Do not talk to the examiner
Do not speak during the examination.
The examiner will tell you when you
can talk
Example:
0
1
2
Please keep this
entrance clear
Supersaver
tickets cannot
be used on
Fridays
A
D
PART 1
B
C
A
B
C
D
Only use this entrance in an
emergency.
Do not park in front of this entrance
Always keep this door open
Permission is needed to park here
A
B
C
D
You need a special ticket to travel on
a Friday.
You can save money by traveling on a
Friday.
Supersaver tickets can be used every
day except Fridays.
Supersaver tickets cannot be bought
before the weekend.
46
3
4
Please show the
librarian all
books when you
leave the library
Machine out of
order. Drinks
available at bar
5
Keep this door
locked when room
not in use
A
B
C
D
A
B
C
D
A
B
C
D
Return your books before you leave the
library.
The librarian needs to see your books
before you go.
Make sure you take all your books with
you.
The librarian will show you where to
put your books.
This machine is not working at the
moment.
There is a drinks machine in the bar.
Drinks cannot be ordered at the bar.
Use this machine when the bar is
closed.
This room cannot be used at present.
This door must always be kept locked.
Keep the key to this door in the room.
Lock the door when it is not being
used.
Part 2
Questions 6- 10
 The people below are looking at the contents pages of magazines.
 On the next page are parts of the contents pages of eight
magazines.
 Decided which magazine (letter A-H) would be the most suitable for
each person (numbers 6-10)
 For each of these numbers mark the correct letter on your answer.
Example:
0
6
7
A
G
B
H
PART 2
C
D
Sarah is a keen walker. She lives in an area which
is very flat and when she goes on holiday she likes
to walk in the hills. She is looking for new places
to go.
Jane is keen on music. She likes reading about the
personal life of famous people to find out what they
are really like.
8
Peter is going to France next week on business and
has a free weekend which he plans to spend in Paris.
He would like to find out what there is to do there.
9
Paul likes visiting other countries. He is also
interested in history and likes reading about famous
explorers from the past.
F
47
10
A
C
Mary likes clothes but hasn’t got much money so she
is looking for ways of dressing smartly without
spending too much.
MARIA MARIA
She conquered the world
of opera with the most
extraordinary voice of
the century – and died
miserable and alone.
Michael Tonner looks at
Callas, the woman behind
the opera singer.
BUSINESS IN PARIS
John Felbrick goes to
Paris to see what
facilities it offers for
business people planning
meetings
Here and there
Our guide to what is happening in
London, and this month we’ll also
tell you what’s on in each of the
capital cities of Europe.
Explore Africa
Last year Jane Merton joined a
trip across Africa, exploring the
most cut-off parts of the
continent. Read what she has to
say.
E
B
D
F
 Read about Neij Ashdown’s
recent walk along one of
Britain’s oldest paths. It
passes through some of the most
beautiful hill country.
 Enter our competition and
win a week for two in Thailand
 Don’t go into the
hills unprepared. If
you’re a hill walker,
we have advice for you
on what to take and
what to do if something
goes wrong.
 We show pictures
of Linda Evangelista,
the supermodel from
Toronto, wearing next
season’s clothes for
the woman with
unlimited pocket money.
Festivals
This is the season for street
festivals. We’ve traveled to
three of the big ones in South
America and bring you pictures
and information.
How I got there
Georgina Fay tells us how she
became a famous clothes
designer overnight.
In the Freezer
We talk to the two men who have
just completed a walk across the
Antarctic.
Tighten That Belt
Well-known fashion designer,
Virginia McBrid , who now lives
in Paris, tells us how to make
our old clothes look fashionable.
48
G
Wake up children
Penelope Fine’s well-known
children’s stories are going to
be on Sunday morning Children’s
TV. We talk to this famous author
and find out how she feels about
seeing her stories on screen.
Flatlands
It may not look like promising
walking country – it hardly rises
above sea level, but we can show
you some amazing walks.
H
My audience with Pavarotti
David Beech talks to the famous
singer about his future tour of the
Far East.
New light
Julian Smith talks to the
granddaughter of one of the men who
reached the North Pole for the
first time in 1909. She tells us
about his interesting life.
Part 3
Questions 11 – 20




Look at the statements below about a student hostel.
Read the text to decide if each statement is correct or incorrect.
If it is correct, mark A on your answer sheet.
If it is not correct, mark B on your answer sheet.
Example:
0
PART 3
A
B
21.Every student has a key to the main door.
22.You can borrow your friend main door card.
23.Insurance companies will pay if someone steals your card and takes
things from your room.
24.Spare rooms are least likely to be available in summer.
25.Your brother can stay free of charge if he uses the other bed in
your room.
26.Guests must report to Stan when they arrive.
27.The cleaners take away food that they find in bedrooms.
28.If cook late at night, you should leave the washing-up until the
morning.
29.Students who play loud music may have to leave the hostel.
30.You should ask Stan to call a doctor if you are ill.
Hostel rule
To make life in this student hostel as comfortable
possible for every one, please remember these rules.
and
safe
as
Security You have a special card which operates the electronic lock
on your room door and a key for the main door of the hostel. These are
your responsibility and should never be lent to anyone, including your
fellow students. If you lose them you will be charged 20 pounds for a
replacement. Do not leave your room unlocked even for short periods
(for example, when making yourself a coffee). Unfortunately, theft
from student hostel is very common and insurance companies will not
49
pay for stolen goods unless you prove that your room was broken into
by force.
Visitors
There are rarely any rooms available for visitors, except
at the end of the summer term. Stan Jenkins, the hostel manager, will
be able to tell you and can handle the booking. A small charge is
made. Stan also keeps a list of local guesthouses, with some
information about what they’re like, price, etc. You are also allowed
to use empty beds for up to three nights, with the owner’s permission
(for example, if the person who shares your room is away for the
weekend), but you must inform Stan before your guest arrives, so that
he has an exact record of who’s in the building if a fire breaks outs.
Students are not allowed to charge each other for this.
Kitchen
There is a kitchen on each floor where light meals, drinks,
etc. may be prepared. Each has a large fridge and a food cupboard. All
food should be stored, clearly marked with the owner’s name, in one of
these two places. Bedrooms are too warm for food to be kept in, and
the cleaners have instructions to remove any food found in them. After
using the kitchen, please be sure you do all the washing up
immediately and leave it tidy. If you use it late in the evening,
please also take care that you do so quietly in order to avoid
disturbing people in nearby bedrooms.
Music
If you like your music loud, please use a Walkman! Remember
that your neighbors may not share your tastes. Breaking this rule can
result in being asked to leave the hostel. Musicians can use the
practice rooms in the basement. Book through Stan.
Health
Any serious problems should be taken to the local doctor.
The number to ring for an appointment is on the ‘Help’ list beside the
phone on each floor. For first aid, contact Stan or one of the
students whose names you will find on that list, who also have some
first aid training.
Part 4
Questions 21 – 25


Read the text and questions
For each question, mark the letter next to the correct answer – A,
B, C or D – on your answer sheet.
Example:
0
A
D
PART 4
B
C
Dear Mr Lander,
I run ‘Snip’ hairdressing shop above Mr Shah’s chemist’s shop at 24 High street. I
started the business 20 years ago and it is now very successful. My customers have to walk
through the chemist’s to the stairs at the back which lead to the hairdresser’s. This has
never been a problem.
Mr Shah plans to retire later this year, and I have heard from a business
acquaintance that you intend to rent the shop space to a hamburger bar. I have thought about
trying to rent it myself and make my shop bigger but I cannot persuade anyone to lend me
that much money. I don’t know that to do. My customers come to the hairdresser’s to relax
and the noise and smells of a burger bar will surely drive them away. Also, they won’t like
having to walk through a hot, smelly bar to reach the stairs.
I have always paid my rent on time. You have told me in the past that you wish me to
continue with my business for as long as possible. I believe you won another empty shop in
the High Street. Could the burger bar not go there, where it would not affect other people’s
businesses?
50
26
What
A
B
C
D
is
to
to
to
to
the writer’s main aim in the letter?
show why her business is successful
explain why her customers are feeling unhappy
avoid problems for her business
complain about the chemist downstairs
27
Who was the letter sent to?
A
the writer’s landlord
B
the writer’s bank manager
C
the owner of the burger bar
D
the local newspaper
28
What
A
B
C
D
29
Why is the writer worried about her customers?
A
They do not like eating burger.
B
They may not be allowed to use the stairs.
C
The smells will not be pleasant.
D
The hairdresser’s will get too crowded.
30
Which of these is part of a reply to the letter?
A
Thank you for your letter. I am sorry you shop
has had to close down because of lack of
business.
does the writer think about the burger bar?
It will make her lose money.
It will not be successful.
The High Street is not the place for it.
Other shopkeepers will complain about it too.
B
Thank you for your letter. I understand you
problem. I will ask them to look at the other
shop but I can make no problems at the moment.
C
Thank you for your letter asking me to rent
the ground floor shop to you. I will think
about it and let you know.
D
Thank you for your letter. I am sorry that I
am not able to lend you the money you ask for
Part 5
51
Questions 26 – 35
 Read the text below and choose the correct word for each space.
 For each question, mark the letter next to the correct word - A, B,
C or D – on your answer sheet.
Example:
A
D
0
PART 5
B
C
Sally
After two weeks of worry, a farmer (0)…………the north of England was very happy yesterday.
James Tuke, a farm who (26)…….....sheep, lost his dog, Sally, when were our (27)……… together
a fornight ago.
‘Sally was running (28)………of me,’ he said, ‘and disappeared over the top of the hill. I
whistled and called (29)………. she didn’t come. She’s young, so I thought perhaps she’d gone
back to the farmhouse (30)……….her own. But she wasn’t there. Over the next few days I
(31)………as much time as I could looking for her. I was afraid she’d heard an animal crying while
she was out of walking near the (33)……..of a cliff. I rushed out and found Sally on a shelf of rock
halfway down. She was this and (34)……….but she had no (35)……….injuries. She was really
lucky!’
B of
C at
D to
(0) A in
26
A
goes
B
grows
C
keeps
D
holds
27
A
working
B
worked
C
work
D
works
28
A
behind
B
beside
C
ahead
D
around
29
A
but
B
so
C
and
D
even
30
A
by
B
on
C
with
D
of
31
A
used
B
spent
C
gave
D
passed
32
A
more
B
again
C
further
D
after
33
A
edge
B
side
C
border
D
height
34
A
poor
B
dull
C
weak
D
broken
35
A
strong
B
hard
C
rough
D
serious
APPENDIX 3: QUESTIONS FOR DISCUSSION
(For teachers)
The questions are designed for collecting data for my research. Your assistance in taking
part in the discussion is highly appreciated.
Thank you very much for your cooperation!
------------------------------------
52
Questions for discussion
1. Does the test appear to you a reasonable way of assessing the students?
2. Is the test too difficult or easy?
3. Is it realistic?
Download