test critique

I. Test Purpose and Use
The Cambridge Young Learners of English (CYLE) tests are a new English proficiency
instrument produced by the University of Cambridge Local Examination Syndicate (UCLES).
They are intended to serve as a bridge to the UCLES Main Suite examinations (i.e., KET: Key
English Test and PET: Preliminary English Test). The tests were first administered in 1997 in
Cambridge and are currently taken by over 310,000 children in about 55 countries around the
world. CYLE came to China in 1996.
The purpose of Cambridge Young Learners English Tests is “to offer a comprehensive
approach to testing the English of primary learners between the ages of 7 and 12” (UCLES,
2003a:2). The YLE tests span three ability levels: Starters, Movers and Flyers. The Starters is
designed for children at the age of 7, a Movers candidate is to be aged between 8 and 11 and a
Flyers candidate is between 9 and 12. The YLE tests are designed to assess the listening, reading,
writing, and speaking skills of children learning English as a foreign language (EFL). Each level
has three components: Listening, Reading and Writing, and Speaking.
While one purpose of the test is to measure overall English proficiency, the CYLE tests
also aim to give a positive impression of English testing to young learners in order to encourage
and motivate them to continue their learning. To this end, the test items use large colorful
illustrations and emphasize communicative discourse and vocabulary.
The tests aim to: “1) sample relevant and meaningful language use; 2)measure ability
accurately and fairly; 3) present a positive impression of international tests; 4) promote and
encourage effective learning and teaching”.(Cambridge Young Learners English Test,2003)
II. Content Validity
Content validity is the relevance of the test content to the goal of the test. A test is said to
have content validity if its content constitutes a representative sample of the language skills,
structures, etc, with which it is meant to be concerned. (Hughes, 2004)
This language test critique will focus on the speaking test of Movers, the second level of
the Cambridge Young Learners English Test. The purpose of this speaking test is to identify EFL
young learners’ interactive listening ability, pronunciation, and ability to produce words and
phrases. Their speaking ability in Movers is measured by their ability to complete four tasks.
Main Skill Focus
Identifying and describing 4
2 similar pictures
Words/short phrases
differences between 2 pictures
Narrating a story
4 pictures story
Extended narrative
Identifying ‘odd one out’ and give
Picture sets
Answer personal questions(open-
Answer personalized
ended questions)
(words/short phrases)
Figure 1: 4 Sections of Movers Speaking Test
One argument for the validity of the speaking test is that it is a direct test (testing
speaking by speaking). Hughes (2004) suggests that direct testing improves the validity of the
test since it promotes authentic tasks. Another point is that the four skills (comparing pictures,
telling a story, categorizing and exploring, talking about oneself) constructed for the Movers
speaking test are definitely geared toward testing the speaking ability of EFL young learners at
the age of between 8 and 11. These four skills are also a representative sample of necessary
speaking skills for this age level. In addition, the speaking test of young learners is age
appropriate because it is not a pencil and paper test. Many sets of colorful pictures in each task
are used to elicit describing, story-telling, explaining and communicative responses from the
young learners. Items in the test are primarily comprised of everyday vocabulary for children’s
toy, activities, general interests, and concepts such as weather, animals, days of the week, and
On the other hand, some vocabularies in the speaking test indicate that the test does not
assess the skills or knowledge it wants to assess. For example, some pictures about American
foods like “sandwich and salad” are inappropriate because many Chinese young learners,
especially kids from the country have never tried these foods and may well not be able to answer.
Rather than testing knowledge of the vocabulary items, the test is testing knowledge of another
culture. Moreover, if a student is not used to seeing the representations of the words, even though
they know the word, it could call into question the validity of the test. For example, a telephone
or cinema might be represented differently in another country, so the test would be biased against
students from countries where a telephone or cinema are depicted differently. Students from lowincome backgrounds who have never seen telephones or cinemas would also be at a disadvantage.
III. Reliability
Reliability is the consistency of measurement. According to Hughes (2004), there are two
components of test reliability: the performance of candidates from occasion to occasion, and the
reliability of the scoring. I will analyze the reliability of the Movers’ speaking test from the
perspective of the performance of candidates and oral examiners.
The Movers speaking test is conducted by one oral examiner and one candidate and takes
7 minutes for each candidate. Oral examiners are required to be trained on how to carry out the
test, how to give positive feedback to the candidate, how to strictly follow “interlocutor frames”,
and how to score. The interlocutor frames and the training requirement of oral examiners do help
establish the reliability of the test.
However, some factors threaten the reliability of the test. First, it is not easy to guarantee
the quality and objectivity and consistency of oral examiners although perfect consistency is not
to be expected in the performance in an interview. In conducting the speaking test in China, it is
realized that the unavoidable differences among oral examiners are a serious threat to the
reliability of the test. The oral examiners’ performance has strong impact on candidates’
performance in the test: some oral examiners have clear English pronunciation to young learners,
but some do not; Some oral examiners are able to follow the interlocutor frame strictly, but some
can not help being flexible although they have been trained and monitored before the test; Some
oral examiners are able to give positive feedback, but some never use encouraging language and
even correct candidates’ errors repeatedly during the test; Some oral examiners give long wait
times to candidates, but some do not give wait time at all; Some oral examiners provide clear and
explicit instructions of the tasks, but some do not. Second, in some areas in China, it is not
possible to ensure that all candidates have the opportunity to familiarize themselves with the
format and the speaking testing techniques in order to learn what will be required of them and
improve their performance on the exam. As an oral examiner, I always found that some
candidates who were not familiar with test procedures did poorer job than those who knew what
they were expected of them. Not all candidates have access to the sample tests or the provision of
practice materials. So if any aspect of the speaking test is unfamiliar to candidates, they are likely
to not perform as well as they would do otherwise. Third, administration of the test is also one
threat to the reliability. The greater the differences between one administration of a test and
another, the greater the differences one can expect between a candidate’s performance on the two
occasions. It is hard to ensure uniformity in conducting the speaking test in China. It is almost
impossible for each testing center to strictly adhere to the timing and quiet setting with no
distracting sounds or movements. Fourth, the scoring of the speaking test is mostly subjective
and some oral examiners whose scoring deviates markedly from the norm during the training
session are still used due to the lack of English teachers in some areas in China. Fifth, In order to
reduce young learners’ anxiety and deliver the test to children in an enjoyable and nonthreatening way, only one examiner is used in the interview. However, it is difficult for one oral
examiner to conduct the test and keep track of the candidate’s performance at the same time. In
this subjective testing, it is a threat to reliability that the candidate’s performance is scored by
only one scorer. Finally, some candidates have to wait a longer time than others to be
interviewed. Fatigue can become an issue for the last individual student and threaten the
reliability of the assessment.
IV. Scoring Method and Score Reporting
The Movers speaking test is criterion-referenced,
“which compares learner’s performance, not to other learners, but to a set of criteria of
expected performance or learning targets. Criterion-referenced assessment can match the child’s
performance against an expected response on an item, or it may make use of a set of descriptors
along a scale which a learner is placed” (Cameron, 2001).
In the Movers speaking test, examples of an expected response and a descriptive scale are
given in respect to the candidates’ speaking skills. The criterion used to assess the child’s
speaking skills is the production of answers in single words or short phrases, and is rated on three
aspects: 1) interactive listening ability; 2) pronunciation; 3) production of words and phrases.
Each criterion carries a maximum mark of 3. (UCLES 2003a) (see Appendix)
A strong motivational characteristic of this test is that there is no pass or fail. It is
designed to test what the candidates know instead of what they do not know. Speaking is scored
locally by the oral examiner. All candidates are given a certificate from Cambridge University to
reward their efforts and abilities. Students of the Movers speaking test receive certificates with
an array of shields (1-5) which means that the best candidates receive five shields for their
speaking proficiency and the lowest scoring examinees receive one shield.
Although the Handbook and Research notes have understandably de-emphasized
numerical scores at this early stage of a young learner’s career, MacGregor (2001:5) points out
“one obvious weakness in this score reporting system is that no indication of what these shield
scores mean, and therefore the scores cannot be translated into descriptions of what the examinee
is and is not able to do”. This is largely difficult for parents, teachers, researchers, and the oral
examiners to understand how the raw scores are translated into shield bands each year. This is a
serious problem for candidates who are almost ready to move to the next Flyers level and even
worse for students at the Flyers level who are ready to move to the main suite of examinations.
According to Jones (2002), UCLES has now recognized the need to investigate the
relationships between YLE levels and between the YLE and the main suite because of increasing
demand by users to try to interpret results within such wider frameworks. One suggestion I offer
is that a scoring report of the individual student’s strength and weakness in achieving different
task types in the speaking test would greatly help the students, parents, and teachers evaluate the
English teaching and learning process and results. This speaking test will become a powerful
instructional tool in EFL contexts if the report of the individual student performance in the
speaking test does already exist. The scoring report will also serve to achieve one of the purposes
of the test: to promote and encourage effective learning and teaching.
V. Impact of the test
The stated purposes of Cambridge Young Learners’ English Tests are thought to be far
from “high–stakes” for any individual child’s scholastic career (Bailey, 2005). Although the
Movers speaking test can give the young learners, their teachers, and parents a kind of measure
of how well they are acquiring English speaking, the testing result is not highly relevant and
appropriate to their decisions to be made.
The test washback effect the young learners receive from taking the Movers Speaking
Test relates well to promote their learning and using English as a Foreign Language. It is a test of
real language skills. Language use is tested in meaningful, realistic, and interesting ways, using
materials especially suited to children aged between 8 and 11. The test has a positive impact on
children’s spoken English as they experience real communication in a foreign language. They are
fun and children enjoy taking them. In this sense, the test does promote effective oral English
The test has a positive instructional washback. For example, in China, UCLES has been
working closely with Sino-British Center of China National Education Examination Authority
(NEEA) to promote and support the Cambridge Young Learners English Tests. The impact of
the test on instruction is evidenced not only by teacher training, the network of English training
and testing, the national conferences and seminars on educational measurement for Cambridge
Young Learners managed by UCLES and the Sino-British Center, but also by the large number
of instructional materials linked to test content which “span the instructional spectrum from
multi-unit formal classroom programs to cheerful puzzle books for independent study, and these
materials appear to focus on language instruction rather than rehearsal of test taking
skills.”(Bailey, 2005)
One positive consequence of using the test is that it does promote English teaching and
learning in an EFL context. Both EFL teachers and students can greatly benefit from the
principles of Cambridge Young Learners English: student-centered, activity-centered, listening
and speaking first and writing and reading second, interest first, communicative teaching and
learning, and motivational test. This is usually what EFL formal classroom lack. In this case, the
purpose of the test is consistent with the instructional goals of EFL teachers and curriculum.
There are also potentially negative consequences for society and the educational system
from using this test. For example, with the good reputation of the Cambridge Test and support
from national education department in China, the test has brought into being an integrated
system of after-school English classrooms (“second English classrooms”). That means Chinese
young learners have to take Cambridge English classes and tests either during the weekends or
during their holidays, which is placing some pressure on both parents and students. In addition,
the cost of Cambridge Young Learners English Training and Test are expensive for some
Chinese families. Parents have to sacrifice their many things to pay for the cost of the test.
However some children may not be able to afford this opportunity and society should look for
ways to give these children opportunities in this area. Moreover, there appears another unfairness
in “English starting point” when the young learners move to their middle school. Some children
with many years’ Cambridge English training and testing easily outscore those who have no
English learning experiences or limited formal English learning in elementary schools. In this
sense, Cambridge English may influence some kids’ confidence and even their success in their
school life. Finally, while the new test encourages communicative language teaching, not all the
teachers are given adequate guidance and training, thus it is a big concern that not all the students
involved in Cambridge English are being given excellent teaching and fair testing.
In conclusion, the Cambridge Young Learners English (YLE) Tests “offer one of few
options for comprehensively assessing EFL abilities in young students” (Bailey, 2005, 251)
Indeed, the Movers Speaking Test has filled the gap of assessing young learner’s oral English
proficiency in some EFL countries such as China. It has significant strengths in encouraging
meaningful language use, giving young learners a positive impression of international tests, and
promoting effective learning and teaching in EFL contexts. Although it is an excellent
assessment of young EFL learners, some areas for improvement are raised in this paper:
improving the story-telling task by keeping a balance between supporting weaker candidates and
rewarding strong performance; improving the oral examiners training for more objectivity of
assessment; revising score reporting in order to reflect the individual student’s strength and
weakness in his/her oral English; revising the speaking test procedures for greater ease and
accuracy of administration. With the improvement in the width of use of the test, I believe that
the reliability and the validity of the Movers speaking test in Cambridge Young Learners English
can be greatly enhanced.
Bailey, A (2005) Test Review: Cambridge Young Learners English (YLE) Tests.
Language Testing, 22. 2. 242-252.
Cameron, L (2001) Teaching Languages to Young Learners. Cambridge University Press
Hughes, A (2004) Testing for Language Teachers. Cambridge University Press. P26
UCLES 2003a: Cambridge Young Learners English tests handbook: Starters,
Movers and Flyers. Cambridge: UCLES. Available at
http://www.cambridgeeslo.org/support/dloads/yle/yle_hb_03.pdf (February 2005)
Jones, N. (2002) Linking YLE levels into a single framework. Research Notes,10.
UCLES. Available at http://www.cambridgeesol.org/rs_notes/index.cfm
MacGregor, L(2001) Testing young learners with CYLE: the new kids on the
block. JALT Testing and Evaluation SIG Newsletter 5, 4-6.