LANGUAGE TEST CRITIQUES I. Test Purpose and Use The Cambridge Young Learners of English (CYLE) tests are a new English proficiency instrument produced by the University of Cambridge Local Examination Syndicate (UCLES). They are intended to serve as a bridge to the UCLES Main Suite examinations (i.e., KET: Key English Test and PET: Preliminary English Test). The tests were first administered in 1997 in Cambridge and are currently taken by over 310,000 children in about 55 countries around the world. CYLE came to China in 1996. The purpose of Cambridge Young Learners English Tests is “to offer a comprehensive approach to testing the English of primary learners between the ages of 7 and 12” (UCLES, 2003a:2). The YLE tests span three ability levels: Starters, Movers and Flyers. The Starters is designed for children at the age of 7, a Movers candidate is to be aged between 8 and 11 and a Flyers candidate is between 9 and 12. The YLE tests are designed to assess the listening, reading, writing, and speaking skills of children learning English as a foreign language (EFL). Each level has three components: Listening, Reading and Writing, and Speaking. While one purpose of the test is to measure overall English proficiency, the CYLE tests also aim to give a positive impression of English testing to young learners in order to encourage and motivate them to continue their learning. To this end, the test items use large colorful illustrations and emphasize communicative discourse and vocabulary. The tests aim to: “1) sample relevant and meaningful language use; 2)measure ability accurately and fairly; 3) present a positive impression of international tests; 4) promote and encourage effective learning and teaching”.(Cambridge Young Learners English Test,2003) 1 LANGUAGE TEST CRITIQUES II. Content Validity Content validity is the relevance of the test content to the goal of the test. A test is said to have content validity if its content constitutes a representative sample of the language skills, structures, etc, with which it is meant to be concerned. (Hughes, 2004) This language test critique will focus on the speaking test of Movers, the second level of the Cambridge Young Learners English Test. The purpose of this speaking test is to identify EFL young learners’ interactive listening ability, pronunciation, and ability to produce words and phrases. Their speaking ability in Movers is measured by their ability to complete four tasks. Part Main Skill Focus Input Response 1 Identifying and describing 4 2 similar pictures Words/short phrases differences between 2 pictures 2 Narrating a story 4 pictures story Extended narrative 3 Identifying ‘odd one out’ and give Picture sets Words/phrases Answer personal questions(open- Examiner’s Answer personalized ended questions) questions questions reason 4 (words/short phrases) Figure 1: 4 Sections of Movers Speaking Test One argument for the validity of the speaking test is that it is a direct test (testing speaking by speaking). Hughes (2004) suggests that direct testing improves the validity of the test since it promotes authentic tasks. Another point is that the four skills (comparing pictures, 2 LANGUAGE TEST CRITIQUES telling a story, categorizing and exploring, talking about oneself) constructed for the Movers speaking test are definitely geared toward testing the speaking ability of EFL young learners at the age of between 8 and 11. These four skills are also a representative sample of necessary speaking skills for this age level. In addition, the speaking test of young learners is age appropriate because it is not a pencil and paper test. Many sets of colorful pictures in each task are used to elicit describing, story-telling, explaining and communicative responses from the young learners. Items in the test are primarily comprised of everyday vocabulary for children’s toy, activities, general interests, and concepts such as weather, animals, days of the week, and shapes. On the other hand, some vocabularies in the speaking test indicate that the test does not assess the skills or knowledge it wants to assess. For example, some pictures about American foods like “sandwich and salad” are inappropriate because many Chinese young learners, especially kids from the country have never tried these foods and may well not be able to answer. Rather than testing knowledge of the vocabulary items, the test is testing knowledge of another culture. Moreover, if a student is not used to seeing the representations of the words, even though they know the word, it could call into question the validity of the test. For example, a telephone or cinema might be represented differently in another country, so the test would be biased against students from countries where a telephone or cinema are depicted differently. Students from lowincome backgrounds who have never seen telephones or cinemas would also be at a disadvantage. III. Reliability Reliability is the consistency of measurement. According to Hughes (2004), there are two components of test reliability: the performance of candidates from occasion to occasion, and the 3 LANGUAGE TEST CRITIQUES reliability of the scoring. I will analyze the reliability of the Movers’ speaking test from the perspective of the performance of candidates and oral examiners. The Movers speaking test is conducted by one oral examiner and one candidate and takes 7 minutes for each candidate. Oral examiners are required to be trained on how to carry out the test, how to give positive feedback to the candidate, how to strictly follow “interlocutor frames”, and how to score. The interlocutor frames and the training requirement of oral examiners do help establish the reliability of the test. However, some factors threaten the reliability of the test. First, it is not easy to guarantee the quality and objectivity and consistency of oral examiners although perfect consistency is not to be expected in the performance in an interview. In conducting the speaking test in China, it is realized that the unavoidable differences among oral examiners are a serious threat to the reliability of the test. The oral examiners’ performance has strong impact on candidates’ performance in the test: some oral examiners have clear English pronunciation to young learners, but some do not; Some oral examiners are able to follow the interlocutor frame strictly, but some can not help being flexible although they have been trained and monitored before the test; Some oral examiners are able to give positive feedback, but some never use encouraging language and even correct candidates’ errors repeatedly during the test; Some oral examiners give long wait times to candidates, but some do not give wait time at all; Some oral examiners provide clear and explicit instructions of the tasks, but some do not. Second, in some areas in China, it is not possible to ensure that all candidates have the opportunity to familiarize themselves with the format and the speaking testing techniques in order to learn what will be required of them and improve their performance on the exam. As an oral examiner, I always found that some candidates who were not familiar with test procedures did poorer job than those who knew what 4 LANGUAGE TEST CRITIQUES they were expected of them. Not all candidates have access to the sample tests or the provision of practice materials. So if any aspect of the speaking test is unfamiliar to candidates, they are likely to not perform as well as they would do otherwise. Third, administration of the test is also one threat to the reliability. The greater the differences between one administration of a test and another, the greater the differences one can expect between a candidate’s performance on the two occasions. It is hard to ensure uniformity in conducting the speaking test in China. It is almost impossible for each testing center to strictly adhere to the timing and quiet setting with no distracting sounds or movements. Fourth, the scoring of the speaking test is mostly subjective and some oral examiners whose scoring deviates markedly from the norm during the training session are still used due to the lack of English teachers in some areas in China. Fifth, In order to reduce young learners’ anxiety and deliver the test to children in an enjoyable and nonthreatening way, only one examiner is used in the interview. However, it is difficult for one oral examiner to conduct the test and keep track of the candidate’s performance at the same time. In this subjective testing, it is a threat to reliability that the candidate’s performance is scored by only one scorer. Finally, some candidates have to wait a longer time than others to be interviewed. Fatigue can become an issue for the last individual student and threaten the reliability of the assessment. IV. Scoring Method and Score Reporting The Movers speaking test is criterion-referenced, “which compares learner’s performance, not to other learners, but to a set of criteria of expected performance or learning targets. Criterion-referenced assessment can match the child’s performance against an expected response on an item, or it may make use of a set of descriptors along a scale which a learner is placed” (Cameron, 2001). 5 LANGUAGE TEST CRITIQUES In the Movers speaking test, examples of an expected response and a descriptive scale are given in respect to the candidates’ speaking skills. The criterion used to assess the child’s speaking skills is the production of answers in single words or short phrases, and is rated on three aspects: 1) interactive listening ability; 2) pronunciation; 3) production of words and phrases. Each criterion carries a maximum mark of 3. (UCLES 2003a) (see Appendix) A strong motivational characteristic of this test is that there is no pass or fail. It is designed to test what the candidates know instead of what they do not know. Speaking is scored locally by the oral examiner. All candidates are given a certificate from Cambridge University to reward their efforts and abilities. Students of the Movers speaking test receive certificates with an array of shields (1-5) which means that the best candidates receive five shields for their speaking proficiency and the lowest scoring examinees receive one shield. Although the Handbook and Research notes have understandably de-emphasized numerical scores at this early stage of a young learner’s career, MacGregor (2001:5) points out “one obvious weakness in this score reporting system is that no indication of what these shield scores mean, and therefore the scores cannot be translated into descriptions of what the examinee is and is not able to do”. This is largely difficult for parents, teachers, researchers, and the oral examiners to understand how the raw scores are translated into shield bands each year. This is a serious problem for candidates who are almost ready to move to the next Flyers level and even worse for students at the Flyers level who are ready to move to the main suite of examinations. According to Jones (2002), UCLES has now recognized the need to investigate the relationships between YLE levels and between the YLE and the main suite because of increasing demand by users to try to interpret results within such wider frameworks. One suggestion I offer is that a scoring report of the individual student’s strength and weakness in achieving different 6 LANGUAGE TEST CRITIQUES task types in the speaking test would greatly help the students, parents, and teachers evaluate the English teaching and learning process and results. This speaking test will become a powerful instructional tool in EFL contexts if the report of the individual student performance in the speaking test does already exist. The scoring report will also serve to achieve one of the purposes of the test: to promote and encourage effective learning and teaching. V. Impact of the test The stated purposes of Cambridge Young Learners’ English Tests are thought to be far from “high–stakes” for any individual child’s scholastic career (Bailey, 2005). Although the Movers speaking test can give the young learners, their teachers, and parents a kind of measure of how well they are acquiring English speaking, the testing result is not highly relevant and appropriate to their decisions to be made. The test washback effect the young learners receive from taking the Movers Speaking Test relates well to promote their learning and using English as a Foreign Language. It is a test of real language skills. Language use is tested in meaningful, realistic, and interesting ways, using materials especially suited to children aged between 8 and 11. The test has a positive impact on children’s spoken English as they experience real communication in a foreign language. They are fun and children enjoy taking them. In this sense, the test does promote effective oral English learning. The test has a positive instructional washback. For example, in China, UCLES has been working closely with Sino-British Center of China National Education Examination Authority (NEEA) to promote and support the Cambridge Young Learners English Tests. The impact of the test on instruction is evidenced not only by teacher training, the network of English training 7 LANGUAGE TEST CRITIQUES and testing, the national conferences and seminars on educational measurement for Cambridge Young Learners managed by UCLES and the Sino-British Center, but also by the large number of instructional materials linked to test content which “span the instructional spectrum from multi-unit formal classroom programs to cheerful puzzle books for independent study, and these materials appear to focus on language instruction rather than rehearsal of test taking skills.”(Bailey, 2005) One positive consequence of using the test is that it does promote English teaching and learning in an EFL context. Both EFL teachers and students can greatly benefit from the principles of Cambridge Young Learners English: student-centered, activity-centered, listening and speaking first and writing and reading second, interest first, communicative teaching and learning, and motivational test. This is usually what EFL formal classroom lack. In this case, the purpose of the test is consistent with the instructional goals of EFL teachers and curriculum. There are also potentially negative consequences for society and the educational system from using this test. For example, with the good reputation of the Cambridge Test and support from national education department in China, the test has brought into being an integrated system of after-school English classrooms (“second English classrooms”). That means Chinese young learners have to take Cambridge English classes and tests either during the weekends or during their holidays, which is placing some pressure on both parents and students. In addition, the cost of Cambridge Young Learners English Training and Test are expensive for some Chinese families. Parents have to sacrifice their many things to pay for the cost of the test. However some children may not be able to afford this opportunity and society should look for ways to give these children opportunities in this area. Moreover, there appears another unfairness in “English starting point” when the young learners move to their middle school. Some children 8 LANGUAGE TEST CRITIQUES with many years’ Cambridge English training and testing easily outscore those who have no English learning experiences or limited formal English learning in elementary schools. In this sense, Cambridge English may influence some kids’ confidence and even their success in their school life. Finally, while the new test encourages communicative language teaching, not all the teachers are given adequate guidance and training, thus it is a big concern that not all the students involved in Cambridge English are being given excellent teaching and fair testing. In conclusion, the Cambridge Young Learners English (YLE) Tests “offer one of few options for comprehensively assessing EFL abilities in young students” (Bailey, 2005, 251) Indeed, the Movers Speaking Test has filled the gap of assessing young learner’s oral English proficiency in some EFL countries such as China. It has significant strengths in encouraging meaningful language use, giving young learners a positive impression of international tests, and promoting effective learning and teaching in EFL contexts. Although it is an excellent assessment of young EFL learners, some areas for improvement are raised in this paper: improving the story-telling task by keeping a balance between supporting weaker candidates and rewarding strong performance; improving the oral examiners training for more objectivity of assessment; revising score reporting in order to reflect the individual student’s strength and weakness in his/her oral English; revising the speaking test procedures for greater ease and accuracy of administration. With the improvement in the width of use of the test, I believe that the reliability and the validity of the Movers speaking test in Cambridge Young Learners English can be greatly enhanced. 9 LANGUAGE TEST CRITIQUES References Bailey, A (2005) Test Review: Cambridge Young Learners English (YLE) Tests. Language Testing, 22. 2. 242-252. Cameron, L (2001) Teaching Languages to Young Learners. Cambridge University Press Hughes, A (2004) Testing for Language Teachers. Cambridge University Press. P26 UCLES 2003a: Cambridge Young Learners English tests handbook: Starters, Movers and Flyers. Cambridge: UCLES. Available at http://www.cambridgeeslo.org/support/dloads/yle/yle_hb_03.pdf (February 2005) Jones, N. (2002) Linking YLE levels into a single framework. Research Notes,10. UCLES. Available at http://www.cambridgeesol.org/rs_notes/index.cfm (February2005) MacGregor, L(2001) Testing young learners with CYLE: the new kids on the block. JALT Testing and Evaluation SIG Newsletter 5, 4-6. 10