Concurrent Validity of TOEIC and CSEPT: A Case Study on Lower

Concurrent Validity of TOEIC and CSEPT: A Partial-range Case Study 樹德科技大學通識教育學院講師劉冠麟 Abstract The literature in measurement indicated a five-step conventional method of correlating different assessment results: equating, calibrating, statistical moderation, benchmarking, and social moderation. (Angoff, 1971, North, 2000, Council of Europe, 2003) This study examined the concurrent validity of TOEIC and CSEPT by using a condensed approach with two steps. The first compared the “can-do” lists of the two tests and the second conducted a statistical analysis by using 35 sets of student scores who participated in TOEIC and CSEPT within a time span of 5 months. Pearson product-moment correlation of the two tests was arrived to examine how the test scores relate. A regression model was also produced to predict the correlation pattern within and beyond the range of collected data. The result computed by the model was compared with the current stated scores stated by each test administrator in relation to CEFR. Suggestions and adjustments were proposed based on the findings. Keywords: concurrent validity, TOEIC, CSEPT, proficiency test, language assessment. 1. Introduction Various English proficiency tests exit to serve the need of test takers at all walks of life, adapting to the features that the different tests deigned to target. In Taiwan, policies hold that citizens should possess adequate English proficiency in order to better play the role as a member of the global village. Therefore, English learners and educational authorities in Taiwan, where different kinds of tests are available, depend on scores of these validated tests to determine whether someone has achieved a certain level in English. A rigorously established correlation of these different language tests is therefore in need when different tests are used for the same purpose. Comparing different test scores should trace back to the development of standardized tests when different forms of the same test were in need of “equating” to achieve reliability (Angoff, 1971). The idea later expanded to establish correlation among different tests by comparing correlation coefficients computed based on the data gathered from the same group of test takers taking different tests, i.e. the “concurrent validity” (Bachman, 1990). In Taiwan, several English proficiency tests are available. TOEFL (Test of English as a Foreign Language) and TOEIC (Test of English for International Communication) were introduced to Taiwan by ETS (Educational Testing System) for university admission services and business workforce training. For admissions to Commonwealth universities, IELTS (International English Language Testing System) was later introduced to Taiwan. KET (Cambridge Key English Tests) of the main suite tests were also adopted by many preschool or children’s English educators. In addition to these tests, the GEPT (General English Proficiency Test) developed locally by LTTC (Language Teaching and Testing Center, affiliated to National Taiwan University) has been widely recognized as a valid English proficiency test by Taiwanese governmental and educational institutions. For many collegiate institutions in Taiwan, CSEPT (College Student English Proficiency Test) is used to assess students’ English proficiency. These tests developed by specific test administrators reflect a variation in design and structure, it is difficult to have a clear cut and it is too abrupt and rough to state exact correlated scores between any two tests without robust validation. For studies in correlation between different tests, so far the author has only located that the University of Cambridge Local Examination Syndicate (UCLES) conducted a series of comparison studies between UCLES and ETS tests such as comparing FCE and TOEFL. (Bachman, Davidson, Ryan, and Choi, 1995). Among the six tests mentioned, pairs of TOEFL and TOEIC, IELTS and KET, and GEPT and CSEPT are derived from the same test developers such as ETS and LTTC or from the same testing system like Cambridge ESOL (English for Speakers of Other Languages) Examinations. The correlation between GEPT and CSEPT, TOEIC and TOEFL, IELTS and KET would attract fewer disputes because these pairs are from the same test administrators. This study aims to examine two particular tests from different test administrators: TOEIC of ETS and CSEPT of LTTC. Both tests are being used and recognized in Taiwanese universities as valid tests to demonstrate students’ proficiency in English. The correlation between the two tests could serve as a reference for students who need to have a quick understanding or prediction in another test by having the test score of either test as mentioned. 2. Literature review Before proceeding to the discussion of the correlation of the two tests, some concepts about correlating different test results should be made clear. 2.1 Concurrent validity Tests we use nowadays are validated by two theories: the Classic Test Theory (CTT) and Item Response Theory (IRT). The former theory is conventional and it is analogized as a “hand tool” (Davidson, 2000) to examine test items’ effectiveness through calculating Item Facility and Discrimination Index. The later works as a “power tool” to take probability, test taker ability, and item difficulty into consideration. Concurrent validity is a concept to examine whether two measures, tests, or scales are correlated. Davies (1990) proposed that concurrent validity is achieved by having the test and the criterion administered at the same time. This concept is usually demonstrated by using a previously validated test to establish a second validated test. For example, this concept is used by psychologists to establish the validity of a newly formed test on intelligence. It is also used by researchers of social science to validate cross-lingual research instruments. Hughes (2003) stated that there are essentially two kinds of criterion-related validity: concurrent-validity and predictive validity. Concurrent validity is established when the test and the criterion are administered in at about at about the same time. Nall (2003) explained that “concurrent and predictive validity” are both forms of criterion-related validity. Concurrent validity is determined by comparing results from one test format with those of another instrument which is assumed to be testing “the same thing”, that is, to be held in reference to the same language construct. This is typically accomplished by examining the correlation between the tests’ results, looking for high positive correlation coefficient. The ETS’ own TOEIC documentation, under a section labeled “Construct-related validity” provides a definition of validity that does not differ from our definition of concurrent validity. It would seem that the ETS accepts concurrent validity as sufficient proof of construct validity. Therefore, we can use scores from the same subject of the two tests- TOEIC and CSEPT- to establish the concurrent validity of the two tests. If the scores indicate a high correlated relationship, then we can say the two tests have high concurrent validity. 2.2 Correlation, comparability, and Equivalent scores 2.2.1 CEFR descriptors Council of European (2001) developed a “Framework of Reference” (CEFR) describing how language users with different language proficiency level would vary in performance. The reference level classified users as proficient, independent and basic with two subcategories in each classification. Language performance reflecting proficiency is demonstrated by these descriptors depicted in each level (see table 1). Language proficiency tests thus base their test items on these “can-do” descriptors in order to have a common reference for language proficiencies. The reference levels are recognized within European countries and there are more and more test administrators and educational authorities adopt the reference table for test or educational purposes outside Europe. In Taiwan, it has become a reference tool to correlate, to compare, and to calculate equivalent scores of different language proficiency test results. Table 1: Global scale of the Common Reference Level Proficient user C2 C1 Independent B2 user B1 Basic user A2 A1 Source: Council of Europe Can understand with ease virtually everything heard or read. Can summarize information from different spoken and written sources, reconstructing arguments and accounts in a coherent presentation. Can express him/herself spontaneously, very fluently and precisely, differentiating finer shades of meaning even in more complex situations. Can understand a wide range of demanding, longer texts, and recognize implicit meaning. Can express him/herself fluently and spontaneously without much obvious searching for expressions. Can use language flexibly and effectively for social, academic and professional purposes. Can produce clear, well-structured, detailed text on complex subjects, showing controlled use of organizational patterns, connectors and cohesive devices. Can understand the main ideas of complex text on both concrete and abstract topics, including technical discussions in his/her field of specialization. Can interact with a degree of fluency and spontaneity that makes regular interaction with native speakers quite possible without strain for either party. Can produce clear, detailed text on a wide range of subjects and explain a viewpoint on a topical issue giving the advantages and disadvantages of various options. Can understand the main points of clear standard input on familiar matters regularly encountered in work, school, leisure, etc. Can deal with most situations likely to arise whilst traveling in an area where the language is spoken. Can produce simple connected text on topics which are familiar or of personal interest. Can describe experiences and events, dreams, hopes and ambitions and briefly give reasons and explanations for opinions and plans. Can understand sentences and frequently used expressions related to areas of most immediate relevance (e.g. very basic personal and family information, shopping, local geography, employment). Can communicate in simple and routine tasks requiring a simple and direct exchange of information on familiar and routine matters. Can describe in simple terms aspects of his/her background, immediate environment and matters in areas of immediate need. Can understand and use familiar everyday expressions and very basic phrases aimed at the satisfaction of needs of a concrete type. Can introduce him/herself and others and can ask and answer questions about personal details such as where he/she lives, people he/she knows and things he/she has. Can interact in a simple way provided the other person talks slowly and clearly and is prepared to help. 2.2.2 CEF correlation process North (2000) and the Council of Europe (2001) stated in papers that: The scales for the Common References Levels are intended to facilitate the description of the level of proficiency attained in existing qualifications- and so aid comparison between systems. The measurement literature recognizes five classic ways of linking separate assessments: (1) equating; (2) calibrating; (3) statistical moderation; (4) benchmarking, and (5) social moderation. This corresponds to the literature in measurement indicating the process of equating and calibrating two forms of the same tests as the initial development in test score comparison. (Angoff, 1971) North continues: The first three methods are traditional: (1) producing alternative versions of the same test (equating), (2) linking the results from different tests to a common scale (calibrating), and (3) correcting for the difficulty of test papers or the severity of examiners (statistical moderation). The last two methods involve building up a common understanding through discussion (social moderation) and the comparison of work samples in relation to standardized definitions and examples (benchmarking). Supporting this process of building a common understanding is one of the aims of the Framework. This is the reason why the scales of descriptors to be used for this purpose have been standardized with a rigorous development methodology. In education this approach is increasingly described as standards-oriented assessment. It is generally acknowledged that the development of a standards-oriented approach takes time, as partners acquire a feel for the meaning of the standards through the process of exemplification and exchange of opinions. The ideas for expert judgment as social moderation and benchmarking can also trace back to early development in educational measurement. Thorndike (1982) proposed a test-equating method by using experts’ judgments to estimate item difficulties and to equate tests, in which Rogosa (1982) claimed as novel and potentially important. The expert judgment method later proved indeed important as we can see the development in recent years and the issues in discussion in this paper. Furthermore, Kolen and Brennan (2004) proposed the term “linking” for the concept of putting scores from two or more tests on the same scale. If the tests conform to the same content and statistical specifications, then we are entitled to call the resulting linking an equating. 2.3 Tests in consideration 2.3.1 TOEIC The TOEIC official can-do guide (ETS, 2004) introduces TOEIC (Test of English for International Communication) as: TOEIC measures the listening and reading comprehension skills of non-native speakers of English. The TOEIC test is designed for use by organizations working in an international market where English is the primary language of communication. These organizations use TOEIC scores to make employment decisions about selection, assignment to overseas posts, promotion, training needs, and training effectiveness.” TOEIC test consists of 200 multiple-choice questions; 100 listening comprehension questions, and 100 reading comprehension questions. The listening comprehension section is administered by audiotape; the reading comprehension section is administered using a standard paper-and-pencil format. The answers from both sections are recorded on a scannable answer sheet. Examinees receive two subscores, one each for listening comprehension and reading comprehension, along with a total score (listening comprehension plus reading comprehension). Each standardized subscore ranges from 5 to 495, with a total score range of 10 to 990. 2.3.2 CSEPT CEEPT (College Students English Proficiency Test) is developed by the LTTC (Language Teaching and Training Center) affiliated to National Taiwan University. It is used only for college students in Taiwan, not open to general public like the other test by LTTC, the GEPT (General English Proficiency Test). It is conducted only on demand by individual university administrations; to be held on specific date arranged jointly cooperated by LTTC and university administrators to test on student’s English proficiency for curriculum or teaching effectiveness purposes. The test is comprised of two sections on listening and reading. There are 25 multiple-choice questions in the listening section and 60 multiple-choice questions in the reading section. Test takers write test responses on optical scannable sheet for computer scoring process. Test transcripts denote score for listening and reading with both full points as 120, making a possible full mark for the test as 240. More information about CSEPT can be found at http://www.lttc.ntu.edu.tw. 3. Methodology The five steps proposed in the literature (North, 2000) are regarded as the “standard” or “legitimate” method to correlate or to establish the relationship between two tests. However, the five steps are ideal for large-scale projects and test administrators themselves. This paper intends to limit the process within two techniques: the first technique this study employed is to compare the available can-do descriptions as stated by the TOEIC and CSEPT administrators. Then a statistical data analysis is conducted by calculating the Pearson correlation between the scores collected. A regression model is also computed to predict correlated scores for higher performance. 3.1 Data collection Data for discussion in this study come from two sources. The TOEIC can-do descriptors are from studies published by ETS (www.ets.org). The CSEPT can-do list are extracted and translated from LTTC test transcripts, since no other can-do descriptors about CSEPT could be located. For the statistical data, the university (a technological university in southern Taiwan) where the data were collected upholds a policy of an “exit threshold for English proficiency”. In accordance to the Ministry of Education policy, students have to achieve “CEF A2 English proficiency” before graduation, or they can have an option for a makeup course after two official attempts in attending tests. They should submit score reports and obtain enrollment in the makeup course. A total of 35 sets of scores were included for analysis. For the purpose of the submitted scores, all of them are under CSEPT 170 and TOEIC 400 (the current correlation adopted in the university mentioned). Both tests are divided into scores for listening, reading, and the total. The pairs can be examined through Pearson Correlation. A regression model can be computed to predict scores correlating the two tests for higher performance. 4. Findings and discussion Two steps of establishing the correlation of TOEIC and CSEPT are described as follows. 4.1 Can-do list comparison There are detail descriptors stated according to listening, speaking, reading, writing, and interacting skills. For the TOEIC can-do list, listening and reading scores are classified as bands of 5-100, 105-225, 230-350, 355-425, 430-495, while the CSEPT divided listening and reading performance as bands of below 29, 30-49, 50-69, 70-89, 90-109, 110 and higher. Here the 105-225 band of TOEIC and 70-89 band of the CSEPT are included for consideration. We see that TOEIC can-do list are more detail with “can-do”, “can-do with difficulty”, and “cannot do” for “reading”, “listening” and “interactive” descriptions. For both 105-225 bands in reading and listening, there are no clear “can-do” description. There are “can do with difficulty” descriptions. Therefore we would say that the language users can reach this level of ability if they can perform adjusted easier tasks described in the TOEIC can-do list. (See table 2 and 3) Table 2: TOEIC Reading Score of 105 - 225 Can Do Can Do with Difficulty Cannot Do Reading Writing read, on storefronts, the type of store or write a list for items to take on a weekend services provided (e.g., “dry cleaning,” “book store”) read and understand a restaurant menu read and understand a train or bus schedule find information that I need in a telephone directory read office memoranda written to me in which the writer has used simple words or sentences read and understand traffic signs read and understand simple, step-by-step instructions read and understand a travel brochure read and understand directions and explanations presented in computer manuals written for beginning users read and understand a letter of thanks from a client or customer read and understand an agenda for a meeting read and understand magazine articles like those found in Time or Newsweek, without using a dictionary read highly technical material in my field or area of expertise with no use or only infrequent use of a dictionary identify inconsistencies or differences in points of view in two newspaper interviews with politicians of opposing parties read and understand a popular novel trip write a one- or two-sentence thank-you note for a gift a friend sent to me write a brief note to a co-worker explaining why I will not be able to attend the scheduled meeting write a postcard to a friend describing what I have been doing on my vacation write clear directions on how to get to my house or apartment fill out an application form for a class at night school write a letter requesting information about hotel accommodations for a future vacation write a short note to a co-worker describing how to operate a standard piece of office equipment (e.g., photocopier, fax machine) write a memorandum to my supervisor explaining why I need a new time off from work write a letter introducing myself and describing my qualifications to accompany an employment application write a memorandum to my supervisor describing the progress being made on a current project or assignment write a complaint to a store manager about my dissatisfaction with an appliance I recently purchased write a letter to a potential client describing the services and/or products of my company write a 5-page formal report on a project in which I participated write a memorandum summarizing the main points of a meeting I recently attended If we look at the descriptions in the listening part, we see that tasks in this band are quite fundamental in the daily-life context. For example, understanding “how are you?”, understanding prices told when shopping, ordering food in restaurants. Table 3: TOEIC Listening Score of 105 - 225 Can Do Can Do with Difficulty Cannot Do Listening Speaking  understand simple questions in social  introduce myself in social situations and situations such as “How are you?” “Where do you live?” and “How do you feel?”  understand a salesperson when she or he tells me prices of various items  understand someone speaking slowly and deliberately, who is giving me directions on how to walk to a nearby location  understand a person’s name when she or he gives it to me over the telephone understand directions about what time to come to a meeting and the room in which it will be held understand explanations about how to perform a routine task related to my job  understand a co-worker discussing a simple problem that arose at work understand announcements at a railway station indicating the track my train is on and the time it is scheduled to leave  understand headline news broadcasts on the radio  understand a client’s request made on the telephone for one of my company’s major products or services  understand play-by-play descriptions on the radio of sports events that I like (e.g., soccer, baseball)  understand an explanation given over the radio of why a road has been temporarily closed  understand someone who is speaking slowly and deliberately about his or her hobbies, interests, and plans for the weekend  understand a discussion of current events taking place among a group of persons speaking English  understand an explanation of why one restaurant is better than another use appropriate greeting and leave-taking expressions  state simple biographical information about myself (e.g., place of birth, composition of family)  order food at a restaurant  describe my daily routine (e.g., when I get up, what time I eat lunch)  describe the plot of a movie or television program that I have seen  describe a friend in detail, including physical and personality characteristics  describe my academic training or my present job responsibilities in detail  talk about topics of general interest (e.g., current events, the weather)  talk about my future professional goals and intentions (e.g., what I plan to be doing next year)  tell a co-worker how to perform a routine job task  telephone the airline to change my flight reservations to a different time and day  tell a colleague at work about a humorous event that recently happened to me  adjust my speaking to address a variety of listeners (e.g., professional staff, a friend, children) tell someone directions on how to get to my house or apartment  give a prepared half-hour formal presentation on a topic of interest However, the interactive descriptions in this 105-225 band seem more difficult in relation to their listening and reading descriptions. For example, “explaining policies”, “discussing best ways for a task”, “bargaining prices with an real-estate agent”. (See table 4) These description seem too difficult for English learners at this beginning level to perform, if the correlation is defined as the level around CEF A2. If these descriptions in the interactive part involves only listening comprehension, it is reasonable to the author of having these descriptions in this level. Even so, some of these tasks would be too complicated for English learners in this level. Table 4: TOEIC Interactive Score of 105 - 225 Interacting Can Do explain written company policies to a new employee  discuss with a co-worker the best way to accomplish a job task  meet with a doctor and explain the physical symptoms of my illness  meet with a real-estate agent to discuss the type of house I would like to buy  discuss world events with an English-speaking guest discuss with my boss ways to improve customer service or product quality  conduct an interview with an applicant for a job in my area of expertise  conduct simple business transactions at places such as the post office, bank, drugstore  telephone a restaurant to make dinner reservations for a party of three give and take messages over the telephone  discuss with an electronics salesperson the features I want on a new videocassette recorder (VCR)  explain to a repairman what is wrong with an appliance that I want fixed  request information over the telephone (e.g., check airline schedules with a travel agent)  talk to an elementary school class about what I do for a living  telephone a department store and find out if a certain item is currently in stock The CSEPT can-do list descriptions are shorter compared to the can-do list in TOEIC. The main gist of the can-do list in this 70-89 band focuses on the tasks in daily life context as well. If we compare the content of the TOEIC can-do and the CSEPT can-do, we would find most descriptions in the listening and reading part match but not for the interactive part. (See table 5) Table 5: CSEPT can-do statement Understand basic conversations in school and daily contexts. Be 70 ~ 89 Listening able to comprehension the main ideas or important detail information of speaking without speakers slowing delivery speed. However, repetition and/or explanation are still needed. (能聽懂與學校及日常生活相關之基礎會話，說話者通常無須放慢速度，但仍須重覆或解釋；能掌握大意及部分之重要細部資訊。) Be able to understand sentence structures with occasional errors in using simple sentences. Be able to read passages, articles and related phrases with 3000-4000 word count ability. Composite  Be able to use context to predict word and sentence meaning.  Cannot handle complex sentences. (能瞭解基本句子之語法結構，使用簡單句時偶有錯誤；對複雜句之掌握仍有困難。能閱讀約 3,000~4,000 字彙及相關片語之讀物，能大致利用字詞結構及上下文推測字詞意義或句子內容。) Note: The English can-do statements are translated by author with original in Chinese in Brackets. If we examine the descriptors stated in the can-do list of TOEIC compared to the CSEPT can-do list again, we would find that descriptors in CSEPT match the descriptors in TOEIC in this band. Therefore we can say that TOEIC scores as 210 to 450 match that of CSEPT 140 to 180, accordingly. 4.2 Statistical data analysis Table 6 indicated that Listening scores for TOEIC and CSEPT are not correlated. The Pearson product-moment correlation between the TOEIC and CSEPT listening scores is .060 with P=.733 (>.05). The correlation indicated no significance. Therefore the scores in listening reported no correlation. This might be the result of having too few samples in this study; the fluctuation of score variation cannot be resolved by the number of sample. Table 6: CSEPT and TOEIC Listening (N=35) CSEPT-LISTEN TOEIC-LISTEN CSEPT-LISTEN Pearson Correlation Sig. (2-tailed) 1.000 .060 . .733 Table 7 indicated that reading scores for TOEIC and CSEPT are correlated, with a Pearson correlation of 0.342 (P=.045). Table 7: CSEPT and TOEIC reading (N=35) CSEPT-READ CSEPT-READ TOEIC-READ Pearson Correlation 1.000 .342 * Sig. (2-tailed) .045 . * Correlation is significant at the 0.05 level (2-tailed). Table 8 indicated that the total scores for TOEIC and CSEPT are correlated, with a Pearson correlation of .393 (P=.020). The total score correlation is a stronger indication that TOEIC and CSEPT scores are correlated. We can use either one of the two tests to predict the potential score in another test. Table 8: TOEIC and CSEPT correlation (N=35) CSEPT-TOTAL CSEPT-TOTAL Pearson Correlation Sig. (2-tailed) TOEIC-TOTAL 1.000 .393 * . .020 * Correlation is significant at the 0.05 level (2-tailed). The regression analysis corresponded to the Pearson correlation test. The degree of freedom with denominator 35 at .05 significance is 4.12. The regression analysis reported an F value at 6.021. Therefore the model stands as a valid one. (See table 9) Table 9: ANOVA analysis Model Sum of Squares df F Sig. 6.021 .020 Mean Square 1 Regression 7721.032 1 7721.032 Residual 42318.968 33 1282.393 Total 50040.000 34 a. Predictors: (Constant), CTOTAL b. Dependent Variable: TTOTAL Table 10: Coefficients of TOEIC and CSEPT Model 1 Unstandardized Std. Error Standardized Coefficients Coefficients Beta Beta (Constant) 177.694 24.521 CTOTAL .563 .229 .393 t Sig. 7.247 .000 2.454 .020 a. Dependent Variable: TTOTAL Since the model has been proved valid, we can use table 5 to arrive at a formula to predict scores at higher performance. Table 10 indicated a 177.694 constant and β=0.563. The model is as follows: Y= 177.694+ 0.563X (Where Y=TOEIC, X=CSEPT) The current adopted score for correlation is CSEPT 170=CEF A2 and TOEIC 350=CEF A2. According to this correlation, CSEPT 170 should be equal to TOEIC 350. However, the regression model we found based on the collected data is different. When CSEPT is 170, TOEIC should be 273.404. However, the worry about too few samples might have also resulted in a limited range problem. The correlation works fine within the data range, i.e. below TOEIC 400 and CSEPT 170. If we would like to see the correlation above the range, we should include data for the upper range, which are not available when the study was done. However, we can use reasonable hypothetical data to see how the correlation would be established. This study adds in the same amount of dummy samples with full marks on both tests, i.e. 35 sets of TOEIC 990 and CSEPT 240. The data are as table 11. Table 11: ANOVA analysis Sum of Squares df Mean Square F Sig. Regression 328914.980 1 328914.980 1062.618 .000 Residual 21048.220 68 309.533 Total 349963.200 69 Model 1 a. Predictors: (Constant), TOEIC b. Dependent Variable: CSEPT The analysis indicated that the correlation is significant, in which the F value is 1062.618 far above the critical value. The high F value indicated a valid regression model. Table 12: Coefficients of TOEIC and CSEPT Model Unstandardized Std. Error Standardized Coefficients Coefficients B Beta 1 (Constant) 60.621 4.007 TOEIC .181 .006 .969 t Sig. 15.130 .000 32.598 .000 a. Dependent Variable: CSEPT According to the data in table 12, the highly validated regression model is as follows: Y= 60.621+ 0.181X (Where Y=CSEPT, X=TOEIC) The second model would offer a totally different opinion about the correlation. If we use the LTTC stated CSEPT 170 as CEF A2, we will get a TOEIC about 604 for CSEPT 170. Moving down to CEF A1 as CSEPT 130, we will get a TOEIC 383. These figures, on the contrary, are higher than the stated correlated score in term of TOEIC correlated to CSEPT. Again, more data are need for further examination. 5. Conclusion This paper investigates the concurrent validity of TOEIC and CSEPT by using two condensed validation measures. The first “can do” list comparison indicated a correlation range that TOEIC scores as 210 to 450 match that of CSEPT 140 to 180. In addition, the statistical analysis on the collected data shows following results. The listening scores of TOEIC and CSEPT is not correlated. Reading scores and the total scores of TOEIC and CSEPT are positively correlated. And the total composite scores of TOEIC and CSEPT are significantly correlated. We can say that TOEIC and CSEPT have significantly validated “concurrent validity”, we can use a person’s TOEIC score to estimate her/his performance in CSEPT, or vise versa. The regression analysis indicated that the total scores of TOEIC and CSEPT are significantly correlated and a regression model based on the collected scores under the band of TOEIC 400 and CSEPT 170 reported a “CSEPT 170=TOEIC 273.404” and “CSEPT 130= TOEIC 250.884” correlation. The correlation indicated a lower score compared to the currently used “CESPT 170=TOEIC 350” correlation based on CEF. It is therefore suggested to lower the TOEIC score correlated to the CSEPT test. Nevertheless, the second trial with hypothetical data turned the correlated around. After adding equivalent amounts of both full marks in TOEIC and CSEPT, we have a correlation of “CSEPT 170=TOEIC 604.304” and “CSEPT 130=TOEIC 383”. The result contrastively suggested making higher TOEIC score correlated to the CSEPT test. However, the initial results bear a lot of space for improvement. First, the correlation is theoretically valid within the range with supporting data, e.g. under TOEIC 400 and CSEPT 170. Second, the correlation should be refined through research design and data collection. Third, the number of samples should be increased for more accurate calculation. Nevertheless, the findings would serve as a quick reference to understand how these two tests are used and correlated. It is hoped to continue to enrich the data for more accurate correlation in the future. References: Angoff, W. (1971). Scales, norms, and equivalent scores. In Thorndike, R. (Eds.) Educational measurement. Washington D. C.: American Council on Education. Bachman, L, Davidson, F., Ryan, K., and Choi, I. (1995). An investigation into the comparability of two test of English as a foreign language: the Cambridge-TOEFL comparability study. Cambridge: Cambridge University Press Bachman, L. (1990). Fundamental Considerations in Language Testing. London: Oxford University Press. Council of Europe (2001). Common European Framework of Reference for Languages. Council of Europe/Cambridge University Press. Davidson F. (2000). The language tester’s statistical toolbox. System, 28, 605-617. Davies, A. (1990). Principles of language testing. Oxford: Basil Blackwell. ETS (2004). TOEIC Can-Do guide: linking TOEIC scores to activities performed using English. Chauncey Group International. Hughes, A. (2003). Testing for language teachers. Cambridge: Cambridge University Press Kolen, M. & Brennan, R. (2004). Test equating, scaling, and linking: methods and practices. NY: Springer. LTTC (2003). Concurrent validity of GEPT. Taipei: LTTC. Nall, T. (2003). TOEIC: a discussion and analysis. Retrieved on 20 January 2006. From http://www.geocities.com.two centselfcafe/teach/toeic.htm. North, B. (2000). Linking language assessments: an example in a low stakes context. System, 28, 555-577. Rogosa, D. (1982). Discussion of “item and score conversion by pooled judgment”. In Holland P. & Rubin, D. (Eds.) Test equating. (pp. 319-326) NY: Academic Press. Thorndike, R. (1982) Item and Score conversion by pooled judgment. In Holland P. & Rubin, D. (Eds.) Test equating. (pp. 309-318) NY: Academic Press.

Concurrent Validity of TOEIC and CSEPT: A Case Study on Lower

Related documents

Products

Support

Concurrent Validity of TOEIC and CSEPT: A Case Study on Lower

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib