Running head: EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST Exploring the Writing Section of the TOEFL iBT Test: Analysis of Tasks and Scoring Process Augar M. Khoshaba The Monterey Institute of International Studies September 30, 2013 1 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST Exploring the Writing Section of the TOEFL iBT Test When I applied for a scholarship to pursue my Master’s degree in the United States, the sponsoring committee required me to provide a proof of eligibility, including graduation transcript and medical report. Additionally, they asked for what seemed to concern them the most: a report of my English language skills. In response, I prepared my documents and registered for the only available language test back then: the paper version of The Test of English as a Foreign Language (TOEFL PBT). My success in this test was my passport to pass different stages of evaluation and, eventually, fly to the United States. While I thought that my experience with standardized tests had ended with the TOEFL PBT, a second round of testing started as I arrived in the U.S. This time, the Admission Office at my school requested that I take the newest and most communicative version of the TOEFL series: TOEFL iBT. According to ETS (2008a), this Internet-based test enables students to “get into more than 6,000 universities worldwide,” and proves that they can “communicate effectively in an academic environment.” Although I achieved the target score in the second attempt, my sub-score in the writing section did not meet my program’s requirements. I had to take the test two more times until I received a better score. This struggle with the writing section of the TOEFL iBT made me question the integrity of its scoring system and whether the tasks actually resembled those in college courses. Hence, I decided to review the writing portion of the TOEFL iBT test to seek answers to my questions. This paper is divided into four sections: history of the TOEFL, description of the writing section, scoring system, and analysis. 2 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST History of the TOEFL Test The increasing population of non-native students in the United States in late 1950s and early 1960s necessitated the urgency of a language test that accommodates their academic needs. As a result, the National Council on the Testing of English as a Foreign Language was established in 1961, which launched its first TOEFL test in 1964. Nine years later, in 1973, the Educational Testing Service (ETS) became the main operator of the TOEFL test. This organization changed the format of the first TOEFL in 1976, from the five multiple-choice sections into three new parts: reading comprehension and vocabulary; listening comprehension; and structure and written expressions (ETS, 2007). Despite the fact that the first two components of the early TOEFL offered direct measures of test takers’ English proficiency, the third section was heavily criticized by English teachers and score users alike. They argued that using discrete-point tests of English structure and written expression does not assess examinees’ writing skills since they do not produce any written responses. Thus, they requested a more appropriate measure that requires test takers’ to produce academic essays equivalent to those they encounter in college classes (Chapelle et al., 2009). Accordingly, the Test of Written English (TWE) was introduced in July, 1986. This test included one writing task, where examinees write an essay to describe a chart, or compare, or express their opinions within 30 minutes. It was scored on a 6-point scale and was first offered separately from the TOEFL, but was soon integrated into it (Greenberg, 1986). Despite people’s satisfaction with this stronger measure of writing, many of them questioned its validity since it used a scale different from that of the TOEFL. These concerns, along with a growing interest in applying the theory of communicative competence to language 3 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST testing, led to the development of a more communicative version of the test. In 1998, ETS introduced the computer-based version of the test: TOEFL cBT. In addition to including new item formats and visual aids to the listening and reading section, this computerized version also added a more communicative writing task along with the structure component. Nevertheless, this test was criticized for not being completely communicative. As a result, a comprehensive and more integrative version was designed in 2005: the “next generation” TOEFL iBT (ETS, 2005). This Internet-based TOEFL consists of four sections: reading, listening, speaking, and writing. The total score of the test is 12—30 points per section. The most salient feature of the TOEFL iBT is the addition of the speaking section, which measures examinees’ abilities to “communicate in English in an academic setting” (Sharpe, 2010). The test lasts for four hours with a ten-minute break given in the middle, between the listening and speaking sections. It is offered more than 50 times per year in 110 countries and has so far been taken by 27 million examinees in the world (ETS, 2008a,2013b). TOEFL iBT is a norm-referenced test, meaning that it is used to “spread students out in percentile terms for proficiency or placement testing purposes” (Brown, 2005, p.76). Recently, ETS (2013a) has published data from January 2012 – December 2012 representing the means and standard deviations of examinees’ scores based on their gender or country (Appendix A). In terms of test registration, test takers need to fill out an online registration form on ETS website (Appendix B) and pay a fee of $160 to $250, depending on the country or testing center. The software used to operate the test is straightforward and uses clear written and audio instructions, which facilitate its use even by first-time test takers. I remember having total control in the test 4 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST even though I was not quite familiar with the program. In the following two sections, I will offer a general description of the writing portion in the TOEFL iBT and its scoring system. Description of the Writing Section of the TOEFL iBT The writing portion of the TOEFL iBT is the last part of the test after the speaking section. It is a direct measure of examinees’ ability to write integrative and opinion essays similar to those they produce in college. Unlike TWE or TOEFL cBT, this recent version includes two essay writing tasks. The Integrated Essay In this task, students respond with an essay after they read a passage and listen to a lecture discussing the same topic (Appendix C). The goal of the task is to measure examinees abilities to synthesize or connect ideas from the two passages. The time allocated for this task is 20 minutes, in which test takers are expected to write 150-225 words (Sharpe, 2010). The Independent Essay In this task, examinees will read a prompt on the screen asking them to compose an essay that reflects their opinion about common topics (Appendix D). They are expected to write 300350 words; therefore, they are given 30 minutes to finish the essay. In both tasks, a timer appears on the screen to notify test takers about the time remaining to complete their essays. Scoring System ETS repeatedly stresses its use of a wide range of security measures to maintain integrity in the scoring process. Most notable is the well-protected location where scoring takes place; 5 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST tests are not graded at the testing centers; rather, they are scored in centralized networks by professional raters representing an array of different cultural backgrounds. As soon as the students finish the test, their essays will be sent to the Online Scoring Network (OSN), where each essay will be marked on a scale from 0-5 points by using a holistic scoring approach. As Brown (2005) puts it, the holistic scoring model “uses a single general scale to give a single global rating for each student’s language production” (p.54). Usually, two raters mark each task according to the writing scoring rubric (Appendix E). Occasionally, the scores assigned by the two raters might differ by 1 point, a situation that requires a third rater to mark the task to determine the final score. In the event that the three scores were close to each other, the final score will be the mean of all the scores. However, if these three scores were still inconsistent, the mean of the two closest scores will be the grade assigned to the task (Sawaki et al., 2008). To calculate the total score for the writing section, raters convert the average of the two scores—integrated and independent—to a score on a scale of 30 points (ETS, 2005). Appendix (F) provides a practical conversion chart. The holistic scoring model used in the writing measure has its strengths and weaknesses; Bailey (1998) notes that the holistic approach is “fast” and results in a high level of rater reliability. Moreover, it focuses on positive qualities of writers’ essays. On the other hand, “A single score may mask differences across individual composition,” i.e.; two papers with the score of “3” on the scale might exhibit different qualities (p.189). Regarding the TOEFL iBT test, ETS often reports that the organization offers intensive training sessions for raters. Bailey (1998) discusses a particular workshop where raters first review “benchmark papers”—samples that best represent each point on the scale—to familiarize themselves with the scale. Then, by following 6 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST the same scale, they read and mark similar papers and discuss their scores with their peers (p.189). This process is intended to create a unified, more accurate measure. Fifteen business days after the test date, examinees receive their score reports via email. The scores are listed in a table divided into four sections: reading, listening, speaking, and writing (Appendix G). In the writing section, the level of performance is indicated by four categories: weak (0-1.0), limited (1.0-2.0), fair (2.5-3.5), and good (4.0-5.0). Next to each level, there is a short description of examinee’s general performance with bullet points explaining particular weaknesses (Appendix H). Moreover, the reports include small tables of score interpretations on the back page. Hard copies of the reports are mailed to the examinees within a week from receiving the electronic versions. Analysis In analyzing the writing section of the TOEFL iBT, I adapted two test analysis frameworks developed by Wesche (1983) and Swain (1984). Under each analysis, I will include a summary table of each component or principle, followed by a detailed discussion of how these principles are reflected in the writing tasks of the Internet-based test. Wesche’s Framework Wesche (1983) proposed a practical framework to analyze test structure. It includes four key components: stimulus material, task based to learner, learner’s response, and scoring criteria. Table 1 shows how the writing tasks in the TOEFL iBT correspond to Wesche’s framework. 7 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST Table 1. The writing tasks and Wesche’s test analysis framework Wesche’s Test Analysis Components Stimulus material Task based to the learner Task 1 (Integrated Essay) Task 2 (Independent Essay) Reading and listening passages. Prompts. -Understanding the passages. -Synthesizing. -Relying on experience. Constructive: essay writing. Constructive: essay writing. Scoring approach: holistic scoring on a scale from 1-5 points. Main focus: quality, completeness, and accurate content. Scoring approach: holistic scale on a scale from 1-5 points. Main focus: quality and development. Learner’s response Scoring criteria Stimulus material. The stimulus material, as defined by Bailey (1998), is a “term refers to whatever linguistic or non-linguistic information presented to the learners to get them to demonstrate the skills or knowledge we [teachers] want to assess” (p.13). In the integrated essay task, the stimulus materials are the reading passage and the listening lecture, whereas in the independent essay writing, it is simply the prompt that appears on the screen. Task posed to the learner. This point refers to the mental processes that examinees activate in order to understand the task and produce output (Bailey, 1998). In the integrated essay, the task posed to the learner is understanding and making connections between the reading and listening passages. In the independent essay task, examinees need to rely on their creativity to relate the topic to their personal experiences. 8 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST 9 Learner’s response. As the name suggests, this component deals with examinees’ outcomes which prove their ability to perform the task assigned to them (Bailey, 1998). In both tasks, test takers respond with essays of varying lengths, depending on the preferences indicated in the directions or time limit. Scoring criteria. As discussed earlier, examinees’ writings on the TOEFL iBT are scored holistically on a 5-point scale, and every point is accompanied by a description of an examinee’s performance at that particular level as shown in (Appendix H). The focus of grading in the two tasks is slightly different; while the integrated task looks at connectedness of ideas, the independent task focuses more on the development of the topic. Swain’s Framework Swain (1984) highlights four principles that test developers need to utilize when designing sound communicative tests: start from somewhere, concentrate on content, bias for best, and work for washback. Table 2 displays these principles and their application to the writing tasks of the TOEFL iBT. Table 2. Swain’s analysis of communicative tests Swain’s Analysis Principles Task 1 (Integrated Essay) Task 2 (Independent Essay) Communicative competence theory Start from somewhere Communicative competence theory Academic content Interactive content Visual aids (picture), reading passage, clock, and notetaking. Prompt, clock, and note-taking Teaching of writing and test preparation courses Teaching of writing and test preparation courses Concentration on content Bias for best Work for washback EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST Start from somewhere. This principle indicates that tests should be built on a theoretical foundation. The TOEFL iBT was established from the need to apply the theory of communicative competence to language testing. The integrated task in the writing measure requires examinees to integrate three skills (reading, listening, and writing) in order to produce essays similar to those they write in university courses. Concentrate on content. This principle “refers to both the content of the material used as the basis of communicative language activities and the tasks used to elicit communicative language behavior” (Swain, 1984, p.190). Since the writing section examines test takers’ writing skills in a school environment, the content of the TOEFL iBT writing tasks is academic. For instance, a test taker would compose a comprehensive essay after reading and listening to passages on language acquisition. This example echoes Brown’s (2007) idea of language contextuality, which is essential in promoting communicative competence. Swain (1984) categorizes the content of large-scale communicative tests into four types: motivating, substantive, integrated, and interactive. He defines the last as “the provision of content that includes opinions or controversial ideas” (p.194). This type of content appears in the independent task of the test, in which examinees express their attitudes toward general topics. Bias for best. This principle focuses on test developers’ efforts to maximize examinees’ opportunities for successful performance (Swain, 1984). ETS has invested a great deal of energy in designing user-friendly software that meets test takers visual and auditory preferences. In the integrated writing task, a picture of a professor appears on the screen when examinees listen to a lecture in order to simulate a real lecture environment. Additionally, test takers are allowed to take notes during the task and, above all, the reading passage re-appears along with a timer when 10 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST they start writing. On a broader level, ETS has published several editions of TOEFL iBT preparation textbooks and online test samples which students can use in their daily practice. Work for washback. The last category in Swain’s framework is the influence of a test on language teaching, or more precisely, on “the curriculum that is related to” the test (Brown, 2005, p.242). The writing measure of the TOEFL iBT promotes positive washback because of its communicative nature. Many ESL programs nowadays offer TOEFL iBT preparation courses to help test takers achieve their target scores. As a former ESL student, I benefited greatly from the preparatory course, especially from the timed-writing activities. This class was especially helpful for one of the two students that I interviewed recently about the writing test. He said that the course has even helped him improve his typing skills. He talked about his frustration in his first test when he couldn’t complete his essay because he was a “very slow” typist. Reliability and Validity In addition to using Swain and Wesche’s frameworks, it is equally important to examine the quality of a test in terms of its level of reliability and validity. Test reliability refers to the consistency of ratings in a given test (Brown, 2005). In the TOEFL iBT, reliability value is reported in a coefficience of 0-1, the closer the reliability value is to 1, the more reliable a test is. In 2011, ETS published operational data that indicate the reliability levels of different sections (Appendix I). On a scale of 30 points, the value of the writing section was 0.74 within 2.76 value of the standard error of measurement. Though this value is the lowest compared to the other sections, it is still considered to be high since it is only 0.26 points below1.0. ETS as well as other resources confirm the former’s efforts in maintaining high reliability by minimizing the impact of interrater issues. On the organization’s Online Scoring Network, 11 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST raters are supervised by their leaders through a “toll-free phone arrangement” during the scoring session. Next, specialists at ETS examine the quality of ratings, and finally, ETS reviews supervisors’ reports on raters’ performance to further ensure consistency (Chapelle et al., 2009, p.266). However, reliability alone does not tell us enough about the quality of a test. A test’s validity should be also investigated to determine its quality. Validity of the TOEFL iBT could be measured on the basis of numerous propositions, such as those presented in (Appendix J). In this review, however, I only focus on the first aspect in the list which focuses on the extent to which the writing tasks measure what they are supposed to measure. In other words, how equivalent the writing tasks of the TOEFL iBT are to those in college courses. To answer the questions, Cumming et al. (2004) conducted a study on the authenticity of communicative tasks in the TOEFL iBT. They interviewed seven highly experienced ESL teachers in the U.S and Canada about whether the tasks of the then-new writing section actually resembled those in college classes. In response, the teachers had their students take prototype TOEFL tasks. The results indicated that 70% of students’ performances were similar to their performances when they write English in class (Cumming et al., 2004). As a test taker, I agree that most of the writings that I produce in graduate school require me to state my opinion and synthesize ideas from different sources. Similarly, my interviewees expressed their overall satisfaction with the nature of the tasks. Their main problem, besides one of them being a slow typist, was writing under time pressure, which they gradually overcame with the help of test preparation courses. 12 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST Final Thoughts This review of the writing section of the TOEFL iBT test served as a rich resource on how and what goes into designing a high-stakes communicative test. In applying Wesche’s (1983) and Swain’s (1984) analytical framework, I discovered that the writing portion of the TOEFL iBT test exhibits features of a valid test. It measures test takers abilities to write essays similar to those they compose at the university, by providing academic content supported with auditory and visual aids to maximize the quality of their performances. Additionally, the writing section appears to be reliable in that it manifests stability in the scoring process. One of the interesting facts about the ETS scoring system is that more than one rater scores a single task to ensure consistency. This complicated process made me think that my low score in the writing section was more likely due to anxiety than to interrater problems. Another feature that makes the TOEFL iBT a good candidate of an effective communicative test is its positive influence on language teaching. Many language institutes offer test preparation classes to equip international students, especially those with limited computer skills, with the necessary tips and practice to receive the aimed score. Lastly, as a result of this review, and particularly the long history of the TOEFL test, I came to realize that tests can never be perfect; they change over time to meet the existing pedagogical practices. As a language teacher, I will always bear this idea in mind because a test designed for a specific group in a particular context might not work well for another population in different settings. 13 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST References Bailey, K. M. (1998). Learning about language assessment: Dilemmas, decisions and directions. Boston, MA: Heinle & Heinle. Brown, J. D. (2005). Testing in language programs: A comprehensive guide to English language assessment (New ed.). Upper Saddle River, N.J: Prentice Hall Regents. Brown, H. D. (2007). Teaching by principles: An interactive approach to language pedagogy. White Plains, NY: Pearson Education. Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2009). Building a validity argument for the Test of English as a Foreign Language. New York, NY: Routledge. Cumming, A., Grant, L., & Mulcahy-Ernt, P. (2004). A teacher-verification study of speaking and writing prototype tasks for a new TOEFL. Language Testing, 21(1), 107-145. Educational Testing Service. (2005). How to prepare for the next generation TOEFL test and communicate with confidence. Retrieved September 28, 2013, from http://www.transint.boun.edu.tr/toefl/belgeler/tips.pdf Educational Testing Service. (2007). TOEFL computer-based and paper-based tests. Retrieved September 28, 2013, from http://www.ets.org/Media/Research/pdf/TOEFL-SUM-0506CBT.pdf Education Testing Service. (2008a). TOEFL iBT at a glance. Retrieved September 28, 2013, from http://www.ets.org/Media/Tests/TOEFL/pdf/TOEFL_at_a_Glance.pdf 14 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST Educational Testing Service. (2008b). Validity evidence supporting the interpretation and use of TOEFL iBT scores. Retrieved September 28, 2013, from http://www.ets.org/s/toefl/pdf/toefl_ibt_insight_s1v4.pdf Educational Testing Service. (2011). Reliability and comparability of TOEFL iBT scores. Retrieved September 28, 2013, from http://www.ets.org/s/toefl/pdf/toefl_ibt_research_s1v3.pdf Educational Testing Service. (2013a). Test and score data summary for TOEFL iBT tests and TOEFL PBT tests. Retrieved September 28, 2013, from http://www.ets.org/s/toefl/pdf/94227_unlweb.pdf Educational Testing Service. (2013b). About the TOEFL iBT test. Retrieved September 28, 2013, from http://www.ets.org/toefl/ibt/about Educational Testing Service. (2013). TOEFL Ibt test scores. Retrieved September 28, 2013, from http://www.ets.org/toefl/ibt/scores/ Greenberg, K. L. (1986). The development and validation of the TOEFL writing test: A discussion of TOEFL research reports 15 and 19. TESOL Quarterly. Sawaki, Y., Stricker, L., & Oranje, A. (2008). Factor structure of the TOEFL internet-based test (iBT): Exploration in a field trial sample. Educational Testing Service. Sharpe, P. J. (2010). TOEFL iBT (13. ed.). Hauppauge, NY: Barron's. 15 EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST Swain, M. (1984). Large-scale communicative language testing: A case study. In S. J. Savignon, & M. Berns (Eds.), Initiatives in communicative language teaching (pp. 185-201). Reading, MA: Addison-Wesley. 16