Uploaded by radaqdaqi1s

test-review-final

advertisement
Running head: EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
Exploring the Writing Section of the TOEFL iBT Test:
Analysis of Tasks and Scoring Process
Augar M. Khoshaba
The Monterey Institute of International Studies
September 30, 2013
1
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
Exploring the Writing Section of the TOEFL iBT Test
When I applied for a scholarship to pursue my Master’s degree in the United States, the
sponsoring committee required me to provide a proof of eligibility, including graduation
transcript and medical report. Additionally, they asked for what seemed to concern them the
most: a report of my English language skills. In response, I prepared my documents and
registered for the only available language test back then: the paper version of The Test of English
as a Foreign Language (TOEFL PBT). My success in this test was my passport to pass different
stages of evaluation and, eventually, fly to the United States.
While I thought that my experience with standardized tests had ended with the TOEFL
PBT, a second round of testing started as I arrived in the U.S. This time, the Admission Office at
my school requested that I take the newest and most communicative version of the TOEFL
series: TOEFL iBT. According to ETS (2008a), this Internet-based test enables students to “get
into more than 6,000 universities worldwide,” and proves that they can “communicate effectively
in an academic environment.” Although I achieved the target score in the second attempt, my
sub-score in the writing section did not meet my program’s requirements. I had to take the test
two more times until I received a better score. This struggle with the writing section of the
TOEFL iBT made me question the integrity of its scoring system and whether the tasks actually
resembled those in college courses. Hence, I decided to review the writing portion of the TOEFL
iBT test to seek answers to my questions. This paper is divided into four sections: history of the
TOEFL, description of the writing section, scoring system, and analysis.
2
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
History of the TOEFL Test
The increasing population of non-native students in the United States in late 1950s and
early 1960s necessitated the urgency of a language test that accommodates their academic needs.
As a result, the National Council on the Testing of English as a Foreign Language was
established in 1961, which launched its first TOEFL test in 1964. Nine years later, in 1973, the
Educational Testing Service (ETS) became the main operator of the TOEFL test. This
organization changed the format of the first TOEFL in 1976, from the five multiple-choice
sections into three new parts: reading comprehension and vocabulary; listening comprehension;
and structure and written expressions (ETS, 2007).
Despite the fact that the first two components of the early TOEFL offered direct measures
of test takers’ English proficiency, the third section was heavily criticized by English teachers
and score users alike. They argued that using discrete-point tests of English structure and written
expression does not assess examinees’ writing skills since they do not produce any written
responses. Thus, they requested a more appropriate measure that requires test takers’ to produce
academic essays equivalent to those they encounter in college classes (Chapelle et al., 2009).
Accordingly, the Test of Written English (TWE) was introduced in July, 1986. This test included
one writing task, where examinees write an essay to describe a chart, or compare, or express
their opinions within 30 minutes. It was scored on a 6-point scale and was first offered separately
from the TOEFL, but was soon integrated into it (Greenberg, 1986).
Despite people’s satisfaction with this stronger measure of writing, many of them
questioned its validity since it used a scale different from that of the TOEFL. These concerns,
along with a growing interest in applying the theory of communicative competence to language
3
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
testing, led to the development of a more communicative version of the test. In 1998, ETS
introduced the computer-based version of the test: TOEFL cBT. In addition to including new
item formats and visual aids to the listening and reading section, this computerized version also
added a more communicative writing task along with the structure component. Nevertheless, this
test was criticized for not being completely communicative. As a result, a comprehensive and
more integrative version was designed in 2005: the “next generation” TOEFL iBT (ETS, 2005).
This Internet-based TOEFL consists of four sections: reading, listening, speaking, and
writing. The total score of the test is 12—30 points per section. The most salient feature of the
TOEFL iBT is the addition of the speaking section, which measures examinees’ abilities to
“communicate in English in an academic setting” (Sharpe, 2010). The test lasts for four hours
with a ten-minute break given in the middle, between the listening and speaking sections. It is
offered more than 50 times per year in 110 countries and has so far been taken by 27 million
examinees in the world (ETS, 2008a,2013b).
TOEFL iBT is a norm-referenced test, meaning that it is used to “spread students out in
percentile terms for proficiency or placement testing purposes” (Brown, 2005, p.76). Recently,
ETS (2013a) has published data from January 2012 – December 2012 representing the means
and standard deviations of examinees’ scores based on their gender or country (Appendix A). In
terms of test registration, test takers need to fill out an online registration form on ETS website
(Appendix B) and pay a fee of $160 to $250, depending on the country or testing center. The
software used to operate the test is straightforward and uses clear written and audio instructions,
which facilitate its use even by first-time test takers. I remember having total control in the test
4
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
even though I was not quite familiar with the program. In the following two sections, I will offer
a general description of the writing portion in the TOEFL iBT and its scoring system.
Description of the Writing Section of the TOEFL iBT
The writing portion of the TOEFL iBT is the last part of the test after the speaking
section. It is a direct measure of examinees’ ability to write integrative and opinion essays
similar to those they produce in college. Unlike TWE or TOEFL cBT, this recent version
includes two essay writing tasks.
The Integrated Essay
In this task, students respond with an essay after they read a passage and listen to a
lecture discussing the same topic (Appendix C). The goal of the task is to measure examinees
abilities to synthesize or connect ideas from the two passages. The time allocated for this task is
20 minutes, in which test takers are expected to write 150-225 words (Sharpe, 2010).
The Independent Essay
In this task, examinees will read a prompt on the screen asking them to compose an essay
that reflects their opinion about common topics (Appendix D). They are expected to write 300350 words; therefore, they are given 30 minutes to finish the essay. In both tasks, a timer appears
on the screen to notify test takers about the time remaining to complete their essays.
Scoring System
ETS repeatedly stresses its use of a wide range of security measures to maintain integrity
in the scoring process. Most notable is the well-protected location where scoring takes place;
5
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
tests are not graded at the testing centers; rather, they are scored in centralized networks by
professional raters representing an array of different cultural backgrounds. As soon as the
students finish the test, their essays will be sent to the Online Scoring Network (OSN), where
each essay will be marked on a scale from 0-5 points by using a holistic scoring approach. As
Brown (2005) puts it, the holistic scoring model “uses a single general scale to give a single
global rating for each student’s language production” (p.54). Usually, two raters mark each task
according to the writing scoring rubric (Appendix E).
Occasionally, the scores assigned by the two raters might differ by 1 point, a situation
that requires a third rater to mark the task to determine the final score. In the event that the three
scores were close to each other, the final score will be the mean of all the scores. However, if
these three scores were still inconsistent, the mean of the two closest scores will be the grade
assigned to the task (Sawaki et al., 2008). To calculate the total score for the writing section,
raters convert the average of the two scores—integrated and independent—to a score on a scale
of 30 points (ETS, 2005). Appendix (F) provides a practical conversion chart.
The holistic scoring model used in the writing measure has its strengths and weaknesses;
Bailey (1998) notes that the holistic approach is “fast” and results in a high level of rater
reliability. Moreover, it focuses on positive qualities of writers’ essays. On the other hand, “A
single score may mask differences across individual composition,” i.e.; two papers with the score
of “3” on the scale might exhibit different qualities (p.189). Regarding the TOEFL iBT test, ETS
often reports that the organization offers intensive training sessions for raters. Bailey (1998)
discusses a particular workshop where raters first review “benchmark papers”—samples that best
represent each point on the scale—to familiarize themselves with the scale. Then, by following
6
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
the same scale, they read and mark similar papers and discuss their scores with their peers
(p.189). This process is intended to create a unified, more accurate measure.
Fifteen business days after the test date, examinees receive their score reports via email.
The scores are listed in a table divided into four sections: reading, listening, speaking, and
writing (Appendix G). In the writing section, the level of performance is indicated by four
categories: weak (0-1.0), limited (1.0-2.0), fair (2.5-3.5), and good (4.0-5.0). Next to each level,
there is a short description of examinee’s general performance with bullet points explaining
particular weaknesses (Appendix H). Moreover, the reports include small tables of score
interpretations on the back page. Hard copies of the reports are mailed to the examinees within a
week from receiving the electronic versions.
Analysis
In analyzing the writing section of the TOEFL iBT, I adapted two test analysis
frameworks developed by Wesche (1983) and Swain (1984). Under each analysis, I will include
a summary table of each component or principle, followed by a detailed discussion of how these
principles are reflected in the writing tasks of the Internet-based test.
Wesche’s Framework
Wesche (1983) proposed a practical framework to analyze test structure. It includes four
key components: stimulus material, task based to learner, learner’s response, and scoring criteria.
Table 1 shows how the writing tasks in the TOEFL iBT correspond to Wesche’s framework.
7
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
Table 1. The writing tasks and Wesche’s test analysis framework
Wesche’s Test Analysis
Components
Stimulus material
Task based to the learner
Task 1 (Integrated Essay)
Task 2 (Independent Essay)
Reading and listening
passages.
Prompts.
-Understanding the passages.
-Synthesizing.
-Relying on experience.
Constructive: essay writing.
Constructive: essay writing.
Scoring approach: holistic
scoring on a scale from 1-5
points.
Main focus: quality,
completeness, and accurate
content.
Scoring approach: holistic
scale on a scale from 1-5
points.
Main focus: quality and
development.
Learner’s response
Scoring criteria
Stimulus material. The stimulus material, as defined by Bailey (1998), is a “term refers
to whatever linguistic or non-linguistic information presented to the learners to get them to
demonstrate the skills or knowledge we [teachers] want to assess” (p.13). In the integrated essay
task, the stimulus materials are the reading passage and the listening lecture, whereas in the
independent essay writing, it is simply the prompt that appears on the screen.
Task posed to the learner. This point refers to the mental processes that examinees
activate in order to understand the task and produce output (Bailey, 1998). In the integrated
essay, the task posed to the learner is understanding and making connections between the reading
and listening passages. In the independent essay task, examinees need to rely on their creativity
to relate the topic to their personal experiences.
8
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
9
Learner’s response. As the name suggests, this component deals with examinees’
outcomes which prove their ability to perform the task assigned to them (Bailey, 1998). In both
tasks, test takers respond with essays of varying lengths, depending on the preferences indicated
in the directions or time limit.
Scoring criteria. As discussed earlier, examinees’ writings on the TOEFL iBT are
scored holistically on a 5-point scale, and every point is accompanied by a description of an
examinee’s performance at that particular level as shown in (Appendix H). The focus of grading
in the two tasks is slightly different; while the integrated task looks at connectedness of ideas, the
independent task focuses more on the development of the topic.
Swain’s Framework
Swain (1984) highlights four principles that test developers need to utilize when
designing sound communicative tests: start from somewhere, concentrate on content, bias for
best, and work for washback. Table 2 displays these principles and their application to the
writing tasks of the TOEFL iBT.
Table 2. Swain’s analysis of communicative tests
Swain’s Analysis Principles
Task 1 (Integrated Essay)
Task 2 (Independent Essay)
Communicative competence theory
Start from somewhere
Communicative competence
theory
Academic content
Interactive content
Visual aids (picture), reading
passage, clock, and notetaking.
Prompt, clock, and note-taking
Teaching of writing and test
preparation courses
Teaching of writing and test
preparation courses
Concentration on content
Bias for best
Work for washback
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
Start from somewhere. This principle indicates that tests should be built on a theoretical
foundation. The TOEFL iBT was established from the need to apply the theory of
communicative competence to language testing. The integrated task in the writing measure
requires examinees to integrate three skills (reading, listening, and writing) in order to produce
essays similar to those they write in university courses.
Concentrate on content. This principle “refers to both the content of the material used
as the basis of communicative language activities and the tasks used to elicit communicative
language behavior” (Swain, 1984, p.190). Since the writing section examines test takers’ writing
skills in a school environment, the content of the TOEFL iBT writing tasks is academic. For
instance, a test taker would compose a comprehensive essay after reading and listening to
passages on language acquisition. This example echoes Brown’s (2007) idea of language
contextuality, which is essential in promoting communicative competence.
Swain (1984) categorizes the content of large-scale communicative tests into four types:
motivating, substantive, integrated, and interactive. He defines the last as “the provision of
content that includes opinions or controversial ideas” (p.194). This type of content appears in the
independent task of the test, in which examinees express their attitudes toward general topics.
Bias for best. This principle focuses on test developers’ efforts to maximize examinees’
opportunities for successful performance (Swain, 1984). ETS has invested a great deal of energy
in designing user-friendly software that meets test takers visual and auditory preferences. In the
integrated writing task, a picture of a professor appears on the screen when examinees listen to a
lecture in order to simulate a real lecture environment. Additionally, test takers are allowed to
take notes during the task and, above all, the reading passage re-appears along with a timer when
10
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
they start writing. On a broader level, ETS has published several editions of TOEFL iBT
preparation textbooks and online test samples which students can use in their daily practice.
Work for washback. The last category in Swain’s framework is the influence of a test
on language teaching, or more precisely, on “the curriculum that is related to” the test (Brown,
2005, p.242). The writing measure of the TOEFL iBT promotes positive washback because of its
communicative nature. Many ESL programs nowadays offer TOEFL iBT preparation courses to
help test takers achieve their target scores. As a former ESL student, I benefited greatly from the
preparatory course, especially from the timed-writing activities. This class was especially helpful
for one of the two students that I interviewed recently about the writing test. He said that the
course has even helped him improve his typing skills. He talked about his frustration in his first
test when he couldn’t complete his essay because he was a “very slow” typist.
Reliability and Validity
In addition to using Swain and Wesche’s frameworks, it is equally important to examine
the quality of a test in terms of its level of reliability and validity. Test reliability refers to the
consistency of ratings in a given test (Brown, 2005). In the TOEFL iBT, reliability value is
reported in a coefficience of 0-1, the closer the reliability value is to 1, the more reliable a test is.
In 2011, ETS published operational data that indicate the reliability levels of different sections
(Appendix I). On a scale of 30 points, the value of the writing section was 0.74 within 2.76 value
of the standard error of measurement. Though this value is the lowest compared to the other
sections, it is still considered to be high since it is only 0.26 points below1.0.
ETS as well as other resources confirm the former’s efforts in maintaining high reliability
by minimizing the impact of interrater issues. On the organization’s Online Scoring Network,
11
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
raters are supervised by their leaders through a “toll-free phone arrangement” during the scoring
session. Next, specialists at ETS examine the quality of ratings, and finally, ETS reviews
supervisors’ reports on raters’ performance to further ensure consistency (Chapelle et al., 2009,
p.266).
However, reliability alone does not tell us enough about the quality of a test. A test’s
validity should be also investigated to determine its quality. Validity of the TOEFL iBT could be
measured on the basis of numerous propositions, such as those presented in (Appendix J). In this
review, however, I only focus on the first aspect in the list which focuses on the extent to which
the writing tasks measure what they are supposed to measure. In other words, how equivalent the
writing tasks of the TOEFL iBT are to those in college courses. To answer the questions,
Cumming et al. (2004) conducted a study on the authenticity of communicative tasks in the
TOEFL iBT. They interviewed seven highly experienced ESL teachers in the U.S and Canada
about whether the tasks of the then-new writing section actually resembled those in college
classes. In response, the teachers had their students take prototype TOEFL tasks. The results
indicated that 70% of students’ performances were similar to their performances when they write
English in class (Cumming et al., 2004).
As a test taker, I agree that most of the writings that I produce in graduate school require
me to state my opinion and synthesize ideas from different sources. Similarly, my interviewees
expressed their overall satisfaction with the nature of the tasks. Their main problem, besides one
of them being a slow typist, was writing under time pressure, which they gradually overcame
with the help of test preparation courses.
12
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
Final Thoughts
This review of the writing section of the TOEFL iBT test served as a rich resource on
how and what goes into designing a high-stakes communicative test. In applying Wesche’s
(1983) and Swain’s (1984) analytical framework, I discovered that the writing portion of the
TOEFL iBT test exhibits features of a valid test. It measures test takers abilities to write essays
similar to those they compose at the university, by providing academic content supported with
auditory and visual aids to maximize the quality of their performances. Additionally, the writing
section appears to be reliable in that it manifests stability in the scoring process. One of the
interesting facts about the ETS scoring system is that more than one rater scores a single task to
ensure consistency. This complicated process made me think that my low score in the writing
section was more likely due to anxiety than to interrater problems.
Another feature that makes the TOEFL iBT a good candidate of an effective
communicative test is its positive influence on language teaching. Many language institutes offer
test preparation classes to equip international students, especially those with limited computer
skills, with the necessary tips and practice to receive the aimed score. Lastly, as a result of this
review, and particularly the long history of the TOEFL test, I came to realize that tests can never
be perfect; they change over time to meet the existing pedagogical practices. As a language
teacher, I will always bear this idea in mind because a test designed for a specific group in a
particular context might not work well for another population in different settings.
13
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
References
Bailey, K. M. (1998). Learning about language assessment: Dilemmas, decisions and directions.
Boston, MA: Heinle & Heinle.
Brown, J. D. (2005). Testing in language programs: A comprehensive guide to English language
assessment (New ed.). Upper Saddle River, N.J: Prentice Hall Regents.
Brown, H. D. (2007). Teaching by principles: An interactive approach to language pedagogy.
White Plains, NY: Pearson Education.
Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2009). Building a validity argument for the
Test of English as a Foreign Language. New York, NY: Routledge.
Cumming, A., Grant, L., & Mulcahy-Ernt, P. (2004). A teacher-verification study of speaking
and writing prototype tasks for a new TOEFL. Language Testing, 21(1), 107-145.
Educational Testing Service. (2005). How to prepare for the next generation TOEFL test and
communicate with confidence. Retrieved September 28, 2013, from
http://www.transint.boun.edu.tr/toefl/belgeler/tips.pdf
Educational Testing Service. (2007). TOEFL computer-based and paper-based tests. Retrieved
September 28, 2013, from http://www.ets.org/Media/Research/pdf/TOEFL-SUM-0506CBT.pdf
Education Testing Service. (2008a). TOEFL iBT at a glance. Retrieved September 28, 2013,
from http://www.ets.org/Media/Tests/TOEFL/pdf/TOEFL_at_a_Glance.pdf
14
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
Educational Testing Service. (2008b). Validity evidence supporting the interpretation and use of
TOEFL iBT scores. Retrieved September 28, 2013, from
http://www.ets.org/s/toefl/pdf/toefl_ibt_insight_s1v4.pdf
Educational Testing Service. (2011). Reliability and comparability of TOEFL iBT scores.
Retrieved September 28, 2013, from
http://www.ets.org/s/toefl/pdf/toefl_ibt_research_s1v3.pdf
Educational Testing Service. (2013a). Test and score data summary for TOEFL iBT tests and
TOEFL PBT tests. Retrieved September 28, 2013, from
http://www.ets.org/s/toefl/pdf/94227_unlweb.pdf
Educational Testing Service. (2013b). About the TOEFL iBT test. Retrieved September 28,
2013, from http://www.ets.org/toefl/ibt/about
Educational Testing Service. (2013). TOEFL Ibt test scores. Retrieved September 28, 2013, from
http://www.ets.org/toefl/ibt/scores/
Greenberg, K. L. (1986). The development and validation of the TOEFL writing test: A
discussion of TOEFL research reports 15 and 19. TESOL Quarterly.
Sawaki, Y., Stricker, L., & Oranje, A. (2008). Factor structure of the TOEFL internet-based test
(iBT): Exploration in a field trial sample. Educational Testing Service.
Sharpe, P. J. (2010). TOEFL iBT (13. ed.). Hauppauge, NY: Barron's.
15
EXPLORING THE WRITING SECTION OF THE TOEFL IBT TEST
Swain, M. (1984). Large-scale communicative language testing: A case study. In S. J. Savignon,
& M. Berns (Eds.), Initiatives in communicative language teaching (pp. 185-201).
Reading, MA: Addison-Wesley.
16
Download